There are several existing mechanisms in Core to determine if a given string contains valid UTF-8 bytes or not. These are spread out and depend on which extensions are installed on the running system and what is set for `blog_charset`. The `seems_utf8()` function is one of these mechanisms. `seems_utf8()` does not properly validate UTF-8, unfortunately, and is slow, and the purpose of the function is veiled behind its name and historic legacy. This patch deprecates `seems_utf()` and introduces `wp_is_valid_utf8()`; a new, spec-compliant, efficient, and focused UTF-8 validator. This new validator defers to `mb_check_encoding()` where present, otherwise validating with a pure-PHP implementation. This makes the spec-compliant validator available on all systems regardless of their runtime environment. Developed in https://github.com/WordPress/wordpress-develop/pull/9317 Discussed in https://core.trac.wordpress.org/ticket/38044 Props dmsnell, jonsurrell, jorbin. Fixes #38044. Built from https://develop.svn.wordpress.org/trunk@60630 git-svn-id: http://core.svn.wordpress.org/trunk@59966 1a063a9b-81f0-0310-95a4-ce76da25c4cd
42 KiB
42 KiB