This is the fourth in a series of patches to modernize and standardize UTF-8 handling. `wp_check_invalid_utf8()` has long been dependent on the runtime configuration of the system running it. This has led to hard-to-diagnose issues with text containing invalid UTF-8. The function has also had an apparent defect since its inception: when requesting to strip invalid bytes it returns an empty string. This patch updates the function to remove all dependency on the system running it. It defers to the `mbstring` extension if that’s available, falling back to the new UTF-8 scanning pipeline. To support this work, `wp_scrub_utf8()` is created with a proper fallback so that the remaining logic inside of `wp_check_invalid_utf8()` can be minimized. The defect in this function has been fixed, but instead of stripping the invalid bytes it will replace them with the Unicode replacement character for stronger security guarantees. Developed in https://github.com/WordPress/wordpress-develop/pull/9498 Discussed in https://core.trac.wordpress.org/ticket/63837 Follow-up to: [60768]. Props askapache, chriscct7, Cyrille37, desrosj, dmsnell, helen, jonsurrell, kitchin, miqrogroove, pbearne, shailu25. Fixes #63837, #29717. See #63863. Built from https://develop.svn.wordpress.org/trunk@60793 git-svn-id: http://core.svn.wordpress.org/trunk@60129 1a063a9b-81f0-0310-95a4-ce76da25c4cd
5.6 KiB
5.6 KiB