Files
wordpress/wp-includes/compat.php
dmsnell aa40e03117 Charset: Rely on new UTF-8 pipeline for mb_substr() fallback.
The existing polyfill for `mb_substr()` contains a number of issues leaving plenty of opportunity for improvement. Specifically, the following are all deficiencies: it relies on Unicode PCRE support, assumes input strings are valid UTF-8, splits input strings into an array of characters (1,000 at a time, iterating until complete), and re-joins them at the end.

This patch provides an updated polyfill which will reliably parse UTF-8 strings even in the presence of invalid bytes. It computes boundaries for the substring extraction with zero allocations and then returns a single `substr()` call at the end.

This change improves the reliability of UTF-8 string handling and removes behavioral variability based on the runtime system.

Developed in https://github.com/WordPress/wordpress-develop/pull/9829
Discussed in https://core.trac.wordpress.org/ticket/63863

See #63863.

Built from https://develop.svn.wordpress.org/trunk@60969


git-svn-id: http://core.svn.wordpress.org/trunk@60305 1a063a9b-81f0-0310-95a4-ce76da25c4cd
2025-10-18 04:36:34 +00:00

17 KiB