The existing polyfill for `mb_substr()` contains a number of issues leaving plenty of opportunity for improvement. Specifically, the following are all deficiencies: it relies on Unicode PCRE support, assumes input strings are valid UTF-8, splits input strings into an array of characters (1,000 at a time, iterating until complete), and re-joins them at the end. This patch provides an updated polyfill which will reliably parse UTF-8 strings even in the presence of invalid bytes. It computes boundaries for the substring extraction with zero allocations and then returns a single `substr()` call at the end. This change improves the reliability of UTF-8 string handling and removes behavioral variability based on the runtime system. Developed in https://github.com/WordPress/wordpress-develop/pull/9829 Discussed in https://core.trac.wordpress.org/ticket/63863 See #63863. Built from https://develop.svn.wordpress.org/trunk@60969 git-svn-id: http://core.svn.wordpress.org/trunk@60305 1a063a9b-81f0-0310-95a4-ce76da25c4cd
17 KiB
17 KiB