That's path dependence [0]. When all of those were conceived in the nineties, 2-...

kg · on Jan 7, 2018

Some JavaScript runtimes (Firefox's Spidermonkey for one) have an optimization that stores some strings in single-byte format where possible to mitigate the cost of the awful original choice to use UCS-2 for JS strings. I expect some other runtimes do this too, but I don't know any off-hand.

IIRC this was motivated by Firefox OS (strings eat up a lot of RAM on memory-starved $50 smartphones) but it pays off on desktops too.

ubernostrum · on Jan 8, 2018

Python as of 3.3 uses any of three different internal storage mechanisms for strings: 1-byte (latin-1), 2-byte (UCS-2) or 4-byte (UCS-4) depending on the width of the highest code point in the string. This allows the internal storage to always be fixed-width, while still saving space for strings which contain, say, only code points representable in a single byte.

Prior to 3.3, the internal storage of Unicode was determined by a flag during compilation of the interpreter; a "narrow" compiled interpreter would use 2-byte strings with surrogate pairs for non-BMP code points, and a "wide" compiled interpreter would use 4-byte strings.

pcwalton · on Jan 7, 2018

V8 and JavaScriptCore do it, I believe.

BeeOnRope · on Jan 7, 2018

Java has taken a couple different shots at this, going back a decade or more, and the newer option is currently enabled in Java 9.

Some background: https://stackoverflow.com/q/8833385/149138