wait can some not?

CodesInChaos · on Aug 3, 2023

UTF-16 surrogates are code-points, but UTF-8 prohibits encoding them. However unicode strings are sequences of unicode scalar values and can't contain surrogates, so you can encode every unicode string in UTF-8.

(There is the WTF-8 encoding, which allows unpaired surrogates under certain circumstances. This can be useful for losslessly storing wide-strings (unvalidated UTF-16) in a UTF-8 like encoding, which are used by Windows, C#, Java and Javascript)

1-more · on Aug 7, 2023

ohhh that's a fun and subtle thing. Thanks!