Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

wait can some not?


UTF-16 surrogates are code-points, but UTF-8 prohibits encoding them. However unicode strings are sequences of unicode scalar values and can't contain surrogates, so you can encode every unicode string in UTF-8.

(There is the WTF-8 encoding, which allows unpaired surrogates under certain circumstances. This can be useful for losslessly storing wide-strings (unvalidated UTF-16) in a UTF-8 like encoding, which are used by Windows, C#, Java and Javascript)


ohhh that's a fun and subtle thing. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: