Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Best advice I've heard is to never use the character type in your programming language. Instead, store characters in strings. An array of strings can be used as a string of characters. In this approach, characters become opaque blobs of bytes. This makes it easy to get the two numbers you care about: length in characters and size in bytes.

There is some overhead for this, so maybe a technique more suited to backends. Normalization, sanitation and validation steps are best performed in the frontend.

Also worth knowing is the ICU library, which is often the easiest way to work with Unicode consistently regardless of programming language.

Finally, punycode is a standard way to represent arbitrary Unicode strings as ASCII. It's reversible too (and built into every web browser). You can do size limits on the punycode representation.

BTW, you shouldn't store passwords in strings in the first place. Many programming languages have an alternative to hold secrets in memory safely.



> validation steps are best performed in the frontend.

I'm really hoping we have very different definitions of "frontend"


I meant the web server, not in the end user's browser! (So by backend, I meant the application and data layers.)


This is generally a bad idea, even if you ignore the obvious overhead from doing so. At some point you are going to create a "real" string out of the thing you have, and it is not going to behave like you expect if you just blindly use the array's properties to compute them. Nor will they really have well defined semantics unless you are careful about what the "characters" you're storing in strings are.


>length in characters and size in bytes

you change the word you use as if those words have inherent meanings that we can draw upon. they don't.

it would be more clear to write "length in characters and length in bytes"

[linguistically speaking, words don't carry meanings, it is us who ascribe meaning to words. we use words to say what we want to say, but words don't limit us in what we can say]


You are correct. It's just that I am loquacious by nature and often use a plethora of words when a paucity would better and more succinctly convey meaning precisely.

My bad!


Swift’s Character type represents an extended grapheme cluster, which is the correct thing to do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: