Best advice I've heard is to never use the character type in your programming language. Instead, store characters in strings. An array of strings can be used as a string of characters. In this approach, characters become opaque blobs of bytes. This makes it easy to get the two numbers you care about: length in characters and size in bytes.
There is some overhead for this, so maybe a technique more suited to backends. Normalization, sanitation and validation steps are best performed in the frontend.
Also worth knowing is the ICU library, which is often the easiest way to work with Unicode consistently regardless of programming language.
Finally, punycode is a standard way to represent arbitrary Unicode strings as ASCII. It's reversible too (and built into every web browser). You can do size limits on the punycode representation.
BTW, you shouldn't store passwords in strings in the first place. Many programming languages have an alternative to hold secrets in memory safely.
This is generally a bad idea, even if you ignore the obvious overhead from doing so. At some point you are going to create a "real" string out of the thing you have, and it is not going to behave like you expect if you just blindly use the array's properties to compute them. Nor will they really have well defined semantics unless you are careful about what the "characters" you're storing in strings are.
you change the word you use as if those words have inherent meanings that we can draw upon. they don't.
it would be more clear to write "length in characters and length in bytes"
[linguistically speaking, words don't carry meanings, it is us who ascribe meaning to words. we use words to say what we want to say, but words don't limit us in what we can say]
You are correct. It's just that I am loquacious by nature and often use a plethora of words when a paucity would better and more succinctly convey meaning precisely.
There is some overhead for this, so maybe a technique more suited to backends. Normalization, sanitation and validation steps are best performed in the frontend.
Also worth knowing is the ICU library, which is often the easiest way to work with Unicode consistently regardless of programming language.
Finally, punycode is a standard way to represent arbitrary Unicode strings as ASCII. It's reversible too (and built into every web browser). You can do size limits on the punycode representation.
BTW, you shouldn't store passwords in strings in the first place. Many programming languages have an alternative to hold secrets in memory safely.