I'm going to split some hairs, because it matters for the topic at hand.
>Unicode of course takes up more space and fills up your buffer sooner. Looks like the jump happens after 8 chars.
It sounds like you are conflating unicode with UTF-8. There is more than one way to represent the unicode code points, and UTF-8 is one of them. Further, it seems like you assume that "unicode characters" have a constant size. This is a potentially dangerous misunderstanding of how UTF-8 works. UTF-8 code points have a variable number of bytes (from one to four bytes, IIRC.) You happen to have copied some code points that take 3 bytes each.
I also used to believe Unicode and UTF-8 were different types of encoding until someone corrected me. I just remembered why I had thought such a thing in the first place:
>Unicode of course takes up more space and fills up your buffer sooner. Looks like the jump happens after 8 chars.
It sounds like you are conflating unicode with UTF-8. There is more than one way to represent the unicode code points, and UTF-8 is one of them. Further, it seems like you assume that "unicode characters" have a constant size. This is a potentially dangerous misunderstanding of how UTF-8 works. UTF-8 code points have a variable number of bytes (from one to four bytes, IIRC.) You happen to have copied some code points that take 3 bytes each.
The UTF-8 encoding scheme is a great compromise, and the wikipedia article is easy to follow: http://en.wikipedia.org/wiki/UTF-8