How much of an improvement does SIMD offer for something like this? It looks like it's only being used for strings and comments, but I would kind of assume that for most programming languages, the proportion of code that is long strings / comments is not large. Also curious if there's any performance penalty for trying to do SIMD if most of the comments and strings are short.