I want to share a really dumb, but very practical project I have packaged this summer, to perform operations on strings much faster. I was using Python to work with a multi-terabyte newline-delimited file. Reading, splitting, and shuffling it was a nightmare. So, I wrapped a trivial hardware-friendly heuristic I've been using for the last few years into a CPython library.
The part I enjoyed the most is implementing SIMD behavior without SIMD instructions... Using 64-bit words to work at 8-bit granularity. Unlike conventional SIMD, the code would remain the same for ~~almost~~ any hardware. Let this library be a reminder of how awesome bit-level hacks are! Feel free to use it when working with CommonCrawl or any other sizeable textual dataset.
One random sample:
https://lemire.me/blog/2022/01/21/swar-explained-parsing-eig...