As per this paper:
“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
Zhiying Jiang, Matthew Yang, Mikhail Tsirlin, Raphael Tang, Yiqin Dai, Jimmy Lin
https://aclanthology.org/2023.findings-acl.426/
via: twitter.com/goodside/status/1679358632431853568
Here is the gzip knn implementation I tested https://github.com/jimmc414/document_intelligence/blob/main/...
I will note that I am comparing entire text files in these implementations not sentences.