Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ah OK. So this is for resuming chat context cheaply. What I said is still correct - 3FS is not part of the inference flow & not relevant to the paper which is about optimizing the KV cache usage at runtime.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: