Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Using linear algebra to convert a large code model (gist.github.com)
53 points by moyix on July 28, 2022 | hide | past | favorite | 3 comments


Amazing work! Is it possible to finetune this model on your own code, or a subset? Finetuning this model on pytorch code to help with tensor manipulation would be awesome!


And what is the code being used for? It's harder to be able to visualize something like code without knowing the purpose.


> I would love to be able to run CodeGen models locally and fast, ideally fast enough that they can be used for interactive tasks like code completion. [...] GPT-J is a very popular model and a lot of work has been put into making fast implementations, like the one in FasterTransformers. [...] Unfortunately, these don't work with CodeGen. Even though the two are 99.9% identical, they're just different enough that you can't naively transfer over the CodeGen weights and run them in a GPT-J implementation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: