Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you fit Llama2-70b into V100? V100 is 16GB. Llama2-70b 4bit would require up to 40GB. Also, what do you use for inference to get 300+tokens/s?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: