Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While models such as XLNet incorporate recurrence, GPT-{2,3} is mostly just a plain decoder-only transformer model.[1]

[1]https://arxiv.org/abs/2005.14165 [2]https://d4mucfpksywv.cloudfront.net/better-language-models/l...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: