They're probably taking shortcuts such as taking advantage of sparsity. There ar... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		FartyMcFarter 12 days ago \| parent \| context \| favorite \| on: 1M context is now generally available for Opus 4.6... They're probably taking shortcuts such as taking advantage of sparsity. There are various tricks like that mentioned in some papers, although the big companies are getting more and more secretive about how their models work so you won't necessarily find proof.

		help

cubefox 11 days ago [–]

The latest DeepSeek model has sparse attention. Though sparse attention is still not linear. Close enough perhaps.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact