Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
goldenarm
41 days ago
|
parent
|
context
|
favorite
| on:
Ollama is now powered by MLX on Apple Silicon in p...
How many tokens per second?
LuxBennu
40 days ago
[–]
Roughly 8-12 token/s on generation depending on context length. Prompt processing is faster obviously. Haven't benchmarked it super carefully though, just eyeballing the llama.cpp output.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: