Not really. To get extra CPU performance that likely means more cores, or some o...

Not really. To get extra CPU performance that likely means more cores, or some other general compute silicon. That stuff tends to be quite big, simply because it’s so flexible.

NPUs focus on one specific type of computation, matrix multiplication, and usually with low precision integers, because that’s all a neural net needs. That vast reduction in flexibility means you can take lots of shortcuts in your design, allowing you cram more compute into a smaller footprint.

If you look at the M1 chip[1], you can see the entire 16-Neural engine has a foot print about the size of 4 performance cores (excluding their caches). It’s not perfect comparison, without numbers on what the performance core can achieve in terms of ops/second vs the Neural Engine. But it seems reasonable to be that the Neural Engine and handily outperform the performance core complex when doing matmul operations.

[1] https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...