OK, bear with me as we get slightly contrived here...(and deviate a little from ...

jholman · on April 18, 2013

Hmn. I think the reason there's superlinear speedup in the paper you linked is because the requests must be serviced in order. If you only care about throughput, and you can service the requests out-of-order, then you can use LARD in a serial process too, to improve cache locality, and achieve speed 1/Nth that of N-machine LARD. But to serve requests online, you can't do that reordering, so with one cache you'd be constantly invalidating it, thus the increased aggregate cache across the various machines results in superlinear speedup.

So, mission accomplished! I now believe that superlinear speedup is a real thing, and know of one example!