I was making an assumpotion that using a vector of ARC<T> would be the best way ...

jhgg · on Feb 5, 2020

Assuming only one thread at a time needs to access the LRU cache (not hard with the shared-nothing message passing architecture which we employ here), the lifetime of the object being checked out from the cache is able to be understood at compile time, and we can just use the borrow checker to ensure that it remains that way (we've got a mutable reference to the LRU, and we can use that to get a mutable reference to an object within the LRU. By the time the function that is mutating the data in the LRU finishes, the references to the objects must be dead (the borrow checker will enforce that.) Since all this information is available during compile time, runtime ref-counting (via rc/arc) is not necessary.

This is made possible by rust's memory model, where it understands ownership of data, and the lifetime of each reference that's being taken from that owned data. This means that the compiler can statically determine how long an object needs to live, and that references to the object don't outlive the owned data. For use-cases where the lifetime of references are able to be statically understood, an arc/rc is not required. This blog-post goes into it in much better detail than I can: https://words.steveklabnik.com/borrow-checking-escape-analys...

falcolas · on Feb 5, 2020

Yes, I'm quite familiar with rust's borrow checking model. I've programmed some in rust, and the rest has been beaten into my head quite thoroughly by Rustaceans. I don't care for Rust, but I understand it.

Locking on one thread at a time seems like a pretty obvious performance flaw. It just doesn't seem like an appropriate design for the given workload (lots of requests, lots of stored items, largely write-only (except for its position in the queue)). It would make a lot more sense to grant multiple threads access the LRU at any given time.

And early optimization and all that aside, creating the LRU in such a way that it can be easily restricted to one thread or opened up makes the most sense to me. Otherwise, you get to re-write the LRU (and all the code which accesses it) if it should be identified as a bottleneck.

Of course, I'm not responsible for the code or truly involved in the design process, so my perspective may be limited.

jhgg · on Feb 5, 2020

In practice, for our service, most of our CPU time is not spent in data mutation, but rather networking and serialization (this is btw, the same conclusion Redis came to when they added "multi-threading".)

You can scale-out by running multiple instances of the service (shared-nothing, N many depending on how cores you want to run on.) Or, you can do message-passing between cores.

In this case, we have 2 modes of scale-up/out (add more nodes to the cluster, or add more shared-nothing LRU caches that are partitioned internally that the process runs, allowing for more concurrency).

We however only run one LRU per node, as it turns out that the expensive part is not the bottleneck here, nor will it probably ever be.

jstrong · on Feb 5, 2020

what kind of design do you have in mind? I assume you don't mean simultaneous reads/writes from multiple threads without synchronization - yolo! there's a lot of possible designs, mutex, read/write lock, concurrent hashmap. I've never worked on an LRU cache, asking because interested in what plays well in that use case, and how you would approach it in another language.