-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SIEVE as underlying LRU? #446
Comments
Hi. Sorry for the long delay. I was very busy with my work between July and September last year. I also had a health problem between August and December. Now I feel better and can work on Thanks for the information about SIEVE. I will use the Caffeine simulator to measure the performance (hit rate) of SIEVE and As for the
These dequeues are implemented as a doubly-linked list. |
I ran the Caffeine simulator using two different cache trace files ARC S3 and Corda. How to runhttps://github.com/moka-rs/caffeine-sim-drivers/tree/w-tinylfu ResultsARC S3
Corda
|
In terms of performance (hit rate), I think it will be better to keep the underlying LRU as is. This is because of the second result above (Corda). Replacing the underlying LRU with SIEVE will not improve the hit rate. In terms of simplicity, SIEVE may help to eliminate the need for a batch read/write application on the underlying LRU, especially for reads. However, I would rather keep the current CC: @ben-manes |
Corda is blockchain, so I think it is recent transactions being validated by multiple miners. This makes it very recency biased like stream processing where there is a small fanout per item. The workloads only need a small LRU to become optimal so I mainly use it for the stress test (corda => 5x loop => corda) to force the cache to reconfigure itself between LRU and MRU patterns. The simulator's version of these algorithms were verified against the author's simulator. Note that after publication and promoting, they changed the algorithm in their code base but kept the same name. That option has not been reimplemented, so it is v0 in their new terminology. The difference is only on initial wam-up: "This version (S3FIFO.c) differs from the original S3-FIFO (S3FIFOv0.c) in that when the small queue is full, but the cache is not full, the original S3-FIFO will insert into the small queue". Sieve is a very minor adjustment to the Clock algorithm from the 1960s (A Paging Experiment with the Multics System), which mimicked LRU for the OS page buffer. In this era the miss penalty was a "drum-core transfers" (8–20 milliseconds), which was program memory. Therefore recent memory pages were important as programmers would actively design for that because page faults were incredibly costly to the program's running time. This small amount of memory, using very predictable algorithms, were important to fit within the scheduled allocation of computer time (pre-timesharing). Sieve adjusts this slightly by decoupling the insertion hand from the victim hand, a well technique called a two-handed clock in Solaris (1990s), to more aggressively evict recent arrivals that are inactive. The intent from the original authors was the same as Sieve's, that recent arrivals may be "one-hit wonders" and to more aggressively evict them. S3-FIFO is a very minor adjustment to the 2Q algorithm from the early 1990s (A Low Overhead High Performance Buffer Management Replacement Algorithm). This algorithm was used broadly in databases and introduced the concept of using the recent eviction history in the replacement decision. The only slight change was to use Clock queues instead of LRU queues, due to its well known concurrency characteristics. That wasn't a concern for the original authors and isn't noteworthy as developers will typically adjust an algorithm for their needs, so this would be a pragmatic change to expect to see in existing systems. Clock-based concurrency is very nice for reads, at the penalty of O(n) eviction scans on write. That write is typically under a lock, but the S3-FIFO authors instead suggest using lock-free queues. This becomes challenging because (a) queue head/tail contention, (b) moving "weighted" entries between queues breaks the capacity accounting, (c) it often does not combine well with other features like expiration. Instead of using clock or random sampling policies to cope with concurrency, Caffeine/Moka use request buffering to batch and replay the events. This avoids contention, does not require that writers wait or block on eviction, offers similar performance, and allows for more efficient O(1) algorithms without race concerns. It is a more complicated technique for high performance, but more flexible than the 1960s approach (in fact, its a 1970s idea of a write-ahead log). S3-FIFO does not use Sieve for its queues as the overall 2Q structure of an probation and protected queues serves the same purpose. W-TinyLFU used in Moka has a very similar structure, but replaces ghost entries with a more compact frequency sketch to allow for a larger history and to make better replacement decision. All of these require some heuristic, such as the hand speed or the queue sizes, to determine how aggressively they bias for or against recency. We later solved this by using hill climbing rather than a static configuration. This allows for higher hit rates across a larger set of workloads without manual tuning. There's really no benefit for adopting Sieve / S3-FIFO because they are renames of historic algorithms and modern techniques work better. Hit rate simulations and concurrency benchmark have shown that, e.g. Otter attempted to implement S3-FIFO's concurrency with its author's help, but had to replace it with Caffeine / Moka's approach due to poor performance. Research papers are often biased or incorrect because the authors often have financial motives, so performing your own analysis like @tatsuya6502 did here is critical for making good engineering decisions. |
I recently read about SIEVE (https://cachemon.github.io/SIEVE-website/, https://www.usenix.org/system/files/nsdi24-zhang-yazhuo.pdf) which is advertised as a simpler and better alternative for LRU.
It is also mentioned that it can serve as an underlying primitive for more advanced eviction algorithms.
So I wanted to give it a shot to actually implement it within
moka
but got sidetracked (#445 😅).I’m still trying to fully piece together all the moving parts in my head, and I stumbled across the access-order and write-order dques, and now I’m wondering if any other piece of functionality has a hard requirement on having the elements sorted in access order, vs write order? SIEVE does not reorder elements at all, so it maintains write-order naturally.
Does it make sense to continue with this effort? Is there a possibility this could be worthwhile?
The text was updated successfully, but these errors were encountered: