Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIEVE eviction algorithm support #129

Closed
bessey opened this issue Jan 14, 2024 · 3 comments
Closed

SIEVE eviction algorithm support #129

bessey opened this issue Jan 14, 2024 · 3 comments

Comments

@bessey
Copy link

bessey commented Jan 14, 2024

This crossed my mind off the back of reading through rails/rails#50443 and in particular @byroot's concerns of SolidCache relying on FIFO for evictions.

Doing the rounds on HN recently is a new algorithm for cache invalidation called SIEVE. Compared to FIFO it significantly improves cache hit ratios. Compared to LRU it significantly reduces cache write frequency. Most importantly, its a very simple algorithm that in the paper is laid out in just 16 lines of pseudo-code.

I thought those qualities sounded like rather a good fit for solving the concerns laid out in rails/rails#50443. This really isn't my area of expertise, but I've spent some time taking a stab at sketching out what this algorithm might look like in SolidCache with minimal changes to the existing structure.

I have not yet got as far as actually implementing it and measuring performance. Before I go that far, is there appetite for such a change / addition to the gem?

@byroot
Copy link
Member

byroot commented Jan 15, 2024

This algorithm assumes fast access to the entry metadata.

What you implemented in AR/SQL will locks lots of records for prolonged amount of time in a long running transaction. That would be terrible for performance.

@bessey
Copy link
Author

bessey commented Jan 15, 2024

Yep fair. The only purpose of the outer transaction though is to ensure no 2 eviction processes are running (and trying to move the pointer) at once.

With that in mind, an advisory lock would probably be a better option.

@djmb
Copy link
Collaborator

djmb commented Jan 22, 2024

Hi @bessey, thanks for the suggestion!

The SIEVE algorithm looks like its designed mainly for in-memory caches with a single writer and fixed size items. As such I'm not sure that would translate well to Solid Cache.

It would require:

  • Writing each entry when it is read to set the visited bit, so we need to add an UPDATE query to each cache write
  • Tracking the size of the cache so each write or delete would then involve atomically updating a counter. That would be hard to do performantly.
  • Potentially scanning lots of records on a cache miss
  • Co-ordinating between different processes that would be evicting at the same time.

Maybe there's an adaptation that would allow multiple processes to run at once, but I suspect it would no longer look so simple.

The bet with Solid Cache and FIFO is that the inferior algorithm doesn't matter so much when your cache is large and items are long lived. I'm not sure we'd see hit rate improvements similar to what they found in that paper. It would certainly need investigation first.

Finally by using FIFO and deleting records sequentially we reduce fragmentation in records and on disk. Reducing disk fragmentation means we can store more records and reducing record fragmentation allows us to quickly get a rough estimate for the number of cache entries by subtracting the max and min IDs.

@djmb djmb closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants