feat: JWT cache implementation based on sieve algorithm #4084

mkleczek · 2025-05-13T14:50:40Z

Draft version of JWT cache implementation based on https://cachemon.github.io/SIEVE-website/blog/2023/12/17/sieve-is-simpler-than-lru/ algorithm

mkleczek · 2025-05-14T08:31:36Z

This PR contains:

Refactoring and some cleanup of JWT handling code:
- Moved JWT parsing and validation to a separate module Auth.JWT
- Split JWT decoding, parsing and signature validation, claims validation into separate functions
- Instead of caching AuthResult cache decoded claims (which signature was verified). Validating claims and determining role is done after cache lookup
- Cleaned up API so that usage of it is simplified: lookupJwtCache cache key >>= parseClaims configJwtAud time
- handling of JwtCacheState initialization and updates of configuration is encapsulated in Auth.JwtCache module
Generic high performance (hopefully) scalable, dynamically resizeable cache implementation based on stm, stm-hamt and sieve algorithm. It also provides usage stats (ie. hit ratio, evictions count, size)

steve-chavez · 2025-05-14T21:12:52Z

Very interesting! 👀 👀

It also provides usage stats (ie. hit ratio, evictions count, size)

Maybe we could expose those as metrics. That would help with passing code coverage too, since it looks like it's detecting those functions as dead code:

postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:124:1: delete
postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:133:1: deleteIO
postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:136:1: resetIO
postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:142:1: accessStats
postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:148:1: evictionsCount

I also see the JWT loadtest failing on CI https://github.com/PostgREST/postgrest/actions/runs/15030111955?pr=4084:

Options "http://postgrest/authors_only": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Not sure what's going on there, but you should be able to reproduce it locally with postgrest-loadtest -k jwt.

mkleczek · 2025-05-15T04:26:29Z

postgrest-loadtest -k jwt gives me:

delaying data to/from PostgREST by 0ms
jwt": -c: line 1: unexpected EOF while looking for matching `"'

Not sure what the problem is - tried to find typos in my changes to env variables settings, main (0b3c8c9) has the same issue on my machine.

steve-chavez · 2025-05-15T05:44:41Z

postgrest-loadtest -k jwt gives me:
delaying data to/from PostgREST by 0ms
jwt": -c: line 1: unexpected EOF while looking for matching `"'

Weird, it's like you're using and old version of the nix tools and it's ignoring the -k jwt arg. On my machine I get:

$ postgrest-loadtest -k jwt
Created 50000 targets in ./test/load/gen_targets.http (0.79s)
...

## this is the ouput you're getting
$ postgrest-loadtest -k mixed # same as just "postgrest-loadtest"
delaying data to/from postgres by 0ms

Maybe try going out of nix-shell and running it again.

mkleczek · 2025-05-15T08:26:44Z

Yeah, that was it, thanks.

I have serious doubts about postgrest-loadtest -k jwt. When testing main on my machine (beefy MacBook) there is no difference in results between PGRST_JWT_CACHE_MAX_LIFETIME=86400 and PGRST_JWT_CACHE_MAX_LIFETIME=0. I guess that's because symmetric crypto is used to generate/validate tokens. That's probably too fast to be worth caching and show any difference between cached and not cached access. I think meaningful tests have to be based on asymmetric cryptography.

Nevertheless there indeed seems to be an issue with implementation in this PR: all percentiles (ie. median up to 99th) show better results than main but max is 30s (which causes timeouts and skews average response time). Looking at it right now.

steve-chavez · 2025-05-15T21:31:59Z

That's probably too fast to be worth caching and show any difference between cached and not cached access. I think meaningful tests have to be based on asymmetric cryptography.

Agree, we can change that.

When testing main on my machine (beefy MacBook) there is no difference in results between PGRST_JWT_CACHE_MAX_LIFETIME=86400 and PGRST_JWT_CACHE_MAX_LIFETIME=0.

Note that the jwt loadtest actually generates unique JWTs , it was mainly done to test the jwt decoding perf while also ensuring the JWT cache purging doesn't slow things down (#4034).

The "mixed" loadtest does have hardcoded JWTs, and the perf looks more or less maintained (or slightly better).

mkleczek · 2025-05-16T12:42:43Z

@steve-chavez @wolfgangwalther @taimoorzaeem

I am not sure what to do with outstanding test coverage issues. This is about delete and deleteIO not being used. And indeed - right now there is no explicit cache entry removal anywhere. These functions are meant to be used in case of claims validation errors.

The question is: how to handle invalid JWTs? There are several options:

Do not cache invalid JWTs at all
Do not cache JWTs that cannot be parsed or have invalid signature. Cache all other JWTs, also JWTs containing invalid claims.
Cache all token parsing and signature validation results - this is currently implemented option (ie. the type of values in the cache is Either Error (JSON.Object)
Cache all tokens that can be parsed (but possibly having invalid signature).

I've decided option three makes sense as it also speeds up repeated invalid token requests. OTOH - it opens up possibility of cache filling attacks using randomly generated tokens. It is disputable if that is a bigger problem than overloading CPU with eg. JWTs signed with random key.

It looks to me option 4 would be the best compromise because it would prevent filling up cache with random garbage but would offload signature validation. I haven't implemented it as I don't know how to split parsing and signature validation with JOSE.

WDYT?

wolfgangwalther · 2025-05-16T18:32:28Z

It looks to me option 4 would be the best compromise because it would prevent filling up cache with random garbage but would offload signature validation. I haven't implemented it as I don't know how to split parsing and signature validation with JOSE.

Parsing a JWT is really simple. Split on ., decode the first and second part as base64 - the result are two JSON objects, one the header, the other the payload.

The parser that is used for jose-jwt is here: https://hackage.haskell.org/package/jose-jwt-0.10.0/docs/src/Jose.Internal.Parser.html#jwt - it's internal, so I don't think you can use it.

But do we really need to? Parsing a JWT is really simple - but creating parseable tokens, with invalid signatures is just as easy. So I don't really see the additional value of option 4 vs option 3. If somebody wants to fill the cache, they can do so easily.

mkleczek · 2025-05-16T19:46:28Z

Very interesting! 👀 👀

It also provides usage stats (ie. hit ratio, evictions count, size)

Maybe we could expose those as metrics.

@steve-chavez

Done. Three new counters are provided:

total number of cache lookups
total number of cache hits
total number of cache evictions

mkleczek · 2025-05-17T06:54:07Z

But do we really need to? Parsing a JWT is really simple - but creating parseable tokens, with invalid signatures is just as easy. So I don't really see the additional value of option 4 vs option 3. If somebody wants to fill the cache, they can do so easily.

Indeed - any random bytes are a signature that consumes CPU to validate.

So that leaves us with the choice between caching negative results or caching only valid JWTs.

I've made cache implementation polymorphic over value computation monad so it is easy to change strategies. The latest commits introduce possibility of selecting one of two variants in PostgREST.Auth.JwtCache - it is a matter of changing cachingErrors to notCachingErrors in newJwtCache. For now I've left both variants and disabled unused local bindings warning to quiet linter. Once we decide which one is better we can remove the other one (or we can decide to make it configurable and leave both).

wolfgangwalther · 2025-05-17T15:29:08Z

So that leaves us with the choice between caching negative results or caching only valid JWTs.

I wonder whether we can cache all valid JWTs, but including expired ones? Aka when validation fails because of expiry, still cache. When validation fails because of something else, don't.

The assumption is, that:

In regular operation the only real failure case that can happen regularly is expiry. We should stay fast for that.
If somebody wants to overload the system, they will find ways - or put differently: Just because we cache or don't cache invalid JWTs, attacks are not impossible. You'll need a different layer of protection against that anyway.

mkleczek · 2025-05-17T16:00:29Z

So that leaves us with the choice between caching negative results or caching only valid JWTs.

I wonder whether we can cache all valid JWTs, but including expired ones? Aka when validation fails because of expiry, still cache. When validation fails because of something else, don't.

That’s exactly option 2. Which is currently implemented as notCachingErrors variant (switchable in JwtCache.init).

As described in the description of the PR (first comment) - we don’t cache AuthResults but raw parsed claims that are re-validated for each request.

wolfgangwalther · 2025-05-17T16:49:33Z

That’s exactly option 2.

Well, not exactly, but close enough I agree. We'd still cache those that fail, for example, and audience check or so. But that's totally fine, yes.

mkleczek · 2025-05-17T16:55:56Z

That’s exactly option 2.

Well, not exactly, but close enough I agree. We'd still cache those that fail, for example, and audience check or so. But that's totally fine, yes.

Right - it is not exactly the same.

The reason I decided to leave claims checking until after cache lookup are two-fold:

Time sensitive claims validation is dependent on well... time :) Not only exp but also for example nbf - it might become valid in the future.
On the other hand aud validation is configuration sensitive and I wanted to minimize the number of configuration options which require whole cache reset when changed.

And, of course, claims checking is very fast, so there is little sense in caching it.

wolfgangwalther · 2025-05-17T18:39:55Z

I wanted to minimize the number of configuration options which require whole cache reset when changed.

Yes, this makese a lot of sense!

So the only change that requires a reset is the change of secret, right?

mkleczek · 2025-05-17T19:19:12Z

I wanted to minimize the number of configuration options which require whole cache reset when changed.

Yes, this makese a lot of sense!

So the only change that requires a reset is the change of secret, right?

That, and - of course - turning off caching alltogether (ie. setting jwt-cache-max-size=0)

Changes: 1. Refactoring and some cleanup of JWT handling code: * Moved JWT parsing and validation to a separate module Auth.JWT * Split JWT decoding, parsing and signature validation, claims validation into separate functions * Instead of caching AuthResult cache decoded claims (which signature was verified). Validating claims and determining role is done after cache lookup * Cleaned up API so that usage of it is simplified: lookupJwtCache cache key >>= parseClaims configJwtAud time * Handling of JwtCacheState initialization and updates of configuration is encapsulated in Auth.JwtCache module 2. Generic high performance (hopefully) scalable, dynamically resizeable cache implementation based on stm, stm-hamt and sieve algorithm. It also integrates with PostgREST measurements infrastructure providing usage stats (ie. hit ratio, evictions count)

mkleczek · 2025-05-24T11:45:37Z

@steve-chavez @wolfgangwalther @taimoorzaeem

After some more work on this PR I think it is now in a mergeable state (pending documentation changes, changelog adjustments etc. - and of course code review).

I am pretty confident it is working fine as I've added JWT cache behavior tests that verify hits/misses and evictions using metrics.

Please, let me know if there are any adjustments / changes required (or if you think the whole idea is wrong).

taimoorzaeem · 2025-05-24T15:38:19Z

src/PostgREST/Config.hs

+--    <*> (fromMaybe 0 <$> optInt "jwt-cache-max-lifetime")
+    <*> (fromMaybe 0 <$> optInt "jwt-cache-max-size")


So the size in there seems bit confusing as it may be confused with size in memory. How about renaming the config to jwt-cache-max-entries? It seems more appropriate, no?

So the size in there seems bit confusing as it may be confused with size in memory. How about renaming the config to jwt-cache-max-entries? It seems more appropriate, no?

Fine for me. @steve-chavez WDYT?

Yes, agree with jwt-cache-max-entries.

Looked at other implementations, looks like "size" is used to mean max entries:

Max Entries in JWT Cache - org.forgerock.agents.jwt.cache.size

Not to say that we cannot be explicit in our config name. As an alternative, we could also do jwt-cache-max-length or maybe just jwt-cache-max?

steve-chavez · 2025-05-28T00:01:10Z

test/spec/Feature/Auth/JwtCacheSpec.hs

+    let auth = genToken [json|{"exp": 9999999999, "role": "postgrest_test_author", "id": "jdoe1"}|]
+
+    expectCounters
+      [
+        requests (+ 1)
+      , hits     (+ 0)
+      ] $


This is a really elegant way to test the metrics 💯 So easy to read!

test/spec/Feature/Auth/JwtCacheSpec.hs

steve-chavez · 2025-05-28T00:21:59Z

test/spec/Feature/Auth/JwtCacheSpec.hs

+         request methodGet "/authors_only" [jwt1] ""
+      *> request methodGet "/authors_only" [jwt2] ""
+      -- this one should hit the cache
+      *> request methodGet "/authors_only" [jwt1] ""
+      -- this one should trigger eviction of jwt2 (not FIFO)
+      *> request methodGet "/authors_only" [jwt3] ""
+      -- these two should hit the cache
+      *> request methodGet "/authors_only" [jwt1] ""
+      *> request methodGet "/authors_only" [jwt3] ""


An idea to better test the contents of the JWT cache, would be to add a /jwtcache endpoint to the Admin Server. That could print all the cached JWTs in order, possiblty decoded.

Not to say it should be done in this PR, can be separate.

steve-chavez · 2025-05-28T00:28:04Z

src/PostgREST/CLI.hs

+      |## Enables JWT Cache and sets its max size, disables caching with 0
+      |# jwt-cache-max-size = 0


I think we should make the default 1000. That means the cache will be enabled by default for next major.

1000 is a rough estimation, mentioned before on #3802 (comment)

I think we should make the default 1000. That means the cache will be enabled by default for next major.

#4084 (comment)

steve-chavez · 2025-05-28T00:39:30Z

postgrest.cabal

+                    , stm                       >= 2.5 && < 3
+                    , stm-hamt                  >= 1.2 && < 2
+                    , focus                     >= 1.0 && < 2
+                    , some                      >= 1.0.4.1 && < 2


Any concerns with the runtime overhead of some?

However, due to GHC issue #1965, the direct implementation of this datastructure is less efficient than it could be. As a result, this library uses a more complex approach that implements it as a newtype, so there's no runtime overhead associated with wrapping and unwrapping Some values.
https://github.com/haskellari/some

Is the dependency really needed?

The implementation is based on doubly linked list: each node has two TVars pointing to previous and next node. To simplify handling of empty cache there is a single empty "head" node that has a different type than "entry" nodes that are put in the hash map.
Both node types are defined as two constructors of a GADT along the lines of:

data ListNode (k :: Bool) where Head :: Node False Entry :: Node True

(The above is a simplification to illustrate the point)

So the TVars point either to an "entry" node or to "head" - to do that we need to use existentials and hence the TVars are of type TVar (Some ListNode).

We could define our own data type data OurSome f = forall (k :: Bool). OurSome (f k) but I found this library that simplifies this, provides some helpers and promises to use a newtype instead.

steve-chavez · 2025-05-28T03:23:10Z

Done. Three new counters are provided:
total number of cache lookups
total number of cache hits
total number of cache evictions

@mkleczek Awesome! How about a gauge for number of cached JWTs? I'm currently load testing the feature and that would help me ensure the cache size is maxed and that it goes down. This might be good for tests too?

I noticed that the previous JWT purge had a memory usage problem (#3889 (comment)) and this is now gone 🚀

I've recorded a video using postgrest-benchmark's OPTIONSUniqueJWT.js (this has the same logic as our jwt loadtest, but on dedicated hardware and using k6 instead of vegeta). Here you can see the memory reaches ~20% tops and then goes down to ~12%. Note: PostgREST is on a t3a.nano, so only has 0.5 GB.

Screencast.from.05-27-2025.09.56.23.PM.webm

Sharing the run results here for completeness:

     data_received..................: 51 MB  1.6 MB/s
     data_sent......................: 150 MB 4.8 MB/s
     http_req_blocked...............: avg=3.28µs  min=950ns    med=2.5µs    max=4.61ms  p(90)=3.17µs  p(95)=3.66µs 
     http_req_connecting............: avg=370ns   min=0s       med=0s       max=4.25ms  p(90)=0s      p(95)=0s     
   ✓ http_req_duration..............: avg=1.05ms  min=317.36µs med=961.44µs max=15.6ms  p(90)=1.55ms  p(95)=1.86ms 
       { expected_response:true }...: avg=1.05ms  min=317.36µs med=961.44µs max=15.6ms  p(90)=1.55ms  p(95)=1.86ms 
   ✓ http_req_failed................: 0.00%  ✓ 0           ✗ 254542
     http_req_receiving.............: avg=32.9µs  min=11.93µs  med=29.64µs  max=7.91ms  p(90)=45.02µs p(95)=49.7µs 
     http_req_sending...............: avg=16.81µs min=7.42µs   med=14.05µs  max=4.42ms  p(90)=28.89µs p(95)=32.53µs
     http_req_tls_handshaking.......: avg=0s      min=0s       med=0s       max=0s      p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=1ms     min=279.8µs  med=912.3µs  max=15.55ms p(90)=1.5ms   p(95)=1.8ms  
     http_reqs......................: 254542 8156.901769/s
     iteration_duration.............: avg=1.16ms  min=434.1µs  med=1.06ms   max=1.1s    p(90)=1.66ms  p(95)=1.97ms 
     iterations.....................: 254542 8156.901769/s
     vus............................: 10     min=0         max=10  
     vus_max........................: 10     min=10        max=10

Also the metrics after the run (counters are high because I did some previous runs):

curl localhost:3001/metrics
# HELP pgrst_jwt_cache_evictions_total The total number of JWT cache evictions
# TYPE pgrst_jwt_cache_evictions_total counter
pgrst_jwt_cache_evictions_total 184905.0
# HELP pgrst_jwt_cache_hits_total The total number of JWT cache hits
# TYPE pgrst_jwt_cache_hits_total counter
pgrst_jwt_cache_hits_total 2817.0
# HELP pgrst_jwt_cache_requests_total The total number of JWT cache lookups
# TYPE pgrst_jwt_cache_requests_total counter
pgrst_jwt_cache_requests_total 188758.0

@mkleczek I've noticed that the jwt loadtest results show a perf drop compared to main and latest version. Is that expected?

Co-authored-by: Steve Chavez <[email protected]>

mkleczek · 2025-05-28T05:32:28Z

@mkleczek I've noticed that the jwt loadtest results show a perf drop compared to main and latest version. Is that expected?

Our load test is the worst possible case for this (and I would say: any bounded) cache: all JWTs are different so no caching but:

additional two hash map lookups (one for cache miss, another to put entry into the cache)
if the cache is smaller than the number of requests then we pay the price of eviction and garbage collection
updating metrics

In other words - cache thrashing at its best :)

What's more: in case of symmetric JWT keys I don't think cache lookup is faster than simply performing JWT verification.

wolfgangwalther · 2025-05-28T07:30:58Z

What's more: in case of symmetric JWT keys I don't think cache lookup is faster than simply perform JWT verification.

This is also important for the default of enabling the cache or not. Can we default to enabling the cache only for asymmetric keys?

mkleczek · 2025-05-28T08:47:34Z

What's more: in case of symmetric JWT keys I don't think cache lookup is faster than simply perform JWT verification.

This is also important for the default of enabling the cache or not. Can we default to enabling the cache only for asymmetric keys?

Hmm... Not a bad idea - sensible self-configuration is much better than forcing the user to make decisions. On the other hand we don't have any cache sizing self-tuning, so the user has to configure this anyway.
It might be confusing to keep the cache turned off even though the size is configured to be > 0.

mkleczek · 2025-05-28T09:44:27Z

@steve-chavez @wolfgangwalther

Looks like combining the default cache size from #4084 (comment) with auto-configuration of the cache idea from #4084 (comment) allows us to have a pretty good user experience:

If max size is not configured, perform auto-configuration that turns on JWT cache of size 1000 but only if we have asymmetric keys in configuration.
Max size of <= 0 disables caching regardless of the configured keys.
Max size > 0 forces caching regardless of the configured keys.

Implemented in c6e73b1

mkleczek · 2025-05-28T09:45:59Z

What's more: in case of symmetric JWT keys I don't think cache lookup is faster than simply perform JWT verification.

This is also important for the default of enabling the cache or not. Can we default to enabling the cache only for asymmetric keys?

#4084 (comment)

wolfgangwalther · 2025-05-28T09:57:16Z

Our load test is the worst possible case for this (and I would say any bounded) cache: all JWTs are different so no caching

With the new default, we should change the JWT load test to use asymmetric keys - and then see how much perf we lose in that worst case.

mkleczek · 2025-05-28T13:30:11Z

@mkleczek Awesome! How about a gauge for number of cached JWTs? I'm currently load testing the feature and that would help me ensure the cache size is maxed and that it goes down. This might be good for tests too?

I thought about it but decided not to implement this and the reasoning is:

the cache is bounded and we aim for it to be always full (but big enough to have high hit rate) - so the actual number of entries is not really that interesting in this context
If no cache resets were done, you can calculate the size with the formula: requests - hits - evictions
updating yet another metric (and of questionable value) would eat up additional cycles

I noticed that the previous JWT purge had a memory usage problem (#3889 (comment)) and this is now gone 🚀

I've recorded a video using postgrest-benchmark's OPTIONSUniqueJWT.js (this has the same logic as our jwt loadtest, but on dedicated hardware and using k6 instead of vegeta). Here you can see the memory reaches ~20% tops and then goes down to ~12%. Note: PostgREST is on a t3a.nano, so only has 0.5 GB.

This is somewhat surprising - we do not remove from the cache explicitly right now, so it should be always full (hence occupying constant memory). Maybe what you see is GC effects?

steve-chavez · 2025-05-28T20:18:19Z

updating yet another metric (and of questionable value) would eat up additional cycles

Cool. Fair enough.

This is somewhat surprising - we do not remove from the cache explicitly right now, so it should be always full (hence occupying constant memory). Maybe what you see is GC effects?

Right. It looks like it's indeed GC as RSS and VSZ both decrease simultaneously as seen on:

Screencast.from.05-28-2025.02.12.05.PM.webm

I detected a problem with the previous JWT cache that this PR solves #4107.

* Changed postgrest-loadtest -k jwt to generate 1000 JWT signed by RSA 4096 key. * Added parameter --jwtcache=off to postgrest-loadtest to turn off JWT caching

mkleczek · 2025-05-29T07:47:47Z

Our load test is the worst possible case for this (and I would say any bounded) cache: all JWTs are different so no caching

With the new default, we should change the JWT load test to use asymmetric keys - and then see how much perf we lose in that worst case.

@wolfgangwalther @steve-chavez @taimoorzaeem
See 4d7ccc4

I've changed load test to use 1000 JWTs signed by RSA4096 key. Also added additional parameter --jwtcache=off to turn off JWT caching to compare results.

In case of asymmetric RSA, caching gives around 4x performance boost (all response times up to 95th percentile are 4x lower with caching turned on).

steve-chavez · 2025-05-29T19:18:31Z

I've changed load test to use 1000 JWTs signed by RSA4096 key. Also added additional parameter --jwtcache=off to turn off JWT caching to compare results.

@mkleczek Cool! Can you split those commits to another PR for easier review/merge?

wolfgangwalther

The commit history in the PR currently represents which steps you took while you were working on it. Now, we need to get it into a reviewable and mergeable shape. That's because we can't squash merge, but also not merge it as-is for a clean history.

So, it would be great if you could rebase to create a nice history of self-contained commits. At least with the following:

refactor for the split of auth module
refactor in the Metrics module
improving the jwt loadtest as Steve mentioned
... probably some more refactors that I just didn't spot, but which are independent of the main change ...
and finally the main change to change the JWT cache implementation

We'll need to slice this nicely, otherwise there is no chance to review this properly. I still left a few nits while skimming the code.

wolfgangwalther · 2025-05-30T12:42:10Z

src/PostgREST/Config.hs

@@ -97,7 +97,8 @@ data AppConfig = AppConfig
  , configJwtRoleClaimKey          :: JSPath
  , configJwtSecret                :: Maybe BS.ByteString
  , configJwtSecretIsBase64        :: Bool
-  , configJwtCacheMaxLifetime      :: Int
+--  , configJwtCacheMaxLifetime      :: Int


There are some left-overs in this file (more below)

test/spec/Main.hs

wolfgangwalther · 2025-05-30T12:47:30Z

src/PostgREST/Metrics.hs

-  poolTimeouts <- register $ counter (Info "pgrst_db_pool_timeouts_total" "The total number of pool connection timeouts")
-  poolAvailable <- register $ gauge (Info "pgrst_db_pool_available" "Available connections in the pool")
-  poolWaiting <- register $ gauge (Info "pgrst_db_pool_waiting" "Requests waiting to acquire a pool connection")
-  poolMaxSize <- register $ gauge (Info "pgrst_db_pool_max" "Max pool connections")
-  schemaCacheLoads <- register $ vector "status" $ counter (Info "pgrst_schema_cache_loads_total" "The total number of times the schema cache was loaded")
-  schemaCacheQueryTime <- register $ gauge (Info "pgrst_schema_cache_query_time_seconds" "The query time in seconds of the last schema cache load")
-  setGauge poolMaxSize (fromIntegral configDbPoolSize)
-  pure $ MetricsState poolTimeouts poolAvailable poolWaiting poolMaxSize schemaCacheLoads schemaCacheQueryTime
+  metricState <- MetricsState <$>
+    register (counter (Info "pgrst_db_pool_timeouts_total" "The total number of pool connection timeouts")) <*>
+    register (gauge (Info "pgrst_db_pool_available" "Available connections in the pool")) <*>
+    register (gauge (Info "pgrst_db_pool_waiting" "Requests waiting to acquire a pool connection")) <*>
+    register (gauge (Info "pgrst_db_pool_max" "Max pool connections")) <*>
+    register (vector "status" $ counter (Info "pgrst_schema_cache_loads_total" "The total number of times the schema cache was loaded")) <*>
+    register (gauge (Info "pgrst_schema_cache_query_time_seconds" "The query time in seconds of the last schema cache load")) <*>
+    register (counter (Info "pgrst_jwt_cache_requests_total" "The total number of JWT cache lookups")) <*>
+    register (counter (Info "pgrst_jwt_cache_hits_total" "The total number of JWT cache hits")) <*>
+    register (counter (Info "pgrst_jwt_cache_evictions_total" "The total number of JWT cache evictions"))
+  setGauge (poolMaxSize metricState) (fromIntegral configDbPoolSize)
+  pure metricState


Most of this change could be a separate refactor commit.

src/PostgREST/Metrics.hs

wolfgangwalther · 2025-05-30T12:49:08Z

src/PostgREST/Auth/Jwt.hs

This file should also be created in a separate refactor commit ahead of all the other changes.

postgrest.cabal

mkleczek · 2025-05-30T15:46:55Z

The commit history in the PR currently represents which steps you took while you were working on it. Now, we need to get it into a reviewable and mergeable shape. That's because we can't squash merge, but also not merge it as-is for a clean history.

So, it would be great if you could rebase to create a nice history of self-contained commits. At least with the following:

refactor for the split of auth module

The problem with this is that currently in main we don't really have a proper interface for caching and everything is intermingled. I thought it does not really make too much sense to perform the refactoring while leaving current JWT cache implementation.
Will think on how to do that without too much effort, if possible.

refactor in the Metrics module

+1

improving the jwt loadtest as Steve mentioned

+1

... probably some more refactors that I just didn't spot, but which are independent of the main change ...

and finally the main change to change the JWT cache implementation

See above.

We'll need to slice this nicely, otherwise there is no chance to review this properly. I still left a few nits while skimming the code.

Thanks for this review.

Co-authored-by: Wolfgang Walther <[email protected]>

mkleczek force-pushed the jwt-cache-sieve branch from 7eb9449 to 5b00cbb Compare May 14, 2025 05:20

mkleczek force-pushed the jwt-cache-sieve branch from 0ddd1af to 0d8e598 Compare May 15, 2025 04:28

mkleczek marked this pull request as ready for review May 16, 2025 09:09

mkleczek changed the title ~~feat: (Draft) JWT cache implementation based on sieve algorithm~~ feat: JWT cache implementation based on sieve algorithm May 16, 2025

mkleczek force-pushed the jwt-cache-sieve branch from 2c216f2 to 7b57fae Compare May 24, 2025 08:20

mkleczek added 3 commits May 24, 2025 13:25

JwtCacheSpec test module cleanups

3e8f7fb

Style fixes in JwtCacheSpec

e989c49

Pass evicted entry key and value to evictionListener

66daab3

mkleczek added 2 commits May 24, 2025 14:34

Added test case to verify finger passing head

df8626f

Added cases to test caching of invalid and expired tokens

97ffda9

taimoorzaeem reviewed May 24, 2025

View reviewed changes

Add reporting of what MeasureState field failed in JwtCacheSpec

34a8437

steve-chavez reviewed May 28, 2025

View reviewed changes

test/spec/Feature/Auth/JwtCacheSpec.hs Outdated Show resolved Hide resolved

steve-chavez reviewed May 28, 2025

View reviewed changes

Add explicit configuration description to test header in JwtCacheSpec.hs

67d6f22

Co-authored-by: Steve Chavez <[email protected]>

Implement JWT cache autoconfiguration

c6e73b1

Fixed IO tests after changes to jwt-cache-max-size configuration

a469bf4

Minor cosmetic changes in JwtCacheSpec

b97623f

steve-chavez mentioned this pull request May 28, 2025

test: JWT Cache Memory usage grows quickly #4107

Open

Make JWT loadtest use RSA 4096

4d7ccc4

* Changed postgrest-loadtest -k jwt to generate 1000 JWT signed by RSA 4096 key. * Added parameter --jwtcache=off to postgrest-loadtest to turn off JWT caching

mkleczek added 2 commits May 29, 2025 09:59

postgrest-style fixes in generate_targets.py

3b09bd2

Clean up leftover comments in generate_targets.py

ab1c859

wolfgangwalther reviewed May 30, 2025

View reviewed changes

postgrest.cabal Show resolved Hide resolved

mkleczek and others added 2 commits May 30, 2025 17:48

Missing space in src/PostgREST/Metrics.hs

5d6cf37

Co-authored-by: Wolfgang Walther <[email protected]>

Remove appSt from test/spec/Main.hs

7d2f5bc

Co-authored-by: Wolfgang Walther <[email protected]>

		-- <*> (fromMaybe 0 <$> optInt "jwt-cache-max-lifetime")
		<*> (fromMaybe 0 <$> optInt "jwt-cache-max-size")

		\|## Enables JWT Cache and sets its max size, disables caching with 0
		\|# jwt-cache-max-size = 0

Uh oh!

feat: JWT cache implementation based on sieve algorithm #4084

Are you sure you want to change the base?

feat: JWT cache implementation based on sieve algorithm #4084

Uh oh!

Conversation

mkleczek commented May 13, 2025

Uh oh!

mkleczek commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented May 14, 2025

Uh oh!

mkleczek commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented May 15, 2025

Uh oh!

mkleczek commented May 15, 2025

Uh oh!

steve-chavez commented May 15, 2025

Uh oh!

mkleczek commented May 16, 2025

Uh oh!

wolfgangwalther commented May 16, 2025

Uh oh!

mkleczek commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkleczek commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented May 17, 2025

Uh oh!

mkleczek commented May 17, 2025

Uh oh!

wolfgangwalther commented May 17, 2025

Uh oh!

mkleczek commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented May 17, 2025

Uh oh!

mkleczek commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkleczek commented May 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steve-chavez commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkleczek commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented May 28, 2025

Uh oh!

mkleczek commented May 28, 2025

mkleczek commented May 14, 2025 •

edited

Loading

mkleczek commented May 15, 2025 •

edited

Loading

mkleczek commented May 16, 2025 •

edited

Loading

mkleczek commented May 17, 2025 •

edited

Loading

mkleczek commented May 17, 2025 •

edited

Loading

mkleczek commented May 17, 2025 •

edited

Loading

steve-chavez commented May 28, 2025 •

edited

Loading

mkleczek commented May 28, 2025 •

edited

Loading

mkleczek commented May 29, 2025 •

edited

Loading