Skip to content

feat: JWT cache implementation based on sieve algorithm #4084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

mkleczek
Copy link
Contributor

Draft version of JWT cache implementation based on https://cachemon.github.io/SIEVE-website/blog/2023/12/17/sieve-is-simpler-than-lru/ algorithm

@mkleczek
Copy link
Contributor Author

mkleczek commented May 14, 2025

This PR contains:

  1. Refactoring and some cleanup of JWT handling code:
    • Moved JWT parsing and validation to a separate module Auth.JWT
    • Split JWT decoding, parsing and signature validation, claims validation into separate functions
    • Instead of caching AuthResult cache decoded claims (which signature was verified). Validating claims and determining role is done after cache lookup
    • Cleaned up API so that usage of it is simplified: lookupJwtCache cache key >>= parseClaims configJwtAud time
    • handling of JwtCacheState initialization and updates of configuration is encapsulated in Auth.JwtCache module
  2. Generic high performance (hopefully) scalable, dynamically resizeable cache implementation based on stm, stm-hamt and sieve algorithm. It also provides usage stats (ie. hit ratio, evictions count, size)

@steve-chavez
Copy link
Member

Very interesting! 👀 👀

It also provides usage stats (ie. hit ratio, evictions count, size)

Maybe we could expose those as metrics. That would help with passing code coverage too, since it looks like it's detecting those functions as dead code:

postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:124:1: delete
postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:133:1: deleteIO
postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:136:1: resetIO
postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:142:1: accessStats
postgrest-13.1-inplace: src/PostgREST/Cache/Sieve.hs:148:1: evictionsCount

I also see the JWT loadtest failing on CI https://github.com/PostgREST/postgrest/actions/runs/15030111955?pr=4084:

Options "http://postgrest/authors_only": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Not sure what's going on there, but you should be able to reproduce it locally with postgrest-loadtest -k jwt.

@mkleczek
Copy link
Contributor Author

mkleczek commented May 15, 2025

postgrest-loadtest -k jwt gives me:

delaying data to/from PostgREST by 0ms
jwt": -c: line 1: unexpected EOF while looking for matching `"'

Not sure what the problem is - tried to find typos in my changes to env variables settings, main (0b3c8c9) has the same issue on my machine.

@steve-chavez
Copy link
Member

postgrest-loadtest -k jwt gives me:
delaying data to/from PostgREST by 0ms
jwt": -c: line 1: unexpected EOF while looking for matching `"'

Weird, it's like you're using and old version of the nix tools and it's ignoring the -k jwt arg. On my machine I get:

$ postgrest-loadtest -k jwt
Created 50000 targets in ./test/load/gen_targets.http (0.79s)
...

## this is the ouput you're getting
$ postgrest-loadtest -k mixed # same as just "postgrest-loadtest"
delaying data to/from postgres by 0ms 

Maybe try going out of nix-shell and running it again.

@mkleczek
Copy link
Contributor Author

Yeah, that was it, thanks.

I have serious doubts about postgrest-loadtest -k jwt. When testing main on my machine (beefy MacBook) there is no difference in results between PGRST_JWT_CACHE_MAX_LIFETIME=86400 and PGRST_JWT_CACHE_MAX_LIFETIME=0. I guess that's because symmetric crypto is used to generate/validate tokens. That's probably too fast to be worth caching and show any difference between cached and not cached access. I think meaningful tests have to be based on asymmetric cryptography.

Nevertheless there indeed seems to be an issue with implementation in this PR: all percentiles (ie. median up to 99th) show better results than main but max is 30s (which causes timeouts and skews average response time). Looking at it right now.

@steve-chavez
Copy link
Member

That's probably too fast to be worth caching and show any difference between cached and not cached access. I think meaningful tests have to be based on asymmetric cryptography.

Agree, we can change that.

When testing main on my machine (beefy MacBook) there is no difference in results between PGRST_JWT_CACHE_MAX_LIFETIME=86400 and PGRST_JWT_CACHE_MAX_LIFETIME=0.

Note that the jwt loadtest actually generates unique JWTs , it was mainly done to test the jwt decoding perf while also ensuring the JWT cache purging doesn't slow things down (#4034).

The "mixed" loadtest does have hardcoded JWTs, and the perf looks more or less maintained (or slightly better).

@mkleczek mkleczek marked this pull request as ready for review May 16, 2025 09:09
@mkleczek mkleczek changed the title feat: (Draft) JWT cache implementation based on sieve algorithm feat: JWT cache implementation based on sieve algorithm May 16, 2025
@mkleczek
Copy link
Contributor Author

@steve-chavez @wolfgangwalther @taimoorzaeem

I am not sure what to do with outstanding test coverage issues. This is about delete and deleteIO not being used. And indeed - right now there is no explicit cache entry removal anywhere. These functions are meant to be used in case of claims validation errors.

The question is: how to handle invalid JWTs? There are several options:

  1. Do not cache invalid JWTs at all
  2. Do not cache JWTs that cannot be parsed or have invalid signature. Cache all other JWTs, also JWTs containing invalid claims.
  3. Cache all token parsing and signature validation results - this is currently implemented option (ie. the type of values in the cache is Either Error (JSON.Object)
  4. Cache all tokens that can be parsed (but possibly having invalid signature).

I've decided option three makes sense as it also speeds up repeated invalid token requests. OTOH - it opens up possibility of cache filling attacks using randomly generated tokens. It is disputable if that is a bigger problem than overloading CPU with eg. JWTs signed with random key.

It looks to me option 4 would be the best compromise because it would prevent filling up cache with random garbage but would offload signature validation. I haven't implemented it as I don't know how to split parsing and signature validation with JOSE.

WDYT?

@wolfgangwalther
Copy link
Member

It looks to me option 4 would be the best compromise because it would prevent filling up cache with random garbage but would offload signature validation. I haven't implemented it as I don't know how to split parsing and signature validation with JOSE.

Parsing a JWT is really simple. Split on ., decode the first and second part as base64 - the result are two JSON objects, one the header, the other the payload.

The parser that is used for jose-jwt is here: https://hackage.haskell.org/package/jose-jwt-0.10.0/docs/src/Jose.Internal.Parser.html#jwt - it's internal, so I don't think you can use it.

But do we really need to? Parsing a JWT is really simple - but creating parseable tokens, with invalid signatures is just as easy. So I don't really see the additional value of option 4 vs option 3. If somebody wants to fill the cache, they can do so easily.

@mkleczek
Copy link
Contributor Author

mkleczek commented May 16, 2025

Very interesting! 👀 👀

It also provides usage stats (ie. hit ratio, evictions count, size)

Maybe we could expose those as metrics.

@steve-chavez

Done. Three new counters are provided:

  • total number of cache lookups
  • total number of cache hits
  • total number of cache evictions

@mkleczek
Copy link
Contributor Author

mkleczek commented May 17, 2025

But do we really need to? Parsing a JWT is really simple - but creating parseable tokens, with invalid signatures is just as easy. So I don't really see the additional value of option 4 vs option 3. If somebody wants to fill the cache, they can do so easily.

Indeed - any random bytes are a signature that consumes CPU to validate.

So that leaves us with the choice between caching negative results or caching only valid JWTs.

I've made cache implementation polymorphic over value computation monad so it is easy to change strategies. The latest commits introduce possibility of selecting one of two variants in PostgREST.Auth.JwtCache - it is a matter of changing cachingErrors to notCachingErrors in newJwtCache. For now I've left both variants and disabled unused local bindings warning to quiet linter. Once we decide which one is better we can remove the other one (or we can decide to make it configurable and leave both).

@wolfgangwalther
Copy link
Member

So that leaves us with the choice between caching negative results or caching only valid JWTs.

I wonder whether we can cache all valid JWTs, but including expired ones? Aka when validation fails because of expiry, still cache. When validation fails because of something else, don't.

The assumption is, that:

  • In regular operation the only real failure case that can happen regularly is expiry. We should stay fast for that.
  • If somebody wants to overload the system, they will find ways - or put differently: Just because we cache or don't cache invalid JWTs, attacks are not impossible. You'll need a different layer of protection against that anyway.

@mkleczek
Copy link
Contributor Author

So that leaves us with the choice between caching negative results or caching only valid JWTs.

I wonder whether we can cache all valid JWTs, but including expired ones? Aka when validation fails because of expiry, still cache. When validation fails because of something else, don't.

That’s exactly option 2. Which is currently implemented as notCachingErrors variant (switchable in JwtCache.init).

As described in the description of the PR (first comment) - we don’t cache AuthResults but raw parsed claims that are re-validated for each request.

@wolfgangwalther
Copy link
Member

That’s exactly option 2.

Well, not exactly, but close enough I agree. We'd still cache those that fail, for example, and audience check or so. But that's totally fine, yes.

@mkleczek
Copy link
Contributor Author

mkleczek commented May 17, 2025

That’s exactly option 2.

Well, not exactly, but close enough I agree. We'd still cache those that fail, for example, and audience check or so. But that's totally fine, yes.

Right - it is not exactly the same.

The reason I decided to leave claims checking until after cache lookup are two-fold:

  1. Time sensitive claims validation is dependent on well... time :) Not only exp but also for example nbf - it might become valid in the future.
  2. On the other hand aud validation is configuration sensitive and I wanted to minimize the number of configuration options which require whole cache reset when changed.

And, of course, claims checking is very fast, so there is little sense in caching it.

@wolfgangwalther
Copy link
Member

I wanted to minimize the number of configuration options which require whole cache reset when changed.

Yes, this makese a lot of sense!

So the only change that requires a reset is the change of secret, right?

@mkleczek
Copy link
Contributor Author

mkleczek commented May 17, 2025

I wanted to minimize the number of configuration options which require whole cache reset when changed.

Yes, this makese a lot of sense!

So the only change that requires a reset is the change of secret, right?

That, and - of course - turning off caching alltogether (ie. setting jwt-cache-max-size=0)

Changes:

1. Refactoring and some cleanup of JWT handling code:
* Moved JWT parsing and validation to a separate module Auth.JWT
* Split JWT decoding, parsing and signature validation, claims validation into separate functions
* Instead of caching AuthResult cache decoded claims (which signature was verified). Validating claims and determining role is done after cache lookup
* Cleaned up API so that usage of it is simplified: lookupJwtCache cache key >>= parseClaims configJwtAud time
* Handling of JwtCacheState initialization and updates of configuration is encapsulated in Auth.JwtCache module

2. Generic high performance (hopefully) scalable, dynamically resizeable cache implementation based on stm, stm-hamt and sieve algorithm. It also integrates with PostgREST measurements infrastructure providing usage stats (ie. hit ratio, evictions count)
@mkleczek
Copy link
Contributor Author

@steve-chavez @wolfgangwalther @taimoorzaeem

After some more work on this PR I think it is now in a mergeable state (pending documentation changes, changelog adjustments etc. - and of course code review).

I am pretty confident it is working fine as I've added JWT cache behavior tests that verify hits/misses and evictions using metrics.

Please, let me know if there are any adjustments / changes required (or if you think the whole idea is wrong).

Comment on lines 292 to 293
-- <*> (fromMaybe 0 <$> optInt "jwt-cache-max-lifetime")
<*> (fromMaybe 0 <$> optInt "jwt-cache-max-size")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the size in there seems bit confusing as it may be confused with size in memory. How about renaming the config to jwt-cache-max-entries? It seems more appropriate, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the size in there seems bit confusing as it may be confused with size in memory. How about renaming the config to jwt-cache-max-entries? It seems more appropriate, no?

Fine for me. @steve-chavez WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree with jwt-cache-max-entries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at other implementations, looks like "size" is used to mean max entries:

Max Entries in JWT Cache - org.forgerock.agents.jwt.cache.size

Not to say that we cannot be explicit in our config name. As an alternative, we could also do jwt-cache-max-length or maybe just jwt-cache-max?

Comment on lines +31 to +37
let auth = genToken [json|{"exp": 9999999999, "role": "postgrest_test_author", "id": "jdoe1"}|]

expectCounters
[
requests (+ 1)
, hits (+ 0)
] $
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really elegant way to test the metrics 💯 So easy to read!

Comment on lines +108 to +116
request methodGet "/authors_only" [jwt1] ""
*> request methodGet "/authors_only" [jwt2] ""
-- this one should hit the cache
*> request methodGet "/authors_only" [jwt1] ""
-- this one should trigger eviction of jwt2 (not FIFO)
*> request methodGet "/authors_only" [jwt3] ""
-- these two should hit the cache
*> request methodGet "/authors_only" [jwt1] ""
*> request methodGet "/authors_only" [jwt3] ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea to better test the contents of the JWT cache, would be to add a /jwtcache endpoint to the Admin Server. That could print all the cached JWTs in order, possiblty decoded.

Not to say it should be done in this PR, can be separate.

Comment on lines +206 to +207
|## Enables JWT Cache and sets its max size, disables caching with 0
|# jwt-cache-max-size = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make the default 1000. That means the cache will be enabled by default for next major.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 is a rough estimation, mentioned before on #3802 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make the default 1000. That means the cache will be enabled by default for next major.

#4084 (comment)

, stm >= 2.5 && < 3
, stm-hamt >= 1.2 && < 2
, focus >= 1.0 && < 2
, some >= 1.0.4.1 && < 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concerns with the runtime overhead of some?

However, due to GHC issue #1965, the direct implementation of this datastructure is less efficient than it could be. As a result, this library uses a more complex approach that implements it as a newtype, so there's no runtime overhead associated with wrapping and unwrapping Some values.
https://github.com/haskellari/some

Is the dependency really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is based on doubly linked list: each node has two TVars pointing to previous and next node. To simplify handling of empty cache there is a single empty "head" node that has a different type than "entry" nodes that are put in the hash map.
Both node types are defined as two constructors of a GADT along the lines of:

data ListNode (k :: Bool) where
  Head :: Node False
  Entry :: Node True

(The above is a simplification to illustrate the point)

So the TVars point either to an "entry" node or to "head" - to do that we need to use existentials and hence the TVars are of type TVar (Some ListNode).

We could define our own data type data OurSome f = forall (k :: Bool). OurSome (f k) but I found this library that simplifies this, provides some helpers and promises to use a newtype instead.

@steve-chavez
Copy link
Member

steve-chavez commented May 28, 2025

Done. Three new counters are provided:
total number of cache lookups
total number of cache hits
total number of cache evictions

@mkleczek Awesome! How about a gauge for number of cached JWTs? I'm currently load testing the feature and that would help me ensure the cache size is maxed and that it goes down. This might be good for tests too?


I noticed that the previous JWT purge had a memory usage problem (#3889 (comment)) and this is now gone 🚀

I've recorded a video using postgrest-benchmark's OPTIONSUniqueJWT.js (this has the same logic as our jwt loadtest, but on dedicated hardware and using k6 instead of vegeta). Here you can see the memory reaches ~20% tops and then goes down to ~12%. Note: PostgREST is on a t3a.nano, so only has 0.5 GB.

Screencast.from.05-27-2025.09.56.23.PM.webm

Sharing the run results here for completeness:

     data_received..................: 51 MB  1.6 MB/s
     data_sent......................: 150 MB 4.8 MB/s
     http_req_blocked...............: avg=3.28µs  min=950ns    med=2.5µs    max=4.61ms  p(90)=3.17µs  p(95)=3.66µs 
     http_req_connecting............: avg=370ns   min=0s       med=0s       max=4.25ms  p(90)=0s      p(95)=0s     
   ✓ http_req_duration..............: avg=1.05ms  min=317.36µs med=961.44µs max=15.6ms  p(90)=1.55ms  p(95)=1.86ms 
       { expected_response:true }...: avg=1.05ms  min=317.36µs med=961.44µs max=15.6ms  p(90)=1.55ms  p(95)=1.86ms 
   ✓ http_req_failed................: 0.00%  ✓ 0           ✗ 254542
     http_req_receiving.............: avg=32.9µs  min=11.93µs  med=29.64µs  max=7.91ms  p(90)=45.02µs p(95)=49.7µs 
     http_req_sending...............: avg=16.81µs min=7.42µs   med=14.05µs  max=4.42ms  p(90)=28.89µs p(95)=32.53µs
     http_req_tls_handshaking.......: avg=0s      min=0s       med=0s       max=0s      p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=1ms     min=279.8µs  med=912.3µs  max=15.55ms p(90)=1.5ms   p(95)=1.8ms  
     http_reqs......................: 254542 8156.901769/s
     iteration_duration.............: avg=1.16ms  min=434.1µs  med=1.06ms   max=1.1s    p(90)=1.66ms  p(95)=1.97ms 
     iterations.....................: 254542 8156.901769/s
     vus............................: 10     min=0         max=10  
     vus_max........................: 10     min=10        max=10  

Also the metrics after the run (counters are high because I did some previous runs):

curl localhost:3001/metrics
# HELP pgrst_jwt_cache_evictions_total The total number of JWT cache evictions
# TYPE pgrst_jwt_cache_evictions_total counter
pgrst_jwt_cache_evictions_total 184905.0
# HELP pgrst_jwt_cache_hits_total The total number of JWT cache hits
# TYPE pgrst_jwt_cache_hits_total counter
pgrst_jwt_cache_hits_total 2817.0
# HELP pgrst_jwt_cache_requests_total The total number of JWT cache lookups
# TYPE pgrst_jwt_cache_requests_total counter
pgrst_jwt_cache_requests_total 188758.0

@mkleczek I've noticed that the jwt loadtest results show a perf drop compared to main and latest version. Is that expected?

@mkleczek
Copy link
Contributor Author

mkleczek commented May 28, 2025

@mkleczek I've noticed that the jwt loadtest results show a perf drop compared to main and latest version. Is that expected?

Our load test is the worst possible case for this (and I would say: any bounded) cache: all JWTs are different so no caching but:

  • additional two hash map lookups (one for cache miss, another to put entry into the cache)
  • if the cache is smaller than the number of requests then we pay the price of eviction and garbage collection
  • updating metrics

In other words - cache thrashing at its best :)

What's more: in case of symmetric JWT keys I don't think cache lookup is faster than simply performing JWT verification.

@wolfgangwalther
Copy link
Member

What's more: in case of symmetric JWT keys I don't think cache lookup is faster than simply perform JWT verification.

This is also important for the default of enabling the cache or not. Can we default to enabling the cache only for asymmetric keys?

@mkleczek
Copy link
Contributor Author

What's more: in case of symmetric JWT keys I don't think cache lookup is faster than simply perform JWT verification.

This is also important for the default of enabling the cache or not. Can we default to enabling the cache only for asymmetric keys?

Hmm... Not a bad idea - sensible self-configuration is much better than forcing the user to make decisions. On the other hand we don't have any cache sizing self-tuning, so the user has to configure this anyway.
It might be confusing to keep the cache turned off even though the size is configured to be > 0.

@mkleczek
Copy link
Contributor Author

@steve-chavez @wolfgangwalther

Looks like combining the default cache size from #4084 (comment) with auto-configuration of the cache idea from #4084 (comment) allows us to have a pretty good user experience:

  1. If max size is not configured, perform auto-configuration that turns on JWT cache of size 1000 but only if we have asymmetric keys in configuration.
  2. Max size of <= 0 disables caching regardless of the configured keys.
  3. Max size > 0 forces caching regardless of the configured keys.

Implemented in c6e73b1

@mkleczek
Copy link
Contributor Author

What's more: in case of symmetric JWT keys I don't think cache lookup is faster than simply perform JWT verification.

This is also important for the default of enabling the cache or not. Can we default to enabling the cache only for asymmetric keys?

#4084 (comment)

@wolfgangwalther
Copy link
Member

Our load test is the worst possible case for this (and I would say any bounded) cache: all JWTs are different so no caching

With the new default, we should change the JWT load test to use asymmetric keys - and then see how much perf we lose in that worst case.

@mkleczek
Copy link
Contributor Author

@mkleczek Awesome! How about a gauge for number of cached JWTs? I'm currently load testing the feature and that would help me ensure the cache size is maxed and that it goes down. This might be good for tests too?

I thought about it but decided not to implement this and the reasoning is:

  • the cache is bounded and we aim for it to be always full (but big enough to have high hit rate) - so the actual number of entries is not really that interesting in this context
  • If no cache resets were done, you can calculate the size with the formula: requests - hits - evictions
  • updating yet another metric (and of questionable value) would eat up additional cycles

I noticed that the previous JWT purge had a memory usage problem (#3889 (comment)) and this is now gone 🚀

I've recorded a video using postgrest-benchmark's OPTIONSUniqueJWT.js (this has the same logic as our jwt loadtest, but on dedicated hardware and using k6 instead of vegeta). Here you can see the memory reaches ~20% tops and then goes down to ~12%. Note: PostgREST is on a t3a.nano, so only has 0.5 GB.

This is somewhat surprising - we do not remove from the cache explicitly right now, so it should be always full (hence occupying constant memory). Maybe what you see is GC effects?

@steve-chavez
Copy link
Member

updating yet another metric (and of questionable value) would eat up additional cycles

Cool. Fair enough.

This is somewhat surprising - we do not remove from the cache explicitly right now, so it should be always full (hence occupying constant memory). Maybe what you see is GC effects?

Right. It looks like it's indeed GC as RSS and VSZ both decrease simultaneously as seen on:

Screencast.from.05-28-2025.02.12.05.PM.webm

I detected a problem with the previous JWT cache that this PR solves #4107.

* Changed postgrest-loadtest -k jwt to generate 1000 JWT signed by RSA 4096 key.

* Added parameter --jwtcache=off to postgrest-loadtest to turn off JWT caching
@mkleczek
Copy link
Contributor Author

mkleczek commented May 29, 2025

Our load test is the worst possible case for this (and I would say any bounded) cache: all JWTs are different so no caching

With the new default, we should change the JWT load test to use asymmetric keys - and then see how much perf we lose in that worst case.

@wolfgangwalther @steve-chavez @taimoorzaeem
See 4d7ccc4

I've changed load test to use 1000 JWTs signed by RSA4096 key. Also added additional parameter --jwtcache=off to turn off JWT caching to compare results.

In case of asymmetric RSA, caching gives around 4x performance boost (all response times up to 95th percentile are 4x lower with caching turned on).

@steve-chavez
Copy link
Member

I've changed load test to use 1000 JWTs signed by RSA4096 key. Also added additional parameter --jwtcache=off to turn off JWT caching to compare results.

@mkleczek Cool! Can you split those commits to another PR for easier review/merge?

Copy link
Member

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit history in the PR currently represents which steps you took while you were working on it. Now, we need to get it into a reviewable and mergeable shape. That's because we can't squash merge, but also not merge it as-is for a clean history.

So, it would be great if you could rebase to create a nice history of self-contained commits. At least with the following:

  • refactor for the split of auth module
  • refactor in the Metrics module
  • improving the jwt loadtest as Steve mentioned
  • ... probably some more refactors that I just didn't spot, but which are independent of the main change ...
  • and finally the main change to change the JWT cache implementation

We'll need to slice this nicely, otherwise there is no chance to review this properly. I still left a few nits while skimming the code.

@@ -97,7 +97,8 @@ data AppConfig = AppConfig
, configJwtRoleClaimKey :: JSPath
, configJwtSecret :: Maybe BS.ByteString
, configJwtSecretIsBase64 :: Bool
, configJwtCacheMaxLifetime :: Int
-- , configJwtCacheMaxLifetime :: Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some left-overs in this file (more below)

Comment on lines -26 to +48
poolTimeouts <- register $ counter (Info "pgrst_db_pool_timeouts_total" "The total number of pool connection timeouts")
poolAvailable <- register $ gauge (Info "pgrst_db_pool_available" "Available connections in the pool")
poolWaiting <- register $ gauge (Info "pgrst_db_pool_waiting" "Requests waiting to acquire a pool connection")
poolMaxSize <- register $ gauge (Info "pgrst_db_pool_max" "Max pool connections")
schemaCacheLoads <- register $ vector "status" $ counter (Info "pgrst_schema_cache_loads_total" "The total number of times the schema cache was loaded")
schemaCacheQueryTime <- register $ gauge (Info "pgrst_schema_cache_query_time_seconds" "The query time in seconds of the last schema cache load")
setGauge poolMaxSize (fromIntegral configDbPoolSize)
pure $ MetricsState poolTimeouts poolAvailable poolWaiting poolMaxSize schemaCacheLoads schemaCacheQueryTime
metricState <- MetricsState <$>
register (counter (Info "pgrst_db_pool_timeouts_total" "The total number of pool connection timeouts")) <*>
register (gauge (Info "pgrst_db_pool_available" "Available connections in the pool")) <*>
register (gauge (Info "pgrst_db_pool_waiting" "Requests waiting to acquire a pool connection")) <*>
register (gauge (Info "pgrst_db_pool_max" "Max pool connections")) <*>
register (vector "status" $ counter (Info "pgrst_schema_cache_loads_total" "The total number of times the schema cache was loaded")) <*>
register (gauge (Info "pgrst_schema_cache_query_time_seconds" "The query time in seconds of the last schema cache load")) <*>
register (counter (Info "pgrst_jwt_cache_requests_total" "The total number of JWT cache lookups")) <*>
register (counter (Info "pgrst_jwt_cache_hits_total" "The total number of JWT cache hits")) <*>
register (counter (Info "pgrst_jwt_cache_evictions_total" "The total number of JWT cache evictions"))
setGauge (poolMaxSize metricState) (fromIntegral configDbPoolSize)
pure metricState
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this change could be a separate refactor commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should also be created in a separate refactor commit ahead of all the other changes.

@mkleczek
Copy link
Contributor Author

The commit history in the PR currently represents which steps you took while you were working on it. Now, we need to get it into a reviewable and mergeable shape. That's because we can't squash merge, but also not merge it as-is for a clean history.

So, it would be great if you could rebase to create a nice history of self-contained commits. At least with the following:

  • refactor for the split of auth module

The problem with this is that currently in main we don't really have a proper interface for caching and everything is intermingled. I thought it does not really make too much sense to perform the refactoring while leaving current JWT cache implementation.
Will think on how to do that without too much effort, if possible.

  • refactor in the Metrics module

+1

  • improving the jwt loadtest as Steve mentioned

+1

  • ... probably some more refactors that I just didn't spot, but which are independent of the main change ...
  • and finally the main change to change the JWT cache implementation

See above.

We'll need to slice this nicely, otherwise there is no chance to review this properly. I still left a few nits while skimming the code.

Thanks for this review.

mkleczek and others added 2 commits May 30, 2025 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants