Single pool for multiple databases #806

Open

opened

on Sep 8, 2024

Context
We have a multi-tenant SaaS setup where each tenant has a database in a managed database cluster (Scaleway/DigitalOcean/...). Currently we are using a single cluster. The goal for us for using pgcat is that we can limit the amount of connections to our database cluster as the cluster has a limited amount of connections available.

Problem
As we have one cluster we essentially only need one pgcat pool, the only issue I'm bouncing against is that it is impossible for us to allow multiple different databases inside a single pool. And if I were to create separate pools for each tenant, then the implementation with pgcat has no benefit as we can't limit the connections as a while towards the database cluster. A simple example:

Max database cluster connections: 100
Tenant pools: 200
Pgcat connections to our cluster: 200

Potential solution
It would be awesome if we could allow a regex pattern for which databases should connect to a certain pool, in that way we could say that tenant-1 until tenant-50 should connect to pool 1 and tenant-51 until tenant-100 should connect to pool 2.

[general]
# ...

[pools]
[pools.postgres]
# ...

[pools.postgres.users.0]
# ...

[pools.postgres.shards.0]
servers = [["...", 5432, "primary"]]
database = "^tenant-(?:[1-9]|[1-4][0-9]|50)$"

Or is all this unneeded and am I missing something?

nealoke

Author

I just noticed that we can have a shard per tenant and that would fix the issue where we can't limit the connections overall but these are not possible to be added through the pgcat admin api which would mean that we need to manually add them to the pgcat.toml file and restart the service everytime a tenant gets created which seems like a hassle and will cause interuptions in service of tenants using the application.

magec

Collaborator

Hey, I just read the issue. I see you've already kind of found the answer. Although both things seem 'doable' there is no way of having 'dynamic db names' on pools neither adding new pool config through admin api (which is mostly for statistics and a few commands). Either way, you can leverage pgcat live configuration reloading functionality to achive what you want. The only piece left is a simple service that handles the config changes, rewrite the config using some kind of templating system and signal pgcat to reload the configuration, that would be the way I'd do it. Also, bear in mind that you have to execute SET SHARD TO 'SHARD_NUMBER'; before each query/transaction for this to work.

nealoke

Author

Thanks for the reply, we are working with Prisma so we can't really alter the SQL query in there so I'm affraid this won't be an option. If you have any other ideas that would be great, or other tools that could work as I'm not finding anything 🥲

magec

Collaborator

There is also a possibility, If you can 'annotate' with comments your query, pgcat can extract that information and do the sharding selection.

rvsoni

@nealoke

You can try the #855 fix
this would allow having a PGBouncer-type proxy behavior, and only on pool config would it allow connecting many databases; if the database is not pooled, it would be created dynamically on the fly and start serving.

Vad1mo

@nealoke @rvsoni, we have the same challenges, multiple tenants each has own DB with username/pwd, and multiple tenants share a DB cluster. We are currently looking to make it work with PgCat, PgBouncer or Supavisor. (in that order)

On top there are a few less known pooler (at least to me) odyssey
pgagroal
cf-pgbouncer

@rvsoni, why did you try to make it work with pgcat, hence your pr?

It seems there are quite a few options out there already.

rvsoni

Hello @Vad1mo

This fix works as you described: set one Pool on pgcat with Auth_quey config and make proxy = true (in my PR). This would dynamically create a pool of connections for each user/db on the same cluster.

From the client side, you need to maintain just only one pool to pgcat and keep login using your user/pass map to database (user name and database name must need to be same)

Ravi

nealoke

Author

@Vad1mo I'm not that familiar with all that is related to revers proxies and half of what the pgcat tries to do 😅. So I read your comment and also looked at the messages in the #855 and the PR but can't seem to understand how this would solve my issue or how I would even need to implement this.

I don't directly see the link to how I can map tenant-1 to tenant-50 to shard 0 for example.

rvsoni

Hello @Vad1mo, @nealoke

The most important point this discussion misses is the need for an authentication configuration on PgCat based on an auth_query.

You need to set a Pool on pgcat points to a Postgre cluster with auth method of auth_query;
on each new authentication, pgcat query postgres cluster pg_shadow table like auth_query="SELECT usename, passwd FROM pg_shadow WHERE usename='$1'.

This query returns a password from the PostgreSQL cluster for each tenant user you try to authenticate over the pgcat pool setting. This allows you to connect to many databases on one pgcat pool setting.

Refer https://github.com/rvsoni/pgcat/blob/main/pgcat.proxy.toml and https://hub.docker.com/r/rvsoni/pgcat docker image to test this dynamic connection pool creation for all tenant database,

szguoxz

So, you PR is not merged yet?
I thought pgcat is a drop-in replacement of pgbouncer. I am really suprised to find out it does not have this feature.

But I don't quite like your design, it's kindof hacky, I still like pgbouncer implementation. At least, the enduser does not need to mess around with username/databasename. What do you think?

Maybe also implement a default pool which will accept all different database names?

Calvin

Hello @Vad1mo, @nealoke

The most important point this discussion misses is the need for an authentication configuration on PgCat based on an auth_query.

You need to set a Pool on pgcat points to a Postgre cluster with auth method of auth_query; on each new authentication, pgcat query postgres cluster pg_shadow table like auth_query="SELECT usename, passwd FROM pg_shadow WHERE usename='$1'.

This query returns a password from the PostgreSQL cluster for each tenant user you try to authenticate over the pgcat pool setting. This allows you to connect to many databases on one pgcat pool setting.

Refer https://github.com/rvsoni/pgcat/blob/main/pgcat.proxy.toml and https://hub.docker.com/r/rvsoni/pgcat docker image to test this dynamic connection pool creation for all tenant database,

rvsoni

@Vad1mo

I implemented this fix in PgCat to handle opening many connections on the client application.
I am working on a Java application and using PGBouncer with an Auth_query based setup would open too many connections on the Java application too,

for example: 100 Tenant database, 10 Pool for tech tenant scenario,

The Java application connection pool would be set to 1000 connections for all 100 Active tenants. In my case, I had to support 30k to 50k active tenants. PGBouncer would also not be able to work on my case,

With PgCat, I get a single proxy database (same name as PgCat Pool name), so my Java app would need to manage the pool only with the proxy DB of the PgCat pool. Behind PgPool, I can open and manage as many databases as I require per tenant database.

PgCat uses bb8 rust lib to create a connection pool internally, https://docs.rs/bb8/latest/bb8/index.html

This fix helps me manage just a few connections on a Java application. Using the PgCat pool, I can switch to any tenant database connection on the fly. I can also use the same PgCat pool against many PostgreSQL Cluster setups, reducing my resource requirements.

With auth_query mode, PGBouncer would connect to one PostgreSQL cluster only,

Having 10 PostgreSQL clusters set up would need 10 PGBouncer,

rvsoni

@szguoxz
PGBouncer and a transparent Proxy setup, all tenant databases are directly visible to the client,

PgCat uses a proxy database (same as pool Name) and hides all the tenant databases behind it so that you can do a connection routine with pgcat for each resource requirement on the client side for opening and managing connections.

p-bakker

@nealoke I don't quite get the 'single pool for multiple databases' question: a pooler pools connections and connections in postgres are per (user &) database

So if each tenant has it's own database within a Postgres database server, then a pooler MUST have at least one pool per database (and more if you want o connect to the database with multiple users)

Sure, some poolers allow connections using app users, but then connect to a database using a shared db user, to multiple app users can share the same pool of connections to a database, but at a fundamental level, you still have a pool of connections to a single database for a single db user

The PR of @rvsoni seems to (just) be about:

dynamic pool creation, based on the details in the connection(string) from the app to the pooler, so in a multi-tenant setup you don't have to reconfigure the pooler (pgCat) everytime a new tenant is added
delegating the authentication to Postgres, so you don't have to configure all the users in the pooler either

@rvsoni is that a fair statement?

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Single pool for multiple databases #806

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Single pool for multiple databases #806

Description

Activity

nealoke commented on Sep 8, 2024

magec commented on Sep 12, 2024

nealoke commented on Sep 12, 2024

magec commented on Sep 16, 2024

rvsoni commented on Nov 17, 2024

Vad1mo commented on Jan 14, 2025

rvsoni commented on Jan 16, 2025

nealoke commented on Jan 24, 2025

rvsoni commented on Jan 25, 2025

szguoxz commented on May 5, 2025

rvsoni commented on May 19, 2025

rvsoni commented on May 19, 2025

p-bakker commented on Jul 14, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions