Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically delete shape(s) when a table is dropped #1733

Open
kevin-dp opened this issue Sep 17, 2024 · 3 comments
Open

Automatically delete shape(s) when a table is dropped #1733

kevin-dp opened this issue Sep 17, 2024 · 3 comments

Comments

@kevin-dp
Copy link
Contributor

When a table is dropped from Postgres, users currently need to manually delete all shapes that are defined on that table. This is needed to avoid any problems that would occur when recreating the shape (e.g. with a different schema or different data) as Electric would still contain outdated data for that table potentially even in the old (invalid) format.

Currently, Electric is subscribed to a Postgres publication that streams changes (DML statements) about all tables involved in one or more shapes. However, Postgres does not stream DDL statements (such as DROP TABLE), which currently prevents Electric from detecting dropped tables and hence Electric can't automatically delete the corresponding shapes.

The goal of this issue is to discuss/propose ways of detecting tables being dropped in order to automate shape deletion.

@thruflo proposed a very neat approach of using an event trigger that calls a special "purge" function when a table is dropped. That special function would write a row into a meta table, which will come through the replication stream. We could check for this special row on the replication stream to detect a table being dropped right in time. The only downside here is that event triggers require superuser access and don't work in single-user mode (e.g. which is the case of PGlite iirc).

@thruflo
Copy link
Contributor

thruflo commented Sep 17, 2024

Yup, I think the permission issue makes event triggers a bit of a non starter.

I was thinking more that we can install a user defined function and users could choose to call it in the migration where they do their drop. Because you know when you do a big drop table and we can provide a function that makes it easy to notify Electric about it.

@thruflo
Copy link
Contributor

thruflo commented Sep 17, 2024

Can we compare table identifiers (i.e.: internal table identities, not the reusable name):

a) at initial sync query time?
b) when rows come through logical replication?

I.e.: for (b) do we get a table ID in the row metadata? For (a) presumably we could validate the table still has the same internal identifier that we're expecting?

Now ... with (b) a dropped table won't yield any new rows. I don't know if there's something in this. Like could we detect a sequence having been reset somehow?

@kevin-dp
Copy link
Contributor Author

@thruflo When PG sends a Relation message on the stream it contains the relation's OID. So we could potentially detect any changes there, but.. PG only sends this relation if needed for a change it is going to stream. So, if you drop a table and recreate it, it would only send the relation when you insert a row in that new table. While that's not the case, we won't get the Relation message and we will still serve stale data. Also i doubt PG will send the relation message because i'm assuming that PG removes that table from the publication when the table is dropped (need to verify this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants