Is your feature request related to a problem? Please describe.
pw.io.mssql.read does not persist its position in any mode. When persistence is enabled, the connector immediately aborts with an error stating that persistence is not supported. Without persistence, on restart it re-reads everything from the beginning.
Describe the solution you'd like
Use the MS SQL Server WAL (Change Data Capture log) position as the persistence offset. Concretely, store the last successfully processed LSN (Log Sequence Number) in Pathway's persistence layer after each committed batch. On restart, resume CDC replication from that LSN.
If the saved LSN is no longer available on the server — because the CDC log retention window has expired and the corresponding log records have been purged — the connector must raise a descriptive error at startup, for example:
"Saved CDC offset (LSN ...) is no longer available on the MS SQL Server. The CDC log may have been purged. Manual re-snapshot or persistence reset is required."
Silently falling back to reading from the start or from the current position would produce incorrect results (duplicate emission or data loss respectively) and must not be done. The error path should have a code comment referencing this issue.
Describe alternatives you've considered
Storing a row-level timestamp or a row count as the offset is unreliable: timestamps are not guaranteed to be unique or monotonic across transactions, and row counts do not survive schema changes or deletes. The LSN is the only correct and stable resumption point for CDC-based connectors, and is the same approach used by pw.io.mongodb.read.
Additional context
MS SQL Server's CDC retention is controlled by a SQL Server Agent cleanup job (default retention: 3 days), rather than a replication slot that holds WAL on behalf of the consumer. This means the server does not guarantee log availability for slow or offline consumers — hence the hard error on missing LSN rather than any silent fallback.
Testing should extend the existing MS SQL Server integration test suite with: restart/resume tests that verify correct incremental delivery after a checkpoint, and a test that verifies the descriptive error is raised when the connector is restarted with an LSN that has been purged from the CDC tables.
Is your feature request related to a problem? Please describe.
pw.io.mssql.readdoes not persist its position in any mode. When persistence is enabled, the connector immediately aborts with an error stating that persistence is not supported. Without persistence, on restart it re-reads everything from the beginning.Describe the solution you'd like
Use the MS SQL Server WAL (Change Data Capture log) position as the persistence offset. Concretely, store the last successfully processed LSN (Log Sequence Number) in Pathway's persistence layer after each committed batch. On restart, resume CDC replication from that LSN.
If the saved LSN is no longer available on the server — because the CDC log retention window has expired and the corresponding log records have been purged — the connector must raise a descriptive error at startup, for example:
"Saved CDC offset (LSN ...) is no longer available on the MS SQL Server. The CDC log may have been purged. Manual re-snapshot or persistence reset is required."Silently falling back to reading from the start or from the current position would produce incorrect results (duplicate emission or data loss respectively) and must not be done. The error path should have a code comment referencing this issue.
Describe alternatives you've considered
Storing a row-level timestamp or a row count as the offset is unreliable: timestamps are not guaranteed to be unique or monotonic across transactions, and row counts do not survive schema changes or deletes. The LSN is the only correct and stable resumption point for CDC-based connectors, and is the same approach used by
pw.io.mongodb.read.Additional context
MS SQL Server's CDC retention is controlled by a SQL Server Agent cleanup job (default retention: 3 days), rather than a replication slot that holds WAL on behalf of the consumer. This means the server does not guarantee log availability for slow or offline consumers — hence the hard error on missing LSN rather than any silent fallback.
Testing should extend the existing MS SQL Server integration test suite with: restart/resume tests that verify correct incremental delivery after a checkpoint, and a test that verifies the descriptive error is raised when the connector is restarted with an LSN that has been purged from the CDC tables.