-
Notifications
You must be signed in to change notification settings - Fork 32
feature: Add Remote (HTTP(S)) Support for SQLite Databases #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello!
Looks cool, but I wonder why/if this vfs is limited to http, why not just wrap DuckDBs regular filesystem? You can then let DuckDB handle the underlying abstraction and ensure sqlite can access files through all duckdb supported filesystems (e.g. compressed files, or regular files on WASM)
[Updated] @Maxxen Thanks for the excellent suggestions! I've re-implemented based on your feedback. Integrated with DuckDB's filesystem and caching infrastructure
Simplified concurrency model
Adaptive read-ahead optimization
The implementation is much cleaner now - it properly delegates to DuckDB's existing infrastructure rather than reimplementing caching logic. Thanks again for the great foundation and suggestions! |
a5fdad2
to
da96c0b
Compare
This is very cool! Does it work with hosted SQLite solutions like Turso? |
277539a
to
49f7aba
Compare
Thank you @Alex-Monahan! I wasn't familiar with Turso, but it seems to implement a custom wire format rather than providing HTTP-like access to the stored file format, so regrettably, it likely wouldn't. |
6d1c0cf
to
9cc202e
Compare
3ce1800
to
063a5bd
Compare
063a5bd
to
cfd246b
Compare
Also adds yum support for sqlite3 installation in CI.
Opening SQLite connections for remote files can trigger HTTP requests while the MetaTransaction lock is held, potentially causing deadlocks. This change defers connection initialization until first use, with thread-safe lazy initialization using atomic flags and per-database mutexes.
This commit adds support for querying remote SQLite databases over HTTP/HTTPS using the sqlite_scan() function. Remote files are accessed through DuckDB's custom VFS with caching support. Implementation details: - Add OpenWithVFS() to handle remote SQLite databases via custom VFS - Implement helper methods for clean separation of local vs remote handling - Register VFS cleanup callback to properly clean up on connection close - Add comprehensive error handling with HTTP status code mapping - Pass ClientContext through scanner to enable VFS functionality The implementation uses DuckDB's CachingFileSystem for efficient block-level caching of remote file data, minimizing network requests when accessing SQLite databases over HTTP.
This commit completes the HTTP SQLite support by enabling ATTACH functionality for remote databases and improving error handling throughout the codebase. Key changes: - Replace generic std::runtime_error with specific DuckDB exception types (BinderException, ConnectionException, IOException, InternalException) - Update GetSQLiteTransaction to handle missing transactions gracefully - Add validation for busy_timeout in SQLiteAttach to prevent overflow - Implement proper move semantics using swap - Add Copy/Equals methods to AttachFunctionData for proper state management - Enable ATTACH functionality for remote SQLite databases - Defer transaction start to prevent deadlocks with remote file access
cfd246b
to
5d4e241
Compare
I've reorganized this PR into 5 logical commits to attempt to make the review process easier. Each commit is self-contained, compiles, and passes all tests: Commit 1: Add ClientContext parameter to SQLiteDB::Open methods
Commit 2: Fix potential deadlock when opening remote SQLite databases
Commit 3: Add SQLite VFS implementation for remote file support
Commit 4: Enable HTTP/HTTPS SQLite database access via sqlite_scan
Commit 5: Add HTTP ATTACH support and improve error handling
Let me know if you'd like me to adjust the commit structure or if you have any questions about specific changes or feedback. |
Remote (HTTP(S)) Support for SQLite Databases
This PR adds support for querying remote SQLite databases over HTTP/HTTPS to the SQLite scanner extension, leveraging DuckDB's CachingFileSystem.
Features
sqlite_scan()
andATTACH
syntaxImplementation
Usage
Related Issues: #39, #141