Skip to content

feature: Add Remote (HTTP(S)) Support for SQLite Databases #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ak2k
Copy link

@ak2k ak2k commented Jun 5, 2025

Remote (HTTP(S)) Support for SQLite Databases

This PR adds support for querying remote SQLite databases over HTTP/HTTPS to the SQLite scanner extension, leveraging DuckDB's CachingFileSystem.

Features

  • Query SQLite databases directly from HTTP/HTTPS URLs without downloading the entire file
  • Seamless integration with sqlite_scan() and ATTACH syntax
  • Adaptive read-ahead optimization (1MB-128MB) reduces network round trips

Implementation

  • SQLiteDuckDBCacheVFS: Custom SQLite VFS that delegates file I/O to DuckDB's CachingFileSystem
    • Integrates with DuckDB's external file cache for efficient block caching
  • DuckDBCachedFile: Wrapper around DuckDB's CachingFileHandle with adaptive read-ahead
    • Dynamically adjusts read-ahead size based on access patterns
  • Integration:
    • Leverages DuckDB's httpfs extension for HTTP client functionality

Usage

-- Load required extensions
LOAD sqlite_scanner;
INSTALL httpfs;
LOAD httpfs;

-- Query remote SQLite database
SELECT * FROM sqlite_scan('https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite', 'Artist');

-- Attach as a database  
ATTACH 'https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite' AS remote_db (TYPE sqlite);
SELECT * FROM remote_db.Album;

Related Issues: #39, #141

Copy link
Member

@Maxxen Maxxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!

Looks cool, but I wonder why/if this vfs is limited to http, why not just wrap DuckDBs regular filesystem? You can then let DuckDB handle the underlying abstraction and ensure sqlite can access files through all duckdb supported filesystems (e.g. compressed files, or regular files on WASM)

@ak2k
Copy link
Author

ak2k commented Jun 5, 2025

[Updated]

@Maxxen Thanks for the excellent suggestions! I've re-implemented based on your feedback.

Integrated with DuckDB's filesystem and caching infrastructure

  • Now wraps DuckDB's FileSystem API for all remote file access (HTTP/HTTPS via httpfs)
  • Uses DuckDB's CachingFileSystem which provides intelligent block caching
  • Cache is managed by DuckDB's CachingFileSystem infrastructure

Simplified concurrency model

  • Removed per-file mutexes - file operations are lock-free
  • Only uses mutex for VFS registry operations (registration/unregistration)
  • Relies on CachingFileSystem's internal synchronization for thread safety

Adaptive read-ahead optimization

  • Implements dynamic read-ahead (1MB-128MB) to reduce network round trips
  • Adjusts read size based on sequential vs random access patterns
  • Works on top of CachingFileSystem's block caching

The implementation is much cleaner now - it properly delegates to DuckDB's existing infrastructure rather than reimplementing caching logic. Thanks again for the great foundation and suggestions!

@ak2k ak2k force-pushed the feat/http-sqlite-support branch from a5fdad2 to da96c0b Compare June 5, 2025 14:54
@Alex-Monahan
Copy link

This is very cool! Does it work with hosted SQLite solutions like Turso?

@ak2k ak2k force-pushed the feat/http-sqlite-support branch 3 times, most recently from 277539a to 49f7aba Compare June 5, 2025 15:42
@ak2k
Copy link
Author

ak2k commented Jun 6, 2025

This is very cool! Does it work with hosted SQLite solutions like Turso?

Thank you @Alex-Monahan! I wasn't familiar with Turso, but it seems to implement a custom wire format rather than providing HTTP-like access to the stored file format, so regrettably, it likely wouldn't.

@ak2k ak2k requested a review from Maxxen June 6, 2025 12:19
@ak2k ak2k marked this pull request as draft June 8, 2025 19:35
@ak2k ak2k force-pushed the feat/http-sqlite-support branch 2 times, most recently from 6d1c0cf to 9cc202e Compare June 14, 2025 04:07
@ak2k ak2k marked this pull request as ready for review June 14, 2025 04:08
@ak2k ak2k force-pushed the feat/http-sqlite-support branch 5 times, most recently from 3ce1800 to 063a5bd Compare June 14, 2025 04:56
@ak2k ak2k changed the title feature: Add HTTP/HTTPS support for remote SQLite databases feature: Add Remote (HTTP(S)) Support for SQLite Databases Jun 14, 2025
@ak2k ak2k force-pushed the feat/http-sqlite-support branch from 063a5bd to cfd246b Compare June 14, 2025 05:10
ak2k added 5 commits June 27, 2025 12:11
Also adds yum support for sqlite3 installation in CI.
Opening SQLite connections for remote files can trigger HTTP requests while
the MetaTransaction lock is held, potentially causing deadlocks. This change
defers connection initialization until first use, with thread-safe lazy
initialization using atomic flags and per-database mutexes.
This commit adds support for querying remote SQLite databases over HTTP/HTTPS
using the sqlite_scan() function. Remote files are accessed through DuckDB's
custom VFS with caching support.

Implementation details:
- Add OpenWithVFS() to handle remote SQLite databases via custom VFS
- Implement helper methods for clean separation of local vs remote handling
- Register VFS cleanup callback to properly clean up on connection close
- Add comprehensive error handling with HTTP status code mapping
- Pass ClientContext through scanner to enable VFS functionality

The implementation uses DuckDB's CachingFileSystem for efficient block-level
caching of remote file data, minimizing network requests when accessing
SQLite databases over HTTP.
This commit completes the HTTP SQLite support by enabling ATTACH functionality
for remote databases and improving error handling throughout the codebase.

Key changes:
- Replace generic std::runtime_error with specific DuckDB exception types
  (BinderException, ConnectionException, IOException, InternalException)
- Update GetSQLiteTransaction to handle missing transactions gracefully
- Add validation for busy_timeout in SQLiteAttach to prevent overflow
- Implement proper move semantics using swap
- Add Copy/Equals methods to AttachFunctionData for proper state management
- Enable ATTACH functionality for remote SQLite databases
- Defer transaction start to prevent deadlocks with remote file access
@ak2k ak2k force-pushed the feat/http-sqlite-support branch from cfd246b to 5d4e241 Compare June 27, 2025 22:48
@ak2k
Copy link
Author

ak2k commented Jun 27, 2025

I've reorganized this PR into 5 logical commits to attempt to make the review process easier. Each commit is self-contained, compiles, and passes all tests:

Commit 1: Add ClientContext parameter to SQLiteDB::Open methods

  • Minimal API change to support future extensibility
  • Required for subsequent commits that need access to DuckDB's context

Commit 2: Fix potential deadlock when opening remote SQLite databases

  • Implements lazy initialization to prevent deadlocks during remote file access
  • Includes necessary validation logic for busy_timeout

Commit 3: Add SQLite VFS implementation for remote file support

  • Core VFS implementation that integrates with DuckDB's CachingFileSystem
  • Handles HTTP error mapping and adaptive read-ahead optimization
  • No user-facing changes yet

Commit 4: Enable HTTP/HTTPS SQLite database access via sqlite_scan

  • Activates HTTP/HTTPS support for sqlite_scan() function
  • Includes 7 working tests demonstrating basic functionality
  • ATTACH support intentionally deferred to next commit

Commit 5: Add HTTP ATTACH support and improve error handling

  • Completes the implementation with ATTACH functionality
  • Improves error handling throughout the PR
  • Adds remaining tests for complex queries

Let me know if you'd like me to adjust the commit structure or if you have any questions about specific changes or feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants