Floci Duck

A lightweight DuckDB-powered executor sidecar for Floci. It provides an HTTP interface to execute SQL queries using DuckDB, with built-in support for S3-compatible storage via the httpfs extension.

Features

DuckDB Integration: Leverages the high-performance DuckDB engine.
S3 Support: Built-in integration with S3-compatible storage.
Firehose Mode: Direct SQL execution for DDL/DML operations.
Athena Mode: Automated query result export to S3 in CSV format (similar to AWS Athena).
S3 Select Mode: Execute a query and return rows as JSON via /query.
Parquet Support: Read and write Parquet files directly from/to S3.
Correlation ID Tracing: Every request is tagged with a correlation ID propagated through all log lines.
Lightweight: Written in Rust for minimal overhead and high performance.

Getting Started

Prerequisites

Docker and Docker Compose
Alternatively, Rust for local development.

Running with Docker Compose

The easiest way to get started is using the provided docker-compose.yml, which spins up both Floci and Floci Duck.

docker-compose up --build

The executor will be available at http://localhost:3000.

Local Build and Run

cargo build --release
./target/release/floci-duck

Startup behaviour

On startup, floci-duck runs a preflight check that installs the httpfs DuckDB extension to local disk. This ensures every subsequent request can load the extension instantly without a network download.

INFO floci_duck::executor: Preflight: installing httpfs extension...
INFO floci_duck::executor: Preflight: httpfs installed successfully
INFO floci_duck: Starting floci-duck sidecar on 0.0.0.0:3000

If the preflight fails (e.g. no network on first boot), a warning is logged and the extension is installed on the first request that needs it.

API Reference

GET `/health`

Returns 200 OK with body OK. Used for liveness checks.

POST `/query`

Executes a SQL query and returns the result rows as a JSON array. Useful for reading data from S3 (CSV, Parquet, JSON) or running analytical queries inline.

Request Body

Note: S3 credentials are optional if the corresponding FLOCI_DUCK_S3_* environment variables are set.

Field	Type	Description
`sql`	String	The SQL query to execute.
`s3_endpoint`	String	S3-compatible endpoint (e.g., `http://floci:4566`).
`s3_region`	String (Optional)	S3 region. Defaults to `FLOCI_DUCK_S3_REGION` or `us-east-1`.
`s3_access_key`	String (Optional)	S3 access key ID.
`s3_secret_key`	String (Optional)	S3 secret access key.
`s3_use_ssl`	Boolean (Optional)	Use SSL. Auto-detected from the endpoint scheme if omitted.
`s3_url_style`	String (Optional)	`path` or `vhost`. Default: `path`.
`setup_sql`	String (Optional)	SQL executed before the main query — use it to create views, temp tables, or load extensions.

Response Body

{
  "status": "success",
  "rows": [
    { "id": 1, "name": "Alice", "amount": 99.5 },
    { "id": 2, "name": "Bob",   "amount": 150.0 }
  ]
}

On error, status is "error" and message contains the details. The rows field is omitted on error.

Example: Read a Parquet file from S3

{
  "sql": "SELECT * FROM 's3://my-bucket/data/sales.parquet' WHERE amount > 100",
  "s3_endpoint": "http://floci:4566"
}

Example: Query with `setup_sql`

Use setup_sql to define a view or register a data source before the main query runs. Both statements execute within the same DuckDB session.

{
  "sql": "SELECT region, SUM(amount) AS total FROM sales GROUP BY region",
  "s3_endpoint": "http://floci:4566",
  "setup_sql": "CREATE VIEW sales AS SELECT * FROM 's3://my-bucket/data/sales.parquet';"
}

POST `/execute`

Executes a SQL statement with no row output (fire-and-forget). Supports two modes:

Firehose mode — runs any SQL directly (DDL, DML, COPY, etc.)
Athena mode — wraps the SQL in a COPY … TO … FORMAT CSV and writes results to S3

Request Body

Note: S3 credentials are optional if the corresponding FLOCI_DUCK_S3_* environment variables are set.

Field	Type	Description
`sql`	String	The SQL statement to execute.
`s3_endpoint`	String	S3-compatible endpoint (e.g., `http://floci:4566`).
`s3_region`	String (Optional)	S3 region. Defaults to `FLOCI_DUCK_S3_REGION` or `us-east-1`.
`s3_access_key`	String (Optional)	S3 access key ID.
`s3_secret_key`	String (Optional)	S3 secret access key.
`s3_use_ssl`	Boolean (Optional)	Use SSL. Auto-detected from the endpoint scheme if omitted.
`s3_url_style`	String (Optional)	`path` or `vhost`. Default: `path`.
`output_s3_path`	String (Optional)	If provided, enables Athena Mode — results are exported to this S3 path as CSV.
`variables`	Map (Optional)	Key-value pairs substituted into the SQL as `{{key}}` placeholders.

Response Body

{ "status": "success" }

On error, status is "error" and message contains the details.

Example: Firehose Mode (Direct SQL)

{
  "sql": "CREATE TABLE users AS SELECT * FROM read_csv_auto('s3://bucket/data.csv');",
  "s3_endpoint": "http://floci:4566"
}

Example: Athena Mode (Query with S3 Output)

{
  "sql": "SELECT name, count(*) FROM users GROUP BY 1",
  "s3_endpoint": "http://floci:4566",
  "s3_region": "us-east-1",
  "output_s3_path": "s3://results-bucket/report.csv"
}

Example: Variable substitution

{
  "sql": "SELECT * FROM 's3://my-bucket/{{env}}/data.parquet'",
  "s3_endpoint": "http://floci:4566",
  "variables": { "env": "production" }
}

Configuration

Variable	Default	Description
`FLOCI_DUCK_PORT`	`3000`	Port the server listens on.
`FLOCI_DUCK_LOG`	`info`	Log level (`error`, `warn`, `info`, `debug`, `trace`).
`FLOCI_DUCK_EXT_DIR`	(DuckDB default)	Override the DuckDB extension directory (useful in Docker to persist extensions across restarts).
`FLOCI_DUCK_S3_REGION`	`us-east-1`	Default S3 region.
`FLOCI_DUCK_S3_ACCESS_KEY`	`flociadmin`	Default S3 access key ID.
`FLOCI_DUCK_S3_SECRET_KEY`	`flociadmin`	Default S3 secret access key.
`FLOCI_DUCK_S3_USE_SSL`	auto	Default SSL usage. Auto-detected from the endpoint scheme if not set.
`FLOCI_DUCK_S3_URL_STYLE`	`path`	Default S3 URL style (`path` or `vhost`).

Observability

Correlation ID

Every request is assigned a correlation ID — either taken from the incoming x-correlation-id header or auto-generated as a UUID v4. The ID is propagated through every log line produced by that request, including logs emitted deep inside the executor.

Log format

The correlation ID appears as a bare value inside the span context — no key= prefix:

INFO execute{0232f4ad-4ea6-4b24-99e9-4f478998b848}: floci_duck::handlers: Received execute request
INFO execute{0232f4ad-4ea6-4b24-99e9-4f478998b848}: floci_duck::executor: Configuring S3: endpoint=floci:4566, region=us-east-1
INFO execute{0232f4ad-4ea6-4b24-99e9-4f478998b848}: floci_duck::executor: Firehose mode detected. Running raw SQL.
INFO execute{0232f4ad-4ea6-4b24-99e9-4f478998b848}: floci_duck::executor: Executing final SQL: SELECT ...
INFO execute{0232f4ad-4ea6-4b24-99e9-4f478998b848}: floci_duck::handlers: Query executed successfully

INFO query{3e4adc71-366f-4cf7-8ce9-41d441d7e755}: floci_duck::handlers: Received query request
INFO query{3e4adc71-366f-4cf7-8ce9-41d441d7e755}: floci_duck::executor: Configuring S3: endpoint=floci:4566, region=us-east-1
INFO query{3e4adc71-366f-4cf7-8ce9-41d441d7e755}: floci_duck::executor: Executing query SQL: SELECT * FROM ...
INFO query{3e4adc71-366f-4cf7-8ce9-41d441d7e755}: floci_duck::executor: Query returned 4 rows
INFO query{3e4adc71-366f-4cf7-8ce9-41d441d7e755}: floci_duck::handlers: Query returned 4 rows

Passing a correlation ID from the client

curl -X POST http://localhost:3000/query \
  -H "Content-Type: application/json" \
  -H "x-correlation-id: my-trace-id-123" \
  -d '{ "sql": "SELECT 1", "s3_endpoint": "http://floci:4566" }'

If the header is omitted, a UUID v4 is generated automatically.

Development

Testing

duck-test is the integration test CLI for floci-duck. It covers all endpoints and scenarios in named suites.

Prerequisites

A running floci-duck server (make run or docker-compose up)
jq and curl (always required)
aws CLI (required for init, parquet, and validate suites)

Quick start

# Bring up infrastructure and create S3 resources
make dev-infra

# Run all test suites
./duck-test all

# Run specific suites
./duck-test health query
./duck-test parquet --bucket my-bucket

# Verbose output (prints full JSON responses)
./duck-test all -v

Suites

Suite	What it tests
`init`	Creates the S3 bucket and lists resources
`health`	Server liveness (`GET /health`)
`query`	`/query` endpoint — basic SELECT, NULLs, numeric types, `setup_sql`, correlation ID, error handling
`execute`	`/execute` endpoint — firehose mode, athena mode (CSV → S3), variable substitution
`parquet`	Full S3 round-trip: write Parquet, SELECT *, filter, aggregate, DESCRIBE schema
`http`	`httpfs` extension loads and S3 settings are applied
`validate`	Downloads the latest result file from S3 and prints it
`all`	Runs every suite in order

Options

./duck-test [OPTIONS] <SUITE> [SUITE...]

  --url URL              floci-duck server URL        [default: http://localhost:3000]
  --s3-endpoint URL      S3 endpoint (server-facing)  [default: http://floci:4566]
  --s3-endpoint-cli URL  S3 endpoint (aws CLI / host) [default: http://localhost:4566]
  --s3-region REGION     S3 region                    [default: us-east-1]
  --s3-access-key KEY    S3 access key                [default: flociadmin]
  --s3-secret-key KEY    S3 secret key                [default: flociadmin]
  --bucket BUCKET        S3 bucket name               [default: test-bucket]
  -v, --verbose          Print full response bodies
  -h, --help             Show help

All options can also be set via environment variables (FLOCI_DUCK_URL, FLOCI_DUCK_S3_ENDPOINT, etc.).

Example output

duck-test  →  http://localhost:3000  |  S3: http://floci:4566

══ health ══
  [PASS] /health → 200 OK

══ parquet  (S3 round-trip) ══
  [PASS] write parquet to S3
  [PASS] SELECT * from parquet
        row count: 4 (expected 4)
  [PASS] parquet filter WHERE amount > 100
  [PASS] parquet aggregate SUM/COUNT
        total=4 (expected 4), revenue=492.25 (expected 492.25)
  [PASS] parquet DESCRIBE schema

══════════════════════════════
  PASS: 6     FAIL: 0     SKIP: 0
══════════════════════════════

License

This project is licensed under the MIT License - see the LICENSE file for details (or MIT if not present).

Related Projects

Floci: The main Floci project.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.releaserc.json		.releaserc.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Dockerfile.package		Dockerfile.package
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
duck-test		duck-test

Folders and files

Latest commit

History

Repository files navigation

Floci Duck

Features

Getting Started

Prerequisites

Running with Docker Compose

Local Build and Run

Startup behaviour

API Reference

GET /health

POST /query

Request Body

Response Body

Example: Read a Parquet file from S3

Example: Query with setup_sql

POST /execute

Request Body

Response Body

Example: Firehose Mode (Direct SQL)

Example: Athena Mode (Query with S3 Output)

Example: Variable substitution

Configuration

Observability

Correlation ID

Log format

Passing a correlation ID from the client

Development

Testing

Prerequisites

Quick start

Suites

Options

Example output

License

Related Projects

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

GET `/health`

POST `/query`

Example: Query with `setup_sql`

POST `/execute`

Packages