SQL Grammar & PostgreSQL Compatibility

Overview

Cassandra-SQL provides extensive PostgreSQL-compatible SQL support in KV mode. This document details supported SQL syntax and gaps compared to PostgreSQL.

Supported SQL Statements

Data Query Language (DQL)

SELECT

✅ Fully Supported

SELECT [DISTINCT] column_list
FROM table_name [alias]
[WHERE condition]
[GROUP BY column_list]
[HAVING condition]
[ORDER BY column_list [ASC|DESC]]
[LIMIT count] [OFFSET offset]

Features:

Column projection (SELECT col1, col2)
Wildcard selection (SELECT *)
Column aliases (SELECT col AS alias)
Literal values (SELECT 'constant' AS name)
Arithmetic expressions (SELECT price * quantity AS total)
DISTINCT
WHERE clause with complex predicates
ORDER BY (single or multi-column)
LIMIT and OFFSET
Subqueries (see below)

Examples:

-- Basic SELECT
SELECT id, name, email FROM users WHERE age > 18;

-- With ORDER BY and LIMIT
SELECT * FROM products ORDER BY price DESC LIMIT 10;

-- With expressions
SELECT id, price * 1.1 AS price_with_tax FROM products;

-- With DISTINCT
SELECT DISTINCT category FROM products;

JOINs

✅ Fully Supported

SELECT columns
FROM table1 [alias1]
[INNER | LEFT | RIGHT | FULL | CROSS] JOIN table2 [alias2]
  ON join_condition
[WHERE condition]
[ORDER BY columns]
[LIMIT count]

Supported JOIN Types:

INNER JOIN - Returns matching rows from both tables
LEFT JOIN (LEFT OUTER JOIN) - All rows from left table + matching from right
RIGHT JOIN (RIGHT OUTER JOIN) - All rows from right table + matching from left
FULL JOIN (FULL OUTER JOIN) - All rows from both tables
CROSS JOIN - Cartesian product

Multi-way JOINs (3+ tables):

SELECT *
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN products p ON o.product_id = p.id
WHERE o.status = 'shipped';

Examples:

-- INNER JOIN
SELECT u.name, o.order_date
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.name;

-- FULL JOIN
SELECT *
FROM table_a a
FULL JOIN table_b b ON a.id = b.id;

-- CROSS JOIN
SELECT *
FROM colors
CROSS JOIN sizes;

Aggregations

✅ Fully Supported

SELECT aggregate_functions
FROM table_name
[WHERE condition]
[GROUP BY column_list]
[HAVING condition]

Aggregate Functions:

COUNT(*), COUNT(column), COUNT(DISTINCT column)
SUM(column)
AVG(column)
MIN(column)
MAX(column)

Examples:

-- Simple aggregation
SELECT COUNT(*) FROM users;

-- GROUP BY
SELECT category, COUNT(*), AVG(price)
FROM products
GROUP BY category;

-- HAVING clause
SELECT category, COUNT(*) AS count
FROM products
GROUP BY category
HAVING COUNT(*) > 10;

-- Multiple aggregates
SELECT 
  COUNT(*) AS total_orders,
  SUM(amount) AS total_revenue,
  AVG(amount) AS avg_order_value
FROM orders
WHERE status = 'completed';

Subqueries

✅ Fully Supported

Types:

Scalar subqueries (returns single value)
IN subqueries
EXISTS subqueries
Correlated subqueries

Examples:

-- IN subquery
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);

-- EXISTS subquery
SELECT * FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- Correlated subquery
SELECT u.name, 
  (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count
FROM users u;

-- Subquery in FROM clause
SELECT category, avg_price
FROM (
  SELECT category, AVG(price) AS avg_price
  FROM products
  GROUP BY category
) AS subquery
WHERE avg_price > 100;

Window Functions

✅ Fully Supported

SELECT column,
  window_function() OVER (
    [PARTITION BY partition_columns]
    [ORDER BY order_columns]
    [ROWS|RANGE frame_clause]
  )
FROM table_name

Supported Functions:

ROW_NUMBER() - Sequential number within partition
RANK() - Rank with gaps
DENSE_RANK() - Rank without gaps
LAG(column, offset) - Previous row value
LEAD(column, offset) - Next row value
FIRST_VALUE(column) - First value in window
LAST_VALUE(column) - Last value in window

Examples:

-- ROW_NUMBER
SELECT id, name, salary,
  ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
FROM employees;

-- PARTITION BY
SELECT department, name, salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- LAG/LEAD
SELECT date, revenue,
  LAG(revenue, 1) OVER (ORDER BY date) AS prev_revenue,
  LEAD(revenue, 1) OVER (ORDER BY date) AS next_revenue
FROM daily_sales;

UNION / UNION ALL

✅ Fully Supported

SELECT columns FROM table1
UNION [ALL]
SELECT columns FROM table2
[ORDER BY columns]
[LIMIT count]

Examples:

-- UNION (removes duplicates)
SELECT name FROM customers
UNION
SELECT name FROM suppliers;

-- UNION ALL (keeps duplicates)
SELECT product_id FROM orders_2023
UNION ALL
SELECT product_id FROM orders_2024;

-- With ORDER BY and LIMIT
SELECT name, 'customer' AS type FROM customers
UNION ALL
SELECT name, 'supplier' AS type FROM suppliers
ORDER BY name
LIMIT 100;

Data Manipulation Language (DML)

INSERT

✅ Fully Supported

INSERT INTO table_name [(column_list)]
VALUES (value_list) [, (value_list), ...]

Features:

Single row insert
Multi-row insert
Auto-increment columns (SERIAL, GENERATED AS IDENTITY)
DEFAULT values
NULL handling

Examples:

-- Basic INSERT
INSERT INTO users (name, email, age)
VALUES ('John Doe', 'john@example.com', 30);

-- Multi-row INSERT
INSERT INTO products (name, price)
VALUES 
  ('Product A', 19.99),
  ('Product B', 29.99),
  ('Product C', 39.99);

-- With auto-increment
INSERT INTO orders (user_id, total)
VALUES (1, 99.99);  -- id is auto-generated

-- With DEFAULT
INSERT INTO users (name, email, created_at)
VALUES ('Jane', 'jane@example.com', DEFAULT);

UPDATE

✅ Fully Supported

UPDATE table_name
SET column = value [, column = value, ...]
[WHERE condition]

Features:

Single/multiple column updates
Arithmetic expressions (balance = balance + 100)
WHERE clause filtering
Automatic index maintenance

Examples:

-- Basic UPDATE
UPDATE users SET age = 31 WHERE id = 1;

-- Multiple columns
UPDATE products
SET price = 29.99, stock = 100
WHERE id = 5;

-- Arithmetic expression
UPDATE accounts
SET balance = balance + 100
WHERE id = 123;

-- Complex WHERE
UPDATE orders
SET status = 'shipped'
WHERE status = 'pending' AND created_at < '2024-01-01';

DELETE

✅ Fully Supported

DELETE FROM table_name
[WHERE condition]

Features:

WHERE clause filtering
Automatic index cleanup
Transactional (ACID)

Examples:

-- Delete specific rows
DELETE FROM users WHERE age < 18;

-- Delete with complex condition
DELETE FROM orders
WHERE status = 'cancelled' AND created_at < '2023-01-01';

-- Delete all rows (use TRUNCATE for better performance)
DELETE FROM temp_table;

Data Definition Language (DDL)

CREATE TABLE

✅ Fully Supported

CREATE TABLE [IF NOT EXISTS] table_name (
  column_name data_type [constraints],
  ...
  [PRIMARY KEY (column_list)]
)

Supported Data Types:

INT, INTEGER - 32-bit integer
BIGINT - 64-bit integer
SMALLINT - 16-bit integer
SERIAL - Auto-incrementing integer
BIGSERIAL - Auto-incrementing bigint
TEXT, VARCHAR(n) - Variable-length string
DOUBLE PRECISION, DOUBLE, FLOAT - Double precision float
BOOLEAN, BOOL - Boolean
JSON, JSONB - JSON data
UUID - UUID
TIMESTAMP - Timestamp (future)
DATE - Date (future)

Constraints:

PRIMARY KEY - Single or composite
NOT NULL - Non-nullable column
GENERATED ALWAYS AS IDENTITY - Auto-increment (PostgreSQL 10+)
GENERATED BY DEFAULT AS IDENTITY - Auto-increment with override

Examples:

-- Basic table
CREATE TABLE users (
  id INT PRIMARY KEY,
  name TEXT NOT NULL,
  email TEXT,
  age INT
);

-- With auto-increment
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL,
  total DOUBLE PRECISION,
  status TEXT
);

-- Composite primary key
CREATE TABLE order_items (
  order_id INT,
  product_id INT,
  quantity INT,
  PRIMARY KEY (order_id, product_id)
);

-- With GENERATED AS IDENTITY
CREATE TABLE products (
  id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name TEXT NOT NULL,
  price DOUBLE PRECISION
);

-- IF NOT EXISTS
CREATE TABLE IF NOT EXISTS logs (
  id BIGSERIAL PRIMARY KEY,
  message TEXT,
  created_at TIMESTAMP
);

Hidden rowid Column: If no PRIMARY KEY is specified, Cassandra-SQL automatically adds a hidden rowid column (similar to PostgreSQL's ctid).

CREATE INDEX

✅ Fully Supported

CREATE [UNIQUE] INDEX [IF NOT EXISTS] index_name
ON table_name (column_list)

Features:

Single-column indexes
Multi-column (composite) indexes
UNIQUE indexes (future)
Automatic index maintenance

Examples:

-- Single-column index
CREATE INDEX idx_users_email ON users (email);

-- Multi-column index
CREATE INDEX idx_orders_user_status ON orders (user_id, status);

-- IF NOT EXISTS
CREATE INDEX IF NOT EXISTS idx_products_category ON products (category);

DROP TABLE

✅ Fully Supported

DROP TABLE [IF EXISTS] table_name [, table_name, ...]

Features:

Single or multiple tables
IF EXISTS clause
Lazy deletion (VacuumJob cleans up data)

Examples:

-- Drop single table
DROP TABLE users;

-- Drop multiple tables
DROP TABLE orders, order_items;

-- IF EXISTS
DROP TABLE IF EXISTS temp_table;

DROP INDEX

✅ Fully Supported

DROP INDEX [IF EXISTS] index_name

TRUNCATE

✅ Fully Supported

TRUNCATE TABLE table_name

Fast O(1) operation using table versioning. Data is marked as invisible and cleaned up by VacuumJob.

ALTER TABLE

⚠️ Partially Supported

ALTER TABLE table_name
  ADD PRIMARY KEY (column_list)

Supported:

ADD PRIMARY KEY - Add primary key constraint

Not Yet Supported:

ADD COLUMN - Add new column
DROP COLUMN - Remove column
ALTER COLUMN - Modify column type
ADD CONSTRAINT - Add constraints (FOREIGN KEY, CHECK, etc.)

Transaction Control Language (TCL)

✅ Fully Supported

BEGIN [TRANSACTION]
START TRANSACTION

COMMIT [TRANSACTION]

ROLLBACK [TRANSACTION]
ABORT

Features:

ACID transactions
Snapshot isolation
Optimistic concurrency control
Automatic conflict detection

Examples:

-- Transfer money between accounts
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Rollback on error
BEGIN;
INSERT INTO users (name, email) VALUES ('Test', 'test@example.com');
-- ... error occurs ...
ROLLBACK;

Utility Commands

EXPLAIN / EXPLAIN ANALYZE

✅ Fully Supported

EXPLAIN [ANALYZE] query

Shows query execution plan and cost estimates. With ANALYZE, also executes the query and shows actual statistics.

Examples:

-- Show query plan
EXPLAIN SELECT * FROM users WHERE age > 18;

-- Show plan with execution statistics
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';

VACUUM / ANALYZE

✅ Supported (No-op)

VACUUM
ANALYZE

These commands are no-ops in KV mode. Background jobs handle cleanup and statistics collection automatically.

COPY

⚠️ Partially Supported

COPY table_name FROM STDIN
COPY table_name TO STDOUT

Basic support for bulk data loading/export.

PostgreSQL Meta-Commands (psql)

✅ Supported

\dt - List tables
\di - List indexes
\d table_name - Describe table
\l - List databases
\c database_name - Connect to database
\q - Quit

WHERE Clause Operators

✅ Fully Supported

Comparison: =, !=, <>, <, <=, >, >=
Logical: AND, OR, NOT
Pattern matching: LIKE, NOT LIKE
Range: BETWEEN, NOT BETWEEN
Set membership: IN, NOT IN
NULL testing: IS NULL, IS NOT NULL

Examples:

-- Comparison
SELECT * FROM users WHERE age >= 18;

-- Logical operators
SELECT * FROM products WHERE price > 10 AND price < 100;

-- LIKE
SELECT * FROM users WHERE email LIKE '%@gmail.com';

-- BETWEEN
SELECT * FROM orders WHERE total BETWEEN 100 AND 500;

-- IN
SELECT * FROM users WHERE status IN ('active', 'pending');

-- IS NULL
SELECT * FROM users WHERE deleted_at IS NULL;

-- Complex conditions
SELECT * FROM orders
WHERE (status = 'pending' OR status = 'processing')
  AND total > 100
  AND created_at >= '2024-01-01';

PostgreSQL Compatibility Gap Analysis

✅ Fully Compatible

Feature	Status	Notes
Basic SELECT	✅	Full support
JOINs (all types)	✅	INNER, LEFT, RIGHT, FULL, CROSS
Aggregations	✅	COUNT, SUM, AVG, MIN, MAX
GROUP BY / HAVING	✅	Full support
Subqueries	✅	Scalar, IN, EXISTS, correlated
Window functions	✅	ROW_NUMBER, RANK, LAG, LEAD, etc.
UNION / UNION ALL	✅	Full support
INSERT / UPDATE / DELETE	✅	Full support
Transactions	✅	ACID with snapshot isolation
CREATE/DROP TABLE	✅	Full support
CREATE/DROP INDEX	✅	Single and multi-column
ORDER BY / LIMIT / OFFSET	✅	Full support
DISTINCT	✅	Full support
Auto-increment	✅	SERIAL, GENERATED AS IDENTITY

⚠️ Partially Compatible

Feature	Status	Limitations
ALTER TABLE	⚠️	Only ADD PRIMARY KEY supported
Data types	⚠️	Missing TIMESTAMP, DATE, INTERVAL, ARRAY
COPY	⚠️	Basic support only
Constraints	⚠️	No FOREIGN KEY, CHECK, UNIQUE yet
Functions	⚠️	Limited built-in functions
CASE expressions	⚠️	Not yet implemented
CTEs (WITH)	⚠️	Not yet implemented
Recursive queries	⚠️	Not yet implemented

❌ Not Yet Supported

Feature	Status	Priority
FOREIGN KEY constraints	❌	High
CHECK constraints	❌	Medium
UNIQUE constraints	❌	High
DEFAULT values (complex)	❌	Medium
Views	❌	High
Materialized views	❌	Medium
Triggers	❌	Low
Stored procedures	❌	Low
User-defined functions	❌	Low
CTEs (WITH clause)	❌	High
CASE expressions	❌	High
CAST / type conversion	❌	Medium
String functions	❌	Medium
Date/time functions	❌	High
Math functions	❌	Medium
ARRAY data type	❌	Medium
TIMESTAMP / DATE types	❌	High
INTERVAL type	❌	Medium
Sequences (CREATE SEQUENCE)	❌	Medium
UPSERT (INSERT ... ON CONFLICT)	❌	High
RETURNING clause	❌	Medium
Lateral joins	❌	Low
Recursive CTEs	❌	Low
Table partitioning	❌	Low
Inheritance	❌	Low

Performance Considerations

Optimized Operations

Index scans: Automatically used when beneficial
Predicate pushdown: WHERE clauses pushed to storage layer
Projection pushdown: Only requested columns retrieved
JOIN optimization: Cost-based join order selection
LIMIT pushdown: Early termination for LIMIT queries

Operations to Avoid

Full table scans on large tables: Use indexes
SELECT * on wide tables: Project only needed columns
Unbounded queries: Always use LIMIT for large result sets
Complex subqueries in WHERE: Consider JOINs instead
High-contention transactions: Can cause conflicts

Migration from PostgreSQL

Compatible Patterns

Most PostgreSQL queries work as-is:

-- These work identically in Cassandra-SQL
SELECT * FROM users WHERE age > 18 ORDER BY name LIMIT 10;
INSERT INTO orders (user_id, total) VALUES (1, 99.99);
UPDATE products SET price = price * 1.1 WHERE category = 'electronics';
DELETE FROM logs WHERE created_at < '2023-01-01';

Patterns Requiring Changes

-- PostgreSQL: RETURNING clause
INSERT INTO users (name) VALUES ('John') RETURNING id;
-- Cassandra-SQL: Query after insert
INSERT INTO users (name) VALUES ('John');
SELECT id FROM users WHERE name = 'John' ORDER BY id DESC LIMIT 1;

-- PostgreSQL: UPSERT
INSERT INTO users (id, name) VALUES (1, 'John')
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name;
-- Cassandra-SQL: Separate UPDATE or INSERT
-- (Check existence first, then INSERT or UPDATE)

-- PostgreSQL: CTEs
WITH recent_orders AS (
  SELECT * FROM orders WHERE created_at > '2024-01-01'
)
SELECT * FROM recent_orders WHERE total > 100;
-- Cassandra-SQL: Use subquery
SELECT * FROM (
  SELECT * FROM orders WHERE created_at > '2024-01-01'
) AS recent_orders WHERE total > 100;

Testing SQL Compatibility

Use the provided test scripts:

# Run SQL compatibility tests
./gradlew test --tests "*KvQueryExecutor*"

# Run JOIN tests
./gradlew test --tests "*JoinTest*"

# Run transaction tests
./gradlew test --tests "*TransactionTest*"

Future Roadmap

See FEATURE_ROADMAP.md for planned features and priorities.

FilesExpand file tree

SQL_GRAMMAR.md

Latest commit

History

SQL_GRAMMAR.md

File metadata and controls

SQL Grammar & PostgreSQL Compatibility

Overview

Supported SQL Statements

Data Query Language (DQL)

SELECT

JOINs

Aggregations

Subqueries

Window Functions

UNION / UNION ALL

Data Manipulation Language (DML)

INSERT

UPDATE

DELETE

Data Definition Language (DDL)

CREATE TABLE

CREATE INDEX

DROP TABLE

DROP INDEX

TRUNCATE

ALTER TABLE

Transaction Control Language (TCL)

Utility Commands

EXPLAIN / EXPLAIN ANALYZE

VACUUM / ANALYZE

COPY

PostgreSQL Meta-Commands (psql)

WHERE Clause Operators

PostgreSQL Compatibility Gap Analysis

✅ Fully Compatible

⚠️ Partially Compatible

❌ Not Yet Supported

Performance Considerations

Optimized Operations

Operations to Avoid

Migration from PostgreSQL

Compatible Patterns

Patterns Requiring Changes

Testing SQL Compatibility

Future Roadmap