Skip to content

Conversation

@wojiaodoubao
Copy link
Contributor

Lance transaction is not strict serial isolation. Here is an example of write skew.

def write_skew():
    uri = './alice_and_bob.lance'
    table = pa.Table.from_batches([], schema=pa.schema([("fruit", pa.string())]))
    ds = lance.write_dataset(table, uri, enable_v2_manifest_paths=True, mode="append")
    read_version = ds.version

    # Transaction-1 starts. Add pear if there is no apple.
    txn_1 = None
    if ds.scanner(filter="fruit='apple'", limit=1).to_table().num_rows == 0:
        data = ["pear"]
        table = pa.table([data], schema=pa.schema([("fruit", pa.string())]))
        frag_meta = lance.fragment.LanceFragment.create(uri, table)
        txn_1 = lance.LanceOperation.Append([frag_meta])

    # Transaction-2 starts. Add apple if there is no pear.
    txn_2 = None
    if ds.scanner(filter="fruit='pear'", limit=1).to_table().num_rows == 0:
        data = ["apple"]
        table = pa.table([data], schema=pa.schema([("fruit", pa.string())]))
        frag_meta = lance.fragment.LanceFragment.create(uri, table)
        txn_2 = lance.LanceOperation.Append([frag_meta])

    # Task 1 commit.
    if txn_1 is not None:
        ds.commit(uri, read_version=read_version, operation=txn_1)

    # Task 2 commit.
    if txn_2 is not None:
        ds.commit(uri, read_version=read_version, operation=txn_2)

    # The count is 2, but what we want is either apple or pear.
    ds = lance.dataset(uri)
    print(ds.count_rows())

In partitioned namespace, we use a special lance table __manifest as meta store. When commit in cross-partitioned write, we need serial commit in __manifest to guarantee atomic commit.

@wojiaodoubao wojiaodoubao force-pushed the strict-acid-serial-isolation branch from 57e3216 to d1aec0a Compare January 8, 2026 13:52
@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 97.18310% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset.rs 83.33% 1 Missing ⚠️
rust/lance/src/dataset/write/commit.rs 97.36% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 self-assigned this Jan 8, 2026
@wojiaodoubao
Copy link
Contributor Author

When a lance dataset is opened, it is bound to a read_version. So from a dataset is created until the first write to the dataset can be seen as a transaction.

To support strict serial isolation, there are 3 approaches:

  • Simple approach: Succeed only when there is no new commits after read_version. (This pr's choice)
  • Optimistic approach: Track all rows involved in the dataset's read operations (including non-existent rows) and record them in the transaction. Upon commit, check if any of the read rows have been modified since the read_version; if so, the commit fails.
  • Pessimistic approach: Introduce row locks to Lance tables. This might require an external service to implement row locking, so I'm less inclined to adopt this approach.

Hi @wjones127 @jackye1995 @Xuanwo @majin1102 , please let me know your thoughts, thanks very much!

@majin1102
Copy link
Contributor

majin1102 commented Jan 9, 2026

Let me quote how Iceberg describle serilizable isolation: https://iceberg.apache.org/docs/1.6.0/reliability/

Serializable isolation: All table changes occur in a linear history of atomic table updates
Reliable reads: Readers always use a consistent snapshot of the table without holding a lock

From my understanding, what you're trying to implement in this PR follows a very traditional approach to serializability, which is indeed essential for OLTP workloads. However, for lakehouse formats like Lance, serializable isolation of writes is often sufficient. That said, we still need to address the underlying issue in this scenario. In my opinion, we might need to introduce some form of external locking mechanism beyond the Lance format itself.

@jackye1995
Copy link
Contributor

jackye1995 commented Jan 13, 2026

Lance operates in SNAPSHOT ISOLATION, so write skew is expected. (maybe we should make that clear in the format spec)

I personally like this blog a lot: https://brooker.co.za/blog/2024/12/17/occ-and-isolation.html

It describes well why SI is good enough for most cases, and the perf tradeoff. I think that has been kind of proven in Iceberg as well, for write performance, in the end many customers I interacted with switched to SI by default instead of the default SERIALIZABLE behavior. Also, Iceberg has been using the term SERIALIZABLE to describe its single-table guarantee, which is misleading, because after all any table format is just Read-Committed Snapshot Isolation (RCSI) across tables, which is technically weaker than SI.

So I think we would need a very strong reason to support SERIALIZABLE. Why do we want to add the support now? Can we just work with SI?

@wojiaodoubao
Copy link
Contributor Author

So I think we would need a very strong reason to support SERIALIZABLE. Why do we want to add the support now? Can we just work with SI?

Partitioned namespace uses a lance table __manifest to store meta. The meta table needs SERIALIZABLE to implement cross-partition transaction. Here is an example: lance-format/lance-namespace#296 (comment).

@jackye1995
Copy link
Contributor

As we discussed in that PR, this I think is no longer needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants