feat!: define default index name and return IndexMetadata after building index #5645

wjones127 · 2026-01-06T23:36:30Z

BREAKING CHANGE: create_index now returns IndexMetadata of the new index in Rust and Java. Previously it returned nothing. (Python is unchanged, as it returns the Dataset itself, and changing that would be too disruptive.)

Defines the behavior of default index names, particularly for ones with mixed case, non-alphanumeric, and nested fields. I tried to align it with how it was being done before as much as possible. For example, fields with dashes like my-column would get my-column_idx. This helps libraries that were relying on a predicable algorithm.

However, I did notice we don't handle name collisions. So I added behavior for that. If there's already an existing index with that name, the default name gets a _2 added.

Finally, because we are choosing the name in the function, I made it so we return the IndexMetadata when you create the index in case you want to get back the name that was chosen.

codecov · 2026-01-07T00:25:12Z

Codecov Report

❌ Patch coverage is 94.25287% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset/scanner.rs	50.00%	0 Missing and 3 partials ⚠️
rust/lance/src/index/create.rs	97.53%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

jackye1995 · 2026-01-08T17:10:12Z

rust/lance/src/index/create.rs

+///
+/// Joins field names with `.` to create the base index name.
+/// For example: `["meta-data", "user-id"]` -> `"meta-data.user-id"`
+fn default_index_name(fields: &[&str]) -> String {


can we directly use https://github.com/lance-format/lance/blob/main/rust/lance-core/src/datatypes/schema.rs#L1521

That function is exactly what we don't want, particularly for kebab-case. For backwards compatibility, we want user-id to format to:

user-id_idx

But with that function you are pointing at, user-id formats to

`user-id`_idx

I see. My thought process was that, if the library relies on the name, it is basically trying to parse the fields from the index name. But in that case, if we don't backquote, then it would be an issue for columns with dots. But I guess we don't have column with dot case in the past anyway, so what about this:

if field names contain dot, use format_field_path

otherwise, join with dot

it would be an issue for columns with dots

Why is it an issue for columns with dots?

I think it's fine if nested.column_idx can either be a column literally called nested.column or a field column inside of a struct nested. It's easier to type and remember than

`nested.column`_idx

In the case of collisions, we have the logic in place to add the _2.

I think it's fine if nested.column_idx can either be a column literally called nested.column or a field column inside of a struct nested. It's easier to type and remember

My thought process is that, we are basically doing these to ensure backwards compatibility for places that rely on the index name. And if some process relies on the name and cannot break, it is basically trying to parse the fields from the index name. And for this case, basically it can no longer derive the right column for the case above because the name is ambiguous.

wkalt · 2026-01-08T17:16:30Z

rust/lance/src/index/create.rs

+            .execute()
+            .await
+            .unwrap();
+        assert_eq!(idx2.name, "b_idx_2");


there's possible arguments to make this "1" instead of "2" - it might be worth checking that whatever we're doing is stylistically in line with other parts of our interfaces.

We don't have any other interfaces that do this right now. _2 felt natural given the original is implicitly 1 (assuming 1-based indexing). It feels natural enough to say.

Although I don't care too much, because I think it's better if users just name their indexes something sensible instead.

jackye1995 · 2026-01-08T17:20:48Z

rust/lance/src/index/create.rs

+            let column_path = default_index_name(&names);
+            let base_name = format!("{column_path}_idx");
+            let mut candidate = base_name.clone();
+            let mut counter = 2; // Start with no suffix, then use _2, _3, ...


If we start to allow multiple indexes on the same column, I think we should also have the index type in the index name for information purpose, like col_idx_btree (trying to keep the original string and only add things in the end)

Another thing is that this will start to create a situation that concurrent index creation on the same column and same name should fail. We currently treat that as a delta index, but now I think we need to update the transaction model to differentiate these 2 cases with a flag like is_delta_idex

If we start to allow multiple indexes on the same column, I think we should also have the index type in the index name for information purpose

Yeah, I think we should do that. Should also balance with need for backwards compat. Unfortunately there are places in LanceDB where we hard code the expectation that the index name is {column}_idx. Hopefully we can fix that soon and it won't be an issue.

Another thing is that this will start to create a situation that concurrent index creation on the same column and same name should fail.

Yeah that would be good to handle better. We shouldn't mix index types for the same index name.

wjones127 added 2 commits January 6, 2026 10:07

write tests for current behavior

819cbea

normalize index names

32e8b09

github-actions bot added enhancement New feature or request python labels Jan 6, 2026

wjones127 added 2 commits January 7, 2026 09:53

backwards compatible index name defaults

e7bb14b

return index metadata

5907cec

wjones127 changed the title ~~feat: define default index name behavior for nested field and special characters~~ feat: define default index name and return IndexMetadata after building index Jan 7, 2026

Return from create_index method too

aa81ce0

github-actions bot added the java label Jan 7, 2026

wjones127 changed the title ~~feat: define default index name and return IndexMetadata after building index~~ feat!: define default index name and return IndexMetadata after building index Jan 7, 2026

github-actions bot added the breaking-change label Jan 7, 2026

wjones127 force-pushed the feat/index-name-normalization branch from 208984f to 4d93fc1 Compare January 7, 2026 23:22

return IndexMetadata from Python and Java create_index APIs

8a46b24

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

wjones127 force-pushed the feat/index-name-normalization branch from 4d93fc1 to 8a46b24 Compare January 7, 2026 23:42

wjones127 marked this pull request as ready for review January 8, 2026 00:51

jackye1995 reviewed Jan 8, 2026

View reviewed changes

wkalt reviewed Jan 8, 2026

View reviewed changes

jackye1995 reviewed Jan 8, 2026

View reviewed changes

wjones127 requested review from jackye1995 and wkalt January 13, 2026 17:26

wkalt approved these changes Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat!: define default index name and return IndexMetadata after building index #5645

feat!: define default index name and return IndexMetadata after building index #5645

wjones127 commented Jan 6, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

jackye1995 Jan 8, 2026

Uh oh!

wjones127 Jan 8, 2026

Uh oh!

jackye1995 Jan 8, 2026

Uh oh!

wjones127 Jan 13, 2026

Uh oh!

jackye1995 Jan 15, 2026

Uh oh!

wkalt Jan 8, 2026

Uh oh!

wjones127 Jan 13, 2026

Uh oh!

jackye1995 Jan 8, 2026

Uh oh!

wjones127 Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat!: define default index name and return IndexMetadata after building index #5645

Are you sure you want to change the base?

feat!: define default index name and return IndexMetadata after building index #5645

Conversation

wjones127 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wjones127 commented Jan 6, 2026 •

edited

Loading

codecov bot commented Jan 7, 2026 •

edited

Loading