Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ hive = "3.1.3"
iceberg = "1.9.2" # Ensure to update the iceberg version in regtests to keep regtests up-to-date
quarkus = "3.25.4"
immutables = "2.11.3"
jmh = "1.37"
picocli = "4.7.7"
scala212 = "2.12.19"
spark35 = "3.5.6"
Expand Down Expand Up @@ -76,6 +77,9 @@ jandex = { module = "io.smallrye.jandex:jandex", version ="3.4.0" }
javax-servlet-api = { module = "javax.servlet:javax.servlet-api", version = "4.0.1" }
junit-bom = { module = "org.junit:junit-bom", version = "5.13.4" }
keycloak-admin-client = { module = "org.keycloak:keycloak-admin-client", version = "26.0.6" }
jcstress-core = { module = "org.openjdk.jcstress:jcstress-core", version = "0.16" }
jmh-core = { module = "org.openjdk.jmh:jmh-core", version.ref = "jmh" }
jmh-generator-annprocess = { module = "org.openjdk.jmh:jmh-generator-annprocess", version.ref = "jmh" }
logback-classic = { module = "ch.qos.logback:logback-classic", version = "1.5.18" }
micrometer-bom = { module = "io.micrometer:micrometer-bom", version = "1.15.3" }
microprofile-fault-tolerance-api = { module = "org.eclipse.microprofile.fault-tolerance:microprofile-fault-tolerance-api", version = "4.1.2" }
Expand All @@ -102,6 +106,8 @@ testcontainers-keycloak = { module = "com.github.dasniko:testcontainers-keycloak
threeten-extra = { module = "org.threeten:threeten-extra", version = "1.8.0" }

[plugins]
jcstress = { id = "io.github.reyerizo.gradle.jcstress", version = "0.8.15" }
jmh = { id = "me.champeau.jmh", version = "0.7.3" }
openapi-generator = { id = "org.openapi.generator", version = "7.12.0" }
quarkus = { id = "io.quarkus", version.ref = "quarkus" }
rat = { id = "org.nosphere.apache.rat", version = "0.8.1" }
Expand Down
6 changes: 6 additions & 0 deletions gradle/projects.main.properties
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,9 @@ polaris-extensions-federation-hive=extensions/federation/hive
polaris-config-docs-annotations=tools/config-docs/annotations
polaris-config-docs-generator=tools/config-docs/generator
polaris-config-docs-site=tools/config-docs/site

# id generation
polaris-idgen-api=persistence/nosql/idgen/api
polaris-idgen-impl=persistence/nosql/idgen/impl
polaris-idgen-mocks=persistence/nosql/idgen/mocks
polaris-idgen-spi=persistence/nosql/idgen/spi
55 changes: 55 additions & 0 deletions persistence/nosql/idgen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Unique ID generation framework and monotonic clock

Provides a framework and implementations for unique ID generation, including a monotonically increasing timestamp/clock
source.

Provides a
[Snowflake-IDs](https://medium.com/@jitenderkmr/demystifying-snowflake-ids-a-unique-identifier-in-distributed-computing-72796a827c9d)
implementation.

Consuming production should primarily leverage the `IdGenerator` and `MonotonicClock` interfaces.

## Snowflake ID source

The Snowflake ID source is configurable for each backend instance, but cannot be modified for an existing backend
instance to prevent ID conflicts.

The epoch of these timestamps is 2025-03-01-00:00:00.0 GMT. Timestamps occupy 41 bits at
millisecond precision, which lasts for about 69 years. Node-IDs are 10 bits, which allows 1024 concurrently active
"JVMs running Polaris". 12 bits are used by the sequence number, which then allows each node to generate 4096 IDs per
millisecond. One bit is reserved for future use.

Node IDs are leased by every "JVM running Polaris" for a period of time. The ID generator implementation guarantees
that no IDs will be generated for a timestamp that exceeds the "lease time". Leases can be extended. The implementation
leverages atomic database operations (CAS) for the lease implementation.

ID generators must not use timestamps before or after the lease period nor must they re-use an older timestamp. This
requirement is satisfied using a monotonic clock implementation.

## Code structure

The code is structured into multiple modules. Consuming code should almost always pull in only the API module.

* `polaris-idgen-api` provides the necessary Java interfaces and immutable types.
* `polaris-idgen-impl` provides the storage agnostic implementation.
* `polaris-idgen-mocks` provides mocks for testing.
* `polaris-idgen-spi` provides the necessary interfaces to construct ID generators.
42 changes: 42 additions & 0 deletions persistence/nosql/idgen/api/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

plugins {
id("org.kordamp.gradle.jandex")
id("polaris-server")
}

description = "Polaris ID generation API"

dependencies {
compileOnly(libs.jakarta.annotation.api)
compileOnly(libs.jakarta.validation.api)
compileOnly(libs.jakarta.inject.api)
compileOnly(libs.jakarta.enterprise.cdi.api)

compileOnly(libs.smallrye.config.core)
compileOnly(platform(libs.quarkus.bom))
compileOnly("io.quarkus:quarkus-core")

compileOnly(project(":polaris-immutables"))
annotationProcessor(project(":polaris-immutables", configuration = "processor"))

implementation(platform(libs.jackson.bom))
implementation("com.fasterxml.jackson.core:jackson-databind")
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.ids.api;

/** The primary interface for generating a contention-free ID. */
public interface IdGenerator {
/** Generate a new, unique ID. */
long generateId();

/** Generate the system ID for a node, solely used for node management. */
long systemIdForNode(int nodeId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ID expected to be the same for all IdGenerator implementations? What if a future impl. is not node-based?

Should this be pushed down to the Snowflake ID code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function isn't specific to the particular implementation.
See the upcoming node-id-lease stuff: there is one implementation with one constant configuration per setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the more that I read about the node-id lease stuff. I don't know whether this should be in this module. Here's my thinking:

  1. It seems that NodeID generation is special and not a Snowflake ID generation. The implementation is only based upon the node id passed in and some system-wide variables such as SnowflakeIdGeneratorImpl#timestampMax, SnowflakeIdGeneratorImpl#timestampShift, & SnowflakeIdGeneratorImpl#sequenceBits. So, in practice, it's really just the Node ID passed in.
  2. So, given that, I think we could pull this out and just put it into the node leasing modules. That way, we can keep the IdGenerator clean for the cases that require the distributed id generation.

What do y'all think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Adam's point 1.

... however, I have a bigger concern. Suppose we run with Snowflake IDs for a while and then change to another ID generator. Assume generateId() outputs do not clash. Still, do we expect systemIdForNode(X) to return the same value for all generator implementations and for all possible values of X?


default String describeId(long id) {
return Long.toString(id);
}

IdGenerator NONE =
new IdGenerator() {
@Override
public long generateId() {
throw new UnsupportedOperationException("NONE IdGenerator cannot generate IDs.");
}

@Override
public long systemIdForNode(int nodeId) {
throw new UnsupportedOperationException("NONE IdGenerator cannot generate IDs.");
}
};
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.ids.api;

import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
import com.fasterxml.jackson.databind.annotation.JsonSerialize;
import io.smallrye.config.WithDefault;
import java.util.Map;
import org.apache.polaris.immutables.PolarisImmutable;
import org.immutables.value.Value;

@PolarisImmutable
@JsonSerialize(as = ImmutableIdGeneratorSpec.class)
@JsonDeserialize(as = ImmutableIdGeneratorSpec.class)
public interface IdGeneratorSpec {
@WithDefault("snowflake")
String type();

Map<String, String> params();

@PolarisImmutable
interface BuildableIdGeneratorSpec extends IdGeneratorSpec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class looks odd? Afaict we could use ImmutableIdGeneratorSpec.builder() instead. The only difference seems to be that there is a default type here, whereas in IdGeneratorSpec there isn't, but it's certainly possible to overcome this limitation somehow.

static ImmutableBuildableIdGeneratorSpec.Builder builder() {
return ImmutableBuildableIdGeneratorSpec.builder();
}

@Override
Map<String, String> params();

@Override
@Value.Default
default String type() {
return "snowflake";
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.ids.api;

import java.time.Instant;

/**
* Provides a clock providing the current time in milliseconds, microseconds and instant since
* 1970-01-01-00:00:00.000. The returned timestamp values increase monotonically.
*
* <p>The functions provide nanosecond/microsecond/millisecond precision, but not necessarily the
* same resolution (how frequently the value changes) - no guarantees are made.
*
* <p>Implementation <em>may</em> adjust to wall clocks advancing faster than the real time. If and
* how exactly depends on the implementation, as long as none of the time values available via this
* interface "goes backwards".
*
* <p>Implementer notes: {@link System#nanoTime() System.nanoTime()} does not guarantee that the
* values will be monotonically increasing when invocations happen from different
* CPUs/cores/threads.
*
* <p>A default implementation of {@link MonotonicClock} can be injected as an application scoped
* bean in CDI.
*/
public interface MonotonicClock extends AutoCloseable {
/**
* Current timestamp as microseconds since epoch, can be used as a monotonically increasing wall
* clock.
*/
long currentTimeMicros();

/**
* Current timestamp as milliseconds since epoch, can be used as a monotonically increasing wall
* clock.
*/
long currentTimeMillis();

/**
* Current instant with nanosecond precision, can be used as a monotonically increasing wall
* clock.
*/
Instant currentInstant();

/** Monotonically increasing timestamp with nanosecond precision, not related to wall clock. */
long nanoTime();

void sleepMillis(long millis);

@Override
void close();

void waitUntilTimeMillisAdvanced();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be good to add javadocs here (and above), it's not immediately clear what this method is supposed to do (spin-wait until the clock ticks?).

Also neither this method nor sleepMillis throw InterruptedException, which is surprising since they are clearly blocking. It could be good to add an @implSpec note about how the interrupt flag is expected to be handled.

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.ids.api;

import jakarta.annotation.Nonnull;
import java.time.Instant;
import java.util.UUID;

public interface SnowflakeIdGenerator extends IdGenerator {
/** Offset of the snowflake ID generator since the 1970-01-01T00:00:00Z epoch instant. */
Instant ID_EPOCH = Instant.parse("2025-03-01T00:00:00Z");

/**
* Offset of the snowflake ID generator in milliseconds since the 1970-01-01T00:00:00Z epoch
* instant.
*/
long ID_EPOCH_MILLIS = ID_EPOCH.toEpochMilli();

int DEFAULT_NODE_ID_BITS = 10;
int DEFAULT_TIMESTAMP_BITS = 41;
int DEFAULT_SEQUENCE_BITS = 12;

long constructId(long timestamp, long sequence, long node);

long timestampFromId(long id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is understood as "since the Snowflake Epoch (2025-03-01)". Could be good to add javadocs to clarify.


long timestampUtcFromId(long id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Timestamp UTC" does not make sense to me 🤔
From the implementation class, it seems you meant "timestamp since Unix Epoch" instead.


long sequenceFromId(long id);

long nodeFromId(long id);

UUID idToTimeUuid(long id);

String idToString(long id);

long timeUuidToId(@Nonnull UUID uuid);

int timestampBits();

int sequenceBits();

int nodeIdBits();
}
Loading