Skip to content

Conversation

@fsk119
Copy link
Member

@fsk119 fsk119 commented Sep 25, 2025

…okupJoin

What is the purpose of the change

Refactor codegen and runner to reuse utils for MLPredict and Lookup join

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 25, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@lihaosky lihaosky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Left some comments

callWithDataType
}

try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we move this out of createLookupTypeInference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Which part do you mean 0.0

callContext: FunctionCallContext,
udf: UserDefinedFunction,
operands: Seq[GeneratedExpression]) => {
val inference = TypeInference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call udf.getTypeInference(dataTypeFactory)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because type inference relies on eval method signature to determine input and output types. However, predict function's eval method's input type is always RowData, which is impossible for type inference to infer its type. So here, we directly according to model's input/output schema to build type inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here udf is not ML_PREDICT, it is an subclass of PredictFunction and flink still uses the method signature determine the type inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right


private void registerMetric(MetricGroup metricGroup) {
metricGroup.gauge(
"ai_queue_length", () -> asyncBufferCapacity + 1 - resultFutureBuffer.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resultFutureBuffer size is fixed to be the same as asyncBufferCapacity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just means how many requests are in-flight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean resultFutureBuffer.size() is always equal to asyncBufferCapacity + 1 from open(). So this is alway 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. resultFutureBuffer size is not fixed. When a record arrives, operator will take a result buffer to handle the execution. CC

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Oct 15, 2025
Copy link
Contributor

@lihaosky lihaosky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

callContext: FunctionCallContext,
udf: UserDefinedFunction,
operands: Seq[GeneratedExpression]) => {
val inference = TypeInference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right

@fsk119
Copy link
Member Author

fsk119 commented Oct 16, 2025

The failed tests is not related to the PR. Merging...

@fsk119 fsk119 merged commit 8b6e69e into apache:master Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants