-
Notifications
You must be signed in to change notification settings - Fork 13.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-37232][runtime] Fix for broken synchronization assumption on the AdaptiveScheduler's side introduced by FLIP-272 #26088
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. CI is failing due to spotless.
./mvnw -pl flink-runtime spotless:apply
should do the trick.
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/StateTransitions.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Restarting.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptive/ExecutingTest.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Restarting.java
Show resolved
Hide resolved
1fd69a2
to
8f21e96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going through my comments. Looks good from my end. 👍
@XComp Thanks for review, I addressed the comments and updated the description. PTAL. |
@flinkbot run azure |
@ztison looks like you bot command did not take- we still see yesterdays CI FAILURE |
@flinkbot run azure |
1 similar comment
@flinkbot run azure |
…he AdaptiveScheduler's side introduced by FLIP-272
0d2c955
to
a60f95b
Compare
…eRequirementsITCase
a60f95b
to
78855d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job and thanks for adding the ITCase. 👍
I verified that we can reproduce the error with the ITCase and the changes not being applied:
StandaloneDispatcher [flink-pekko.actor.default-dispatcher-4] INFO Job 98f97effc400a2ee5004f62ca5c4cea4 reached terminal state FAILED.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough resources available for scheduling.
at org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.lambda$determineParallelism$26(AdaptiveScheduler.java:1136)
I have one question, though. PTAL
...ests/src/test/java/org/apache/flink/test/scheduling/UpdateJobResourceRequirementsITCase.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing else to add. Thanks for fixing the issue. 👍 Please provide backports for release-2.0
and release-2.0-preview1-rc1
What is the purpose of the change
The PR fixes the issue with the rescale state transition short cut that was introduced in FLIP-472 where the
WaitingForRequirements
state is omitted and the transition goes directly from the Restarting state to theCreatingExecutionGraph
state.Brief change log
Restarting
state and check when the job is cancelled whether that parallelism has changed.Verifying this change
This change is already covered by existing tests, such as (please describe tests).
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation