-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Fix ReplaceAliasingEvalWithProject in case of shadowing #137025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @alex-spies, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, interesting it didn't surface earlier.
I've only left style optional notes.
| plan.forEachDown(EsRelation.class, relation -> { | ||
| for (Attribute attr : relation.output()) { | ||
| if (attr.name().equals(name)) { | ||
| result.set((FieldAttribute) attr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw if multiple are found? Just to make the testing less "exciting", in case this method ends up being used a plan with multiple sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the multiple case will likely happen in case of lookup join if the name is both in the main index and the lookup index. We should test this
| // the renaming also needs to be propagated to eval fields to the right | ||
| renamesToPropagate.put(field.toAttribute(), newField.toAttribute()); | ||
|
|
||
| field = newField; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: for legibility, I'd reflow this into an if () {} else if () {} else {}, rather than reuse the field var.
| for (int i = 0, size = fields.size(); i < size; i++) { | ||
| Alias field = fields.get(i); | ||
| Expression child = field.child(); | ||
| var attribute = field.toAttribute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this in the "basic renaming" branch.
| List<Alias> newEvalFields = new ArrayList<>(); | ||
|
|
||
| var fields = eval.fields(); | ||
| for (int i = 0, size = fields.size(); i < size; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional, but since changing this method "sufficiently"...: 'for' loop can be replaced with enhanced 'for'
| basicAliasSourcesBuilder.put(attribute, field); | ||
| // propagate all previous aliases into the current field | ||
| field = (Alias) field.transformUp(e -> renamesToPropagate.build().resolve(e, e)); | ||
| Expression child = field.child(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: can be inlined.
| import static org.hamcrest.Matchers.startsWith; | ||
|
|
||
| public class ReplaceAliasingEvalWithProjectTests extends AbstractLogicalPlanOptimizerTests { | ||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from test | ||
| | KEEP emp_no, salary | ||
| | EVAL emp_no2 = emp_no, salary2 = 2*salary, emp_no3 = emp_no2, salary3 = 3*salary2 | ||
| | KEEP * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a case where there is a separate eval after that does something similar?
| from test | ||
| | KEEP emp_no, salary | ||
| | EVAL emp_no2 = emp_no, emp_no = 2*salary, emp_no3 = emp_no2, salary3 = 2*emp_no | ||
| | KEEP * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test case with lookup join where emp_no can come from both the main index and the lookup index?
| // Rule only kicks in if there's a Project or Aggregate above, so we add a KEEP * | ||
| var plan = plan(""" | ||
| from test | ||
| | KEEP salary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't have it already, can you please add a test with a LOOP of evals to make sure the logic does not get in an infinite loop? e.g
row a = 1
eval b = a, a = b, b = a, a = b, b = a, a = b
row a = 1
eval b = a, c = b, a = c
| // Rule only kicks in if there's a Project or Aggregate above, so we add a KEEP * | ||
| var plan = plan(""" | ||
| from test | ||
| | KEEP salary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we throw in some renames in the mix?
| * \_EsRelation[test][_meta_field{f}#26, emp_no{f}#20, first_name{f}#21, ..] | ||
| */ | ||
| public void testAliasForShadowedNonAlias() { | ||
| // Rule only kicks in if there's a Project or Aggregate above, so we add a KEEP * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the rule is also turned on by aggregate. Can we have at least one aggregate example?
| import static org.elasticsearch.xpack.esql.optimizer.rules.logical.TemporaryNameUtils.locallyUniqueTemporaryName; | ||
|
|
||
| /** | ||
| * Replace aliasing evals (eval x=a) with a projection which can be further combined / simplified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we require project on top? It seems we are missing a lot of optimization opportunities.
e.g. maybe we will duplicate a billion doc field many times for this ES|QL? I tested with logical planning, not sure if we have rules later to handle this.
from test
| EVAL salary = salary+1, salary = salary +1, salary = salary +1
Eval[[salary{f}#17 + 1[INTEGER] AS salary#5, salary{r}#5 + 1[INTEGER] AS salary#8, salary{r}#8 + 1[INTEGER] AS sala
ry#11]]
\_Limit[1000[INTEGER],false,false]
\_EsRelation[test][_meta_field{f}#18, emp_no{f}#12, first_name{f}#13, ..]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example above should be optimized to just this, no project needed
from test
| EVAL salary = salary+3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule doesn't propagate shadowed, internal columns from an eval. We could do that! But I don't think we do.
I expect this is what you'd want this to become?
from test
| EVAL salary = ((salary+1)+1)+1
(Which should be simplified by some other rule to salary+3, I think.)
Why do we require project on top?
Great question! That rule is super old, and I don't recall why we don't trigger it always. My hunch is that we wanted it mostly to combine the aliases from the eval with downstream projections. But we could profit from propagating the aliases more generally.
That said, on its own, there is no performance difference between a simple alias in an eval vs. in a projection. Both are cheap! They just incRef the underlying block. (Unless that block is sent over the wire. But that could also be tackled on the serialization level.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would expect us to get to salary = ((salary+1)+1)+1 and then fold the constants in evals eventually. You don't have to address it in this PR, it seems like a bigger change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule is pretty complicated, I feel like it tries to do too much at the same time and the same function. I wonder if we can refactor it somehow, e.g. have a method that does just the aliasing, have a method that does just the renaming, have a method that builds the final output. The changes themselves seem correct. Also added some testing recommendations in the comments.

Fix #137019