Skip to content

Conversation

@alex-spies
Copy link
Contributor

Fix #137019

@alex-spies alex-spies added >bug auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.2.1 v9.3.0 v8.19.7 v9.1.7 labels Oct 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies alex-spies marked this pull request as ready for review October 24, 2025 11:40
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 24, 2025
@bpintea bpintea self-requested a review October 24, 2025 13:02
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, interesting it didn't surface earlier.
I've only left style optional notes.

plan.forEachDown(EsRelation.class, relation -> {
for (Attribute attr : relation.output()) {
if (attr.name().equals(name)) {
result.set((FieldAttribute) attr);
Copy link
Contributor

@bpintea bpintea Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw if multiple are found? Just to make the testing less "exciting", in case this method ends up being used a plan with multiple sources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the multiple case will likely happen in case of lookup join if the name is both in the main index and the lookup index. We should test this

// the renaming also needs to be propagated to eval fields to the right
renamesToPropagate.put(field.toAttribute(), newField.toAttribute());

field = newField;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: for legibility, I'd reflow this into an if () {} else if () {} else {}, rather than reuse the field var.

for (int i = 0, size = fields.size(); i < size; i++) {
Alias field = fields.get(i);
Expression child = field.child();
var attribute = field.toAttribute();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this in the "basic renaming" branch.

List<Alias> newEvalFields = new ArrayList<>();

var fields = eval.fields();
for (int i = 0, size = fields.size(); i < size; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, but since changing this method "sufficiently"...: 'for' loop can be replaced with enhanced 'for'

basicAliasSourcesBuilder.put(attribute, field);
// propagate all previous aliases into the current field
field = (Alias) field.transformUp(e -> renamesToPropagate.build().resolve(e, e));
Expression child = field.child();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: can be inlined.

import static org.hamcrest.Matchers.startsWith;

public class ReplaceAliasingEvalWithProjectTests extends AbstractLogicalPlanOptimizerTests {
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
/*

To avoid this default rendering (and since it's not really javadoc :) ):
Screenshot 2025-10-24 at 15 07 49

from test
| KEEP emp_no, salary
| EVAL emp_no2 = emp_no, salary2 = 2*salary, emp_no3 = emp_no2, salary3 = 3*salary2
| KEEP *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a case where there is a separate eval after that does something similar?

from test
| KEEP emp_no, salary
| EVAL emp_no2 = emp_no, emp_no = 2*salary, emp_no3 = emp_no2, salary3 = 2*emp_no
| KEEP *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test case with lookup join where emp_no can come from both the main index and the lookup index?

// Rule only kicks in if there's a Project or Aggregate above, so we add a KEEP *
var plan = plan("""
from test
| KEEP salary
Copy link
Contributor

@julian-elastic julian-elastic Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have it already, can you please add a test with a LOOP of evals to make sure the logic does not get in an infinite loop? e.g
row a = 1
eval b = a, a = b, b = a, a = b, b = a, a = b

row a = 1
eval b = a, c = b, a = c

// Rule only kicks in if there's a Project or Aggregate above, so we add a KEEP *
var plan = plan("""
from test
| KEEP salary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw in some renames in the mix?

* \_EsRelation[test][_meta_field{f}#26, emp_no{f}#20, first_name{f}#21, ..]
*/
public void testAliasForShadowedNonAlias() {
// Rule only kicks in if there's a Project or Aggregate above, so we add a KEEP *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the rule is also turned on by aggregate. Can we have at least one aggregate example?

import static org.elasticsearch.xpack.esql.optimizer.rules.logical.TemporaryNameUtils.locallyUniqueTemporaryName;

/**
* Replace aliasing evals (eval x=a) with a projection which can be further combined / simplified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we require project on top? It seems we are missing a lot of optimization opportunities.
e.g. maybe we will duplicate a billion doc field many times for this ES|QL? I tested with logical planning, not sure if we have rules later to handle this.

from test
| EVAL salary = salary+1, salary = salary +1, salary = salary +1
Eval[[salary{f}#17 + 1[INTEGER] AS salary#5, salary{r}#5 + 1[INTEGER] AS salary#8, salary{r}#8 + 1[INTEGER] AS sala
ry#11]]
\_Limit[1000[INTEGER],false,false]
  \_EsRelation[test][_meta_field{f}#18, emp_no{f}#12, first_name{f}#13, ..]

Copy link
Contributor

@julian-elastic julian-elastic Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example above should be optimized to just this, no project needed

from test
| EVAL salary = salary+3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule doesn't propagate shadowed, internal columns from an eval. We could do that! But I don't think we do.
I expect this is what you'd want this to become?

from test
| EVAL salary = ((salary+1)+1)+1

(Which should be simplified by some other rule to salary+3, I think.)

Why do we require project on top?

Great question! That rule is super old, and I don't recall why we don't trigger it always. My hunch is that we wanted it mostly to combine the aliases from the eval with downstream projections. But we could profit from propagating the aliases more generally.

That said, on its own, there is no performance difference between a simple alias in an eval vs. in a projection. Both are cheap! They just incRef the underlying block. (Unless that block is sent over the wire. But that could also be tackled on the serialization level.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would expect us to get to salary = ((salary+1)+1)+1 and then fold the constants in evals eventually. You don't have to address it in this PR, it seems like a bigger change

Copy link
Contributor

@julian-elastic julian-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is pretty complicated, I feel like it tries to do too much at the same time and the same function. I wonder if we can refactor it somehow, e.g. have a method that does just the aliasing, have a method that does just the renaming, have a method that builds the final output. The changes themselves seem correct. Also added some testing recommendations in the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.7 v9.1.7 v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: bug in ReplaceAliasingEvalWithProject: optimized incorrectly due to missing references

4 participants