-
Couldn't load subscription status.
- Fork 25.6k
ESQL: introduce a new interface to declare functions depending on the @timestamp attribute
#137040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @bpintea, I've created a changelog YAML for you. |
| if (plan instanceof Rerank r) { | ||
| return resolveRerank(r, childrenOutput); | ||
| } | ||
| return switch (plan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. Left a first round of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative to this interface is the TimestampAware interface with a timestamp() getter-like method. This would act as a marker interface which can be used by the FunctionRegistry to know if the timestamp needs to be injected or not.
To avoid compiler ambiguities, potentially add the tcab wrapper.
| public TBucket(Source source, Expression buckets, TimestampAttributeSupplier timestampSupplier) { | ||
| this(source, buckets, timestampSupplier.timestampAttribute()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor could be removed and instead the function would rely only on the TBucket constructor with 2 parameters where the users supplies the bucket while the timestamp by the ESQL infra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a timestamp() method would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. It's not absolutely needed now, but having it available for later. It fits well with some of the implementing methods' implicit timestamp() existing method.
| if (arg.foldable() && ((arg instanceof EsqlScalarFunction) == false)) { | ||
| if (i < targetDataTypes.size()) { | ||
| targetDataType = targetDataTypes.get(i); | ||
| int targetDataTypesIdx = f instanceof TimestampAware ? i - 1 : i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using TimestampAware.timestamp() ?
| /** | ||
| * Marker interface to identify classes of functions that operate on the {code @timestamp} field of an index. | ||
| * Implementations of this interface need to expect the associated {@code Attribute} to be passed as the following argument after the | ||
| * {@code Source} one, which is always the first one. | ||
| */ | ||
| public interface TimestampAware {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for having the Timestamp expression right after Source and not the last Expression, that is:
Class(Source, Expression field, Expression timestamp)
Class(Source, Expression field, Expression timestamp, Configuration config)
Class(Source, Expression field, Expression timestamp, List)
In case of ambiguity, this can be sorted out inside the function registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it makes the most sense to have it following source:
- it's always there, just like source and always on same position for all TimestampAware functions;
- it's consistent to report (verification issues) about its position, if needed;
- it's clearer to apply positional transformations, like implicit casting (example).
But I've reverted this change, leaving the code inline with the other TS functions.
|
Hi @bpintea, I've updated the changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| import static org.hamcrest.Matchers.startsWith; | ||
|
|
||
| //@TestLogging(value = "org.elasticsearch.xpack.esql:TRACE,org.elasticsearch.compute:TRACE", reason = "debug") | ||
| @TestLogging(value = "org.elasticsearch.xpack.esql:TRACE,org.elasticsearch.compute:TRACE", reason = "debug") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| defTS(FirstOverTime.class, FirstOverTime::new, "first_over_time"), | ||
| def(PercentileOverTime.class, bi(PercentileOverTime::new), "percentile_over_time"), | ||
| // dense vector function | ||
| def(TextEmbedding.class, bi(TextEmbedding::new), "text_embedding") } }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if First/Last functions should be updated as well?
Conceptually they need @timestamp although (if I remember correctly) it is possible to supply it or another date/time field explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sort can be anything (as long as dt -> dt == DataType.LONG || dt == DataType.DATETIME || dt == DataType.DATE_NANOS), @timestamp included, but not restricted to it.
This change only applies to functions that only work with the @timestamp field and which assume it's there, i.e. it's an implicit dependency (which this change provides/injects).
This updates the way the
@timestampfield is injected into the functions that require it implicitly: these functions no longer need to declare an attribute themselves, the function registry will do it for them.Followingly, this can be traced from the source and eventually wired into the functions (so that renames no longer be problematic).
Closes #136772