Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft(MQL): Parse multi-entity type formulas #5494

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

enochtangg
Copy link
Member

This is a draft PR for implementing what parsing multi-entity type formulas might look like in snuba.

Depends on:

This change breaks up the parsing pipeline into multiple sub-tasks:

  1. parse_mql_query_body(): Parses the MQL into the InitialParseResult tree
  2. populate_query_ast_initial(): Creates LogicalQuery | CompositeQuery objects with the appropriate expressions. If multi-type formula query, calls some processing function (process_table_names_in_selected_expressions()) which traverses AST and adds the appropriate join table names to the expressions.
  3. populate_query_from_mql_context Parses MQL Context and transforms fields into expressions that inject into the AST. If multi-type formula query, patch the appropriate join table names.

To see preview of CompositeQuery AST (with joins), run pytest tests/query/parser/test_formula_mql_query.py::test_multi_type_formula -s -vv

@evanh
Copy link
Member

evanh commented Feb 5, 2024

How does resolving fit into this flow?

@enochtangg
Copy link
Member Author

How does resolving fit into this flow?

I'm wondering if resolving can function as-is, since processing a multi-type formula query only involves updating the table_name on certain expressions (e.g. Column). From what I can tell, the resolution step is only concerned with updating the names and values of those expressions (e.g. column name) and can operate separately?

@evanh
Copy link
Member

evanh commented Feb 6, 2024

Reading over this, I think the refactor proposed is generally good. However, there is no requirement to keep using -If for single type formulas. And in fact, we can't support time spent percentage if we keep doing that anyways.

Given that's the case, we can and probably should rewrite the how formulas get parsed. There's no need to parse the initial SelectedExpressions with -If for example, and that simplifies some of the JOIN parsing code.

@enochtangg
Copy link
Member Author

There's no need to parse the initial SelectedExpressions with -If for example, and that simplifies some of the JOIN parsing code.

I agree. This was something I was also contemplating, it would be nice if we just had one mechanism (JOINs) for all formulas. One thing we should assess is the performance of these join queries in production. While the -If queries are not performant to begin with, are we sure that JOINs won't be significantly worse before we make this refactor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants