Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements to lexer + parser performance #293

Open
allancalix opened this issue Sep 9, 2022 · 4 comments
Open

improvements to lexer + parser performance #293

allancalix opened this issue Sep 9, 2022 · 4 comments
Assignees
Labels

Comments

@allancalix
Copy link
Contributor

Description

I noticed that parse times increase with query times more than I'd expect. I suspect that this has to do with the number of allocations happening during lexing/parsing as switching to jemalloc improved things ~15%. I'm not sure if it's something specific to this query or if any query of this size that sees the increase.

10 aliased fields

parser_peek_n           time:   [21.519 µs 21.535 µs 21.562 µs]                           
                        change: [-97.894% -97.891% -97.887%] (p = 0.00 < 0.05)
                        Performance has improved.

100 aliased fields

Gnuplot not found, using plotters backend
parser_peek_n           time:   [197.55 µs 197.71 µs 197.91 µs]                          
                        change: [+816.92% +818.31% +819.65%] (p = 0.00 < 0.05)
                        Performance has regressed.

1000 aliased fields

Gnuplot not found, using plotters backend
parser_peek_n           time:   [2.0601 ms 2.0618 ms 2.0636 ms]                           
                        change: [+940.25% +941.65% +943.08%] (p = 0.00 < 0.05)
                        Performance has regressed.

Steps to reproduce

  1. Parse a large query see example

Expected result

Parse time for large queries should grow more slowly.

Actual result

Every 10x increase in number of selections increases parse times by > 800%.

Environment

  • Linux / Darwin
  • apollo-rs apollo-parser revision = HEAD
@lrlna
Copy link
Member

lrlna commented Sep 9, 2022

Interesting! This seems to be an issue specific to the particular Document you're benchmarking. We use apollo-rs on Documents that are MBs in size (as suppose to 57KB in the example you provided), and we are yet to observe such a large spike in parse times.

I'll need to investigate this a bit more.

@lrlna lrlna self-assigned this Sep 12, 2022
@allancalix
Copy link
Contributor Author

It looks like this issue is common for graphql parsers gqlparser, async-graphql-parser, and graphql-parser. While I wasn't able to make the scaling of these queries better overall, reducing the number of allocations in the lexer does improve the performance overall and seems to make it the fastest parsing option of the bunch.

Maybe this could go further in certain cases with rust-analyzer/rowan#119. I more or less hit a wall with all CPU time dedicated to parsing the selection set fields.

flamegraph

Here are some benchmarks with the available Rust parsers: https://github.com/allancalix/graphql-benchmark.

Baseline small query

async_graphql_parser    time:   [7.5938 µs 7.6117 µs 7.6288 µs]
                        change: [-0.5916% -0.1561% +0.2545%] (p = 0.46 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild

apollo_graphql_parser   time:   [6.7100 µs 6.7274 µs 6.7436 µs]
                        change: [-1.8937% -1.5113% -1.1105%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild

apollo_fork_graphql_parser
                        time:   [4.7675 µs 4.7785 µs 4.7904 µs]
                        change: [-2.1926% -1.7055% -1.2349%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

graphql_parser          time:   [5.6228 µs 5.6353 µs 5.6472 µs]
                        change: [-0.3519% +0.0893% +0.6192%] (p = 0.70 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  6 (6.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

Pathological query

async_graphql_parser #2 time:   [2.3517 ms 2.3572 ms 2.3625 ms]
                        change: [+1.3489% +1.6262% +1.9296%] (p = 0.00 < 0.05)
                        Performance has regressed.

apollo_graphql_parser #2
                        time:   [3.0019 ms 3.0110 ms 3.0203 ms]
                        change: [+1.9770% +2.3294% +2.6782%] (p = 0.00 < 0.05)
                        Performance has regressed.

apollo_fork_graphql_parser #2
                        time:   [1.9258 ms 1.9290 ms 1.9322 ms]
                        change: [+0.6851% +0.9991% +1.3283%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

graphql_parser #2       time:   [2.2415 ms 2.2494 ms 2.2584 ms]
                        change: [+0.1431% +0.5157% +0.9513%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

@lrlna lrlna removed the bug Something isn't working label Sep 22, 2022
@lrlna lrlna changed the title Query parse times increase rapidly with query size improvements to lexer + parser performance Sep 22, 2022
@lrlna
Copy link
Member

lrlna commented Sep 22, 2022

whoaaa thank you so much for putting all this work in for benchmarking @allancalix! i really appreciate it!

There are definitely a whole bunch of improvements we can make when it comes to parser's performance. The work you did in #294 is one way to go about it, there is also an idea have a streaming interface (like in #115) for the lexer, or simply have the parser consume lexer's tokens without having that extra alloc at all.

I realise I won't have enough time to think of a direction for us to take this until after GraphQL Summit happening in the first week of October. This goes in my backlog for October though.

@allancalix
Copy link
Contributor Author

allancalix commented Sep 22, 2022

I've actually done both of these things to solve some bottlenecks for my use case. It's fairly rough currently, but once I get a chance to clean it up I'd be happy to upstream the changes if you'd be willing to review the PR. I included the excellent work done in #115 in additional to removing all the allocations from the Token types themselves.

Can see the direction of the changes here: main...allancalix:apollo-rs:finite-state-lexer. I've observed ~40-50% decrease in parse times for both simple and complex queries using the new lexer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants