Reduce the number of allocations needed to find a specific child/sibling #119

theo-lw · 2021-10-03T19:33:26Z

A very common operation in rust-analyzer is finding a child/sibling of a SyntaxNode that matches a specific SyntaxKind. This is currently implemented by iterating through children and finding the first matching node. However, this causes unnecessary allocations - the children that don't match are allocated and quickly discarded.

This PR aims to reduce these unnecessary allocations by providing methods to iterate through only the children/siblings that match the desired SyntaxKind. Children/siblings that don't match the desired SyntaxKind are not allocated in these iterators.

So, I've only implemented this optimization for iterating through children, but this optimization already seems quite promising. Below is the DHAT output from running rust-analyzer's integrated completion benchmarks.

DHAT output before this change (on the master branch in my fork):

==196375== Total:     15,179,316,140 bytes in 107,064,174 blocks
==196375== At t-gmax: 587,308,085 bytes in 2,611,269 blocks
==196375== At t-end:  845,424 bytes in 1,480 blocks
==196375== Reads:     33,131,042,097 bytes
==196375== Writes:    17,949,507,478 bytes

DHAT output after this change (on this branch where I used children_matching in support::child).

==82931== Total:     14,585,180,134 bytes in 97,752,267 blocks
==82931== At t-gmax: 585,555,693 bytes in 2,599,462 blocks
==82931== At t-end:  845,584 bytes in 1,483 blocks
==82931== Reads:     32,637,529,753 bytes
==82931== Writes:    17,171,029,456 bytes

This is roughly a 8.5% decrease in total blocks allocated and a 4% decrease in the total bytes allocated.

…ren of SyntaxNode/SyntaxElement

…the generic parameter over Fn(SyntaxKind) from the outwards facing api

theo-lw · 2021-10-03T23:23:28Z

DHAT output of the integrated completion benchmarks with the latest commits:

==73109== Total:     13,722,837,242 bytes in 84,207,767 blocks
==73109== At t-gmax: 585,402,987 bytes in 2,596,664 blocks
==73109== At t-end:  846,800 bytes in 1,487 blocks
==73109== Reads:     31,817,411,560 bytes
==73109== Writes:    16,038,960,745 bytes

Roughly a 21% decrease in total blocks allocated, a 9.6% decrease in total bytes allocated.

CAD97

Roughly a 21% decrease in total blocks allocated, a 9.6% decrease in total bytes allocated.

👀 that's pretty significant (for the measured workload)!

~~Needs a cargo fmt, but otherwise~~ LGTM.

It might also be interesting to instead expose this as e.g. .children().filter_by_kind(|kind| ..) (cf. filter + *_by_key) rather than .children_matching(|kind| ..), but I don't really have a strong preference here.

r? @matklad?

src/api.rs

matklad · 2021-10-23T09:41:25Z

Nice! Wanted to do this for ages.

In terms of API, I suggesting using _by_kind() suffix, rather than _matching, as that's more specific to what exactly we match.

I also like CAD97's suggestion to makes this look like foo.children().by_kind(|it| it == BLOCK_EXPR).

Not sure how to express the predicate. I can see three options: pass generic impl Fn, pass fn pointer, pass a single kind, pass an array of kinds. I think I'd go with fn pointer -- that's slightly unusual API, but seems to work best for this case. Though, no strong opinion.

Finally, I suggest to start with the smallest API extension we can do here. Looking at the rowan diff, it seems that .by_kind on iterators is all we need here?

matklad · 2021-10-23T09:42:28Z

Would be curious to see what's wall-clock time difference on rowan as well? (rust-analyzer analysis-stats path/to/directory)

theo-lw · 2021-10-23T16:31:08Z

Running cargo run --release -- analysis-stats /rowan shows some improvement.

Before this change:

After this change:

Roughly a 4.6% improvement in the total wall-clock time

CAD97 · 2021-10-23T18:41:43Z

Not sure how to express the predicate. I can see three options: pass generic impl Fn, pass fn pointer, pass a single kind, pass an array of kinds. I think I'd go with fn pointer -- that's slightly unusual API, but seems to work best for this case. Though, no strong opinion.

(Completely overkill but) We could do something like the str.matches(impl Pattern) API and answer "yes" to how to express the predicate

theo-lw · 2021-10-23T21:03:00Z

I'm expressing the predicate as a generic impl Fn for now as it seems to be the most flexible. What's more, I've been able to integrate these changes in rust-analyzer without needing to add type parameters everywhere (see this branch).

theo-lw · 2021-10-23T21:34:49Z

I see I'm failing tests in rust-analyzer, I will investigate.

Edit: fixed the tests.

…t (i.e, only if we start iterating through them) to avoid some unnecessary allocations when calling by_kind

theo-lw · 2021-10-24T03:29:05Z

Current DHAT output of running the integrated completions benchmark:

On the master branch of my fork without these changes:

On my branch that includes these changes:

We can also see that the completions benchmark went from 88.9 billion instructions to 83.7 billion instructions, roughly a 6% decrease.

matklad · 2021-10-24T10:44:03Z

src/api.rs

+    pub fn by_kind<F: Clone + Fn(SyntaxKind) -> bool>(
+        self,
+        matcher: F,
+    ) -> SyntaxNodeChildrenMatching<F, L> {


Suggested change

) -> SyntaxNodeChildrenMatching<F, L> {

) -> SyntaxNodeChildrenByKind<F, L> {

matklad · 2021-10-24T10:47:19Z

src/api.rs

+}
+
+#[derive(Debug, Clone)]
+pub struct SyntaxNodeChildrenMatching<F: Clone + Fn(SyntaxKind) -> bool, L: Language> {


I'd probably still stick with fn(SyntaxKind) -> bool. Fn trait is indeed more flexible, but that comes at the expense of compile times, as generics break separate compilation. With fn, we compile everything when compiling rowan. With Fn, donstream crates would have to carry compilation time burden.

Would there be any disadvantage to a &dyn Fn vs. an fn?

Good question. It'll have to carry a lifetime, and would be two pointers wide, while what we actually need in practice is fn. I guess, my primary intuition here is about avoiding premature generalization.

I think support::token is a place where we would have to pass a closure to get by_kinds to work. Because of this, I'm using &dyn Fn right now.

matklad · 2021-10-24T10:48:48Z

src/cursor.rs

+    parent: SyntaxNode,
    next: Option<SyntaxElement>,
+    next_initialized: bool,


Hm, why do we need to change anything here? I'd expect SyntaxElementChildren to remain exactly as they were

Good question. I changed this so we could avoid creating SyntaxElementChildren::next when calling children().by_kind. This actually makes a pretty significant impact - here's the DHAT output of the integrated completions benchmark without this optimization:

And here's the DHAT output of the integrated completions benchmark after this optimization:

98 million blocks allocated (before this optimization) vs 86 million blocks allocated (after this optimization).

Sounds odd to me that using first_child_or_token_by_kind with these extra changes is so much faster than seeking to the next sibling via next_sibling_or_token_by_kind, as both ways come down to iterating a slice no?

Hmm, my observation is that using first_child_or_token_by_kind allows us to jump to the first child/token node that matches a kind without needing to allocate any other nodes. Contrast this to using first_child_or_token followed by next_sibling_or_token_by_kind (if the first child doesn't match the kind), which can potentially allocate the first child before jumping to the desired node. I think its this allocation avoidance that explains the improvement in the DHAT outputs.

… instead of generic impl Fn.

src/api.rs

src/cursor.rs

Veykril · 2021-11-08T12:27:46Z

src/cursor.rs

+    parent: SyntaxNode,
    next: Option<SyntaxElement>,
+    next_initialized: bool,


Sounds odd to me that using first_child_or_token_by_kind with these extra changes is so much faster than seeking to the next sibling via next_sibling_or_token_by_kind, as both ways come down to iterating a slice no?

Attempt to reduce the allocations needed to iterate through the child…

3d93945

…ren of SyntaxNode/SyntaxElement

theo-lw marked this pull request as draft October 3, 2021 19:33

Reduce allocations when looking through children with tokens, remove …

fd63d90

…the generic parameter over Fn(SyntaxKind) from the outwards facing api

Run cargo fmt

05e47e9

theo-lw marked this pull request as ready for review October 3, 2021 23:28

theo-lw changed the title ~~[DRAFT] Reduce the number of allocations needed to find a specific child/sibling~~ Reduce the number of allocations needed to find a specific child/sibling Oct 3, 2021

CAD97 approved these changes Oct 3, 2021

View reviewed changes

src/api.rs Outdated Show resolved Hide resolved

CAD97 requested a review from matklad October 3, 2021 23:29

Moved 'a to children_with_tokens_matching

9b96915

Expose the api as children().by_kind

5d62e45

Theodore Luo Wang added 3 commits October 23, 2021 18:12

Remove unnecessary functions, fix off-by-one error

0dad088

Lazily create SyntaxNodeChildren::next and SyntaxElementChildren::nex…

94b039c

…t (i.e, only if we start iterating through them) to avoid some unnecessary allocations when calling by_kind

Run cargo fmt

bff5da8

matklad reviewed Oct 24, 2021

View reviewed changes

Theodore Luo Wang added 4 commits October 24, 2021 11:45

Change Syntax*ChildrenMatching -> Syntax*ChildrenByKind

8ba46a7

Run cargo fmt

2b94548

Change function names to use by_kind instead of matching. Use &dyn Fn…

3f283c9

… instead of generic impl Fn.

Expose SyntaxNodeChildrenByKind

04e1a2a

Veykril reviewed Nov 8, 2021

View reviewed changes

Changes per Veykril's comments

39b5410

allancalix mentioned this pull request Sep 18, 2022

improvements to lexer + parser performance apollographql/apollo-rs#293

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the number of allocations needed to find a specific child/sibling #119

Reduce the number of allocations needed to find a specific child/sibling #119

theo-lw commented Oct 3, 2021 •

edited

Loading

theo-lw commented Oct 3, 2021

CAD97 left a comment •

edited

Loading

matklad commented Oct 23, 2021

matklad commented Oct 23, 2021

theo-lw commented Oct 23, 2021

CAD97 commented Oct 23, 2021

theo-lw commented Oct 23, 2021

theo-lw commented Oct 23, 2021 •

edited

Loading

theo-lw commented Oct 24, 2021

matklad Oct 24, 2021

theo-lw Oct 24, 2021

matklad Oct 24, 2021

flodiebold Oct 24, 2021

matklad Oct 24, 2021

theo-lw Oct 24, 2021 •

edited

Loading

matklad Oct 24, 2021

theo-lw Oct 24, 2021

Veykril Nov 8, 2021 •

edited

Loading

theo-lw Nov 9, 2021

Veykril Nov 8, 2021 •

edited

Loading

	) -> SyntaxNodeChildrenMatching<F, L> {
	) -> SyntaxNodeChildrenByKind<F, L> {

Reduce the number of allocations needed to find a specific child/sibling #119

Are you sure you want to change the base?

Reduce the number of allocations needed to find a specific child/sibling #119

Conversation

theo-lw commented Oct 3, 2021 • edited Loading

theo-lw commented Oct 3, 2021

CAD97 left a comment • edited Loading

Choose a reason for hiding this comment

matklad commented Oct 23, 2021

matklad commented Oct 23, 2021

theo-lw commented Oct 23, 2021

CAD97 commented Oct 23, 2021

theo-lw commented Oct 23, 2021

theo-lw commented Oct 23, 2021 • edited Loading

theo-lw commented Oct 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theo-lw Oct 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veykril Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veykril Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

theo-lw commented Oct 3, 2021 •

edited

Loading

CAD97 left a comment •

edited

Loading

theo-lw commented Oct 23, 2021 •

edited

Loading

theo-lw Oct 24, 2021 •

edited

Loading

Veykril Nov 8, 2021 •

edited

Loading

Veykril Nov 8, 2021 •

edited

Loading