Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the number of allocations needed to find a specific child/sibling #119

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

theo-lw
Copy link
Contributor

@theo-lw theo-lw commented Oct 3, 2021

A very common operation in rust-analyzer is finding a child/sibling of a SyntaxNode that matches a specific SyntaxKind. This is currently implemented by iterating through children and finding the first matching node. However, this causes unnecessary allocations - the children that don't match are allocated and quickly discarded.

This PR aims to reduce these unnecessary allocations by providing methods to iterate through only the children/siblings that match the desired SyntaxKind. Children/siblings that don't match the desired SyntaxKind are not allocated in these iterators.

So, I've only implemented this optimization for iterating through children, but this optimization already seems quite promising. Below is the DHAT output from running rust-analyzer's integrated completion benchmarks.

DHAT output before this change (on the master branch in my fork):

==196375== Total:     15,179,316,140 bytes in 107,064,174 blocks
==196375== At t-gmax: 587,308,085 bytes in 2,611,269 blocks
==196375== At t-end:  845,424 bytes in 1,480 blocks
==196375== Reads:     33,131,042,097 bytes
==196375== Writes:    17,949,507,478 bytes

DHAT output after this change (on this branch where I used children_matching in support::child).

==82931== Total:     14,585,180,134 bytes in 97,752,267 blocks
==82931== At t-gmax: 585,555,693 bytes in 2,599,462 blocks
==82931== At t-end:  845,584 bytes in 1,483 blocks
==82931== Reads:     32,637,529,753 bytes
==82931== Writes:    17,171,029,456 bytes

This is roughly a 8.5% decrease in total blocks allocated and a 4% decrease in the total bytes allocated.

@theo-lw theo-lw marked this pull request as draft October 3, 2021 19:33
…the generic parameter over Fn(SyntaxKind) from the outwards facing api
@theo-lw
Copy link
Contributor Author

theo-lw commented Oct 3, 2021

DHAT output of the integrated completion benchmarks with the latest commits:

==73109== Total:     13,722,837,242 bytes in 84,207,767 blocks
==73109== At t-gmax: 585,402,987 bytes in 2,596,664 blocks
==73109== At t-end:  846,800 bytes in 1,487 blocks
==73109== Reads:     31,817,411,560 bytes
==73109== Writes:    16,038,960,745 bytes

Roughly a 21% decrease in total blocks allocated, a 9.6% decrease in total bytes allocated.

@theo-lw theo-lw marked this pull request as ready for review October 3, 2021 23:28
@theo-lw theo-lw changed the title [DRAFT] Reduce the number of allocations needed to find a specific child/sibling Reduce the number of allocations needed to find a specific child/sibling Oct 3, 2021
Copy link
Collaborator

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly a 21% decrease in total blocks allocated, a 9.6% decrease in total bytes allocated.

👀 that's pretty significant (for the measured workload)!

Needs a cargo fmt, but otherwise LGTM.

It might also be interesting to instead expose this as e.g. .children().filter_by_kind(|kind| ..) (cf. filter + *_by_key) rather than .children_matching(|kind| ..), but I don't really have a strong preference here.

r? @matklad?

src/api.rs Outdated Show resolved Hide resolved
@CAD97 CAD97 requested a review from matklad October 3, 2021 23:29
@matklad
Copy link
Member

matklad commented Oct 23, 2021

Nice! Wanted to do this for ages.

In terms of API, I suggesting using _by_kind() suffix, rather than _matching, as that's more specific to what exactly we match.

I also like CAD97's suggestion to makes this look like foo.children().by_kind(|it| it == BLOCK_EXPR).

Not sure how to express the predicate. I can see three options: pass generic impl Fn, pass fn pointer, pass a single kind, pass an array of kinds. I think I'd go with fn pointer -- that's slightly unusual API, but seems to work best for this case. Though, no strong opinion.

Finally, I suggest to start with the smallest API extension we can do here. Looking at the rowan diff, it seems that .by_kind on iterators is all we need here?

@matklad
Copy link
Member

matklad commented Oct 23, 2021

Would be curious to see what's wall-clock time difference on rowan as well? (rust-analyzer analysis-stats path/to/directory)

@theo-lw
Copy link
Contributor Author

theo-lw commented Oct 23, 2021

Running cargo run --release -- analysis-stats /rowan shows some improvement.

Before this change:

Screen Shot 2021-10-23 at 12 25 05 PM

After this change:

Screen Shot 2021-10-23 at 12 24 53 PM

Roughly a 4.6% improvement in the total wall-clock time

@CAD97
Copy link
Collaborator

CAD97 commented Oct 23, 2021

Not sure how to express the predicate. I can see three options: pass generic impl Fn, pass fn pointer, pass a single kind, pass an array of kinds. I think I'd go with fn pointer -- that's slightly unusual API, but seems to work best for this case. Though, no strong opinion.

(Completely overkill but) We could do something like the str.matches(impl Pattern) API and answer "yes" to how to express the predicate

@theo-lw
Copy link
Contributor Author

theo-lw commented Oct 23, 2021

I'm expressing the predicate as a generic impl Fn for now as it seems to be the most flexible. What's more, I've been able to integrate these changes in rust-analyzer without needing to add type parameters everywhere (see this branch).

@theo-lw
Copy link
Contributor Author

theo-lw commented Oct 23, 2021

I see I'm failing tests in rust-analyzer, I will investigate.

Edit: fixed the tests.

Theodore Luo Wang added 3 commits October 23, 2021 18:12
…t (i.e, only if we start iterating through them) to avoid some unnecessary allocations when calling by_kind
@theo-lw
Copy link
Contributor Author

theo-lw commented Oct 24, 2021

Current DHAT output of running the integrated completions benchmark:

On the master branch of my fork without these changes:

Screen Shot 2021-10-23 at 11 18 31 PM

On my branch that includes these changes:

Screen Shot 2021-10-23 at 11 19 05 PM

We can also see that the completions benchmark went from 88.9 billion instructions to 83.7 billion instructions, roughly a 6% decrease.

src/api.rs Outdated
pub fn by_kind<F: Clone + Fn(SyntaxKind) -> bool>(
self,
matcher: F,
) -> SyntaxNodeChildrenMatching<F, L> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> SyntaxNodeChildrenMatching<F, L> {
) -> SyntaxNodeChildrenByKind<F, L> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

src/api.rs Outdated
}

#[derive(Debug, Clone)]
pub struct SyntaxNodeChildrenMatching<F: Clone + Fn(SyntaxKind) -> bool, L: Language> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably still stick with fn(SyntaxKind) -> bool. Fn trait is indeed more flexible, but that comes at the expense of compile times, as generics break separate compilation. With fn, we compile everything when compiling rowan. With Fn, donstream crates would have to carry compilation time burden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any disadvantage to a &dyn Fn vs. an fn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It'll have to carry a lifetime, and would be two pointers wide, while what we actually need in practice is fn. I guess, my primary intuition here is about avoiding premature generalization.

Copy link
Contributor Author

@theo-lw theo-lw Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think support::token is a place where we would have to pass a closure to get by_kinds to work. Because of this, I'm using &dyn Fn right now.

Comment on lines +1306 to +1308
parent: SyntaxNode,
next: Option<SyntaxElement>,
next_initialized: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why do we need to change anything here? I'd expect SyntaxElementChildren to remain exactly as they were

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I changed this so we could avoid creating SyntaxElementChildren::next when calling children().by_kind. This actually makes a pretty significant impact - here's the DHAT output of the integrated completions benchmark without this optimization:

Screen Shot 2021-10-24 at 11 29 39 AM

And here's the DHAT output of the integrated completions benchmark after this optimization:

Screen Shot 2021-10-23 at 11 19 05 PM

98 million blocks allocated (before this optimization) vs 86 million blocks allocated (after this optimization).

Copy link
Member

@Veykril Veykril Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds odd to me that using first_child_or_token_by_kind with these extra changes is so much faster than seeking to the next sibling via next_sibling_or_token_by_kind, as both ways come down to iterating a slice no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my observation is that using first_child_or_token_by_kind allows us to jump to the first child/token node that matches a kind without needing to allocate any other nodes. Contrast this to using first_child_or_token followed by next_sibling_or_token_by_kind (if the first child doesn't match the kind), which can potentially allocate the first child before jumping to the desired node. I think its this allocation avoidance that explains the improvement in the DHAT outputs.

src/api.rs Outdated Show resolved Hide resolved
src/api.rs Outdated Show resolved Hide resolved
src/cursor.rs Outdated Show resolved Hide resolved
Comment on lines +1306 to +1308
parent: SyntaxNode,
next: Option<SyntaxElement>,
next_initialized: bool,
Copy link
Member

@Veykril Veykril Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds odd to me that using first_child_or_token_by_kind with these extra changes is so much faster than seeking to the next sibling via next_sibling_or_token_by_kind, as both ways come down to iterating a slice no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants