Skip to content

Conversation

GuillaumeGomez
Copy link
Member

I was very bothered by the complexity of this file, in particular the handling of pending_elems which was very tricky to follow.

So instead, here comes a more sane approach: the content is store in a stack-like type which handles "levels" of HTML (ie, a macro expansion can contain other HTML tags which can themselves contain other, etc). Making it much simpler to keep track of what's going on.

r? @lolbinarycat

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels Sep 24, 2025
@GuillaumeGomez
Copy link
Member Author

Also need to check the impact on performance (likely slower).

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 24, 2025
@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 24, 2025
@rust-bors
Copy link

rust-bors bot commented Sep 24, 2025

☀️ Try build successful (CI)
Build commit: 6020c97 (6020c97e3046a35e53fd9885eda164c570010a6c, parent: 15283f6fe95e5b604273d13a428bab5fc0788f5a)

@rust-timer

This comment has been minimized.

let mut closing_tag = None;
for part in &self.content {
let text: &dyn Display =
if part.needs_escape { &EscapeBodyText(&part.text) } else { &part.text };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, Either impls Display, which can be nicer than a dyn ref (and maybe slightly more performant)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks!

Comment on lines 256 to 240
for part in elem.content.drain(..) {
last.content.push(part);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for part in elem.content.drain(..) {
last.content.push(part);
}
last.content.append(&mut elem.content);

Both shorter and might also be slightly more performant (can probably pre-reserve just enough capacity in target vector)

Comment on lines 282 to 266
for elem in elements {
self.elements.push(elem);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for elem in elements {
self.elements.push(elem);
}
self.elements.extend(elements);

Same deal as https://github.com/rust-lang/rust/pull/146992/files#r2376863766

@yotamofek
Copy link
Contributor

Code reads much better IMHO!

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6020c97): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
2.2% [0.7%, 6.6%] 19
Regressions ❌
(secondary)
3.2% [0.1%, 13.3%] 17
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.2% [0.7%, 6.6%] 19

Max RSS (memory usage)

Results (primary 0.4%, secondary 12.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
6.2% [6.2%, 6.2%] 1
Regressions ❌
(secondary)
12.0% [2.6%, 31.9%] 6
Improvements ✅
(primary)
-2.5% [-2.8%, -2.2%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [-2.8%, 6.2%] 3

Cycles

Results (primary 2.8%, secondary 5.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.3% [3.2%, 6.5%] 3
Regressions ❌
(secondary)
8.3% [3.6%, 12.8%] 6
Improvements ✅
(primary)
-1.6% [-1.6%, -1.6%] 1
Improvements ✅
(secondary)
-1.5% [-1.5%, -1.4%] 2
All ❌✅ (primary) 2.8% [-1.6%, 6.5%] 4

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 471.213s -> 470.794s (-0.09%)
Artifact size: 387.83 MiB -> 387.93 MiB (0.03%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 24, 2025
@GuillaumeGomez
Copy link
Member Author

Code reads much better IMHO!

Agreed (unsurprisingly 😆), but sadly I think this solution is unlikely to get much better performance-wise so unlikely it'll be merged.

However I now have a much cleaner code, so I think I'll go back to the original "streaming content" but with a much cleaner approach.

@lolbinarycat
Copy link
Contributor

If I had to guess the reason for the perf regression, I would say it probably has to do with all the extra intermediate string allocations. I feel like if we had a way to delay formatting (maybe using an enum, or with dyn Display, or just make it so the final writer is given up front), this would have a lot less overhead.

@GuillaumeGomez
Copy link
Member Author

Possibly. Want to give a try pushing it even further before I try to turn this back into a streaming algorithm? Same question for you @yotamofek. 😉

Start from my branch and open PRs with your commits so we can run perf check on them.

@yotamofek
Copy link
Contributor

I'll give it a go, but my gut says the extra string allocations aren't causing the lion's share of the regressions. Worth a shot though

@bors
Copy link
Collaborator

bors commented Sep 26, 2025

☔ The latest upstream changes (presumably #147037) made this pull request unmergeable. Please resolve the merge conflicts.

@rustbot

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Oct 6, 2025

☔ The latest upstream changes (presumably #147397) made this pull request unmergeable. Please resolve the merge conflicts.

@rustbot
Copy link
Collaborator

rustbot commented Oct 7, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@GuillaumeGomez
Copy link
Member Author

So I realized that there are some bugs in the existing code which are fixed by this PR:

image

In the example above, only self should be underlined, not the whitespace. Anyway, looking a bit more into this performance issue.

@GuillaumeGomez
Copy link
Member Author

@bors2 try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Oct 7, 2025
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 7, 2025
@rust-bors
Copy link

rust-bors bot commented Oct 7, 2025

☀️ Try build successful (CI)
Build commit: 1629a3c (1629a3c85e350f2f8dfc39a15d3e40c3b895d4fc, parent: 4a54b26d30dac43778afb0e503524b763fce0eee)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1629a3c): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
1.5% [0.4%, 4.4%] 19
Regressions ❌
(secondary)
3.0% [0.2%, 9.3%] 12
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.6% [-0.6%, -0.5%] 3
All ❌✅ (primary) 1.5% [0.4%, 4.4%] 19

Max RSS (memory usage)

Results (primary 4.8%, secondary 13.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.8% [4.8%, 4.8%] 1
Regressions ❌
(secondary)
13.1% [4.1%, 32.0%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 4.8% [4.8%, 4.8%] 1

Cycles

Results (primary 3.6%, secondary 4.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.6% [3.2%, 4.0%] 2
Regressions ❌
(secondary)
5.1% [1.7%, 9.3%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.6% [-2.6%, -2.6%] 1
All ❌✅ (primary) 3.6% [3.2%, 4.0%] 2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 473.216s -> 474.103s (0.19%)
Artifact size: 388.39 MiB -> 388.39 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 7, 2025
@GuillaumeGomez
Copy link
Member Author

A bit better. I have some idea to reduce the memory usage as well.

@GuillaumeGomez
Copy link
Member Author

Now let's see if flushing more often makes it better for memory and CPU.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Oct 8, 2025
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 8, 2025
@rust-bors
Copy link

rust-bors bot commented Oct 8, 2025

☀️ Try build successful (CI)
Build commit: 5ce920c (5ce920c89699da2a07c1b7692830149792b9b657, parent: 5767910cbcc9d199bf261a468574d45aa3857599)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5ce920c): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
1.4% [0.4%, 4.1%] 19
Regressions ❌
(secondary)
2.3% [0.2%, 8.4%] 15
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 2
All ❌✅ (primary) 1.4% [0.4%, 4.1%] 19

Max RSS (memory usage)

Results (primary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.4% [2.4%, 2.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.4% [2.4%, 2.4%] 1

Cycles

Results (primary 2.4%, secondary 3.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.4% [2.4%, 2.4%] 1
Regressions ❌
(secondary)
3.1% [1.5%, 4.3%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.4% [2.4%, 2.4%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 473.657s -> 473.163s (-0.10%)
Artifact size: 388.42 MiB -> 388.42 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 8, 2025
@GuillaumeGomez
Copy link
Member Author

Memory usage dropped by a lot (still 2.4% more) and performance loss is now around 8.4%. Much better. :)

@yotamofek
Copy link
Contributor

Memory usage dropped by a lot (still 2.4% more) and performance loss is now around 8.4%. Much better. :)

Nice!
Perf regression is getting very close to being worth the extra code clarity. I'll see if I have any other optimization ideas next week.

@krtab
Copy link
Contributor

krtab commented Oct 9, 2025

I think this helps restore correct performances: krtab@2cffe1f

Basically, we are creating a whole lot of elements with a one sized vec before merging them together, and all this allocations were previously discared, this reuses one such vec as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants