Remove Mustermann leaf matcher to improve performance#284
Remove Mustermann leaf matcher to improve performance#284dcr8898 wants to merge 30 commits intohanami:mainfrom
Conversation
|
This commit addresses a second issue identified and fixed by @kyleplump in #279: using a regular expression to split route and path strings is much slower than using a string ("/"). With this commit, operating speed is faster than 2.1, but start up time and memory usage are unchanged from the previous commit. Commit label: 2.2b |
|
This commit is the proof of concept for removing Mustermann. Basic functionality is implemented (see below). Performance is improved on all metrics. Some tests are failing, as expected at this point. But this proof of concept also comes with a serious reality check. DiscussionThe graphs below show the performance benefits of eliminating Mustermann:
The price of these increases, however, is features. This commit implements "segment-sized" dynamic variables, such as However, I now see from most of the currently failing tests that the specs do document more exotic usage patterns, not documented in the guides or README, that are available through the use of Mustermann. This includes, for example, dynamic variables embedded within a segment, such as: I'm not sure if I can duplicate all of these usage patterns using my current approach, but I want to do more thinking about this. This raises the question of what features we want to support? Are we willing to sacrifice some of the flexibility of Mustermann for better performance? These are our current options, as I see it:
I am happy to elaborate on my current approach, if anyone would like to help me think through alternatives. Unless the discussion here pushes me in a different directions, I plan to implement constraints next (via regular expressions), and URI decoding of all variable segments (interestingly, a core feature of Mustermann, but not exercised in the current test suite). Commit label: 2.2c |
|
@kyleplump @cllns @timriley I am tagging you all for a temperature check as to the direction I am going. No rush to respond. I will be traveling for nine days starting Tuesday, so I thought this would be a good time for reflection and conversation. It may also be a good time to broadcast this exploration to the wider community for feedback. I doubt I will do any coding while traveling, but I may be able to comment. It might be possible to add basic constraints before I go, but no promises. Constraints would have no effect on this benchmark, since the benchmark doesn't exercise them. Way Point No. 1Let's see where we stand so far. Here are three graphs comparing:
I will say again that r10k is not a great tool for reflecting real-world usage (at all), but it may be useful for gross comparisons. It is clear from these graphs that Hanami router in any configuration occupies a sweet spot in terms of performance and developer experience: developer experience on par with Rails, and performance approaching that of Roda. This is Luca's achievement and should be recognized and appreciated. 🙇 Rails routing performance isn't close on any measure. This is food for thought. Questions for the community
|
|
This commit adds basic constraints using regular expressions. There is a variation on regular expression constraints available with Mustermann that allows POSIX-like notations. I added this as a feature in the Not Implemented list. Let me know if this should be a stretch goal. At this point, we have essentially matched the feature set described in the Hanami router guide. However, I don't think we have full parity until we have URI decoding in place (a feature of Mustermann). Even though this is not mentioned in the guide, I see it as basic functionality. I will try to implement that next. As of now, only 16 tests are failing (down from 35 in the prior commit). Four of these are unit tests that I need to re-work, remove, or skip. The other 12 are "exotic" applications of Mustermann, like optional segments, multiple dynamic variables in a segment, and escaped characters. Benchmark performance is indistinguishable from 2.2c. Commit label: 2.2d
Next is fixing unit tests and considering skipping remaining failing tests as undocumented behavior that is not representative of current requirements. |
|
Thank you for your work on this, @dcr8898! It is exemplary as always. I'll get to looking at the code soon, but I did want to answer this question of yours:
If I look at the README overview for the Rails flavour of Mustermann, I'm interested in us supporting the features exercised in all three of the example routes: pattern = Mustermann.new('/:example', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => false
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => nil
pattern = Mustermann.new('/:example(/:optional)', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar", "optional" => nil }
pattern.params("/foo/bar") # => { "example" => "foo", "optional" => "bar" }
pattern = Mustermann.new('/*example', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => { "example" => "foo/bar" }
As for get "/guides/:org/:version/:slug", to: "guides.show"
get "/guides/:org/:version/:slug/*path", to: "guides.show"A route of If we can support these three things, then I think we'll be able to express most routes well enough. It also means we'll largely maintain compatibility with the routing support we've offered historically. I'm not sure if this throws a spanner in your works, but I'm keen to hear your response to the above, @dcr8898 😄 |
|
@timriley Thanks for the guidance! It's invaluable! Quick overview of the router's function:
To address your three feature requests out of order:
pattern = Mustermann.new('/:example', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => false
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => nil
pattern = Mustermann.new('/*example', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar" }
pattern.params("/foo/bar") # => { "example" => "foo/bar" }
pattern = Mustermann.new('/:example(/:optional)', type: :rails)
pattern === "/foo.bar" # => true
pattern === "/foo/bar" # => true
pattern.params("/foo.bar") # => { "example" => "foo.bar", "optional" => nil }
pattern.params("/foo/bar") # => { "example" => "foo", "optional" => "bar" }TakeawaysMy takeaway, since two of your requests should be working now.
Further thoughtsThe current test suite includes what I've been calling "exotic" usages of Mustermann. For example, multiple dynamic variables in a segment or dynamic variables that are a portion of a segment. I believe that these usage patterns could be compatible with my current approach by . . . using Mustermann for these specific cases. We should be able to detect these "exotic" usages and only use Mustermann for the segments that are affected. I like this approach of matching power to demand. The majority of use cases would be unaffected. This was the primary performance problem of the 2.2 router: invoking the full power of Mustermann for all routes, when it is not needed for the most common use cases. |
|
I'm not sure what you mean by this bit:
The splat doesn't capture the leading / in ( |
|
This commit refactors tests to gain clarity before moving forward. At the start of the spike there were 37 skipped tests. Prior to this commit, 16 tests were failing and 41 were skipped (I skipped a few more unit tests along the way). This commit does the following:
Thankfully, CI failures should now be gone too. 🥳 Benchmark performance is indistinguishable from 2.2d. Commit label: 2.2e Next up is support for optional clauses. |
|
This commit implements optional route clauses. Nested optional clauses and consecutive optional clauses are supported. This is implemented by defining routes for every possible permutation of optional clauses. This concludes the basic features that @timriley requested, although some more polish is probably needed, like automatic URI decoding. This is a good place for another way point and a meaningful conversation about where to go from here, especially as pertains to Mustermann. However, before we do that, I am curious to pursue a few more performance tweaks. Benchmark performance is indistinguishable from 2.2e. Commit label: 2.2f Next up is some performance experiments. |
|
I have exhausted my ideas for performance improvements. I feel that this commit concludes this spike. I will declare a way point below and invite a discussion as to whether this spike should be pursued as an actual PR, and what further changes would be needed. ImprovementsThe changes I made here centered mostly around replacing calls to Next, I converted the "globbed route" and "optional route" detection in Router from a Regexp to a String method. However, this did not seem to have much effect. I find this a little odd, even though these types of routes are not exercised by this benchmark, because these checks are made for every route when compiling the Tries. In fact, it seems to me that the rps benchmarks may have been slightly lower after this change for the 10, 100, and 1000 route cases. Weird. Another change I attempted was to use When using A keen eye will show that RPS and Runtime are slightly improved from 2.2f. Commit label: 2.2g |
cllns
left a comment
There was a problem hiding this comment.
First of all, thank you so much for continuing work on this @dcr8898! This is a huge lift.
I made a couple tiny suggestions, but overall this is good with me, based on your description of the changes. I'm fine with removing support for features that were only documented via specs. If those features breaking is a problem for users, they can revert to an earlier version for now, and we can figure out how to add support for them, or provide a workaround.
I renamed this PR to be more accurate, since we're not actually removing all uses of Mustermann, just one important use of it.
Curious what @kyleplump and @timriley have to say as well!
| [ | ||
| +EMPTY_STRING << match_data.pre_match << match_data[1] << match_data.post_match, | ||
| +EMPTY_STRING << match_data.pre_match << match_data.post_match | ||
| ].each do |new_path| |
There was a problem hiding this comment.
Instead of creating this intermediary array to iterate over it, could we repeat the conditional below for each new_path we create? Less DRY but possibly more performant?
There was a problem hiding this comment.
I will try this. It makes sense.
This point also highlights the difference between writing speed-sensitive library code and writing apps: apps are optimized for maintainability, while the library code is optimized for performance. That often requires abandoning a lot of code hygiene practices that we usually promote (like DRY in this case). Many of those good practices are based on indirection, like using constants to label magic values, but that indirection impacts performance. This was the basis of my question above: How much performance is enough? I'm not sure it's worth in-lining all of the constants in the Router, for example, but others might disagree.
It's also hard to know if these micro-optimizations have an effect when we are using gross measurements like r10k. With that said, I'm curious to try this one and see what happens. 🤓
There was a problem hiding this comment.
Taking a fresh look at this, since it only applies to optional routes, which are uncommon and not exercised by this benchmark at all, I wonder if this change is worthwhile. Thoughts?
There was a problem hiding this comment.
According to Fast Ruby, interpolation should be slightly faster than the shovel method. Do you think this change is worth making?
[
-"#{match_data.pre_match}#{match_data[1]}#{match_data.post_match}",
-"#{match_data.pre_match}#{match_data.post_match}"
].each do |new_path|This is what I was getting at with the "Question for All." I like searching for performance tweaks, and I think a router should be as performant as possible, but I agonize over the loss of readability and maintainability. In this case, I think switching to interpolation is probably okay, because the code remains understandable (I think). But I hesitate to un-DRY the code for what is a rare usage (optional clauses).
What do you all think?
kyleplump
left a comment
There was a problem hiding this comment.
this is honestly an insane amount of work @dcr8898, and all of the commendations are well deserved 🎉 💪
(made a few code level comments)
in terms of philosophy, I agree with a lot of what you brought up. I think about this problem from two points of view and I think you've landed at ideal solutions for both (in my opinion):
-
'Hanami in a bubble': if Hanami was the only framework in the world, I think the important metric to capture with this change is to make the common use case as fast as possible, while providing clear instructions on the 'exotic' cases. The 'common case' is something Tim defined, and you've achieved 'as fast as possible' so big win there ❤️ . for the 'edge / exotic' use cases I agree with what Sean said: we can provide workarounds / documentation on how to accomplish these cases, and even support them using a slower Mustermann based approach. all good on this point, 10/10
-
'The real world': unfortunately Hanami does not exist in a bubble and will be constantly compared to rails, whether that's something we collectively like or not. Since that comparison will be at the forefront of developers (especially newcomers) minds, we should honestly lean into the comparison. your observation here: "... offer a feature set and developer experience that rival that of Rails, while achieving performance comparable to Roda", I think is a huge selling point of this approach / the Hanami router generally. it's easy to get stuck optimizing against yourself, but it's important to remember the broader context in which the framework exists. your work widens the gap with Rails and I think makes this spike super appealing
in summary: 'LGTM 👍'
excellent as always @dcr8898 , I'm very much in favor of the change as long as @timriley and @cllns are
| [ | ||
| +EMPTY_STRING << match_data.pre_match << match_data[1] << match_data.post_match, | ||
| +EMPTY_STRING << match_data.pre_match << match_data.post_match | ||
| ].each do |new_path| |
@cllns The fastest strategy I found was the one you suggested. I called it "SneakyArray." To keep it faithful to our use case, I still used It may be related to the fact that re-implementing return unless path[1..].split("/") do |segment|
node = node.get(segment, param_values)
break false if node.nil?
endThis made the benchmark about 10% slower in the 10K route case. 🤔 |
Correction. If I inline the splitting exactly as above in However, when I do the same in Question for allIs this performance increase (about 2% with 10K routes, less with fewer routes) worth the increased complexity of this change? |
To answer your question in general: we should optimize performance for 10-100 routes, never 10k routes. The reason I had an issue with 10k routes in a previous implementation (what started this saga 😅 ) is that it was causing the startup time to be so high (~10 seconds) that it was essentially un-usable (since it would add so much time to starting the server + for integration tests). It makes sense that performance degrades at the upper limit, in terms of execution time. We need to be careful to not let it explode, but a small increase is fine. Sorry, I didn't realize you were already operating on a string. Since we use frozen string literals, we'll need to get copies instead of mutating the object anyway. Also, based on a quick benchmark, This code below is really hard for me to parse. Is there a way to simplify it? return unless split iterate break false. Could we just iterate over the segments and return if the node is nil? |
|
With the above changes there is a noticeable performance increase--about 5% in the 10K route case. Memory use and start-up time are not noticeably affected. I don't know how far we want to go in pursuing further performance gains. This implementation is already markedly better than 2.1, 2.2, or 2.2b (which was the starting baseline for this exploration). Performance isn't the current goal, per se, but more of a baseline. But it is fun to see how far we can go. 🚀 The Big QuestionsIt seems like this is a viable path. Some "features" that were exercised in tests are not currently implemented, but I am confident most or all of them could be implemented in the future (although some them could impact performance). With that said:
Commit label: 2.2h |
Interesting! I will try it! 🤓 The docs say |
|
Hi @dcr8898, I just want to thank you so much for this fantastic work. Whatever direction we take, this has been a hugely informative investigation that I'm sure we'll refer back to in the future. This coming week I'm in the midst of talk writing (Baltic Ruby keynote coming up). This will take most of my focus. As soon as that's done I'll get onto responding to this PR :) |
Thank you! I agree, this has been an important learning journey, especially since Luca has stepped back and this was totally his creation. I look forward to whatever path we choose! It's all upwards from here for Hanami! 💮
No rush! I need to get back to looking for a job anyway! 🤓 |
Yeah, that's awful. 🙄 I'm going to chalk this up to tunnel-vision refactoring. I changed it to this: def find(path)
node = @root
param_values = []
path[1..].split(SEGMENT_SEPARATOR) do |segment|
node = node.get(segment, param_values)
break if node.nil?
end
node&.match(param_values)&.then { |found| [found.to, found.params] }
endIs that better? Any more indirection will reverse the performance gain I think. |
|
This seems like a good time to pause and wait for @timriley's input. I believe that the results so far show that this is a viable path. It's faster than using Mustermann because this approach shrinks the problem space. Mustermann is a powerful library for string pattern recognition, extraction, and expansion. However, our purpose is limited to path string parsing, not all string parsing. This is a smaller problem space. In this space, we can take advantage of the fact that path strings have a defined structure: segments, separated by I want to reiterate that we remain reliant on Mustermann for both globbed routes and all named route expanders (all or most routes in most apps), neither of which are exercised with this benchmark tool. This means our start-up time concerns remain unless we use an alternative for path string expansion. I think this is doable. The graphs below include 2.1 and 2.2b as reference points for our progress. Commit label: 2.2i |
Comments following the 2.3 releaseI was glad to see @kyleplump's performance fix get merged. In light of that and the 2.3 release, I thought this would be a good time to renew this conversation. This PRI started this spike to investigate ways to speed up the router, both at start up and in production. My main focus was on exploring the removal of the Mustermann dependency. I think the work here shows that Mustermann can be avoided for the most common routing use cases, and suggests that Mustermann can be avoided for all routing with just a little more work. However, routing is only half of the issue. The other half is path "expanding": providing properly constructed paths for link helpers. This is still currently done with Mustermann (Mustermann objects perform both route matching and path expanding). I believe we can gradually replace Mustermann for this purpose as well by creating lighter objects for each use case, starting with the most common ones (the way I approached refactoring routing here). With that said, that's a lot of stuff to put in one PR. Therefore, I think it is probably preferable to close this PR as a successful research spike and set out a roadmap to incorporate these ideas in a series of smaller PRs. Some suggested items for roadmap
Thoughts?@kyleplump @cllns @timriley and everyone else, I would love to hear other thoughts and ideas. Thank you all! |
|
Not sure what the status is, and I don't have a grasp of how the router works at the moment, but I felt like chiming in here, as Mustermann being so complex is my fault. I did initially explore writing a trie-based router as part of Mustermann. The internal AST structure used for compiling patterns is actually pretty ideal for that. At the time, I discarded that idea, as the main reason for Mustermann was to replace the somewhat messy pattern compilation in Sinatra (we kept having edge cases around regex capture greediness that people struggled with – this is something you might want to keep an eye on when building your own solution, if you want to support sub-segment matching). Shipping custom code might also be tricky if you want to support features like the pipe operator, as Rails does. They've shipped broken versions of it in stable releases – I think this is part of the reason no one has touched their route compilation since Rails 5.0. You would probably end up partially reimplementing an AST-based parser (like Journey or Mustermann). Turns out, at least back then, that the potential performance gains from trie-based routing were negligible or even negative for the typical Sinatra application, given that the number of routes is usually in the tens rather than the thousands. I have been revisiting Mustermann and am open to giving trie-based routing another try if you're interested. It would not help at all with the startup/compilation time. To be honest, I'm not really sure what to do about that, other than adding a shortcut compilation for simple patterns. It might be an option, given that Rails patterns are much simpler than others.
FWIW, Mustermann does realize this, too. They have a special AST node. They are also used for atomic grouping in regular expressions to speed up matching (i.e., you don't need to backtrack beyond them). Splitting by them just isn't necessary from a Mustermann perspective, as we're not building a trie. Mustermann is written with path-based matching and expansion as the main use case. That said, if you only ever have placeholders that span exactly one segment, or multiple segments, then Mustermann is probably way too complex. |



























Router Improvement
This will be a long-running, exploratory spike intended to remove the router's dependency on Mustermann and improve performance overall. Router performance is defined by two metrics: start up time and requests-per-second, both measured using the r10k benchmarking tool. I feel that this tool, r10k, is problematic for a number of reasons, but it is readily available and offers a general sense of performance changes as the spike progresses.
I invite any and all comments as this goes along. I am tagging the following contributors to request their input:
@kyleplump @cllns @timriley
Goals
Mustermann is a powerful string matching library that offers many features, most of which our router does not use. In fact, some of the work done by Mustermann is duplicated by the work the current router does in splitting route and path strings into segments in order to utilize trie data structures. Since Mustermann pattern objects are complex and resource intensive to create, eliminating this dependency should improve router startup time, performance in production, and reduced memory usage. I hope. 🙄
I believe that further improvements may be possible on these same metrics through a more general refactor of the router's operation. I would like to explore these ideas as well.
Checklists
Following are a series of checklists to guide progress on this spike. They should be considered "living lists" that will change during the process. Please feel free to edit the lists (or suggest edits) if they are not complete or detailed enough.
Current features
The Hanami Routing guide commits to supporting the following features (checked items have been re-implemented so far without Mustermann):
get "/books(/:id)".Features NOT implemented
These are undocumented--so far as the Hanami guides and the router README are concerned--features of the 2.1 and 2.2 routers that are powered by Mustermann and technically possible, but possibly can't be duplicated with 1-1 parity using my current approach.
"/test\\:variable""/test.:format".character, like"/hey.:greed.html"(captures:greed).Edge cases
I am aware of the following edge cases that aren't specifically covered in the guides, but merit discussion. It would be good to establish explicit rules for them.
/books/:idandbooks/:id/as distinct, matchable routes (because it uses Mustermann to match the complete, original route definition).String#spliton path strings, which ignores the final character if it is the#splitparameter.Stretch goals
I believe these goals are possible with my current approach. I will implement them if there is consensus on their value.
#match?, for example, and receive a string argument).Starting Point
The current 2.2 router introduced a performance regression compared to 2.1 (see #278 ). This commit (2.2a) corrects the core issue causing that regression, as identified and fixed by @kyleplump (see #279) and restores near parity to 2.1's speed, while increasing start up time (primarily due to Mustermann object creation during start up). This is our starting point, as shown in the following graphs.
Benchmark machine: AMD Ryzen 7 laptop with 24GB of RAM and no attempt to restrict background processes (in other words, YMMV).
Commit label: 2.2a
Test status: 544 examples, 0 failures, 37 pending
expand to see graphs