Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline code for function calls in interpreter #5330

Merged
merged 20 commits into from
Sep 10, 2024
Merged

Conversation

ChrisPenner
Copy link
Contributor

@ChrisPenner ChrisPenner commented Sep 5, 2024

Overview

Previously we stored only references to the code we wanted to jump to in things like function calls/applications even though the code each reference referred to was known at the time we translate to MCode.

This change statically resolves each reference to the code it refers to at code generation time, avoiding lookups at runtime.

Implementation notes

  • Parameterizes most of the MCode types over their comb type such that we can build the computations using CombIx, then "tie the knot" and resolve to Combs using a fixed point.
  • Add code to actually do the resolution in the appropriate spots (code-gen time, deserialization time)
  • Also downgrades parser-typechecker to -O1 by Dan's suggestion, O2 is pretty intense (and slow) and can probably be sequestered off to just the runtime package now that it's split off to improve build times. Reverted by Paul's request

Interesting/controversial decisions

Test coverage

  • Existing tests should cover it.

Benchmarks:

@pchiusano/misc-benchmarks/chris:.suite

this-branch:
Decode Nat
748ns

Generate 100 random numbers
470.88µs

Count to 1 million
449.3574ms

Json parsing (per document)
377.202µs

Count to N (per element)
533ns

Count to 1000
547.879µs

Mutate a Ref 1000 times
1.011772ms

CAS an IO.ref 1000 times
1.259371ms

List.range (per element)
651ns

List.range 0 1000
666.46µs

Set.fromList (range 0 1000)
3.05485ms

Map.fromList (range 0 1000)
2.314902ms

Map.lookup (1k element map)
5.877µs

Map.insert (1k element map)
15.034µs

List.at (1k element list)
580ns

Text.split /
40.449µs

----------

trunk:
Decode Nat
947ns

Generate 100 random numbers
623.831µs

Count to 1 million
574.758ms

Json parsing (per document)
461.911µs

Count to N (per element)
690ns

Count to 1000
691.485µs

Mutate a Ref 1000 times
1.288178ms

CAS an IO.ref 1000 times
1.589608ms

List.range (per element)
875ns

List.range 0 1000
879.3µs

Set.fromList (range 0 1000)
3.849484ms

Map.fromList (range 0 1000)
2.987285ms

Map.lookup (1k element map)
7.849µs

Map.insert (1k element map)
19.599µs

List.at (1k element list)
768ns

Text.split /
52.997µs

@mitchellwrosen/mbta/@mitchellwrosen/branch-for-chris:.runTheBenchmark

this-branch:
read bytes: 0.201ms
utf8 decode byes: 0.165ms
json decode text: 387.17ms
jsonapi parse json: 126.466ms
mbta parse jsonapi: 139.128ms

trunk:
read bytes: 0.188ms
utf8 decode byes: 0.166ms
json decode text: 490.019ms
jsonapi parse json: 170.002ms
mbta parse jsonapi: 182.414ms

Loose ends

Nope.

@pchiusano
Copy link
Member

Looking pretty good so far! Though less dramatic than I hoped. I'll be curious to see it with the same treatment applied to foreign function calls (not sure if you're planning separate PR for that), I'm guessing that will make the JSON decoding and other benchmarks that use a lot of foreign calls go faster.

Note that trunk is going to be worse the more code you have loaded, since the IntMap will have greater depth. A best case scenario for trunk is running benchmarks after freshly starting up UCM (or you can use compile command with ucm run.compiled).

But this isn't terribly realistic since (for instance) Unison Cloud nodes are kept running and are loading new code all the time. And most users aren't bouncing UCM willy nilly.

| Just n <- M.lookup exceptionRef rfTy,
-- TODO: Should I special-case this raise ref and pass it down from the top rather than always looking it up?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @dolio What do you think about this particular case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt special casing this will make much difference anywhere. This is essentially just the top level handler for exceptions, which will kill the interpreter with an error message when it's invoked.

@ChrisPenner ChrisPenner marked this pull request as ready for review September 6, 2024 18:01
Copy link
Member

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress, I'll let @dolio weigh in. I'm assuming you're going to do the analogous ForeignCall optimization as a separate PR?

One note: I think releases should still build everything with -O2 though, including parser-typechecker. We aren't really sure how much it matters there, but we also don't have a good way of testing it, so I'm inclined to leave it alone.

If you want to switch up the release process to support building with different flags than the default then that's cool, but I don't think it does that currently. Or you could do this in a separate PR.

@dolio
Copy link
Contributor

dolio commented Sep 6, 2024

Reading through some of this, something occurred to me. Is there even a reason for us to number the term references at this point?

I guess it's a little more economical to make numbers for the references and store the latter once for e.g. compiled data. But for actual code functionality, we just make up numbers for the references, then get rid of the numbers by making things circular, right?

resolveCombs mayExisting combs =
-- Fixed point lookup;
-- We make sure not to force resolved Combs or we'll loop forever.
let ~resolved =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ~ doesn't actually do anything. let is irrefutable by default unless perhaps the Strict extension is enabled, which I wouldn't recommend, because what it actually does can be rather confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it if you like, I'm mostly using it as a sign-post since Haskell doesn't have explicit let-recs 😢

Copy link
Contributor

@dolio dolio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've read through this. I don't see any major issues. It looks about like what I was expecting.

I left a couple comments about minor things, but that's it.

@ChrisPenner
Copy link
Contributor Author

And yes, I'll do the Foreign Funcs as a separate PR so we can track each performance change individually.

@ChrisPenner
Copy link
Contributor Author

ChrisPenner commented Sep 9, 2024

@dolio

Is there even a reason for us to number the term references at this point?
I guess it's a little more economical to make numbers for the references and store the latter once for e.g. compiled data. But for actual code functionality, we just make up numbers for the references, then get rid of the numbers by making things circular, right?

Yeah I think you're right, though tbh it's probably more work to remove them than it's worth at this point, I don't think they're doing any harm 🤷🏼‍♂️

@dolio
Copy link
Contributor

dolio commented Sep 9, 2024

Yeah, I wouldn't try to remove the numbering I guess.

Actually, the CI failures look suspicious. The mac one reports a stack overflow on the codeops test, so something there might be doing something bad with a circular representation.

@ChrisPenner
Copy link
Contributor Author

ChrisPenner commented Sep 9, 2024

There's an infinite loop when testing equality on recursive functions; I'm guessing we're naively crawling through and checking equality on all contained sections; probably just need to lower back to combix's when checking equality. Don't worry, we won't merge until CI is passing 😄

Failure case:

testcase = do
  f n = if n == 0 then 0 else f (Nat.drop 1 n)
  f == f

@dolio
Copy link
Contributor

dolio commented Sep 9, 2024

Oh, right.

Actually, you should just not derive Eq and Ord for RComb. It should test equality via the stored CombIx, no?

For that matter, I guess the Show instance for RComb could be more informative using the CombIx, too, right?

@ChrisPenner
Copy link
Contributor Author

Haha, yeah I had the exact same ideas 😄

@ChrisPenner
Copy link
Contributor Author

We've passed the nimbus test-suite now as well ✅

@ChrisPenner ChrisPenner merged commit 814f968 into trunk Sep 10, 2024
32 checks passed
@ChrisPenner ChrisPenner deleted the cp/inline-func-calls branch September 10, 2024 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants