-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache Pure Top Level Definitions on startup #5379
Conversation
@@ -4,14 +4,13 @@ | |||
{-# LANGUAGE PatternSynonyms #-} | |||
{-# LANGUAGE RankNTypes #-} | |||
{-# LANGUAGE ViewPatterns #-} | |||
-- TODO: Fix up all the uni-patterns | |||
{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the uni-patterns with explicit errors so I could re-enable this.
combs :: EnumMap Word64 RCombs | ||
combs = | ||
( mapWithKey | ||
srcCombs :: EnumMap Word64 Combs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now include all the original non-recursive Combs in the CCache now because it makes it much easier to serialize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I noticed one thing that seems weird to me. Having a cached closure in a PAp
seems wrong, so I think we need to tweak something. I'm not sure what yet, though. I'll think about it and look things over some more.
(useg, bseg) <- closeArgs C ustk bstk useg bseg args | ||
ustk <- discardFrame =<< frameArgs ustk | ||
bstk <- discardFrame =<< frameArgs bstk | ||
apply !env !denv !activeThreads !ustk !bstk !k !ck !args = \case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't really make much sense to have a partial application to a CachedClosure
, similar to the Let
situation. I haven't finished reading everything, but maybe there needs to be a type that is actually just combinator references, not both those and pre-evaluated closures.
Okay, so, maybe it's as simple as this... Separate the Make Occurrences of the new, separate type don't need to be lazy, because they aren't from knot tying, they're the concrete entry info for an actual combinator. |
A PAp should only contain an actual combinator, not a cached value. So, the combinator case has been factored out of and unpacked into GComb. This way a PAp can refer to the factored-out part.
@ChrisPenner I'm curious about when this kicks in and what "on startup" means. Could you help me understand? Some of my questions:
Here's one case that I'm wondering about: Historically it hasn't been too bad to have a (pure) test in a project that checks 100k samples and takes 10 seconds to evaluate, because the result gets cached. But are we going to pay that 10 second penalty just to cache |
I should have mentioned in my last comment that I'm excited about this change @ChrisPenner! I've seen repeated evaluation bite newcomers, and I've been curious about to what extent it might be hitting us with certain |
This allows code sent between machines to pre-evaluate things if applicable. Naturally, this requires serialization version changes. Some tweaks have been made to avoid changing hashes as much as possible between versions. Old serialized values and code can still be loaded, but obviously they will be treated as completely uncacheable.
…into cp/cache-toplevel
It will run and cache each pure top-level definition as part of adding it to the runtime's code-cache, we only load the dependencies of whatever we're trying to run, so it'll evaluate dependencies of a
See above, for
It only pre-evaluates the dependencies of things you try to
This is only an issue if you reference the test in your We may want to look into optimizations for docs and termLinks where we don't necessarily depend on the actual result of a referenced definition 🤔 |
One downside of this change is if you have code like this:
Previously So this change will be good for longer-term processes like cloud and webservers, but potentially worse for short-lived processes like cli apps. If the problem proves noticeable enough we probably can fix this delay by continuing on the Closure serialization work I mention in the overview, which I think we still want to do, the work was just starting to hit diminishing returns for how tricky it was when we have bigger possible gains with other work 😄 . |
I noticed this broke the serialized form round trip golden tests. Which might be fine, since I'd expect this could change the serialized form for some definitions, but it's worth checking. @SystemFw do we have Unison Cloud client tests to make sure that no hashes change for schema types and functions? The way these could work is to just |
We now have a test in the Cloud client to test for the relevant hash changes. It passes on this branch. |
It was meant to be a test in a `match` expression, but was missing a #:when
fwiw, contrary to the majority opinion in the meeting, I don't think this would be a good idea. afaik other languages don't do it, apart maybe from some trivial constant folding — if someone really wants to precompute a pure result, they can always use |
@@ -14,8 +14,8 @@ on: | |||
env: | |||
## Some version numbers that are used during CI | |||
ormolu_version: 0.7.2.0 | |||
jit_version: "@unison/internal/releases/0.0.20" | |||
runtime_tests_version: "@unison/runtime-tests/releases/0.0.1" | |||
jit_version: "@unison/internal/releases/0.0.21" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious why did the jit version change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the new serialization format for the jit. This means it now has access to the cacheability information as well.
I was going to implement the top level value behavior in the jit, too, but after some struggling, I realized that my stronglyConnectedComponents
implementation does not actually output a topological sorting of the SCCs, which I need to output the racket definitions in an acceptable order. So I'll probably just merge this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I wouldn't have expected it to change in an interpreter-related PR, but is it just a compatibility thing? That the interpreter was updated to match the new serialization format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interpreter, jit and ucm use a common format for code interchange, so when that changes (to include the cacheability information), the jit needs to be updated to understand it. Otherwise ucm could send it stuff that would just make it bomb. For instance, the format is used for run.native
and compile.native
to tell the executable what code to run/compile.
Pretty sure it wouldn't even get past CI if I didn't update it in this case, because the version was incremented.
@SystemFw cool, I'm good with merging this then, whenever @dolio and/or @ChrisPenner are ready. |
Overview
Before this change all pure top-level definitions were evaluated at every call site, which is obviously unnecessary.
Now we can detect these using type info from the codebase, and inline the evaluated result into usage sites.
We also discussed whether to bake the evaluated results into the compiled artifact.
The consensus was yes, to serialize the results; I'd still like to do that, but after a day of fiddling it proved tougher than I expected.
Issues with that were:
Foreign
s, which are arbitrary Haskell values, most of these won't appear in a pure value, but some thing like stdin/stdout Handles do, patterns as well, and things like text/bytes literals. These are all serializable in their own way, but it takes some care to ensure we do it correctly.Seg 'BX
, but this is fixed in the Stack implementation to include inlined Sections, which we can't serialize. We can most likely either parameterize this type, or replace usage sites with a more generic version, but it affects a big chunk of things and is going to be a bit of a pain to figure out and propagate.So serializing the pre-evalutated pure results is possible with more work, but it'd be nice to release the non-serializing version for now. The currently implemented version pre-evaluates cacheable values on executable startup.
When running something from a cold runtime (e.g. via
ucm run
or a freshucm
session) there'll be a short delay while ucm evaluates everything, but it'll be snappy after that.Implementation notes
RComb
indirection by including the CombIx directly in the Section/Instr and avoid an additional redirection. This will also help with serializing closures, but as mentioned above that part isn't done yet.getTypeOfTerm
to Runtime CodeLookup.It's a bit annoying, because in order to evaluate the pure defns I need the rest of all the combs to be in the right format for the runtime, so here's how it goes:
Benchmarks
Made a test which references a 1k element map definition:
Loose ends
See overview about serializing closures.