Skip to content

Conversation

@widlarizer
Copy link
Collaborator

Intends to fix #5509. The core idea is that CellTypes usage for Yosys internal cells can be directed to a simple compile time constructed integer lookup table for the type IdString index instead of hashing the type. Also improves memory locality with data spread across multiple arrays rather than vector of struct CellTypes.

So far, I've already seen 0-2% gain in the current partial implementation, which only for known_cells and only for CellTypes that only touch internal cells. I'm pretty sure internal cells can be fast pathed to this in CellTypes constructed from user designs, too. Compilation times seem fine so far

  • Add unit test that comprehensively checks equivalence with old celltypes

@widlarizer
Copy link
Collaborator Author

opt_expr is like 2.5 faster now leading to 8-10% faster jpeg synth without even excluding abc, wowie

@widlarizer
Copy link
Collaborator Author

I hope everybody is excited to have 40+ second link times with LTO on though!

@widlarizer
Copy link
Collaborator Author

Up to a 16% speedup now, probably mostly thanks to opt_dff by porting ModWalker

@whitequark
Copy link
Member

Laughs in 15 minute Wasm LTO link times

@widlarizer
Copy link
Collaborator Author

Sounds like this change would make the LTO link time for WASI even worse. If I rewrite this into a C++ code generator, we should suffer less with LTO across platforms... annoying, but easy

@widlarizer
Copy link
Collaborator Author

Further testing reveals that the performance gains are specific to designs with wide hierarchy (lots of module instances, like jpeg from ORFS) and maybe some other similar patterns. The directly observable effect of this is that early flatten (like what you get when you read_verilog; synth -auto-top -flatten) effectively guarantees there is no performance impact to this PR. The reason for this is probably that before, a hash table would be built per CellTypes using pass, which would contain cell type information for all types, which means internal as well as design modules. Having a lot of design modules involved meant that looking up information for internal cells would suffer from bad memory locality.

Either that, or setting up those hash maps with every run of opt_expr was the actual overhead. Not sure. Either way, I'm still happy with this change

@widlarizer
Copy link
Collaborator Author

FYI @KrystalDelusion I was testing against celltypes.h, but then is_evaluable didn't make sense (#5024), so I changed celltypes.h to match newcelltypes.h rather the other way around. Let me know if you have a different idea. My guess is evaluable means eval or consteval.h should cover it and that's definitely not the case

@whitequark
Copy link
Member

Sounds like this change would make the LTO link time for WASI even worse. If I rewrite this into a C++ code generator, we should suffer less with LTO across platforms... annoying, but easy

The WASI LTO link time is completely dominated by the relooper doing something, probably duplicating code for that clever switch/case construct Claire likes. (This is probably the one major potential argument against using it that I can have, although I have not specifically proven that it is the thing at fault and I'd have to do that for it to be an actual argument.)

45 more seconds do not change the overall outlook TBQH.

@KrystalDelusion
Copy link
Member

FYI KrystalDelusion I was testing against celltypes.h, but then is_evaluable didn't make sense (#5024), so I changed celltypes.h to match newcelltypes.h rather the other way around. Let me know if you have a different idea. My guess is evaluable means eval or consteval.h should cover it and that's definitely not the case

I also did this in #4910 (b48881e) with the following comment:

Also remove is_evaluable from cell types that aren't actually eval-able, instead using is_combinatorial which afaict is what the flag was being used as a proxy for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Less wasteful CellTypes usage in opt_clean

4 participants