-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf improvements #314
base: main
Are you sure you want to change the base?
Perf improvements #314
Conversation
`ios` isn't very accurate as these are only outputs.
Verilator's lint complains about this being implicit when selecting a value of 8. See future pinmapping commit.
These pins are only used for spi so don't need to be pinmuxed.
This removes blocks that have no options from the register map
Updates the definitions of the number of GPIO, UART and SPI devices that are available in the software definitions file for Sonata to match the changed nubmer of devices in the new pin mapping configuration. This commit aims to keep changes minimal to get tests passing - it does not look at solving any other issues with e.g. outdated definitions for now.
Updates the manual `pinmux_checker` application along with the `pinmux_all_blocks_check` that uses it to match the new pin mapping changes under Pinmux for 1.0. This involves changing the pinmux assignment for a couple of pins to match the changed order in the pinmux config changes. It also involves reducing the number of UARTs that can be used by the the pinmux checker from 1-2 down from 1-4, since the overall number of UARTs has also been reduced by recent harwdare changes, and thus these non-existent UART devices should not be exposed. It finally also changes the number of SPIs exposed to just SPI0 and SPI1 which are now exposed via Pinmux, and makes the necessary changes to pinmux checker to facilitate this.
…ernal flash Updates the pinmux tests for the changes introduced with the new pinmux mapping. This primarily involves just changing the pins / devices used to the devices that are now mapped on those pins, and updating documentation to reflect this. Also resets all pinmux logic after muxing to ensure that if more tests are added afterwards or if the tests are run multiple times, that errors are not introduced by the pinmux tests changing state. Since the Application flash pinmux pins are no longer available through pinmux, this also involves converting the SPI Pinmux test to change to instead test using an extrenal SPI PMOD SF3 conecting to PMOD1. This repurposes the existing test logic used by the pinmux checker to reduce the amonut of additional code.
Adds a basic PMOD SF3 DPI in Verilator, attached to the PMOD1 pins so that the device can be used. This is modelled to be the same as the existing Application Flash on Sonata, but with a different JEDEC ID that it reports. Makes the Spi Flash take a JEDEC ID as a parameter to reduce duplicated logic between these DPIs. Disclaimer: I am not a hardware engineer, the hardware changes may contain problems.
Co-authored-by: Elliot Baptist <[email protected]>
Timing failures have been observed on this loopback path, it has minimal logic levels but a very long delay, possibly due to I/O timing constraints. Adding the register stage cuts the internal path. With the register stage the internal loopback cannot run at full speed, however as this is for testing purposes only this is acceptable.
Need a minimum of 2 (this is what is used in OpenTitan) to enable back to back requests without stall cycles.
Update code from upstream repository https://github.com/lowrisc/cheriot-ibex.git to revision ea2df9db3bcea776f0dc72d6d89c31c73798ecd4 * Feed RV32M through ibexc_top_tracing/ibexc_top (Greg Chadwick) * Switch to no bitmanip by default (Greg Chadwick) * Feed RV32B through in ibexc_top (Greg Chadwick) Signed-off-by: Greg Chadwick <[email protected]>
This is effectively a no-op change. Before the latest Ibex was vendored we had no bitmanip (the RV32BFull parameter was not fully passed through) and RV32M was the fast multiplier. Sadly the single cycle multiplier seems to be increasing timing pressure. It does just meet timing but greatly increases synthesis times. As it's implemented with in-built FPGA DSP blocks it shouldn't be a big issue to use it so something to examine here but for now leave things as they are.
dc38d53
to
8235bc7
Compare
For reference here is the timing summary from the CI run (from the implementation reports artefact)
Here's the utilization report
Build time was 11m 30s which is in line with previous build times in CI |
I've also seen an LCD SPI path with 24 ns of delay on a single net. The hold slack on the path was only 13 ns, so I suppose it might be hold-fixing gone wrong. |
@@ -86,6 +86,8 @@ jobs: | |||
runs-on: [ubuntu-22.04-fpga, sonata] | |||
env: | |||
BITSTREAM_PATH: build/lowrisc_sonata_system_0/synth-vivado/lowrisc_sonata_system_0.bit | |||
TIMING_RPT: build/lowrisc_sonata_system_0/synth-vivado/lowrisc_sonata_system_0.runs/impl_1/top_sonata_timing_summary_routed.rpt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make this conditional on whether "top_sonata_timing_summary_postroute_physopted.rpt" exists, so we avoid looking at the wrong report when we switch to the new 2024.1-compatible post-route optimisation flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think easier to just remember to fix this up when we alter the tool version. It's reasonable for CI to specifically work with the currently agreed tool versions and need modest updates when we change versions.
assign spi_cipo = reg2hw.control.int_loopback.q ? spi_copi_o : spi_cipo_i; | ||
logic spi_copi_q; | ||
|
||
always_ff @(posedge clk_i) begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have noticed the lack of reset but think it is fine. The u_spi_core/copi_shift_q
driving it also lacks a reset, so this is just replicating the behaviour one cycle later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I prefer to leave resets off flops that don't strictly need them. Given the SPI ignores incoming data when it's not been specifically command to clock things in/out then it won't ever actually 'see' the first bit out of reset anyway.
I've run a bitstream build of this PR and a baseline build with the changes it is based on (up to and including the opt=2 change) using Vivado 2021.1 on my local machine. The breaking of the SPI loopback path did seem to improve the overall WNS (-1.451 -> -0.391). Though, for these two builds at least, the TNS went the other way (-8.168 -> -27.834), though the numbers are still relatively small. Build times were also increase substantially (19m -> 32m) though that is muddied by builds going on in parallel. All this is somewhat concerning (for the state of Sonata if not this PR) but not conclusive. More testing to follow |
Adding timing constraint updates (#316) and building ono showed a more reasonable build time (15m) and timing passing. |
Thanks for checking it out @elliotb-lowrisc. Frustrating it's giving you some failures. I've certainly see some flakey behaviour from the implementation flow, one version of this (pre pinmux changes) actually gave a noticeable improvement in timing, taking us to 0.5 ns WNS (positive)! I've rebased this locally now the pinmux changes are in and am doing another run, will see what results! |
Well rebasing and running locally gives a terrible timing result (something like -1.5 WNS) and increased build times. So sadly I think we'll have to reject this for 1.0 :( I will the try the two changes separately and see if they work in isolation. As @elliotb-lowrisc says I think this is more a reflection of the current state of Sonata rather than the specific things this PR is doing, all too easy to push things the wrong way with what looks like modest design changes. |
I think the SPI loopback change is definitely worth a quick PR and merge if possible |
Agreed I'll spin out a separate PR |
I have had this just passing timing on latest main but it does look very tight and was starting to push up build times. So I think we're best off doing this post-1.0. With @elliotb-lowrisc's crossbar change and a couple of simple Ibex timing fixes I'd hope this becomes a lot more comfortable. |
Proposed performance improvements (improving SRAM throughput and Ibex single cycle multiplier), rebased on top of the pinmux changes (#309).
Timing is passing and build time looking good.
Two things were required to achieve this:
I've also added saving the utilization and timing reports from the FPGA CI as artifacts so they can be examined after (in particular so you can look at the timing results for any particular PR).