-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
speed optimisations, fix for restoreColorLossy, code cleanup #4895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis change introduces a new per-function optimization macro, refactors color scaling/fade utilities and their call sites, adjusts function attributes across FX and particle code, hardens 2D bounds checks, and adds per-bus applied-brightness tracking for ABL in the bus manager. One HTML file received formatting-only edits. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai there is an #ifdef #else that excludes one or the other. |
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed:
Length of output: 4997 @DedeHai Thank you for the clarification! You're absolutely right. Looking at the FX.h file, I can see there are indeed two different implementations of
The #ifdef/#else structure ensures that only one of these definitions is active in any given build configuration, which completely resolves the ODR concern I mentioned. This is the correct approach for handling the optional 2D functionality in WLED. I apologize for the confusion in my original comment - the conditional compilation makes this implementation perfectly valid. 🧠 Learnings used
|
A few tweaks to increase rendering speed just a bit more. On ESP32 this comes at a cost of ~500bytes of flash use, on C3 and ESP8266 it saves ~300bytes. The flash and speed numbers in the text below are for ESP32.
Created WLED_O2_ATTR alias for `attribute((optimize("O2")))
Moved
fastColorScale()
function from PS to colors.h, changed it to use 32bit math and put it to good use wherever scaling accuracy does not matter too much, i.e. inSegment::fadeToBlackBy()
andSegment::blur()
, making these function much faster. I also made it an inline function as the function call overhead is using more flash than the function itself. Since no byte access is used in the function, this is safe for all color operations, even if in 32bit access RAM or at least I'd exptect so, did not test that but it will show soon enogh ;)Added new
_NPBbri
variable to bus: need to track total brightness scaling (new ABL code) for NPB buffers orBusDigital::getPixelColor()
will return incorrect color if ABL is engaged. Function is currently not used but if we remove the global_pixels[]
buffer, the Copy FX will need it. I did not test this particular code change, someone please check if the logic holds (the math should be correct, I checked that).color_blend()
:adding WLED_O2_ATTR gives a rendering speed improvement of about 1% at the cost of 100 bytes of flash.Segment::setPixelColor()
: removed IRAM_ATTR, using WLED_O2_ATTR instead: significant FPS improvement of 4% for the cost of 350bytes of flash.Segment::getPixelColor()
removed IRAM_ATTR, using WLED_O2_ATTR instead: faster in tests and even saves a bit of flash.isPixelClipped()
1D & 2D version: removing IRAM_ATTR will highly likely just inline the function as it is called only once. A small flash use reduction seems to confirm that. This is faster than forcing a function call to IRAM. Since its only used during transitions I did not measure the speed impact and I would not expect it to be huge.Get/setPixelColorX
: instead of castingvWidth/vHeight
to int, cast X and Y to unsigned and save the negative check. Makes code size smaller so I assume it is faster too. I also thought about making X/Y coordinated unsigned in general to get more consistency throughout the code but decided I will leave that to a futuer endeavour.Code cleanup:
adjust_color()
and fixed indentationIn summary: this PR and in combination with #4889 there was a significant improvement in FPS in my test. From 68FPS to 78FPS (4 Layers, 32x32 on ESP32), each PR contributes roughly half of that improvement. On a more general test on the C3 the gain was less significant but still visible (+2FPS at 50FPS just this PR alone).