Skip to content

Conversation

DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Aug 31, 2025

A few tweaks to increase rendering speed just a bit more. On ESP32 this comes at a cost of ~500bytes of flash use, on C3 and ESP8266 it saves ~300bytes. The flash and speed numbers in the text below are for ESP32.

  • Created WLED_O2_ATTR alias for `attribute((optimize("O2")))

  • Moved fastColorScale() function from PS to colors.h, changed it to use 32bit math and put it to good use wherever scaling accuracy does not matter too much, i.e. in Segment::fadeToBlackBy() and Segment::blur(), making these function much faster. I also made it an inline function as the function call overhead is using more flash than the function itself. Since no byte access is used in the function, this is safe for all color operations, even if in 32bit access RAM or at least I'd exptect so, did not test that but it will show soon enogh ;)

  • Added new _NPBbri variable to bus: need to track total brightness scaling (new ABL code) for NPB buffers or BusDigital::getPixelColor() will return incorrect color if ABL is engaged. Function is currently not used but if we remove the global _pixels[] buffer, the Copy FX will need it. I did not test this particular code change, someone please check if the logic holds (the math should be correct, I checked that).

  • color_blend():adding WLED_O2_ATTR gives a rendering speed improvement of about 1% at the cost of 100 bytes of flash.

  • Segment::setPixelColor(): removed IRAM_ATTR, using WLED_O2_ATTR instead: significant FPS improvement of 4% for the cost of 350bytes of flash.

  • Segment::getPixelColor() removed IRAM_ATTR, using WLED_O2_ATTR instead: faster in tests and even saves a bit of flash.

  • isPixelClipped() 1D & 2D version: removing IRAM_ATTR will highly likely just inline the function as it is called only once. A small flash use reduction seems to confirm that. This is faster than forcing a function call to IRAM. Since its only used during transitions I did not measure the speed impact and I would not expect it to be huge.

  • Get/setPixelColorX: instead of casting vWidth/vHeight to int, cast X and Y to unsigned and save the negative check. Makes code size smaller so I assume it is faster too. I also thought about making X/Y coordinated unsigned in general to get more consistency throughout the code but decided I will leave that to a futuer endeavour.

Code cleanup:

  • Changed parameter order in settings_leds.htm to match cpp file as it was confusing, no functional change.
  • Bugfix in adjust_color() and fixed indentation
  • Removed IRAM_ATTR_YN from unused gamma functions (just in case that attribute forces the compiler to put them there)

In summary: this PR and in combination with #4889 there was a significant improvement in FPS in my test. From 68FPS to 78FPS (4 Layers, 32x32 on ESP32), each PR contributes roughly half of that improvement. On a more general test on the C3 the gain was less significant but still visible (+2FPS at 50FPS just this PR alone).

@DedeHai DedeHai requested a review from willmmiles August 31, 2025 17:00
Copy link
Contributor

coderabbitai bot commented Aug 31, 2025

Walkthrough

This change introduces a new per-function optimization macro, refactors color scaling/fade utilities and their call sites, adjusts function attributes across FX and particle code, hardens 2D bounds checks, and adds per-bus applied-brightness tracking for ABL in the bus manager. One HTML file received formatting-only edits.

Changes

Cohort / File(s) Summary of changes
Optimization attribute unification
wled00/const.h, wled00/FX_fcn.cpp, wled00/FXparticleSystem.cpp, wled00/colors.cpp
Added WLED_O2_ATTR macro and applied it to selected functions; removed/standardized prior IRAM-related/__attribute__ annotations; updated signatures accordingly without changing parameters.
Color scaling/fade refactor
wled00/colors.h, wled00/colors.cpp, wled00/FX_fcn.cpp, wled00/FX_2Dfcn.cpp, wled00/FXparticleSystem.cpp
Introduced fast_color_scale(...); moved color_fade(...) into NeoGammaWLEDMethod; replaced color_fade uses with fast_color_scale in FX and 2D blur; removed local fast_color_scale from particle system; fixed boolean in adjust_color condition.
Bus ABL applied-brightness tracking
wled00/bus_manager.h, wled00/bus_manager.cpp
Added _NPBbri member; relocated _autoWhiteMode out of bitfield; initialized _NPBbri; updated applyBriLimit, show, and getPixelColor to track/use _NPBbri for color restoration under ABL.
2D bounds checks and signatures
wled00/FX_2Dfcn.cpp
Removed IRAM_ATTR_YN from Segment::isPixelXYClipped; replaced negative coordinate checks with unsigned casts in setPixelColorXY/getPixelColorXY; updated 2D blur to fast_color_scale.
Gamma/utility qualifiers cleanup
wled00/colors.cpp
Added WLED_O2_ATTR to color_blend; removed IRAM qualifiers from NeoGammaWLEDMethod::Correct and inverseGamma32.
HTML formatting only
wled00/data/settings_leds.htm
Reordered local variable declarations within bLimits; no logic changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • willmmiles
  • blazoncek
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot]

This comment was marked as resolved.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Aug 31, 2025

wled00/FX_2Dfcn.cpp (1)> 149-184: ODR/behavior mismatch with FX.h inline stub for isPixelXYClipped()

// FX.h (inside class Segment)

  • inline bool isPixelXYClipped(int x, int y) const { return isPixelClipped(x); }
  • bool isPixelXYClipped(int x, int y) const;

@coderabbitai there is an #ifdef #else that excludes one or the other.

Copy link
Contributor

coderabbitai bot commented Aug 31, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and examine the FX.h file to look for the isPixelXYClipped function and any #ifdef logic
fd "FX.h" --type f --exec cat {} \; | grep -A 10 -B 10 "isPixelXYClipped"

Length of output: 4997


@DedeHai Thank you for the clarification! You're absolutely right. Looking at the FX.h file, I can see there are indeed two different implementations of isPixelXYClipped() separated by conditional compilation:

  1. The full 2D implementation: [[gnu::hot]] bool isPixelXYClipped(int x, int y) const; (declaration for the implementation in FX_2Dfcn.cpp)
  2. The 1D fallback stub: inline bool isPixelXYClipped(int x, int y) const { return isPixelClipped(x); }

The #ifdef/#else structure ensures that only one of these definitions is active in any given build configuration, which completely resolves the ODR concern I mentioned. This is the correct approach for handling the optional 2D functionality in WLED.

I apologize for the confusion in my original comment - the conditional compilation makes this implementation perfectly valid.


🧠 Learnings used
Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.

@DedeHai DedeHai requested a review from blazoncek August 31, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant