Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed ::abs to ::fabs in bool fba::Hardware::position_xy_bad which #470

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

craigwarner-ufastro
Copy link

caused issue #258. r_min was being calculated as 0 in the case where it was ::abs(-0.1718) in gcc 11.2 in desi 23.10 and this was leading to a different result than gcc 13 with desi 24.4 where it was correctly resolved to the abs of a double.

Note that it seems from this like the newer version actually has the correct behavior, opposite of what was expected, unless the desired behavior is to treat r_min as an int.

caused issue #258.  r_min was being calculated as 0 in the case where it
was ::abs(-0.1718) in gcc 11.2 in desi 23.10 and this was leading to a
different result than gcc 13 with desi 24.4 where it was correctly
resolved to the abs of a double.

Note that it seems from this like the newer version actually has the
correct behavior, opposite of what was expected, unless the desired
behavior is to treat r_min as an int.
@schlafly
Copy link
Contributor

Wow, great find, thank you!

I agree that that just looks like a bug. It will clearly change fiber assignment, etc., but the "old fiber assignment" was just broken. We should probably discuss on the Monday survey ops call what we want to do. Our usual policy is to keep fiber assign reproducible, which would mean doing something ridiculous like "keep this buggy behavior for early dates". It feels weird to follow that policy in this case, though.

If 0 -> use old 'buggy' compiler specific ::abs call
If 1 or undefined, use ::fabs

While implementing this I found an interesting behavior that should be
documented and took me a while to figure out.  If I include <stdlib.h>
then ::abs always performs correctly as a floating point abs.  I had
thought I needed to include stdlib.h to handle the env var, though it
turned out I did not.

This may be useful in tracking down exactly where the discrepancy occurs
since we see a difference between desi 23.10 / gcc 11.2 and desi 24.4 /
gcc 11.2 - it is probably in libstdc++ or glibc or some other library
module that is loaded.
@schlafly
Copy link
Contributor

Hi Craig---two questions here:

  1. for the "buggy" mode, I think we want to change it so that it's "always buggy" not "only buggy if a buggy compiler is used." i.e., we want to be able to use a new bug-free compiler and get the buggy behavior when we "want" it and not when we don't. I think that means something like casting to an int in the buggy case?
  2. It's not clear to me when the fa_behavior_env variable gets set. @araichoor or @ashleyjross should tell us, but I suspect that we want to be able to do something like this pseudocode (I actually have no idea what the real API looks like, and am just trying to describe the conceptual flow that exists somewhere):
for tileid, design_date in zip(tileids, design_dates):
    fiberassign.assign(tileid, old_buggy_mode=(design_date < bug_fix_date))

I think in your implementation this would look like

for tileid, design_date in zip(tileids, design_dates):
    os.environ['FIBERASSIGN_BEHAVIOR'] = '0' if design_date < bug_fix_date else '1'
    fiberassign.assign(tileid)

which is fine. But I expect that that getenv only gets called at the initial import time and so the updated environment variable wouldn't get seen without some weird tricks to force that module to get reloaded.

But maybe I'm wrong and the alt-MTL always comes in through the CLI or something and triggers that 'getenv' call to be executed anew. @araichoor or @ashleyjross , can you advise on what the right entry point into fiberassign is to set/check this environment variable / trigger this new behavior? Thanks!

@ashleyjross
Copy link

For the altmtl, currently, a separate .sh script gets created for every tile and the needed environment is loaded for it. So, it would be easy to add this new environment variable in there. (This dates back to SV3, where many different versions of fiberassign were required to reproduce everything properly. I had been thinking recently that it might be nice to not do things this way, but it would have been some work to change it...)

@forero
Copy link
Member

forero commented Mar 18, 2025

Thanks for the great detective work!

Two more questions:

  • Is there some kind of unit test we could add to be sure the FIBERASSIGN_BEHAVIOR is behaving as expected?
  • Do we also want to keep FIBERASSIGN_BEHAVIOR in the FITS header?

@craigwarner-ufastro
Copy link
Author

@schlafly 1. Sure - I could force it to be an int.

@ALL I think there are a lot of ways so however its decided is the best/easiest to set the environment variable (and how we want to handle the undefined case which I defaulted to using the new/correct logic) - just let me know if there's anything else on my end (I could move the getenv within the method for instance).

And I think Anand's python script can serve as the basis for a easy unit test I could make since we have a case known to be an issue. I'll work on that today.

@schlafly
Copy link
Contributor

Yeah, moving the getenv within the method would be an option if that isn't a hot path or getenv is really fast. I have no idea. If it doesn't appreciably impact performance I agree that that would be a good approach. I was trying to think of a place less likely to be a hot path, though.

@craigwarner-ufastro
Copy link
Author

@schlafly
I moved into the method and timed it and 3 microsec to read env var and cast to int (2 us to detect that its not set) :)

I also changed the behavior to cast as an int in the case where FIBERASSIGN_BEHAVIOR=0 so that we get reproducible 'buggy' behavior regardless of compiler and environment.

Last thing I was working on was a unit test but then I realized while working on it I need clarification of what exactly you want the unit test to do? Since this is not a return value - do you want to test by asserting that the boolean return value is what we'd expect for that particular tile/fiber if the behavior is as expected based on the env var?

E.g.

    if (r < r_min + inner_keepout_buffer_mm) {
        return true;
    }

This conditional returns true when the buggy behavior is enabled but a false is returned later when it is not. So we could test setting env var to 0, assert return is true, set env var to 1, assert return is false.

Or should we set up a separate method just to test the int vs double value of abs? But then that's a separate method so the unit test wouldn't even notice if someone changed things in fba::Hardware::position_xy_bad

Updated logic to force "buggy" behavior of casting abs as int
when FIBERASSIGN_BEHAVIOR=0.
@schlafly
Copy link
Contributor

Great, thanks Craig. I'm not personally very worried about the unit test. It will be adding very good coverage of a very specific case that's unlikely to reappear. But yes, I think that your proposed unit test makes sense---checking that we get the expected true / false depending on the buggy-behavior-flag for this specific combination. If that is too ugly to set up, though, I wouldn't bother---I don't know how good our test coverage here is in general.

@craigwarner-ufastro
Copy link
Author

I went ahead and added a unit test to test_hardware.py that sets the env var to 0, runs Anand's test, asserts the correct True return value, then sets to 1, runs again and asserts the correct False return value.

not have the DESI environment set up correctly so this would require a
more compilicated (or better abstracted) unit test.  Since it sounds
like its not a big deal I will just remove it.
@craigwarner-ufastro
Copy link
Author

@schlafly turns out the unit test I wrote fails on github's automatic testing on the line

    fafn = os.path.join(os.getenv("DESI_ROOT"), "target", "fiberassign", "tiles", "trunk", tileidpad[:3], "fiberassign-{}.fits.gz".format(tileidpad))

I think clearly the github environment isn't set up to be the same as the DESI environment, nor does it have access likely to that particular tile/fiber. So the unit test would either need to be much more complex or much more abstracted and since you said don't bother if its too ugly to set up, I just removed it.

src/hardware.cpp Outdated
logger.debug("FIBERASSIGN_BEHAVIOR = 1 -- using ::fabs correctly on r_min");
r_min = ::fabs(r_min);
}
std::cout << "R_MIN = " << r_min << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to ourselves: we ll want to disable this printing of R_MIN, as it "pollutes" the fiberassign-TILEID.log files.
here s an example of a file run with that: https://data.desi.lbl.gov/desi/users/sybenzvi/fa_holding_pen_bugfix_issue_258_main_20250324/002/fiberassign-002212.log

@sybenzvi
Copy link

sybenzvi commented Mar 28, 2025

I committed a few updates to the coordinate transformations in fiberassign/hardware.py that @araichoor implemented to force the use of a statically calculated polar misalignment matrix. The default is to turn this on by default. Before merging, we can discuss if we want this off by default, with a command-line switch in fba_launch that enables the static calculation.

Note that unit tests aren't updated to the new interface; that will wait for our decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants