Skip to content

Comments

Improvements to pycbc_brute_bank#5281

Open
ahnitz wants to merge 11 commits intogwastro:masterfrom
ahnitz:bank_updates
Open

Improvements to pycbc_brute_bank#5281
ahnitz wants to merge 11 commits intogwastro:masterfrom
ahnitz:bank_updates

Conversation

@ahnitz
Copy link
Member

@ahnitz ahnitz commented Feb 4, 2026

Standard information about the request

This is a feature update for pycbc brute bank.

This affects bank generation, so could affect use in all searches, though only indirectly.

Motivation

The main motivation is to improve the computational (both wall clock time and memory use) of pycbc_brute_bank.

Contents

There are three features changes.

  1. The options to do match calculations / proposal point checking in parallel. This does affect the behavior and is not an entirely drop-in option. It enables parallelization by assuming that proposals do not need to be checked against each other. That means if two proposals were identical points, they would both end up getting added to the bank, since no check between them is done. However, I believe that in most cases, this not really a problem. Without this option, requesting multiple cores will only parallelize bank generation as is currently the case. It think that is the safest default for the general case.

  2. Option to limit the amount of store information on the inter-bank matches. (through option --max-connections XX). By default this is set to infinite which matches the old behavior. If set to some number, only the most important matches are actually stored for each template (in the way we do things these are actually the 'low' matches). This means that if you get a particularly large bank, you can avoid being dominated in memory usage by the connections themselves instead of the cached waveforms. There are cases, where this can result in 5-10x memory reduction (again only for a large bank region) without noticeable change to result bank size. It could potentially increase the number of matches required though if set too small.

  3. The option to save the bank even with an early termination. If you send a sigterm or ctrl-c (interrupt) signal to the job, it will now try to intercept this and do an emergency save of the bank.

Links to any issues or associated PRs

This should be merged first #5227

Testing performed

I've verified that by default banks are not modified. The defaults resort to the old behavior.

  • The author of this pull request confirms they will adhere to the code of conduct

Copy link
Contributor

@GarethCabournDavies GarethCabournDavies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this looks good, and as you say if defaults are used, it reverts to earlier behaviour, that is good as well

There are a few places where there are minor code-styling changes / questions.

I also don't understand why a couple of line have changes from a variable (len(...)) to a constant (2)

#keep[args.max_connections//2: -args.max_connections//2] = False
keep[args.max_connections:] = False

hp.matches = msorted[keep].copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this array copy not now increase the memory footprint, particularly in the max_connections infinite / default case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's the opposite, otherwise loose reference could keep something in memory when we don't really need it anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A further clarification may help here. If you just return a view, even if that's all you have, the full array is always kept. I explicitly want the reduced form to be what is stored and not the full array, hence explicitly asking for a new array that contains a copy of what the view pointed to. This means that the original values can be cleaned up by python's garbage collector.

# Defensive initialization if matches/indices are missing
if not hasattr(hc, 'matches'):
hc.matches = numpy.empty(len(self))
hc.matches = numpy.empty(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure I understand this change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In truth, I'd like to remove these lines. I can't really figure out why they are needed. The only guess I have is that perhaps there is some boundary issue that this is masking. However, to the point, there is no reason to add these with the full size of the existing bank. Since the information added is meaningless, it ends up being a waste of memory. Putting it at 2 was simply a lazy way to force the correct dimensions.

parser.add_argument('--tau0-cutoff-frequency', type=float, default=15.0)
parser.add_argument('--nprocesses', type=int, default=1,
help='Number of processes to use for waveform generation parallelization. If not given then only a single core will be used.')
parser.add_argument('--parallel-check', action='store_true', help="Do bank checking parallel, note that this means that proposals WILL NOT be checked against each other.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this line be wrapped to match maximum line length requirements?

if hp.checked or force_add:
num_added += 1
self.insert(hp)
self.insert(hp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

logging.info("Waveform generation failed!")
continue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

@ahnitz
Copy link
Member Author

ahnitz commented Feb 18, 2026

@GarethCabournDavies I'll address the style issues before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants