Conversation
GarethCabournDavies
left a comment
There was a problem hiding this comment.
I think that this looks good, and as you say if defaults are used, it reverts to earlier behaviour, that is good as well
There are a few places where there are minor code-styling changes / questions.
I also don't understand why a couple of line have changes from a variable (len(...)) to a constant (2)
| #keep[args.max_connections//2: -args.max_connections//2] = False | ||
| keep[args.max_connections:] = False | ||
|
|
||
| hp.matches = msorted[keep].copy() |
There was a problem hiding this comment.
Does this array copy not now increase the memory footprint, particularly in the max_connections infinite / default case?
There was a problem hiding this comment.
Actually it's the opposite, otherwise loose reference could keep something in memory when we don't really need it anymore.
There was a problem hiding this comment.
A further clarification may help here. If you just return a view, even if that's all you have, the full array is always kept. I explicitly want the reduced form to be what is stored and not the full array, hence explicitly asking for a new array that contains a copy of what the view pointed to. This means that the original values can be cleaned up by python's garbage collector.
| # Defensive initialization if matches/indices are missing | ||
| if not hasattr(hc, 'matches'): | ||
| hc.matches = numpy.empty(len(self)) | ||
| hc.matches = numpy.empty(2) |
There was a problem hiding this comment.
Im not sure I understand this change
There was a problem hiding this comment.
In truth, I'd like to remove these lines. I can't really figure out why they are needed. The only guess I have is that perhaps there is some boundary issue that this is masking. However, to the point, there is no reason to add these with the full size of the existing bank. Since the information added is meaningless, it ends up being a waste of memory. Putting it at 2 was simply a lazy way to force the correct dimensions.
| parser.add_argument('--tau0-cutoff-frequency', type=float, default=15.0) | ||
| parser.add_argument('--nprocesses', type=int, default=1, | ||
| help='Number of processes to use for waveform generation parallelization. If not given then only a single core will be used.') | ||
| parser.add_argument('--parallel-check', action='store_true', help="Do bank checking parallel, note that this means that proposals WILL NOT be checked against each other.") |
There was a problem hiding this comment.
Could this line be wrapped to match maximum line length requirements?
| if hp.checked or force_add: | ||
| num_added += 1 | ||
| self.insert(hp) | ||
| self.insert(hp) |
| logging.info("Waveform generation failed!") | ||
| continue | ||
|
|
||
|
@GarethCabournDavies I'll address the style issues before merging. |
Standard information about the request
This is a feature update for pycbc brute bank.
This affects bank generation, so could affect use in all searches, though only indirectly.
Motivation
The main motivation is to improve the computational (both wall clock time and memory use) of pycbc_brute_bank.
Contents
There are three features changes.
The options to do match calculations / proposal point checking in parallel. This does affect the behavior and is not an entirely drop-in option. It enables parallelization by assuming that proposals do not need to be checked against each other. That means if two proposals were identical points, they would both end up getting added to the bank, since no check between them is done. However, I believe that in most cases, this not really a problem. Without this option, requesting multiple cores will only parallelize bank generation as is currently the case. It think that is the safest default for the general case.
Option to limit the amount of store information on the inter-bank matches. (through option --max-connections XX). By default this is set to infinite which matches the old behavior. If set to some number, only the most important matches are actually stored for each template (in the way we do things these are actually the 'low' matches). This means that if you get a particularly large bank, you can avoid being dominated in memory usage by the connections themselves instead of the cached waveforms. There are cases, where this can result in 5-10x memory reduction (again only for a large bank region) without noticeable change to result bank size. It could potentially increase the number of matches required though if set too small.
The option to save the bank even with an early termination. If you send a sigterm or ctrl-c (interrupt) signal to the job, it will now try to intercept this and do an emergency save of the bank.
Links to any issues or associated PRs
This should be merged first #5227
Testing performed
I've verified that by default banks are not modified. The defaults resort to the old behavior.