-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BlockIdManagerSelector #3560
Add BlockIdManagerSelector #3560
Conversation
you might elaborate on "Will be used to more effectively distribute tasks." |
switching this to draft status because @matthewc2003 is working on gathering evidence on when this does and does not help. |
some documentation ideas: the same summary you posted above, put into a docstring for the block id manager selector, and then add to docs/reference.rst (perhaps in a new section) add a description of manager_selector to the HighThroughputExecutor selector, with a link to the docstring on block id manager selector. |
@@ -19,6 +19,14 @@ def sort_managers(self, ready_managers: Dict[bytes, ManagerRecord], manager_list | |||
|
|||
class RandomManagerSelector(ManagerSelector): | |||
|
|||
"""Returns a shuffled list of interesting_managers | |||
|
|||
Maintains the behavior of the original interchange. Works well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docs should primarily be about the version now, rather than about changes since some other version the user isn't using, so you could get rid of this first sentence.
a related thing to express would be that this is (I think) the default - see how some other docstrings on HighThroughputExecutor talk about defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but the commit message that this gets merged under (which hopefully will come from the description text of this PR) is the right place to describe how the new code is different/similar to the old code - so describing how the default behaviour now is the same or different to the previous behaviour before this is merged is relevant there: the audience for that is people who want to know what has changed in parsl, vs audience for the user guide is what is stuff like now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded to identify the random manager selector as the default
docs/reference.rst
Outdated
@@ -78,6 +78,16 @@ Executors | |||
parsl.executors.FluxExecutor | |||
parsl.executors.radical.RadicalPilotExecutor | |||
|
|||
Executors Miscellaneous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could call this Manager selectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
It would be good to get some testing in here for the newly introduced code. There are a few things you could do: make a test that invokces BlockIdManagerSelector.sort_managers with some example inputs and check that it sorts them the right way launch a DFK with the BlockIdManagerSelector configured and check that a simple no-op task runs. level 2 of that: run a configuration with 2 blocks, no scaling, run a few tasks and check that they all end up in the 2nd block and never in the first block: you should see that in the @khk-globus might have some other ideas too around this space |
@pytest.mark.local | ||
def test_block_id_selection(try_assert): | ||
blockids = [] | ||
try_assert(lambda: (time.sleep(1) or get_worker_pid().result() == "1"), timeout_ms=20000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the sleep(1) for? in test cases that's usually something that should be waited for explicitly, or is something broken that the author hopes goes away after a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe you should be checking that both blocks are available when this test runs - eg lambda: len(htex.connected_managers()) == 2
- so that there is the opportunity for buggy behaviour to place blocks onto block 0, and then the opportunity for the test to detect that.
manager_selector: ManagerSelector | ||
Determines what strategy the interchange uses to select managers during task distribution. | ||
See API reference under "Manager Selectors" regarding the various manager selectors. | ||
Default: RandomManagerSelector() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this and you'll get a hyperlink to the documentation for RandomManagerSelector:
$ git diff
diff --git a/parsl/executors/high_throughput/executor.py b/parsl/executors/high_throughput/executor.py
index 09e634f41d..92a256ae54 100644
--- a/parsl/executors/high_throughput/executor.py
+++ b/parsl/executors/high_throughput/executor.py
@@ -149,7 +149,7 @@ GENERAL_HTEX_PARAM_DOCS = """provider : :class:`~parsl.providers.base.ExecutionP
manager_selector: ManagerSelector
Determines what strategy the interchange uses to select managers during task distribution.
See API reference under "Manager Selectors" regarding the various manager selectors.
- Default: RandomManagerSelector()
+ Default: `RandomManagerSelector`
""" # Documentation for params used by both HTEx and MPIEx
in distributing workloads equally across all availble compute | ||
resources. Is not effective in conjunction with elastic scaling | ||
behavior as it leads to wasted resource consumption. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put subjects in these sentences such as it
- this isn't charged by the word.
|
||
Observations: | ||
1. BlockID manager selector helps with workloads that see a varying | ||
amount of tasks over time. We see new blocks are prioritized with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid academic-we in user facing documentation: just write "New blocks are prioritizied"
for i in range(10): | ||
future = get_worker_pid() | ||
blockids.append(future.result()) | ||
print(future.result()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't print in tests - the output is either passing or failing.
|
||
class BlockIdManagerSelector(ManagerSelector): | ||
|
||
"""Returns a interesting_managers list sorted by block ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/a/an/
I removed most of your PR description from the commit message, rather than harrassing @matthewc2003 to go rewrite it - it was mostly about stuff the isn't implemented, rather than stuff that this PR implements. |
Description
Adds the BlockIdManagerSelector to the list of available manager selectors. This selector returns a sorted list of managers by their block id, from greatest (newest) to least (oldest). Will be used to more effectively distribute tasks. Eventually, this selector will be used in conjunction with some method of sorting the task queue based on runtime to a) ensure tasks won't get assigned to managers that have a shorter remaining wall time than that of the task's execution time and b) put tasks with similar execution times on the same block, thereby resulting in a higher likelihood that the block completely empties of tasks, allowing the scale down logic to kick in.
Changed Behavior
Users can now choose the BlockIdManagerSelector() in the HTEX config.
Fixes
Work as part of #3323
Type of change