Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information about load balancing/resource allocation #276

Open
eljost opened this issue Oct 2, 2023 · 2 comments
Open

Information about load balancing/resource allocation #276

eljost opened this issue Oct 2, 2023 · 2 comments
Labels

Comments

@eljost
Copy link

eljost commented Oct 2, 2023

Dear developers,

the README.md states that

pathos also provides a very basic automated load balancing service,
as well as the ability for the user to directly select the resources.

Is this documented somewhere? Grepping in the pathos repo for terms like (load|balanc|resource|cost) did not bring up any results regarding my question.

Does pathos support something like balancing tasks over workers, when each task has an associated cost?

All the best
Johannes

@mmckerns
Copy link
Member

mmckerns commented Oct 2, 2023

I'm not sure what exactly you are looking for. What pathos provides is different strategies/options for structuring your computations on available resources. So, you can build and configure a hierarchy of maps (and etc) that use threads, processes, MPI, schedulers, and etc. The interaction with MPI and schedulers have been pushed into pyina and interaction with threads and processes are in multiprocess. For example, with pyina, there are options for how you want to structure the workers (i.e. split a task list evenly between the workers or use a pool of workers).

@eljost
Copy link
Author

eljost commented Oct 12, 2023

Thansk for getting back to me so quickly. I was looking for something along the line as in Dask.distributed:

There I can create a LocalCluster with a worker that has certain (abstract) resources, e.g., CPU cores:

cluster_kwargs = {
    # One worker per cluster with multiple threads
    "threads_per_worker": ncores,
    "n_workers": 1,
    # Register available cores as resource
    "resources": {
        "CPU": ncores,
    },
}

By specifying the ressource use of a task I can submit tasks that use different amount of CPU cores.

client.submit(
    func,
    image,
    resources={
        "CPU": pal_parallel,
    },
)

Given a set of 10 tasks on a machine with 8 CPU cores, this way I can submit the first 8 tasks with "CPU": 1 and the remaining two tasks with "CPU": 4, to make efficient use of my available cores. Dask prevents any oversubscription, e.g. the first task with "CPU": 4 will only ever start, after 4 jobs with "CPU": 1 finished.

Is something easily possible with pathos, or do I have to do the resource management by hand/manually?
If this is not possible it is perfectly fine; then I'll close this issue.

All the best
Johannes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants