Skip to content

Feature Request: Run jobs through the future API #126

@zeehio

Description

@zeehio

Idea:

  • With brickster&DBI I can use dplyr to write SQL queries. That's great!
  • With a brickster&future I can use purrr/furrr to parallellize loops on databricks. Let's do this!

The future API is a well designed API to distribute R work among workers. It's nice because it allows using furrr (equivalent to purrr) but with futures, it supports nice progress bars, and it is performant and robust. It's been around for years and it is very well maintained.

https://future.futureverse.org/

Future separates "what to parallelize" which is defined by a package developer (e.g. "As a pkg developer I want this expensive computation to run in parallel") from "how to parallelize" (e.g. As a package user I want "that parallelization to use 4 cores from my laptop" or "to run this heavy thing in a Slurm cluster of computers").

It provides documentation on how to define a new "future backend"

https://cran.r-project.org/web/packages/future/vignettes/future-6-future-api-backend-specification.html

I'd love to have a future backend that sends heavy calculations to a databricks cluster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    open discussionDiscussion welcomed for a decision yet to be made

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions