Running a looper pipeline ad hoc #538

nleroy917 · 2024-11-26T00:59:48Z

I wonder if there are either 1) solutions for this or 2) easy ways to add the ability to run a looper pipeline in an ad hoc manner. What I mean by that is this: occasionally, the overhead of a traditional workflow can be a bit daunting, but I really enjoy the ease of dispatching off jobs through slurm+looper.

I would love to replace traditional bash for loops with looper calls.

An example

I have a folder with hundreds of mixed-type files. Some of these might be bedGraph files. I want to convert these to .bw format. I can use bigtools bedGraphToBigWig. Traditionally, I might just use a for loop:

for file in *.bdg; do
  bigtools bedGraphToBigWig $file $file.bw
done;

But this takes awhile since it goes one-by-one, and there are hundreds. I'd love to fire them all off at once using looper and slurm:

ls *.bdg | looper run "bigtools bedGraphToBigWig {$1} {$1}.bw"

I suppose I am trying to identify or nail-down a potential gap between traditional workflows and the flexibility researchers often need for quick, ad hoc job submission.

The text was updated successfully, but these errors were encountered:

nleroy917 · 2024-11-26T02:36:16Z

I guess the conditions for this to be useful would be:

Extremely small PEP (one sample attribute)
Extremely simple pipeline (bash or python one liner)
Benefits from parallelization

vreuter · 2024-11-26T10:41:31Z

@nleroy917 this is a good idea. IIRC, way back in time, @nsheff had an example or two like this which sort of "pushed the limits" "/ thought outside the box" (if I'm permitted some clichés) of looper in this way, maybe he has already a working example or something closest to this which would represent a good starting point?

nleroy917 · 2024-12-03T18:09:39Z

From infrastructure on December 3rd, 2024:

Theres two things to solve:

What to do with the command template?
Maybe using -y to give it a command template (command-extra-override) is a way to provide a command template when there was none to begin with?
Can we make a PEP on the fly given some way of info?
Sure... we can make it accept stdin and then what I wrote would work...?

nleroy917 · 2024-12-04T15:20:16Z

Just putting here for reference, I went down the rabbit hole slightly more and it is possible to parallelize natively using bash; just use xargs:

ls *.bdg | xargs -n 1 -P $(nproc) -I {} bash -c 'bigtools bedGraphToBigWig "{}" "{}.bw"'

Only works when $(nproc) returns a value greater than one of course... so you still would need to allocate some cores for yourself. Its an interesting stop-gap, but I still think the looper version proposed above would be way better.

github-project-automation bot added this to PEP Nov 26, 2024

nleroy917 added enhancement help wanted question UX labels Nov 26, 2024

nleroy917 changed the title ~~Running a looper pipeline _ad hoc_~~ Running a looper pipeline *ad hoc* Nov 26, 2024

nleroy917 changed the title Running a looper pipeline *ad hoc* Running a looper pipeline ad hoc Nov 26, 2024

donaldcampbelljr added this to the v2.1.0 milestone Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running a looper pipeline ad hoc #538

Running a looper pipeline ad hoc #538

nleroy917 commented Nov 26, 2024 •

edited

Loading

nleroy917 commented Nov 26, 2024

vreuter commented Nov 26, 2024

nleroy917 commented Dec 3, 2024

nleroy917 commented Dec 4, 2024 •

edited

Loading

Running a looper pipeline ad hoc #538

Running a looper pipeline ad hoc #538

Comments

nleroy917 commented Nov 26, 2024 • edited Loading

An example

nleroy917 commented Nov 26, 2024

vreuter commented Nov 26, 2024

nleroy917 commented Dec 3, 2024

Theres two things to solve:

nleroy917 commented Dec 4, 2024 • edited Loading

nleroy917 commented Nov 26, 2024 •

edited

Loading

nleroy917 commented Dec 4, 2024 •

edited

Loading