Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow option failOnIgnore causes workflow to hang #5291

Closed
kgalens opened this issue Sep 9, 2024 · 4 comments · Fixed by #5293
Closed

Workflow option failOnIgnore causes workflow to hang #5291

kgalens opened this issue Sep 9, 2024 · 4 comments · Fixed by #5293
Labels

Comments

@kgalens
Copy link

kgalens commented Sep 9, 2024

Bug report

Expected behavior and actual behavior

When using the ignore errorStrategy with the workflow option failOnIgnore, the pipeline hangs when there's a task failure.

Steps to reproduce the problem

workflow.nf

process process1 {
    input:
    val sample_id

    output:
    val sample_id, emit: sample_ids

    script:
    """
    if [[ $sample_id == "SAMP1" ]]; then
        exit 2
    fi
    ls -lah .*
    """
}

process process2 {
    input:
    val ready

    output:
    stdout

    script:
    """
    ls -lah .*
    """

}

workflow {
    input_channel = channel.of("SAMP1", "SAMP2", "SAMP3")
    process1(input_channel)
    process2(process1.out.sample_ids.collect())
}

Nextflow Config

workflow {
    failOnIgnore = true
}
process {
  errorStrategy = 'ignore'
}

I would expect that the workflow would complete with a non-zero exit status.

Program output

 N E X T F L O W   ~  version 24.05.0-edge

Launching `/path/to/workflows/nextflow/hello_world/main.nf` [infallible_cajal] DSL2 - revision: fe2c285334

executor >  local (3)
executor >  local (3)
[09/916c3a] process1 (2) [100%] 3 of 3, failed: 1 ✔
[-        ] process2     [  0%] 0 of 1
[f2/61fcf1] NOTE: Process `process1 (1)` terminated with an error exit status (2) -- Error is ignored

And it hangs and doesn't finish.

Environment

  • Nextflow version: [24.05.0-edge]
  • Java version: [openjdk version "21.0.2" 2024-01-16]
  • Operating system: [macOS]
  • Bash version: (use the command $SHELL --version) [zsh 5.9 (arm-apple-darwin22.1.0)]

Additional context

@adamrtalbot
Copy link
Collaborator

I can reproduce the issue on 24.08.0-edge:

> /usr/local/bin/nextflow-24.08.0-edge run .
N E X T F L O W  ~  version 24.08.0-edge
Launching `./main.nf` [focused_blackwell] DSL2 - revision: bc82ab126c
[af/ce2eeb] Submitted process > process1 (3)
[f7/ada6f2] Submitted process > process1 (1)
[12/e82491] Submitted process > process1 (2)
[f7/ada6f2] NOTE: Process `process1 (1)` terminated with an error exit status (2) -- Error is ignored

(hangs forever)

  • Nextflow: 24.08.0-edge
  • Java: openjdk 17.0.3
  • Shell: zsh 5.9 (x86_64-apple-darwin23.0)

@adamrtalbot
Copy link
Collaborator

failOnError.nextflow.log

@bentsherman
Copy link
Member

It looks like process1 completes, then the process2 task is scheduled, but never run:

Sep-09 20:52:35.981 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - Invoking task > process2 with params=id=4; index=1; values=[[SAMP2, SAMP3], true]
Sep-09 20:52:35.981 [Actor Thread 12] TRACE nextflow.processor.TaskProcessor - <process2> Process state changed to: StateObj[submitted: 1; completed: 0; poisoned: false ] -- finished: false
Sep-09 20:52:35.981 [Actor Thread 11] TRACE nextflow.processor.TaskProcessor - <process2> Control message arrived $ => groovyx.gpars.dataflow.operator.PoisonPill@e3b762d
Sep-09 20:52:35.982 [Actor Thread 11] TRACE nextflow.processor.TaskProcessor - <process2> Poison pill arrived; port: 1
Sep-09 20:52:35.982 [Actor Thread 2] TRACE nextflow.processor.TaskContext - Binding names for 'process2' > []
Sep-09 20:52:35.983 [Actor Thread 12] TRACE nextflow.processor.StateObj - <process2> State before poison: StateObj[submitted: 1; completed: 0; poisoned: false ]
Sep-09 20:52:35.983 [Actor Thread 12] TRACE nextflow.processor.TaskProcessor - <process2> Process state changed to: StateObj[submitted: 1; completed: 0; poisoned: true ] -- finished: false
Sep-09 20:52:35.986 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - [process2] Store dir not set -- return false
Sep-09 20:52:35.989 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - [process2] Cacheable folder=null -- exists=false; try=1; shouldTryCache=false; entry=null
Sep-09 20:52:35.991 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - [process2] actual run folder: /home/bent/projects/sketches/work/d3/ba17a52e118f36fd05c1434927dd8a
Sep-09 20:52:35.995 [Actor Thread 2] TRACE n.processor.TaskPollingMonitor - Scheduled task > TaskHandler[id: 4; name: process2; status: NEW; exit: -; error: -; workDir: /home/bent/projects/sketches/work/d3/ba17a52e118f36fd05c1434927dd8a]
Sep-09 20:52:35.996 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - <process2> After run
Sep-09 20:52:35.996 [Actor Thread 11] TRACE nextflow.processor.TaskProcessor - <process2> After stop
Sep-09 20:52:36.036 [Task monitor] TRACE n.processor.TaskPollingMonitor - Scheduler queue size: 0 (iteration: 9)

In fact, if I comment out process2 then the run finishes. Strange that it only happens with failOnIgnore.

Right now I suspect there is some race condition in the task polling monitor that is causing it to not submit the task when it should be able to.

@bentsherman
Copy link
Member

Bingo:

protected int submitPendingTasks() {
int count = 0
def itr = pendingQueue.iterator()
while( itr.hasNext() && session.isSuccess() ) {

boolean isSuccess() { !aborted && !cancelled && !failOnIgnore }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants