ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

physnano · 2024-09-30T15:51:46Z

My workflow keeps failing at the reference_assembly:map_reads step:

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)'

Caused by:
  Process `pipeline:reference_assembly:map_reads (1)` terminated with an error exit status (140)

Command executed:

  minimap2 -t 1 -ax splice -uf genome_index.mmi seqs.fastq.gz        | samtools view -q 40 -F 2304 -Sb -        | seqkit bam -j 1 -x -T 'AlnContext: { Ref: "GRCh38.primary_assembly.genome.fa", LeftShift: -24,
      RightShift: 24, RegexEnd: "[Aa]{8,}",
      Stranded: True,Invert: True, Tsv: "internal_priming_fail.tsv"} ' -        | samtools sort --write-index -@ 1 -o "E3_rep2_reads_aln_sorted.bam##idx##E3_rep2_reads_aln_sorted.bam.bai" - ;
  ((cat "E3_rep2_reads_aln_sorted.bam" | seqkit bam -s -j 1 - 2>&1)  | tee E3_rep2_read_aln_stats.tsv ) || true
  
  # Add sample id header and column
  sed "s/$/E3_rep2/" "E3_rep2_read_aln_stats.tsv"         | sed "1 s/E3_rep2/sample_id/" > tmp
  mv tmp "E3_rep2_read_aln_stats.tsv"
  
  if [[ -s "internal_priming_fail.tsv" ]];
      then
          tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $4 }' - > "context_internal_priming_fail_start.fasta"
          tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $6 }' - > "context_internal_priming_fail_end.fasta"
  fi

Command exit status:
  140

Command output:
  (empty)

Error code 140 suggests Memory/CPU constraint, however adding the following to the config file has not resolved the issue:

process {
    withName: 'makeReport' {
    queue = 'himem'
    memory = '512.GB'
    }

    withName: 'reference_assembly:map_reads' {
    memory = '32.GB'
    } 
}

--->

WARN: There's no process matching config selector: reference_assembly:map_reads

The text was updated successfully, but these errors were encountered:

nrhorner · 2024-09-30T19:51:03Z

Hi @physnano

Just the process name should be included in the process selector like so:

    withName: 'map_reads' {
    memory = '32.GB'
    }

physnano · 2024-10-03T15:50:46Z

Thanks @nrhorner, that along with clusterOptions = '--qos=long' seemed to help. Although now I am seeing the following:

ERROR ~ Error executing process > 'pipeline:split_bam (2)'

Caused by:
  Process `pipeline:split_bam (2)` terminated with an error exit status (137)

Command executed:

  n=`samtools view -c isob11_rep2_reads_aln_sorted.bam`
  if [[ n -lt 1 ]]
  then
      echo 'There are no reads mapping for isob11_rep2. Exiting!'
      exit 1
  fi
  
  re='^[0-9]+$'
  
  if [[ 50000 =~ $re ]]
  then
      echo "Bundling up the bams"
      seqkit bam -j 4 -N 50000 isob11_rep2_reads_aln_sorted.bam -o  bam_bundles/
      let i=1
      for b in bam_bundles/*.bam; do
          echo $b
          newname="isob11_rep2_batch_${i}.bam"
          mv $b $newname
         ((i++))
      done
  else
      echo 'no bundling'
      ln -s isob11_rep2_reads_aln_sorted.bam isob11_rep2_batch_1.bam
  fi

Command exit status:
  137

It seems that many of the steps of this workflow do not have sufficient default memory allocated to the (sub)processes...

nrhorner · 2024-10-10T06:25:36Z

Hi @physnano

Ok, thanks for the update. We will review memory allocations for this workflow. Would you be able to share a bit of information about your data? How many samples and total number of reads are you using? ALso which version of the workflow and the command you used?

Thanks,

Neil

physnano · 2024-10-22T16:15:38Z

Hi @nrhorner , In my case 3 replicates for 2 samples (6 total) were split across two PromethION flow cells, so ~40-50M raw reads per individual barcode. The makeReport step spikes to ~200GB according to my monitoring. I am using the latest version v1.4.0 --> Command used:

nextflow run ${wfPath}wf-transcriptomes \
    --fastq ${fqPath} \
    --de_analysis \
    --ref_genome ${refPath}GRCh38.primary_assembly.genome.fa \
    --ref_annotation ${refPath}gencode.v46.primary_assembly.annotation.gtf \
    --ref_transcriptome ${refPath}gencode.v46.transcripts.fa \
    --sample_sheet ${wfPath}sample_sheet.csv \
    --cdna_kit SQK-PCB114 \
    --out_dir ${wfPath}outdir-de \
    -profile singularity \
    -c ${wfPath}wf-transcriptomes/nextflow.config \
    --threads 4 \
    -resume

nrhorner · 2024-11-06T07:34:32Z

Hi @physnano

It's not good that the report generation step is using so much memory. I will investigate this.

nrhorner · 2024-12-16T20:32:44Z

@physnano

Would you be able to try out version 1.6.0 and see if memory consumption has reduced please?

physnano · 2024-12-18T20:33:15Z

Hi @nrhorner, I am rerunning on v1.6.0 today and will let you know how it goes when it completes!

physnano · 2024-12-29T18:00:27Z

Hi @nrhorner , I have run v1.6.0 and it completes, however since I am running the workflow via singularity profile on a cluster I needed to specify job runtimes via the config profile (clusterOptions = '--qos=long') mainly for the map reads step.

Also I am noticing that my "results_dge.tsv" file has raw read counts (I reran the script and same result) instead of the "gene" "logFC" "logCPM" "F" "PValue" "FDR" columns expected of the DGE analysis. The weird thing is this doesn't happen when I processed a different dataset (PacBio reads) with a nearly identical script, so I am confused as to why this might occur... Any ideas why this would be the case? (I can share the log file if needed)

physnano · 2025-01-17T16:52:01Z

closing as this final issue identified in #139 .

physnano added the question Further information is requested label Sep 30, 2024

physnano closed this as completed Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

physnano commented Sep 30, 2024 •

edited

Loading

nrhorner commented Sep 30, 2024

physnano commented Oct 3, 2024 •

edited

Loading

nrhorner commented Oct 10, 2024

physnano commented Oct 22, 2024

nrhorner commented Nov 6, 2024

nrhorner commented Dec 16, 2024

physnano commented Dec 18, 2024

physnano commented Dec 29, 2024 •

edited

Loading

physnano commented Jan 17, 2025

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

Comments

physnano commented Sep 30, 2024 • edited Loading

nrhorner commented Sep 30, 2024

physnano commented Oct 3, 2024 • edited Loading

nrhorner commented Oct 10, 2024

physnano commented Oct 22, 2024

nrhorner commented Nov 6, 2024

nrhorner commented Dec 16, 2024

physnano commented Dec 18, 2024

physnano commented Dec 29, 2024 • edited Loading

physnano commented Jan 17, 2025

physnano commented Sep 30, 2024 •

edited

Loading

physnano commented Oct 3, 2024 •

edited

Loading

physnano commented Dec 29, 2024 •

edited

Loading