Essentials
Running Information
After invoking the pipeline, nextflow will report the progress to stdout, with each row representing a process.
N E X T F L O W ~ version 23.10.1
Launching `/thunderData/pipeline/starscope/scRNA-seq/main.nf` [adoring_ekeblad] DSL2 - revision: 8e27902b23
executor > slurm (9)
[e0/1d00d4] process > scRNAseq:CAT_FASTQ (human_test) [100%] 1 of 1 ✔
[37/8c0795] process > scRNAseq:TRIM_FASTQ (human_test) [100%] 1 of 1 ✔
[20/1edf9b] process > scRNAseq:MULTIQC (human_test) [100%] 1 of 1 ✔
[5a/e0becc] process > scRNAseq:STARSOLO (human_test) [100%] 1 of 1 ✔
[02/15a3b1] process > scRNAseq:CHECK_SATURATION (human_test) [100%] 1 of 1 ✔
[09/e25428] process > scRNAseq:GET_VERSIONS (get_versions) [100%] 1 of 1 ✔
[48/703c20] process > scRNAseq:FEATURESTATS (human_test) [100%] 1 of 1 ✔
[79/cd2784] process > scRNAseq:GENECOVERAGE (human_test) [100%] 1 of 1 ✔
[e6/808adf] process > scRNAseq:REPORT (human_test) [100%] 1 of 1 ✔
Completed at: 09-May-2024 09:07:55
Duration : 25m 9s
CPU hours : 3.7
Succeeded : 9
When encountering any error, nextflow will interrupt running and print error message to stderr directly.
User could also check the error message from running log file .nextflow.log
$ head .nextflow.log
May-09 08:42:37.523 [main] DEBUG nextflow.cli.Launcher - $> nextflow run /thunderData/pipeline/starscope/scRNA-seq -c /thunderData/pipeline/nf_scRNAseq_config/latest/thunderbio_human_config --input sampleList.csv
May-09 08:42:37.924 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 23.10.1
May-09 08:42:38.096 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/xzx/.nextflow/plugins; core-plugins: nf-amazon@2.1.4,nf-azure@1.3.3,nf-cloudcache@0.3.0,nf-codecommit@0.1.5,nf-console@1.0.6,nf-ga4gh@1.1.0,nf-google@1.8.3,nf-tower@1.6.3,nf-wave@1.0.1
May-09 08:42:38.147 [main] INFO o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
May-09 08:42:38.150 [main] INFO o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
May-09 08:42:38.163 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
May-09 08:42:38.234 [main] INFO org.pf4j.AbstractPluginManager - No plugins
May-09 08:42:42.225 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /thunderData/pipeline/starscope/scRNA-seq/nextflow.config
May-09 08:42:42.231 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /thunderData/pipeline/nf_scRNAseq_config/latest/thunderbio_human_config_v2
May-09 08:42:42.233 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /thunderData/pipeline/starscope/scRNA-seq/nextflow.config
Nextflow Log CLI
After each invokation, the pipeline running information could be retrieved by nextflow log
command, and user could check the RUN NAME
, STATUS
and SESSION ID
from the command output.
$ nextflow log
TIMESTAMP DURATION RUN NAME STATUS REVISION ID SESSION ID COMMAND
2024-05-09 08:42:44 25m 12s adoring_ekeblad OK 8e27902b23 8670925f-ce5a-4f7a-b327-a98b288e6aa6 nextflow run /thunderData/pipeline/starscope/scRNA-seq -c /thunderData/pipeline/nf_scRNAseq_config/latest/thunderbio_human_config --input sampleList.csv
Work Dir and Intermediate Files
Each task of the process will be conducted in a sub-directory of the workDir
set in
nextflow configuration file. By default, StarScope set this to work
folder
under project running directory. To confirm each tasks’ working directory, user
will have to check the task hash id with command below. The adoring_ekeblad
is
the RUN NAME
from nextflow log
output.
$ nextflow log adoring_ekeblad -f hash,name,exit,status
e0/1d00d4 scRNAseq:CAT_FASTQ (human_test) 0 COMPLETED
09/e25428 scRNAseq:GET_VERSIONS (get_versions) 0 COMPLETED
37/8c0795 scRNAseq:TRIM_FASTQ (human_test) 0 COMPLETED
20/1edf9b scRNAseq:MULTIQC (human_test) 0 COMPLETED
5a/e0becc scRNAseq:STARSOLO (human_test) 0 COMPLETED
79/cd2784 scRNAseq:GENECOVERAGE (human_test) 0 COMPLETED
48/703c20 scRNAseq:FEATURESTATS (human_test) 0 COMPLETED
02/15a3b1 scRNAseq:CHECK_SATURATION (human_test) 0 COMPLETED
e6/808adf scRNAseq:REPORT (human_test) 0 COMPLETED
To check CAT_FASTQ process task working directory, we could use it’s hash_id (e0/1d00d4
) to
locate the folder in work
:
$ ls -a work/e0/1d00d49d7d562790a4d4f5993852ba/
. .command.begin .command.log .command.run .command.trace human_test_1.merged.fq.gz human_test.R1.fq.gz
.. .command.err .command.out .command.sh .exitcode human_test_2.merged.fq.gz human_test.R2.fq.gz
The work directory always contains several important hidden files:
.command.out
STDOUT from tool..command.err
STDERR from tool..command.log
contains both STDOUT and STDERR from tool..command.begin
created as soon as the job launches..exitcode
created when the job ends, with exit code..command.trace
logs of compute resource usage..command.run
wrapper script used to run the job..command.sh
process command used for this task.
$ cat work/e0/1d00d49d7d562790a4d4f5993852ba/.command.sh
#!/bin/bash -ue
ln -s human_test.R1.fq.gz human_test_1.merged.fq.gz
ln -s human_test.R2.fq.gz human_test_2.merged.fq.gz
Running in Background
The nextflow pipeline could be execute in background, with -bg
option:
starscope gex --input sampleList.csv --config custom_config -bg
Resume Previous Run
One of the core features of Nextflow is the ability to cache task executions and re-use them in subsequent runs to minimize duplicate work. Resumability is useful both for recovering from errors and for iteratively developing a pipeline. It is similar to checkpointing, a common practice used by HPC applications.
To resume from previous run, please use the command below after entering the project running directory:
starscope gex --input sampleList.csv --config custom_config -bg -resume
Or resume from a specific run with session ID (check from nextflow log
output):
starscope gex --input sampleList.csv --config custom_config -bg -resume 8670925f-ce5a-4f7a-b327-a98b288e6aa6
Additional resources: