-
Notifications
You must be signed in to change notification settings - Fork 43
Snakefile config
The config.yaml
file contains a list of parameters that are read in by the Snakefile. Instead of editing the Snakefile whenever you want to try to change some parameter, just create a new copy of the config.yaml
file. Now thats what I call reproducibility.
The config.yaml
file looks something like this:
path:
root: /path/to/your/project/folder/on/the/cluster
scratch: $SCRATCH_FOLDER_VARIABLE_SPECIFIC_TO_YOUR_CLUSTER
folder:
data: dataset
logs: logs
assemblies: assemblies
...
scripts:
kallisto2concoct: kallisto2concoct.py
prepRoary: prepareRoaryInput.R
binFilter: binFilter.py
...
cores:
fastp: 4
megahit: 48
crossMap: 24
...
params:
cutfasta: 10000
assemblyPreset: meta-sensitive
assemblyMin: 1000
...
envs:
metagem: metagem
metawrap: metawrap
prokkaroary: prokkaroary
The root
path will be automatically set by the metaGEM.sh
parser to be the current folder you are submitting jobs from.
This is where folders will be created to store the generated files:
~/cluster_login_home/
|-project_X/
|--root/
|---logs
|---dataset
|---qfiltered
|---assemblies
...
The scratch
path is cluster specific, and you will likely need to consult your the wiki for your institutions cluster to determine how it should be set. Generally there should be some directory for high I/O jobs, usually called something like $SCRATCHDIR
or $TMPDIR
or $TMP
. The Snakefile
assumes that this variable has a unique location for each job submission. You should not set the scratch
path to be a specific directory if you are submitting jobs in parallel, as this may result in multiple jobs copying and reading files from the same temporary directory and result in errors.
- Quality filter reads with fastp
- Assembly with megahit
- Draft bin sets with CONCOCT, MaxBin2, and MetaBAT2
- Refine & reassemble bins with metaWRAP
- Taxonomic assignment with GTDB-tk
- Relative abundances with bwa
- Reconstruct & evaluate genome-scale metabolic models with CarveMe and memote
- Species metabolic coupling analysis with SMETANA