How to investigate an outbreak using civet
Find detailed information about input file options here.
In brief, either provide an input csv file, an ID string or a config yaml file via the -i / --input
option.
Alternatively, define a search criteria with -fm / --from-metadata
.
civet summarises information around a set of sequences of interest. It relies on the user providing a background tree, alignment and metadata file. civet can be run on CLIMB with the --CLIMB
flag, remotely with the -r / --remote
flag or locally by specifying a local directory. If no options are given, civet will automatically look for a directory called civet-cat
in the current working directory.
Find detailed information about background data options here.
The data files civet looks for are:
cog_global_alignment.fasta
cog_global_metadata.csv
cog_global_tree.nexus
-
Activate the environment
conda activate civet
-
To run a simple analysis, you can input your prefered options via the command line. Run:
civet -i input.csv -f input.fasta -r -uun <your-user-name>
where <your-user-name>
represents your unique CLIMB identifier.
- If you're going to be running a similar analysis again and again, a configuration file simplifies the process. The config file can be generated manually, or by running the command line options you require in addition to the
--generate-config
flag.
civet -fm adm2=Edinburgh sample_date=2020-10-01:2020-01-05 -d civet-cat -sc EDIN --generate-config
Running civet is then simple:
civet -i config.yaml
- Config keys are insensitive to '-' and '_' differences.
Example:
--from-metadata
can be added to the config.yaml file asfrom_metadata:
orfrom-metadata
- All command line options are available to be input in the config file other than the following exceptions:
input.csv is specified with
query: input.csv
ID string is specified withids: EGID001,EGID002
input-output options:
-i INPUT, --input INPUT
Input config file in yaml format, csv file (with
minimally an input_column header, Default=`name`) or
comma-separated id string with one or more query ids.
Example: `EDB3588,EDB3589`.
-fm [FROM_METADATA [FROM_METADATA ...]],
--from-metadata [FROM_METADATA [FROM_METADATA ...]]
Generate a query from the metadata file supplied.
Define a search that will be used to pull out
sequences of interest from the large phylogeny. E.g.
-fm adm2=Edinburgh sample_date=2020-03-01:2020-04-01
-o OUTDIR, --outdir OUTDIR
Output directory. Default: current working directory
-f FASTA, --fasta FASTA
Optional fasta query.
--max-ambiguity MAX_AMBIGUITY
Maximum proportion of Ns allowed to attempt analysis.
Default: 0.5
--min-length MIN_LENGTH
Minimum query length allowed to attempt analysis.
Default: 10000
data source options:
-d DATADIR, --datadir DATADIR
Local directory that contains the data files. Default:
civet-cat
-m BACKGROUND_METADATA, --background-metadata BACKGROUND_METADATA
Custom metadata file that corresponds to the large
global tree/ alignment. Should have a column
`sequence_name`.
--CLIMB Indicates you're running CIVET from within CLIMB, uses
default paths in CLIMB to access data
-r, --remote Remotely access lineage trees from CLIMB
-uun UUN, --your-user-name UUN
Your CLIMB COG-UK username. Required if running with
--remote flag
--input-column INPUT_COLUMN
Column in input csv file to match with database.
Default: name
--data-column DATA_COLUMN
Option to search COG database for a different id type.
Default: COG-UK ID
report customisation:
-sc SEQUENCING_CENTRE, --sequencing-centre SEQUENCING_CENTRE
Customise report with logos from sequencing centre.
--display-name DISPLAY_NAME
Column in input csv file with display names for seqs.
Default: same as input column
--sample-date-column SAMPLE_DATE_COLUMN
Column in input csv with sampling date in it.
Default='sample_date'
--colour-by COLOUR_BY
Comma separated string of fields to display as
coloured dots rather than text in report trees.
Optionally add colour scheme eg adm1=viridis
--tree-fields TREE_FIELDS
Comma separated string of fields to display in the
trees in the report. Default: country
--label-fields LABEL_FIELDS
Comma separated string of fields to add to tree report
labels.
--date-fields DATE_FIELDS
Comma separated string of metadata headers containing
date information.
--node-summary NODE_SUMMARY
Column to summarise collapsed nodes by. Default =
Global lineage
--table-fields TABLE_FIELDS
Fields to include in the table produced in the report.
Query ID, name of sequence in tree and the local tree
it's found in will always be shown
--include-snp-table Include information about closest sequence in database
in table. Default is False
--no-snipit Don't run snipit graph
--include-bars Render barcharts in the output report
--omit-appendix Omit the appendix section. Default=False
--private remove adm2 references from background sequences.
Default=True
tree context options:
--distance DISTANCE Extraction from large tree radius. Default: 2
--up-distance UP_DISTANCE
Upstream distance to extract from large tree. Default:
2
--down-distance DOWN_DISTANCE
Downstream distance to extract from large tree.
Default: 2
--collapse-threshold COLLAPSE_THRESHOLD
Minimum number of nodes to collapse on. Default: 1
-p [PROTECT [PROTECT ...]],
--protect [PROTECT [PROTECT ...]]
Protect nodes from collapse if they match the search
query in the metadata file supplied. E.g. -p
adm2=Edinburgh sample_date=2020-03-01:2020-04-01
map rendering options:
--local-lineages
Contextualise the cluster lineages at local regional
scale. Requires at least one adm2 value in query csv.
--date-restriction
Chose whether to date-restrict comparative sequences
at regional-scale.
--date-range-start DATE_RANGE_START
Define the start date from which sequences will COG
sequences will be used for local context. YYYY-MM-DD
format required.
--date-range-end DATE_RANGE_END
Define the end date from which sequences will COG
sequences will be used for local context. YYYY-MM-DD
format required.
--date-window DATE_WINDOW
Define the window +- either side of cluster sample
collection date-range. Default is 7 days.
--map-sequences Map the sequences themselves by adm2, coordinates or
otuer postcode.
--map-info MAP_INFO columns containing EITHER x and y coordinates as a
comma separated string OR outer postcodes for mapping
sequences OR Adm2
--input-crs INPUT_CRS
Coordinate reference system for sequence coordinates
--colour-map-by COLOUR_MAP_BY
Column to colour mapped sequences by
run options
--cluster Run cluster civet pipeline. Requires -i/--input csv
--update Check for changes from previous run of civet. Requires
-fm/--from-metadata option in a config.yaml file from
previous run
-c, --generate-config
Rather than running a civet report, just generate a
config file based on the command line arguments
provided
-b, --launch-browser Optionally launch md viewer in the browser using grip
misc options:
--safety-level SAFETY_LEVEL
Level of anonymisation for users. Options: 0 (no
anonymity), 1 (no COGIDs on background data), 2 (no
adm2 on data). Default: 1
--tempdir TEMPDIR Specify where you want the temp stuff to go. Default:
$TMPDIR
--no-temp Output all intermediate files, for dev purposes.
--verbose Print lots of stuff to screen
--art Print art
-t THREADS, --threads THREADS
Number of threads
-v, --version show program's version number and exit
-h, --help