From 262a2fd4e040e847cea1dc9b1fca0c5e80372d05 Mon Sep 17 00:00:00 2001 From: Marian Freisleben <115372379+infinity-a11y@users.noreply.github.com> Date: Thu, 22 Aug 2024 09:01:24 +0200 Subject: [PATCH] Rename Manual.html to user-manual.html --- docs/Manual.html | 3373 ----------------------------------------- docs/user-manual.html | 1 + 2 files changed, 1 insertion(+), 3373 deletions(-) delete mode 100644 docs/Manual.html create mode 100644 docs/user-manual.html diff --git a/docs/Manual.html b/docs/Manual.html deleted file mode 100644 index 32dcf5e..0000000 --- a/docs/Manual.html +++ /dev/null @@ -1,3373 +0,0 @@ - - - - -
- - - - - - - - -PhyloTrace Version 1.5.0
-Web: www.phylotrace.com
-Contact: info@phylotrace.com
-Github: https://github.com/infinity-a11y/PhyloTrace
-PhyloTrace is a platform for bacterial pathogen monitoring on a -genomic level. Its components evolve around Core-Genome Multilocus -Sequence Typing (cgMLST) and Antimicrobial Resistance Screening. Complex -analyses and computation are wrapped into an appealing and -easy-to-handle graphical user interface. Users build a local database -comprising analyzed isolates, manageable directly with the application. -The visualization of isolate relationship and genetic profile is highly -interactive, aiding to reveal patterns explaining outbreak dynamics and -events by connecting genomic information with epidemiologic variables. -PhyloTrace achieves universal compatibility by assigning unique hashes -based on sequence and allele information. This implementation enables -efficient comparison and sharing of inter-lab results.
-PhyloTrace is supposed to be used for research and -academic purposes only.
-Install the application by following the steps disclosed in the README -document on GitHub. Launch PhyloTrace from the applications menu of your -system. The app runs in the system’s default browser. PhyloTrace is -optimized for Chrome, Chromium, Brave as well as Opera and Vivaldi. -Avoid using Firefox as some elements are distorted or not visible at -all.
-PhyloTrace doesn’t force but encourages to build a local database and
-iteratively add new bacterial isolates together with respective allelic
-profile and meta data. Upon first launch either load an already existing
-database or create a new one.
To start completely from scratch with no previously built database
-available, select + Create New
on the start screen
-(Figure 1) and choose a path where the database should
-be built. A folder named Database will be created in the
-respective location. Make sure to select a location that has writing and
-reading permission. Since there are no entries added or schemes
-downloaded yet, the database is empty and you are immediately directed
-to the > Manage Scheme
tab after
-clicking on Load
. The drop down menu lists all bacterial
-species that are available in the cgMLST.org Nomenclature Server
-(h25). Selecting a species will display information about the
-scheme, such as the seed genome or the curators. Pick the species you
-want to work with and press Download
. You can now proceed
-to type the first assemblies belonging to the respective bacterial
-species (see 3 Allelic Typing).
-
If you or your working group / institution has already used
-PhyloTrace before, they might have saved the respective database folder
-on the internal file system. Click Browse
on the start
-screen (Figure 1) and select the path of the database
-folder. PhyloTrace will automatically recognize if the selected folder
-contains compatible data.
The database is structured by folders for each bacterial species you
-have worked with (see Figure 3). Therefore, when
-loading a local database, select which species you want to work with in
-this session. For example, if the database contains entries typed with
-Bordetella pertussis, Burkholderia pseudomallei and
-Klebsiella pneumoniae schemes, you can choose between one of
-them. Proceed by clicking on the Load
button. The database
-section containing data regarding the selected strain will load.
-
If the already existing database doesn’t include the strain you want
-to work with, pick any arbitrary strain and load the database. Then head
-over to the > Manage Scheme
tab and
-select your desired bacterial species from the list. Proceed to download
-the scheme files comprising gene variants and scheme info by clicking
-Download
. After the download is complete you are prompted
-to load the database again (see Figure 4). Select the
-strain which was just downloaded and confirm. Proceed to start the first
-typing process for this species (see 3 Allelic
-Typing).
The currently loaded species/scheme is displayed on the top of the -sidebar below the PhyloTrace logo. If there is more than one scheme -available in the current database directory, it can be changed in the -same session. To switch, click the button next to the displayed scheme -and choose the new one. After confirmation, the database is loaded with -the newly selected scheme. If you like to switch to a scheme present on -a database located in a different directory, restart the app and select -the respective path linking to this database folder.
-The typing process is the fundamental step which generates the data -(i.e. the allelic profile) for the genomic comparison. The method -applied is based on core-genome multi locus sequence typing (cgMLST). An -allelic profile is generated for selected bacterial isolates. The -allelic profile determines, which allele variants are present for each -gene in the cgMLST scheme. If the process was successful, the results, -i.e. the allelic profile of the respective isolate as well as -epidemiologic meta data, are added as entry to the local database (see -4 Database Browser). By repeating this -process with further isolates, a foundation for a library of bacterial -isolates is created. Technically there is no limit for the number of -entries in the database, although the performance might be reduced if -there are several hundred entries in the currently loaded scheme -(depends on system capacity). The variant calling and alignment steps of -the typing process are facilitated by BLAT (BLAST-like Alignment Tool) -for whole genome assemblies and KMA (k-mer alignment) algorithm -for raw reads 1,2. Allelic -typing for raw reads will be available soon.
-In the sidebar of the
-> Allelic Typing
tab select
-☑ Single | ☐ Multi
(see Figure 5).
-Clicking on Browse
will open a window so that an assembly
-file from the local system can be selected. Any of the commonly used
-FASTA file formats (.fasta, .fna or .fa) are accepted. Selecting an
-incompatible file type will inhibit the start of the typing process.
-Make sure that the assembly files contains sequence data of a bacterial
-species that matches the selected scheme. Afterwards the basic meta data
-(i.e. Assembly ID
, Assembly Name
,
-Isolation Date
, Host
, Country
,
-City
) can be declared. Filling out every field is not
-mandatory if you don’t wish to or don’t have the respective information.
-Note, that the Assembly ID
has to be unique, proceeding is
-not possible if the same name is already present in the local database.
-Except for the Assembly ID
these isolate variables can
-still be change afterwards in the
->> Browse Entries
tab. Clicking on
-Confirm
will save the metadata and render the process
-executable.
Before starting the process, select whether to save the assembly file
-to the local database. If an assembly file is not saved, screening for
-resistance and virulence genes will later not be available for the
-respective isolate (see 5 AMR Screening).
-The assembly file can not be added in retrospect. Pressing
-Start
will launch the typing process. The alignment
-algorithm is now searching the selected assembly for the alleles
-contained in the scheme and checking which variant is present. The
-loading bar provided feedback on this progress. The duration varies
-depending on the capability of your system and the number of alleles and
-variants included in the scheme and can take a while. Once 100% is
-reached the typing results are evaluated and appended to the local
-database. Database changes in the tab
->> Browse Entries
are automatically
-inhibited during this finalization step to avoid issues. After this last
-step is finished you can reset to start another one. If the typing was
-successful, the addition of a new entry is indicated by a pulsating
-button in the >> Browse Entries
tab.
-Click this button to load the updated database including the newly added
-entry.
Multi typing is recommended for larger collections of several
-assemblies belonging to the same species. This saves the time needed to
-start the process one by one. In the sidebar of the
-> Allelic Typing
tab switch to
-☐ Single | ☑ Multi
and click on Browse
to
-select a folder containing the assemblies. If you plan to type just a
-subset of the selected folder, untick the unwanted assemblies in the
-table below and choose a compatible Assembly ID
. The multi
-typing process is only startable if no incompatible files are ticked.
-Because all the files are seamlessly piped into the process the basic
-meta data can be only declared once for all assemblies. The values
-declared for Isolation Date
, Host
,
-Country
and City
will apply for every new
-entry that is produced in this multi typing process. The
-Assembly Name
will first be identical with the
-Assembly ID
, representing unique identifiers of the
-assembly. The file name of the respective assembly is automatically
-assigned to both. However all of the basic meta data values, except
-Assembly ID
, can be changed in retrospect once the entry
-has been successfully added to the database. After confirming the
-metadata the Start
button will be rendered. Note, that if
-the assembly file is selected not to save to the local database,
-screening for resistance and virulence genes will not be available for
-the respective isolate later (see 5 AMR
-Screening). The assembly file can also not be added in retrospect.
-Upon starting the multi typing process, a field where the progress is
-logged is displayed. The process can be monitored with this overview.
-The log of the multi typing process can be downloaded as text file.
-Notifications, providing feedback about the status of the multi typing
-process, show up for every relevant event, such as the (un-)successful
-addition of an entry or the finalization of the multi typing process. A
-pending typing process can be canceled by clicking
-Terminate
. During the process is in the typing or alignment
-phase (indicated by Processing in the log), you can keep
-working with PhyloTrace, e.g. visualizing or editing the local database.
-However, just as for single typing, the app is automatically recognizing
-when the process is switching to the evaluation and addition phase
-(indicated by Attaching in the log), hence any database changes
-are prohibited. After each successful addition you can reload the
-database in the >> Browse Entries
-tab, to inspect the new entry. Unsuccessful typing attempts are captured
-in the log and in the multi typing summary once the process has been
-finalized (see Figure 6). Individual results can be
-inspected by choosing them from the selector in the right column.
-Displayed are only notable events in which e.g. a new allele variant was
-found or unsuccessful allele calling attempts. Press Reset
-to start another multi typing process.
After each variant from the cgMLST scheme has been searched and
-aligned to the assembly, the results are evaluated to determine which
-allele variant is present for each locus. This is conducted by a
-conditional multi-step process that ensures correctness and minimizes
-false positive assignments. The steps and the logic applied in this
-process are shown in Figure 7. If none of the variants
-from the scheme could be found in the bacterial isolate, the presence of
-a potential new gene variant is evaluated (see 3.3.1 New Variant Validation).
-
In case none of the variants from the locally available scheme match
-perfectly, the locus is checked for the existence of a new and valid
-variant. To ascertain whether this variant is valid, the locus must
-fulfill conditions such that it is likely to encode a gene. If there are
-multiple different nucleotide regions in the assembly possibly coding
-for a gene, each of them is sorted and passed through the validation
-logic (see Figure 8).
Unlike the genetic distance between a pair of sequences, summing up -the number of positions in which nucleotides are different, the -calculation of allelic distance considers entire loci/alleles for the -calculation. To receive the allelic distance, algorithms based on the -distance calculation method employed by Hamming in 1950, originally -meant for information technology, are used3. The Hamming distance is a -metric that quantifies the discrepancy between two strings of equal -length. It calculates the number of positions where the characters -differ between the two strings. Essentially, it indicates the minimum -number of substitutions required to transform one string into the other. -For cgMLST with PhyloTrace, hashes, i.e. 64-bit words, organized in an -array represent the allelic profile. The positions of the array elements -correspond to the loci in the scheme and the hash represents the allele -sequence for the respective locus. This allelic profile is generated -during the typing process. Thus, for pairwise comparison of the allelic -profile of two isolates, the total number of discrepant alleles result -in the allelic distance value. Comparing a selection of isolates results -in a distance matrix (see 4.4 Distance -Matrix), which are then used to compute a tree (see 6 Visualization).
-If no variant could be assigned for some genes contained in the -scheme, NA values are be placed in the allelic profile for the -respective position of the gene/locus. This can happen either if the -corresponding gene is not found in the assembly sequence, if there are -multiple hits or when the variant in the assembly is non-coding (refer -to 3.3 Variant Assignment).
-In order to showcase how allelic distances are calculated for
-isolates with missing values, we set up an example. For simplicity
-reasons we consider just three isolates, Isolate 1
,
-Isolate 2
and Isolate 3
with three loci only,
-Locus A
, Locus B
and Locus C
. For
-Isolate 1
let Locus A
have variant 1,
-Locus B
a missing value NA and Locus
-C
variant 1. For Isolate 2
let Locus
-A
be a missing value NA, Locus B
-variant 1 and Locus C
variant 1. For
-Isolate 3
let Locus A
be 2, Locus
-B
also 2 and Locus C
1.
allelic_profile <- data.frame(A = c(1, NA, 2), B = c(NA, 1, 2), C = c(1, 1, 1),
- row.names = c("Isolate 1", "Isolate 2", "Isolate 3"))
-allelic_profile
-## A B C
-## Isolate 1 1 NA 1
-## Isolate 2 NA 1 1
-## Isolate 3 2 2 1
-Option 1: Ignore missing values for pairwise -comparison
-Selecting the first option as missing value handling strategy, will -have NA’s ignored in the pairwise comparison between two isolates. -Unlike Option 2, only single missing values are ignored, not the entire -locus.
-# Option 1
-
-hamming.distIgnore <- function(x, y) {
- sum( (x != y) & !is.na(x) & !is.na(y) )
-}
-
-proxy::dist(allelic_profile, method = hamming.distIgnore)
-## Isolate 1 Isolate 2
-## Isolate 2 0
-## Isolate 3 1 1
-The pair isolate 1 & 2, each have an NA for one of the first two
-loci A
and B
with the third locus
-C
being identical. Their allelic distance is 0,
-hence these two isolates are considered identical in their allelic
-profile. The two other pairs Isolate 1 & 3 as well as 2 & 3 both
-result in an allelic distance of 1.
Option 2: Omit loci with missing values for all -assemblies
-If the second option is selected, loci containing at least one -missing value, will be ignored for the calculation of allelic distances. -Unlike Option 1, the loci with missing values are entirely omitted for -all pairwise comparisons. Even if an isolate pair might both have valid -variant numbers for a locus, it is not included in the analysis if the -locus contains just one NA for another isolate. For the missing -value statistics shown in Figure 10 [5.5 Missing -Values], 41 loci, displayed as columns in the missing value table, would -not be considered for the distance calculation. For this option the -respective loci are filtered out from the allelic profile before -applying the distance computation. Because of the potential to skew the -whole picture with this option, choosing it is only recommended if there -are very few afflicted loci with missing values.
-# Option 2
-
-hamming.distOmit <- function(x, y) {
- sum(x != y)
-}
-
-allelic_profile_noNA <- select(allelic_profile, -A, -B)
-
-proxy::dist(allelic_profile_noNA, method = hamming.distOmit)
-## Isolate 1 Isolate 2
-## Isolate 2 0
-## Isolate 3 0 0
-Locus A
and B
are omitted before
-calculating the distance. This leads to all isolates being considered
-identical with an allelic distance of 0, because they all carry
-variant 1 for the only remaining locus C
.
Option 3: Treat missing values as allele variant
-The third option is rather specific and, considering the consequences -for subsequent calculation of allelic distances and analyses, should be -used with caution. Here, NA values are treated as if they were -a separate variant.
-# Option 3
-
-hamming.distCategory <- function(x, y) {
- sum((x != y | xor(is.na(x), is.na(y))) & !(is.na(x) & is.na(y)))
-}
-
-proxy::dist(allelic_profile, method = hamming.distCategory)
-## Isolate 1 Isolate 2
-## Isolate 2 2
-## Isolate 3 2 2
-Due to both NA’s being considered a further valid variant. -All isolate pairs receive an allelic distance of 2.
-Depending on the options for NA handling applied to these two allelic -profiles, the result of the allelic distance will be different. The -results of these example calculations are summarized in the table -below.
--Pair - | --Option 1 - | --Option 2 - | --Option 3 - | -
---|---|---|---|
-Isolate 1 & 2 - | --0 - | --0 - | --2 - | -
-Isolate 1 & 3 - | --1 - | --0 - | --2 - | -
-Isolate 2 & 3 - | --1 - | --0 - | --2 - | -
The > Database Browser
tab allows to
-examine and manage information saved in the local database of the
-selected scheme. It is divided in the
->> Browse Entries
,
->> Scheme Info,>> Loci Info
,
->> Distance Matrix
and
->> Missing Values
tabs.
Each assembly that has been successfully typed is added to the table
-in >> Browse Entries
. This overview
-allows to edit (see 4.1.1 Edit Meta Data),
-delete (see 4.1.3 Delete Entries), inspect
-(see 4.1.4 Browse the Allelic
-Profile) and add (see 4.1.2 Custom
-Variables) information connected with the entries. The table can
-also be downloaded (see 4.1.5 Download
-Entry Table). The table contains both, the meta data and the allelic
-profile for each entry. The meta data as well as custom variables (see
-4.1.2 Custom Variables) appear first on
-the left part of the table, while the allelic profile with the assigned
-variants is positioned on the right part of the table (see 4.1.4 Browse the Allelic
-Profile). The Index
automatically assigns a number to
-each entry and is eventually updated if entries are deleted (see 4.1.3 Delete Entries). The
-Include
status decides over the inclusion or exclusion of
-the respective entry for further analyses, such as Visualization (see 6 Visualization).
The basic meta data comprising Assembly Name
,
-Isolation Date
, Host
, Country
and
-City
can be edited in the entry table by left-clicking in
-the corresponding field. As soon as changes are detected, a pulsating
-button appears, that saves the changes on click. If you decide
-otherwise, press the Undo
button and go back to the
-previous state. Assembly ID
is the name of the isolate in
-the Isolate directory of the local database and can’t be
-changed. The Index
number as well as the assigned hashes
-representing the allele variants in the allelic profile also can’t be
-edited because it would vitiate the analysis.
There is also the option to add custom variables using the controls
-in the >> Browse Entries
sidebar.
-Choose a name for the variable and press the green +
button
-to add it. In the dialogue window select the variable type, categorical
-(character) or continuous (numeric). After confirmation the variable is
-ready to be filled with values. These can be changed in retrospect in
-the same way as basic meta data (see 4.1.1
-Edit Meta Data). Note, that the database needs to be saved,
-otherwise the custom variables are not permanently added. The custom
-variable type and name can’t be changed in retrospect, but they can be
-deleted by selecting them from the drop-down menu in the sidebar and
-clicking the red -
button. If more than five custom
-variables are present, a table summarizing them is displayed in the
-sidebar.
The Delete Entries panel on the top right corner of the
->> Browse Entries
tab allows to
-delete single or multiple entries at once. Select one or multiple
-entries to be deleted according to their Index
in the
-drop-down menu. Clicking the red x
button will open a
-dialogue window, prompting for confirmation about the intention to
-irreversibly delete the selection. The deletion will lead to a complete
-removal of the respective entry together with all the meta data, custom
-variable values and allelic profile. However, if the database is not
-saved after the deletion, it will appear again in the next session or
-could also be undone with the Undo
button in the same
-session. Note, that if you select all entries
-for deletion, confirmation will immediately and irreversibly empty the
-database for the currently selected scheme and you will
-not have the option to undo this action.
Scrolling the entry table to the right will reveal the allelic
-profile. The variant numbers for each allele/locus are sorted
-column-wise for each entry. By default, only the first 20 loci are
-displayed. Its possible to manually change, which loci are shown by
-selecting or deselecting them in the Compare Loci panel on the
-right below the Delete Entries panel. The respectively assigned
-hash, representing distinct allele sequences is truncated to the first
-and last four digits. Locus columns, containing at least one entry with
-an allele variant that is different from the others, are highlighted in
-green.
If the Only Varying Loci
option is activated, only loci
-with differing variants (i.e. the columns highlighted in red) are
-displayed. For missing variant values, i.e. if no variant could be
-allocated to a locus (see 3.4.1
-Missing Value Handling), the corresponding cell appears empty.
The entry table can be downloaded as CSV file. There are two options
-to control this output. As the user sometimes might choose to only
-include a subset of entries in a current analysis, there is the option
-to include only the entries of interest in the output file. Activate the
-switch Only included Entries
to include only the entries
-that are checkmarked in the Include
column. Control the
-Include
status either by checking or unchecking the
-checkboxes in the Include
column or select or unselect all
-at once by using the buttons on the top-left of the entry table. Note
-that the database has to be saved for the changes to take effect. The
-Index
of the entries marked as included are highlighted in
-green and exclusively selected to be considered in visualization (see 6 Visualization). Moreover you can choose if
-and which loci should be included in the download. By default only the
-meta data and custom variables of the entries are included in the csv
-file. If you activate the switch Include Displayed Loci
,
-the currently displayed loci are included as well. Use the control in
-the Compare Loci
box, to decide which and how many loci are
-displayed. Upon clicking the Download
button you can choose
-to which location on your system the file should be saved.
The tab >> Scheme Info
allows to
-inspect the properties of the currently selected scheme. The table
-displays information regarding the cgMLST scheme downloaded from the cgMLST.org Nomenclature Server (h25).
-It comprises the name of the scheme, the version, the seed genome, genus
-and species, the number of loci included, the complex type distance and
-count parameters, the date of the most recent changes, the official
-curators, publications addressing this scheme as well as the accessory
-scheme.
The overview in the tab
->> Loci Info
provides information on
-the loci included in the scheme as well as the distribution of alleles
-among isolates present in the local database. The table allows to browse
-the Locus ID (e.g. BP0001, BP2483), if known the gene identifier
-(e.g. glpK, pykA), the position of the loci in the seed genome, the
-length in nucleotides (e.g. 1233), the gene product (e.g. pyruvate
-kinase, chromosome partitioning protein) as well as the number of
-variants included in the base scheme. There is the option to filter the
-table by keywords or numbers. Note, that this applies to all attributes,
-so searching for “566” would result in the display of loci having an ID
-that includes this number (e.g. “BP0056”, “BP0566”, “BP1566”, etc.), or
-position (e.g. 317566, 1255669), length (e.g. 1566) and every other
-attribute containing the keywords or numbers.
Selecting a locus from the table will render alleles present in the
-database and their respective DNA sequence. Browse alleles by choosing
-them from the selector showing the respective frequency of the selected
-allele in the database. The sequence can be copied to the clipboard. A
-FASTA file comprising all hashed allele sequences from the currently
-selected locus can be exported with Save FASTA
. To export
-the table with metadata of all loci included in the scheme, click the
-download button right next to the header Loci at the top.
The tab >> Distance Matrix
shows
-a heatmap matrix of the allelic distances between the entries. For
-details on how the allelic distances are derived refer to 3.4 Calculation of Allelic
-Distance. For each pair of entries, the sum of allele variants that
-are not identical, i.e. allelic distance, is displayed in the respective
-cell. Here the choice, how missing values, i.e. entries having
-unsucessfull variant allocations for some loci, can have both small and
-big impact for the values and depends on different parameters (see 3.4.1 Missing Value Handling). In
-addition to the visualization with tree plots, changes in the missing
-value handling strategy can be directly observed in this overview. The
-readability of the matrix is enhanced by a heatmap. The values contained
-are normalized resulting in a color gradient from light green to dark
-red. The lowest value, which is always 0 in the diagonal (allelic
-distance of the same entry logically is zero), is highlighted in light
-green. The highest value (dark red) varies and depends on the highest
-allelic distance value in the matrix.
There is the option to change the appearance of the matrix. Choose
-whether Assembly Name
, Assembly ID
or
-Index
is displayed as column or row headers. As sometimes
-the focus might be centered on the subset of entries that are marked as
-included ion the entry table, the switch
-Only Included Entries
can be toggled to show only this
-selection. Also the display of the diagonal line and the upper triangle
-can be activated or deactivated using the switches
-Show Diagonal
and Show Upper Triangle
-respectively. The distance matrix can be downloaded as CSV file. Note,
-that the matrix is downloaded as currently displayed, including all the
-changes made to the appearance (e.g. with or without diagonal or
-Index
instead of Assembly Name
as header).
Missing values occur if a locus can not be found in the assembly or
-if the present allele contains mutations leading to a dysfunctional
-gene. As long as no entry in the local database has any missing values,
-the >> Missing Values
tab is not
-displayed. When adding a new entry with NA value(s) to the
-local database, containing no missing values so far, reloading the
-database will automatically have the
->> Missing Values
tab render, to
-call attention on the newly occurring missing values. This tab provides
-statistical information about the occurrence of missing values, and most
-importantly: control buttons for the user, to select the strategy how
-missing values are treated for subsequent analyses. The selection how
-these values should be handled directly impacts the calculation of the
-allelic distances between the bacterial isolates. The options to choose
-from are detailed in 3.4.1 Missing
-Value Handling. Due to the importance of missing values and how they
-are treated, upon loading local databases containing at least one
-missing value, the >> Missing Values
-tab will always be rendered first.
Figure 14 shows statistics about the missing values -of the entries in this database. There are 1069 unsuccessful allele -allocations in total, i.e. the global sum of NA values of all -entries and loci. There are 2983 loci in total in the selected -Bordetella pertussis scheme and 217 of these have one or more -missing values, which makes up about 7.3 %. Isolates for which more than -5% of loci contain missing values are highlighted in orange. These -should be included in further analyses with caution because a -significant share of alleles couldn’t be determined.
-Each row in the table on the right shows an entry that contains at
-least one missing value. The next column, Errors
,
-respectively includes the sum of missing values for that isolate. The
-following columns are loci including at least one missing value (denoted
-by NA
).
Screening for species-specific genes of interest, e.g. antibiotics
-resistance, virulence or stress genes, can be performed using the
-integrated NCBI/AMRFinder
-tool. The tab > Resistance Profile
-provides the interface for this feature and lets users inspect the
-screening results in
->> Browse Entries
and perform the
-screening from the tab >> Screrning
.
-Note, that not every species is available for screening with AMRFinder.
-The availability for the currently selected scheme is automatically
-checked.
Use the tab >> Screening
to run
-AMRFinder. Selecting one or multiple isolates and clicking
-Start
initiates the process. The runtime is estimated less
-than a minute per isolate. Only isolates for which the respective
-assembly file is present in the local database can be applied to gene
-screening. The results can be inspected in parallel using the selector
-on the right, appearing once at least one isolate finalized the
-screening. Feedback on unsuccessful typing attempts is displayed as
-well.
There are two viewing modes available to browse the resistance
-profile, resulting from gene screening. Selecting the view mode
-☑ Picker | ☐ Table
renders the option to select isolates
-from a simple selector. The table showing the resistance profile
-(including also virulence genes, stress genes, etc.) for the selected
-isolate will appear below . The view mode
-☐ Picker | ☑ Table
, shows the isolate entry table above the
-resistance profile instead of the selector and therefore, next to
-providing a good overview, enables filtering and sorting. Select an
-entry from the table to render the respective resistance profile for
-this isolate. The currently selected table can be exported as CSV with
-Profile Table
.
Based on the allelic distances in the distance matrix (see 4.4 Distance Matrix), different tree plots
-can be created. PhyloTrace allows to choose between three different tree
-construction algorithms, Minimum-Spanning
,
-Neighbour-Joining
and UPGMA
. This tree type
-can be selected in the sidebar of the
-> Visualization
tab (see Figure
-17). On click of the Create Tree
button, a tree
-plot of the currently selected tree type will be computed and displayed.
-You can switch to a different tree type and create another tree without
-losing the tree created before. If you switch back to the previous tree
-type, you will still have the previously created tree. Unless you create
-a new tree for the same tree type, the plot will be conserved in the
-current session. Switching between different tree types enables to
-seamlessly compare trees created with the same data set, but different
-tree construction algorithms. Changes for the entry table in the
->> Browse Entries
tab, such as
-inclusion of additional isolates (via ticking Include
) or
-edited variables, will only take effect in the tree plot, if you save
-the database with the changes and click Create Tree
again.
-Once a tree has been created, it can be modified and customized without
-having to reload it again.
The minimum-spanning-tree (MST) algorithm constructs a tree by -connecting the closest points or nodes of the distance matrix without -forming cycles. It focuses on finding the shortest path to connect all -the nodes, resulting in a tree that minimizes the total edge length. -Refer to 6.1.1 MST Modification to find -out, how the tree appearance can be modified. The nodes represent single -bacterial isolates. Isolates with identical allelic profile, i.e. a -distance value of 0, are summarized in a single node. If the allelic -distance between isolates lies within a certain threshold, clusters are -drawn.
-Figure 18 shows the modification panels for MST
-plots. These are divided into Layout (see 6.1.1.1 Layout), Nodes (see 6.1.1.2 Nodes) and Edges (see 6.1.1.3 Edges). There are several options to customize
-MST graphs, e.g. colors, forms, sizes, titles, labels, and more. Note,
-that due to the nature of the generation of MST plots, the plot is reset
-to its initial position, when changing one of the modification
-parameters. MST graphs can be enriched with information by mapping
-variables. to the plot.
The Layout control panel allows to add title, subtitle and
-footer to the graph by typing them in the text fields. Individually
-change the color for them using the color button below the text fields.
-Also the overall background color can be modified. Toggle the
-Transparent
switch, to make the background transparent.
The Nodes control panel allows to control the appearance of -the nodes and related elements such as the label. The upper left -controls are related to the label, i.e. which isolates are represented -by the respective node. Using the drop-down menu, the label can be -changed to any variable present for the respective isolates according to -the entry table. The color of the node labels can be modified using the -color button and their sizes by clicking on the blue menu button right -next to it. The color of the nodes themselves can be changed using the -color button from the control panels on the upper right. Clicking the -menu buttons allows to change the opacity.
-Node colors can also be used to map a variable to the graph. Nodes -are colored according to the value present for the respective isolates -and transformed in a pie chart to show the distribution of values if -there are several clonal isolates summarized in a single node. Currently -only variables of categorical type can be used in this feature.
-The node size can be controlled from the bottom left controls. The
-size of nodes containing multiple isolates with identical allelic
-profiles, can be scaled by the number of isolates contained in them.
-Toggle the Scale by Duplicates
switch to activate this
-feature. Consequently, the slider to set the node size changes to a
-range selection instead of distinct values. In this way, the size of the
-smallest nodes, i.e. containing just one isolate, the size of the nodes
-containing most isolates as well as the overall range can be
-controlled.
The form of the nodes can be customized using the control panels on
-the bottom right. Activate the switch Show Shadows
to
-display shadows for the nodes. The shape of the nodes can be changed
-here as well. Choose between shapes that render the node labels below
-(Diamond
, Hexagon
, Dot
,
-Square
) or inside them (Circle
,
-Box
, Text
). If a variable is mapped, the form
-Pie Chart
is locked in and can not be changed.
The Edges control panel allows to control the appearance of the edges
-and related elements. Each edge is labelled by the value of the allelic
-distance that the isolates from connecting nodes have to each other.
-Except its appearace, this label currently can’t be changed. The color
-and size can be modified using the upper left controls Label.
-The color of the edges themselves can be controlled by Color in
-the upper right controls. Click the menu button to see the control for
-the transparency of the edges. On the bottom left, there is the option
-to scale the edge lengths by the allelic distance they represent. Toggle
-the Scale Edge Length
switch to activate this effect. The
-multiplier of this effect can be customized using the slider below.
-Activating this option when the subset of isolates displayed in the MST
-graph has very different allelic distances, e.g. for a maximum of 200
-and a minimum of 10, can lead to an untidy look of the plot. Drag the
-slider to lower values to minimize this issue.
The clustering controls are to be found in the Edges panel at the -bottom right. By default the “Complex Type Distance” value disclosed for -each scheme available on the cgMLST.org Nomenclature Server is selected -as the current cluster threshold. The threshold value can be modified to -any desired value. Nodes with distances that lie withing the selected -threshold are accordingly engulfed by cluster shapes. These are -differently colored in order to distinguish between the cluster groups. -Choose between the Rainbow and Viridis scales to -modify the coloring. There are two types of cluster shapes available: -Area and Skeleton. The cluster type Area -renders an area surrounding nodes that are part of a cluster. Skeleton -instead uses the edges to visualize clusters. This can be particularly -useful if the selection of isolates is complex, which can potentially -lead to overlapping clusters with the Area cluster type.
-The Neighbour-Joining (NJ) method constructs a tree by iteratively -joining pairs of nodes based on their pairwise distances. It aims to -minimize the total branch length in the tree and is commonly used for -constructing phylogenetic trees from distance matrices. Refer to 6.4 NJ and UPGMA Modification for -information on how the tree appearance can be modified.
-The Unweighted Pair Group Method with Arithmetic Mean (UPGMA) -computes tree plots by grouping the most similar sequences or taxa -together at each step and then averaging their distances. It produces a -tree with equal branch lengths and is often used for hierarchical -clustering of data. Refer to 6.4 NJ -and UPGMA Modification for information on how the tree appearance -can be modified.
-The tree elements can be customized in great detail and supplemented
-with additional information such as variables (see 6.4.4 Variable Mapping). However the basic
-appearance, e.g. text and element sizes, are automatically adjusted to
-the qualities and quantities of the entries that were selected to be
-included for the tree. Due to the variable nature of different data
-sets, it is sometimes required to manually readjust some elements to
-receive a balanced look. While Minimum-Spanning trees have slightly
-different modification features and control inputs, NJ and UPGMA trees
-share the same control inputs. This is due to the different
-visualization technique used for the creation and display of MST plots.
-The controls to modify the tree are arranged in panels and divided in
-Layout
, Label
, Elements
and
-Variables
. In some panels you will find small menu buttons
-(highlighted in light blue). They allow to further modify the elements
-addressed by the respective panel in more detail (e.g. position or
-font-style).
The appearance of the general layout can be modified in detail. There
-is a range of different options, e.g. for controlling theme, colors,
-title & subtitle, size, legend and other elements. To switch to
-these controls navigate to the
->> Visualization
tab and click the
-Layout
button from the menu left to the control panels.
-
Layout themes allow to change the geometrical appearance. You can -choose from a selection of themes that are further categorized in linear -and circular layouts. While the visual look changes when switching -between linear and circular theme, the quality, i.e. the order and -arrangement, of the hierarchical NJ and UPGMA trees, stays the same.
-Linear: Rectangular
,
-Roundrect
, Slanted
and
-Ellipse
Circular: Circular
,
-Inward
Moreover, a Rootedge
can be added by turning on the
-switch. The root of the tree can be considered as starting point,
-representing a theoretical “common ancestor” with an initial allelic
-profile, from which all other isolates developed. Next to aesthetics,
-displaying this element can help to distinguish “normal” branches,
-representing actual allelic distance between the isolates, from the
-root. The root menu lets you further modify it’s length and line
-type.
The Ladderize
switch is turned on by default. It sorts
-the tree branches by their length.
The color of lines, text as well as background, can be modified in
-the Color panel. The colored buttons show the color currently
-displayed as well as the respective HEXA code. Clicking them opens the
-color menu. You can either select a color by choosing it directly from
-the gradient field or by providing a HEXA or RGBA code. Note, that the
-Lines/Text color applies to the tree branches, legend text and
-title, but not to the tip labels. Their color can be modified in the
-respective Label
menu (see 6.4.2.1
-Tips).
Add title and subtitle in the Title panel. Their color -changes in accordance to the selected Lines/Text color, but can be -separately modified. The title menu allows to customize the font -size.
-The Sizing panel provides control of plot dimensions and
-position. For the aspect ratio, you can choose from 16:10
,
-16:9
and 4:3
. The overall size can be scaled
-with the slider below. If some elements are cut off you can zoom out
-using the slider at the bottom. Especially trees having a circular
-layout can sometimes appear small with too much white space around. In
-this case zooming in might be beneficial. The Sizing menu
-allows to horizontally and vertically position the content.
Legend and tree scale controls share the same panel. The tree scale -helps to estimate the actual allelic distance, represented by the branch -length. In case you prefer not to show this element you can hide it by -toggling the switch. It’s length can be changed in the tree scale menu -and proportionally scales with the branch length. If the scale -superimposes other elements, adjust its position by dragging the sliders -in the menu.
-If variables are mapped to the plot, a legend will appear. For the
-orientation, the options are either horizontal or vertical (see
-Figure 22). The legend menu allows to also adjust
-position and size.
The Label
menu allows to control whether and how certain
-labels are displayed. There are three different kinds of labels:
-Tips, Branches and Custom Labels. They can be
-modified in many different ways, e.g. in color, size or position.
-
The label at the tips represent the actual entries with their allelic
-profile that determined their position in the tree. By default, the
-Assembly Name
is displayed as tip label. However it is
-possible to select other basic variables, e.g. Host
,
-Country
, City
or Isolation Date
,
-from the drop down menu, or even choose not to show tip labels at all by
-toggling the Show switch. Instead of the tip labels being positioned
-right next to the tips, they can be aligned to the right by activating
-the Align
switch. UPGMA trees always have the tip labels
-aligned and NJ trees only have this activated by default for circular
-layouts. The menu on the right provides further customization options.
-The Opacity
slider can be used to change the transparency.
-The Position
parameter modifies the offset of the labels
-from the tip. Angle, size and font face can be changed as well.
-Customize the color of the label text with the color button and the
-color of the panel with the color button below. The panels envelope the
-tip label and are not shown by default. The controls in the panel menu
-allow to modify size of the panels (not the text itself) and to smooth
-the form.
Branch labels allow to supplement the tree with additional
-information by labelling the branch leading to the final tips with
-variables that are connected to the respective isolate. To show this
-element toggle the Show
switch in the Branches
-panel. The drop down lets you choose which variable or meta data to
-annotate. The color of the panel surrounding the branch label can be
-changed with the color button below. The menu button includes further
-controls, e.g. opacity, size, horizontal and vertical position, font
-face as well as edge smoothing. Note, that having branch labels doesn’t
-work for trees with circular layout. Also more complex linear trees with
-many isolates included mostly have too confined space for adding branch
-labels. Instead, consider mapping a variable to other tree elements such
-as tip points (see 6.4.4 Variable
-Mapping).
If there is a need for labels somewhere other than tips or branches,
-there is the option to create customized ones. The panel
-Custom Labels
lets you define the label. Click the green
-+
button too add it. The label will be positioned at plot
-center. Create more labels by giving them a name and adding them again.
-To change the size and position, select the respective label from the
-drop down and open the menu next to the +
button. Do the
-desired changes and click the Apply
button for them to come
-into effect. Figure 25 shows a tree with two
-highlighted clades (see 6.4.3.5 Clade
-Highlight). The custom label function was used to annotate them.
-
The Elements
menu provides control over several special
-elements such as tip and node points or a heatmap. These are not
-essential but can amplify the explanatory power of the tree. Elements
-can be deactivated or activated and their appearance can be changed.
-
Tip points are located at the end of the tree branches and correspond
-to the isolates displayed. They can be modified in color or size to
-bring the ends of the tree into prominence. Alternatively this element
-can be used to map a variable (see 6.4.4
-Variable Mapping).
Node points, in contrast to tip points, solely represent theoretical -predecessors and relatives with respect to the isolates and their -allelic profile. Despite the option to map a variable, their look can be -customized in the same way like tip points. Mapping variables is not -possible because they connect several isolates which may potentially -have discrepant values for a chosen variable.
-Tiles are supplementary elements that can be used to map variables to
-the plot. They work with both circular and linear layouts. Up to five
-different tiles can be added by activating them in the
-Variables
menu (see 6.4.4
-Variable Mapping). To modify opacity, width or position, select the
-respective tiles that you wish to change with the selector at the top
-left corner of the panel. Any modifications will apply only for the
-selected tile. Opacity defines the transparency of the tile, enabling
-overlaying it e.g. over the tree. The width slider controls the width.
-Changing the position of the tiles for linear layouts, they are moved
-horizontally, while in circular layouts they are moved inwards or
-outwards in relation to the center of the circle.
Heatmaps can be a powerful tool to visualize related variables of the
-same type (either categorical or continuous). For more details refer to
-6.4.4 Variable Mapping. If the heatmap
-is activated in the Variables
menu it can be modified using
-the respective control panel in the Elements
menu. Width
-changes apply to the heatmap overall, not to single columns. Just as
-with tiles, the position control is moving the heatmap horizontally for
-linear layouts and inwards or outwards for circular layouts. In some
-situations, e.g. for long variable names or in circular layouts, it
-might make sense to modify the angle and/or position of the column
-headers. This can be done by using the controls in the heatmap menu.
-
Isolates are grouped in distinct hierarchical clades, which are
-defined by nodes that comprise several isolates or other daughter nodes
-and their respective isolates. In order to emphasize one or several
-clades toggle the Node View switch and inspect the respective node index
-of the clades you wish to highlight. Select the nodes in the drop-down
-menu below and deactivate the Node View again to see the highlighted
-clades. If only one clade is highlighted there is the option to
-customize its color with the color button below. If there is more than
-one clade selected, you choose from a color scale instead. Also use the
-menu in the Clade Highlight control panel to control the
-alignment of the clade highlights to each other. The borders of the
-colored squares can be modified to round or rectangular appearance.
-
Clades, which are located within another clade that is higher in the
-hierarchy can also be highlighted (see Figure 31).
-
Mapping variables, representing epidemiologic metadata or other
-properties of the isolates displayed, is a powerful way of enriching the
-plot with information. The Variables
menu provides full
-control which variables are mapped, the elements they are mapped to and
-the color scale that represents the different values of the selected
-variable. The control panel is ordered into Element, Variable and Color
-Scale columns (see Figure 32). The switches in the
-Element column can be turned on or off to activate or deactivate the
-display of a variable with the respective element. Select the variable
-to be mapped from the drop-down menu right of the element switch. It
-contains the basic meta data (Isolation Date
,
-Host
, City
, Country
) as well as
-the manually added custom variables (see 4.1.2 Custom Variables). The currently
-selected variable is checked for its number of distinct values and
-variable type (categorical or continuous). As this information is
-relevant for selecting the color scale, it is displayed directly next to
-the color scale selection menus. For categorical variables, the
-selectable color scales automatically change depending on the number of
-distinct values. If the number of distinct values is 7 or less you can
-select from qualitative color scales. As there is a limited number of
-distinct colors available in the qualitative color scales, they are not
-selectable if the variable exceeds 7 distinct values. Instead, gradient
-color scales can be selected from. Continuous variables have continuous
-and divergent color scales available. Using the colorblind friendly
-gradients Viridis and Cividis is recommended.
-Divergent color scales are useful for visualizing data where there’s a
-clear central point of interest to highlight positive and negative
-deviations from a central value like 0. An example for a use case are
-gene expression variables. E.g. fold change values, with colors
-indicating whether the change is positive (upregulation) or negative
-(downregulation) relative to a baseline expression level of 0 (no
-change).
In Figure 33 the Isolation Date
-variable is mapped to the tip label color (see 6.4.2.1
-Tips). Hence the tip labels indicate both the
-Assembly Name
and the Isolation Date
, with the
-Greys color scale highlighting more recently added isolates in
-darker shades. The tip point color is assigned to display the
-categorical City
variable in which the sample was acquired.
-In this example with two values only, the cities Graz and
-Vienna. The qualitative scale Set2 is chosen to
-distinguish the variable as well as possible from other variables. The
-tip point shapes circle and triangle represent the host from which the
-bacterial sample was taken. As the variable values are represented by
-shapes instead of colors, there is no color scale for this option.
-Continuous values can’t be represented by shapes. There are six
-different shapes available, hence selecting the tip point shape to
-represent categorical variables is only possible if there are 6 or less
-distinct values. The custom variables Patient Age
and
-ftsA
, which stands for expression values of the
-ftsA gene, are mapped to Tile 1 and 2 respectively. Except
-color values which are assign by the variable mapping, the appearance of
-the elements, such as tip point sizes, can still be modified (e.g. 6.4.3.1 Tip Points).
Figure 35 shows an example for gene expression fold
-changes mapped on a heatmap. While white/yellow colors indicates
-baseline expression levels around 0, green colors indicate upregulation
-and red colors downregulation. When a diverging scale is selected, you
-can choose the midpoint of the scale (Zero
,
-Mean
or Median
) using the drop down menu that
-appears right to the color scale selector. Zero
assigns the
-middle color of the diverging color scale to the value 0. The choices
-Mean
and Median
assign the middle color to the
-arithmetic mean and median of the respective value range. The appearance
-of the heatmap, such as width and position, can be modified using the
-respective control panel from the Elements
menu (see 6.4.3.4 Heatmap).
Neighbour-Joining and UPGMA trees can be downloaded in PNG, JPEG, BMP
-and SVG format. Minimum-Spanning trees can be downloaded in PNG, JPEG
-and BMP format. In addition they can be downloaded as HTML to preserve
-the interactivity of dragging, zooming and moving the MST graph. To
-initiate the download head to the
-> Visualization
tab. In the sidebar,
-below the Create Tree
button, you find the drop down to
-select the file type as well as the download button right next to it.
-Note: In order for the download to work, the plots have to be created
-first.
A report of HTML format can be created by clicking the button
-Print Report
, located in the sidebar of the
-> Visualization
tab. There are several
-options to control which information is included in the report. The
-elements are categorized in Entry Table
,
-General
, Analysis
and
-Attach Plot
(see 7.2.1 Report
-Elements). Note that the report requires prior creation of a tree
-plot. The entry table in the report, the attached plot as well as some
-analysis parameter, such as the tree algorithm, are all settled in the
-moment a tree is created. Therefore a proper report can only be
-generated after tree creation. For the entry table, instead of the
-entire local database for the respective scheme, only isolates of
-interest, i.e. the ones that have been used to generate the currently
-displayed tree are listed in the report. The download will be directed
-to the system location set in your browser download settings.
-
The sub-elements belonging to General
are
-Date
, Operator
, Institute
and
-Comment
. If you wish to include only a selection of these
-elements, tick or untick them accordingly. Unticking the
-General
element will deactivate the display of any
-sub-elements as well.
Ticking the Isolate Table
prints the entry table,
-comprising the isolate names as well as the selected metadata columns on
-the report. Note, that only entries that are marked as
-Included
in the database
-(>> Browse Entries
) are printed.
-Hence only isolates that are shown in the current tree are included.
The sub-elements belonging to the Analysis
parameters
-are Scheme
, Tree
, Distance
,
-NA Handling
and Version
. These parameters are
-automatically derived from the session as well as the created tree and
-can only be selected to be shown or hidden. As with the
-General
parameters, unticking
-Analysis Parameter
will hide all sub-elements.