You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Filament – A large-scale structure in the universe, consisting of a network of galaxies and galaxy clusters interconnected by dark matter and gas.
Why Filament?
In context of life sciences most analyses are performed in a context of an organism. For example, to understand genetic variation in, say, mosquito you want to map reads against a particular Anopheles or Culex reference genome.
Note
There are exceptions to this "one reference" paradigm that include genome assembly analysis (a reference does not exist), as well as metagenomic/metatranscriptomic types of studies.
As a result in most cases analysis does not actually start in Galaxy. It starts at some external resource such as NCBI, EBI, VEuPathDb, UCSC Genome Browser and so on. From these, only UCSC has direct link to Galaxy. IN all other cases the researchers have no way of knowing that (most) analyses they need to do can be done via Galaxy.
Filament is a lightweight application that will provide access to an arbitrary set of genomic data, allow storage of additional data that does not fit into NCBI/EBI/UCSC frameworks and will allow invoking (Galaxy) workflows.
Design overview
The first prototype of the Filament framework will be built to satisfy the needs of two projects: BRC Analytics and VGP (GenomeArk).
This is a static site populated from data as described in #135. See below for explanation of each page
Organism list
Data explorer is a searchable list of organisms currently supported by a given instance of the Filament. Two views should be supported:
List view
Hierarchical view (tree)
List view
Is a simple list that contains a search pane on the left (just like the current https://brc-analytics.org or https://explore.anvilproject.org/datasets). In this list species names are unique (e.g., if a species has multiple assemblies associated with it the species is listed only once. Multiple assemblies will be visible in the species view.)
The list contains the following elements:
Checkbox to enable multiple selection
Species name
TaxId linkable to NCBI Taxonomy
# of references = the number of genomes associated with this taxon
Tags = tags will enable fine grain classification of taxa. For example: "VEuPath", "VGP", "T2T" etc...
Clicking on species name will bring the user to the "Taxa page". If the user selects multiple species by using checkboxes, this create a button "Go to Taxa page", which will point to the "Taxa page" as well.
Tree view
Tree view will provide a Hierarchical view of all data. It can be a phylogenetic tree, a treemap etc.
It should also support ability to select either a single or multiple species. A way to enable this is to allow user to click on a species and them provide an ability to add that species to a "cart".
Genomes page
Genomes page provides a detailed view on reference genomes available within a given Filament instance. A users gets to this page from the Organism view page. If a single organism was selected on Data Explorer page all genomes available for this organism are listed here. If multiple organisms are selected the page shows a list of genomes available for all selected taxa.
Each row contains the following columns:
Checkbox allowing for multiple selection
Action Buttons (see below)
Universal assembly ID (e.g., RefSeq ID)
# Scaffolds = measure of assembly quality
N50 = another measure of assembly completeness
Tags
The Action Buttons should be a configurable set of buttons:
Here the buttons are:
"Analyze" (see below)
UCSC = Link to the UCSC Genome Browser
NCBI = Link to NCBI Datasets
EBI = Link to EBI
PDN = Link to the future Pathogen Data Network site
Clicking on "Analyze" button will bring the user to the final Filament page. This page will look different depending whether the user select single or multiple species.
Species page
This page will list Galaxy workflows available on for this reference as well as additional files. For prototyping the additional files functionality we will use current data from VGPs GenomeArk. This data includes intermediate analysis datasets, haplotype assemblies, QC metrics, JBrowse2 instances generated by post-curation workflows etc.
Comparative page
Multiple species analyze page will be the key component for the comparative genomics aspect of Filament framework.
It will contain a list of genomes selected on the previous page. We still need to think through which columns this list will have. One of the columns should indicate whether the species are included into pre-computed multiple alignment generated with VGP, Zoonomia or other projects.
It will allow performing analyses that involve "multiple" species such as alignment generation, visualization in comparative browsers (such as the one developed by CGR at NCBI) and performing selection analyses with tools such as HyPhy.
The text was updated successfully, but these errors were encountered:
@nekrut How should we present species information where we have no ready assemblies? Most of the data presented in Genomes Page will be missing. Should we have an additional page with species that have genomic data and no assemblies?
Why Filament?
In context of life sciences most analyses are performed in a context of an organism. For example, to understand genetic variation in, say, mosquito you want to map reads against a particular Anopheles or Culex reference genome.
Note
There are exceptions to this "one reference" paradigm that include genome assembly analysis (a reference does not exist), as well as metagenomic/metatranscriptomic types of studies.
As a result in most cases analysis does not actually start in Galaxy. It starts at some external resource such as NCBI, EBI, VEuPathDb, UCSC Genome Browser and so on. From these, only UCSC has direct link to Galaxy. IN all other cases the researchers have no way of knowing that (most) analyses they need to do can be done via Galaxy.
Filament is a lightweight application that will provide access to an arbitrary set of genomic data, allow storage of additional data that does not fit into NCBI/EBI/UCSC frameworks and will allow invoking (Galaxy) workflows.
Design overview
The first prototype of the Filament framework will be built to satisfy the needs of two projects: BRC Analytics and VGP (GenomeArk).
This is a static site populated from data as described in #135. See below for explanation of each page
Organism list
Data explorer is a searchable list of organisms currently supported by a given instance of the Filament. Two views should be supported:
List view
Is a simple list that contains a search pane on the left (just like the current https://brc-analytics.org or https://explore.anvilproject.org/datasets). In this list species names are unique (e.g., if a species has multiple assemblies associated with it the species is listed only once. Multiple assemblies will be visible in the species view.)
The list contains the following elements:
Clicking on species name will bring the user to the "Taxa page". If the user selects multiple species by using checkboxes, this create a button "Go to Taxa page", which will point to the "Taxa page" as well.
Tree view
Tree view will provide a Hierarchical view of all data. It can be a phylogenetic tree, a treemap etc.
It should also support ability to select either a single or multiple species. A way to enable this is to allow user to click on a species and them provide an ability to add that species to a "cart".
Genomes page
Genomes page provides a detailed view on reference genomes available within a given Filament instance. A users gets to this page from the Organism view page. If a single organism was selected on Data Explorer page all genomes available for this organism are listed here. If multiple organisms are selected the page shows a list of genomes available for all selected taxa.
Each row contains the following columns:
The Action Buttons should be a configurable set of buttons:
Here the buttons are:
Clicking on "Analyze" button will bring the user to the final Filament page. This page will look different depending whether the user select single or multiple species.
Species page
This page will list Galaxy workflows available on for this reference as well as additional files. For prototyping the additional files functionality we will use current data from VGPs GenomeArk. This data includes intermediate analysis datasets, haplotype assemblies, QC metrics, JBrowse2 instances generated by post-curation workflows etc.
Comparative page
Multiple species analyze page will be the key component for the comparative genomics aspect of Filament framework.
It will contain a list of genomes selected on the previous page. We still need to think through which columns this list will have. One of the columns should indicate whether the species are included into pre-computed multiple alignment generated with VGP, Zoonomia or other projects.
It will allow performing analyses that involve "multiple" species such as alignment generation, visualization in comparative browsers (such as the one developed by CGR at NCBI) and performing selection analyses with tools such as HyPhy.
The text was updated successfully, but these errors were encountered: