Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TASK-5388 - Add AlphaMissense predictions #698

Open
wants to merge 115 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
278fdbd
lib: download polygenic scores (PGS), #TASK-5406, #TASK-5387
jtarraga Dec 18, 2023
d327c47
lib: update the build command to support PGS data, #TASK-5407, #TASK-…
jtarraga Dec 19, 2023
fd6ac92
lib: copy PGS version file from download to generated json directory,…
jtarraga Dec 19, 2023
53aa886
lib: update the load command to support PGS data, #TASK-5410, #TASK-5387
jtarraga Dec 19, 2023
ecfee15
lib: update CellBase downloader to take into account the AlphaMissens…
jtarraga Dec 20, 2023
967d4cc
lib: update builder to build AlphaMissense predictions, #TASK-5419, #…
jtarraga Dec 21, 2023
345cd1c
lib: update PGS builder according to PGS data models changes, #TASK-…
jtarraga Dec 22, 2023
4f44661
lib: update PGS builder according to Pubmed data models changes, #TA…
jtarraga Dec 22, 2023
436f534
lib: add PGS manager and adaptor, #TASK-5411, #TASK-5387
jtarraga Dec 22, 2023
314786d
lib: add support for annotating using PGS, #TASK-5411, #TASK-5387
jtarraga Jan 2, 2024
b0aeca4
lib: add PGS data for variant annotation, #TASK-5411, #TASK-5387
jtarraga Jan 3, 2024
c6c628b
lib: add MongoDB indexes for PGS collections, #TASK-5410, #TASK-5387
jtarraga Jan 3, 2024
e079931
lib: update AlphaMissenseBuilder according to biodata changes, #TASK-…
jtarraga Jan 5, 2024
0aa0349
lib: improve AlphaMissenseBuilder by skipping incomplete lines from d…
jtarraga Jan 5, 2024
b06cc8a
lib: update RevelScoreBuilder according to biodata changes, #TASK-544…
jtarraga Jan 5, 2024
c43436a
lib: fix chromosome name and trancript ID in AlphaMissense builder, #…
jtarraga Jan 9, 2024
fa99029
app: minor improvements in AlphaMissense downloader, #TASK-5419, #TAS…
jtarraga Jan 9, 2024
351efe0
lib: add exception to AbstractDownloadManager, #TASK-5419, #TASK-5388
jtarraga Jan 9, 2024
ecf1d59
lib: add indexes to the collection protein_substitution_predictions, …
jtarraga Jan 9, 2024
53c433b
lib: add the alphamissenseVersion.json content into the data release …
jtarraga Jan 10, 2024
ea02c6f
app: update Perl script to generate sift/polyphen data according to t…
jtarraga Jan 10, 2024
9229576
app: update Perl script to generate JSON files for the sift and polyp…
jtarraga Jan 10, 2024
2e764a4
lib: update loader for Revel data, #TASK-5441, #TASK-5388
jtarraga Jan 11, 2024
cacfd78
lib: update loader for sift/polyphen data, #TASK-5442, #TASK-5388
jtarraga Jan 11, 2024
4167282
lib: fix Perl script to download sift/polyphen data, #TASK-5442, #TAS…
jtarraga Jan 12, 2024
bbabc8d
app: update exporter for protein substitution predictions (sift, poly…
jtarraga Jan 12, 2024
a4bbb37
app: update protein manager and DB adaptor to retreive protein substi…
jtarraga Jan 12, 2024
e3e9e21
lib: update Revel downloader, #TASK-5441, #TASK-5388
jtarraga Jan 17, 2024
a090053
lib: update Revel builder, #TASK-5441, #TASK-5388
jtarraga Jan 17, 2024
6346c97
download: gwas catalog fixes
imedina Jan 21, 2024
d343315
lib: take into account the NumberFormatException, #TASK-5407, #TASK-5387
jtarraga Jan 29, 2024
268f591
lib: improve PGS builder, #TASK-5407, #TASK-5387
jtarraga Feb 2, 2024
5972c15
lib: remove System.out in PGS builder, #TASK-5407, #TASK-5387
jtarraga Feb 2, 2024
d9ad898
lib: filter contigs (only chromosomes) when processing PGS scores, #T…
jtarraga Feb 7, 2024
d3ad8c9
lib: minor changes, #TASK-5407, #TASK-5387
jtarraga Feb 7, 2024
19368e6
Merge branch 'TASK-5392' into TASK-5564
imedina Mar 1, 2024
89264c2
Update configuration.yml
imedina Mar 1, 2024
2be5f21
core: update pubmed URLs in the configuration file, #TASK-5775, #TASK…
jtarraga Mar 7, 2024
fe05795
core: update pubmed version in the configuration file, #TASK-5775, #T…
jtarraga Mar 7, 2024
50f7008
core: improve Ontology downloader, #TASK-5775, #TASK-5564
jtarraga Mar 7, 2024
a8a9328
lib: take into account PubMed version from config file, and fix sonna…
jtarraga Mar 7, 2024
f84734e
lib: improve clinvar and gwas downloader by removing hardcode filenam…
jtarraga Mar 7, 2024
1a5ba4a
core: update clinvar version in config file, #TASK-5775, #TASK-5564
jtarraga Mar 7, 2024
4cdd046
lib: improve gene downloader by updating versions from config file, a…
jtarraga Mar 8, 2024
3cea3f3
lib: improve repeat downloader by updating versions from config file,…
jtarraga Mar 8, 2024
f308f25
lib: improve conservation downloader by updating versions from config…
jtarraga Mar 22, 2024
2e6e895
lib: update regulation download manager, and the configuration file, …
jtarraga Mar 28, 2024
a6688d0
lib: update configuration file; and create version files for COSMIC a…
jtarraga Apr 3, 2024
c2345d4
lib: update CellBase builder for clinical variants, #TASK-5776, #TASK…
jtarraga Apr 3, 2024
df0f1e0
lib: fix Gwas Catalog builder for clinical variants, #TASK-5776, #TAS…
jtarraga Apr 4, 2024
d4cba15
lib: refactor code by changing the DownloadProperties.URLProperties, …
jtarraga Apr 5, 2024
a3e9684
lib: update CellBase downloaders according to the DownloadProperties.…
jtarraga Apr 11, 2024
c7ad55d
Rename get file name method
imedina Apr 11, 2024
e92b676
lib: update CellBase downloaders according to the DownloadProperties.…
jtarraga Apr 11, 2024
281fb22
Resolve conflicts, #TASK-5564
jtarraga Apr 11, 2024
e18506b
lib: update CellBase downloaders, #TASK-5775, #TASK-5564
jtarraga Apr 12, 2024
69a58bf
core: update CellBase configuration file, #TASK-5775, #TASK-5564
jtarraga Apr 15, 2024
d4e0cd6
lib: update MANE Select downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
6ee2f78
lib: update LRG, HGNC, Cancer HotSpot, DGIDB, Gene Uniprot Xref, Gene…
jtarraga Apr 18, 2024
d794ceb
lib: update RefSeq downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
1b751de
lib: update missense scores (REVEL) downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
b635333
lib: update CADD and clinical variant downloaders, #TASK-5775, #TASK-…
jtarraga Apr 18, 2024
106b96d
lib: update protein downloaders, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
55afe6b
lib: update gene downloader (specially for ensembl data), and improve…
jtarraga Apr 19, 2024
d81b68f
Merge branch 'TASK-5564' into TASK-5387
jtarraga Apr 19, 2024
88c2b17
core: add Ensembl primary fasta URL into the configuration file for t…
jtarraga Apr 22, 2024
eee13e3
lib: update genome download manager by declaring and using constants …
jtarraga Apr 22, 2024
cd367b9
app: update genome builder by using constants from the class EtlCommo…
jtarraga Apr 22, 2024
ce6f8d5
app: fix sonnar issues in BuildCommandExecutor, #TASK-5564
jtarraga Apr 22, 2024
3566e01
app: improve log/exception messages in DownloadCommandExecutor, #TASK…
jtarraga Apr 22, 2024
cd94452
app: update repeats builder, and improve log/exception messages, #TAS…
jtarraga Apr 22, 2024
148814f
lib: update the repeats builder by removing the hardcoded filenames a…
jtarraga Apr 22, 2024
30a4c87
lib: update conservation builder by removing the hardcoded filenames …
jtarraga Apr 22, 2024
85e17db
lib: call bigWigToBedGraph to convert the GERP bigwig to bed graph fi…
jtarraga Apr 23, 2024
0223cb5
lib: include log messages, #TASK-5564
jtarraga Apr 23, 2024
833c337
lib: improve ProteinBuilder by removing hardcoded file names, adding …
jtarraga Apr 23, 2024
01deb0c
lib: move DataSource reader from ConservationBuilder to the parent Ce…
jtarraga Apr 24, 2024
9416894
lib: move the function to split UniProt into chuncks from the protein…
jtarraga Apr 24, 2024
909c0b2
core: fix regulation URLs in the configuration file, #TASK-5775, #TAS…
jtarraga Apr 24, 2024
71d8056
lib: launch a CellBase exception if executing a command (wget, gunzip…
jtarraga Apr 24, 2024
1544824
lib: fix sonnar issues, #TASK-5775, #TASK-5564
jtarraga Apr 24, 2024
3e43874
lib: move the function to parse and build PFMs from the regulation do…
jtarraga Apr 24, 2024
959e423
core: update ontology section of the CellBase configuration since ont…
jtarraga Apr 25, 2024
158c259
lib: update ontology download since ontology versions will be taken f…
jtarraga Apr 25, 2024
0b83831
app: update the build command executor to check/copy the ontology ver…
jtarraga Apr 25, 2024
39f0f41
lib: improve the ontology builder by removing hardcoded filenames, ad…
jtarraga Apr 25, 2024
5c3dae0
lib: improve the PharmGKB downloader by moving the function to unzip …
jtarraga Apr 25, 2024
971235e
lib: improve the PharmGKB builder by adding checks and log messages; …
jtarraga Apr 25, 2024
cd444b0
lib: improve the PubMed downloader by adding log messages and fixing …
jtarraga Apr 25, 2024
e19fe73
lib: create maps to get the names, categories and version filenames f…
jtarraga Apr 26, 2024
a29afe3
lib: update according to the EtlCommons changes, #TASK-5775, #TASK-5564
jtarraga Apr 26, 2024
377ee9c
lib: improve PubMed builder by adding checks, log messages and fixing…
jtarraga Apr 26, 2024
997c8ec
lib: update CADD downloader according to last changes, #TASK-5775, #T…
jtarraga Apr 26, 2024
96078b7
lib: improve the CADD builder by adding checks, log messages, cleanin…
jtarraga Apr 26, 2024
3163a90
lib: update the REVEL downloader according to the last changes, and a…
jtarraga Apr 26, 2024
bc22fad
lib: add log messages, #TASK-5776, #TASK-5564
jtarraga Apr 29, 2024
0c9a299
lib: improve the Revel builder by fixing sonnar issues and adding che…
jtarraga Apr 29, 2024
4f9e39a
lib: update CellBase downloaders according to the last changes, #TASK…
jtarraga Apr 29, 2024
1586a77
app: update load command executor according to the EtlCommons changes…
jtarraga Apr 29, 2024
c7c398a
lib: update CellBase builders according to the EtlCommons changes, #T…
jtarraga Apr 29, 2024
754384a
lib: fix revel builder, #TASK-5776, #TASK-5564
jtarraga Apr 29, 2024
24eb091
configuration: update versions
imedina May 7, 2024
fc09da4
app: add bash script to fix the downloaded MirTarBase file, #TASK-577…
jtarraga May 7, 2024
09d33a0
core: add some comments to the configuration file, #TASK-5775, #TASK-…
jtarraga May 7, 2024
303585d
lib: update Ensembl/RefSeq indexers and builders (include major impro…
jtarraga May 7, 2024
68c47ef
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga May 7, 2024
312c654
Merge branch 'TASK-5564' into TASK-5387
jtarraga May 7, 2024
5665648
Merge branch 'TASK-5387' into TASK-5388
jtarraga May 7, 2024
a25b9c1
core: fix PGS section in the configuration file, #TASK-5406, #TASK-5387
jtarraga May 8, 2024
df05c91
app: add PGS_DATA (polygenic scores) as valid data in the CellBase bu…
jtarraga May 8, 2024
e7c2385
lib: update clinical variant downloader by moving the split ClinVar f…
jtarraga May 10, 2024
f5b7c34
lib: update clinical variant builder by including the split ClinVar f…
jtarraga May 10, 2024
a4fca6b
lib: update code to the last changes, #TASK-5564
jtarraga May 10, 2024
8598f08
Merge branch 'TASK-5564' into TASK-5387
jtarraga May 10, 2024
dec89f8
Merge branch 'TASK-5387' into TASK-5388
jtarraga May 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions cellbase-app/app/scripts/ensembl-scripts/DB_CONFIG.pm
Original file line number Diff line number Diff line change
Expand Up @@ -134,10 +134,10 @@ our $ENSEMBL_GENOMES_PORT = "4157";
our $ENSEMBL_GENOMES_USER = "anonymous";

## Vertebrates
our $HOMO_SAPIENS_CORE = "homo_sapiens_core_110_38";
our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_110_38";
our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_110_38";
our $HOMO_SAPIENS_COMPARA = "homo_sapiens_compara_110_38";
our $HOMO_SAPIENS_CORE = "homo_sapiens_core_111_38";
our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_111_38";
our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_111_38";
our $HOMO_SAPIENS_COMPARA = "homo_sapiens_compara_111_38";
#our $HOMO_SAPIENS_CORE = "homo_sapiens_core_78_38";
#our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_78_38";
#our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_78_38";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@
use Digest::MD5 qw(md5 md5_hex md5_base64);
use JSON;

#use lib "~/appl/cellbase/build/scripts/ensembl-scripts/";
#use lib "~/soft/ensembl-variation/modules/";
#use lib "~/soft/ensembl/modules/";

use DB_CONFIG;

my $species = 'Homo sapiens';
Expand Down Expand Up @@ -87,6 +91,37 @@
#}
#print join("=", $polyphen2->get_prediction(1, 'G'))."\n";

##################################################################

# Get the current time
my ($sec, $min, $hour, $mday, $mon, $year) = localtime();
# Adjust the year and month values (year is years since 1900, and month is 0-based)

$year += 1900;
$mon += 1;

# Format the date and time
my $formatted_date = sprintf("%04d%02d%02d_%02d%02d%02d", $year, $mon, $mday, $hour, $min, $sec);

my $jsonVersion = {};
$jsonVersion->{"date"} = $formatted_date;
$jsonVersion->{"data"} = "protein_substitution_predictions";
$jsonVersion->{"version"} = "Ensembl 104";
my @urls = ();
push @urls, "ensembldb.ensembl.org:3306";
$jsonVersion->{"url"} = \@urls;

print "Generating the JSON file for the Sift version.\n";
$jsonVersion->{"name"} = "sift";
open(FILE, ">".$outdir."/siftVersion.json") || die "error opening file\n";
print FILE to_json($jsonVersion) . "\n";
close(FILE);

print "Generating the JSON file for the PolyPhen version\n";
$jsonVersion->{"name"} = "polyphen";
open(FILE, ">".$outdir."/polyphenVersion.json") || die "error opening file\n";
print FILE to_json($jsonVersion) . "\n";
close(FILE);

my ($translation, $seq, $md5seq, @preds, @all_predictions);
#my @transcripts = @{$transcript_adaptor->fetch_all_by_biotype('protein_coding')};
Expand Down Expand Up @@ -126,42 +161,56 @@

## HASH ##
my $effect = {};
$effect->{"chromosome"} = $trans->seq_region_name;
$effect->{"transcriptId"} = $trans->stable_id;
$effect->{"checksum"} = $md5seq;
$effect->{"size"} = length($seq);

foreach my $u (@{ $trans->get_all_xrefs('Uniprot/SWISSPROT') }){
$effect->{"uniprotId"} = $u->display_id();
}

$effect->{"source"} = "polyphen";
my $polyphen2 = $prot_function_adaptor->fetch_polyphen_predictions_by_translation_md5($md5seq);
for(my $i=1; $i<=length($seq); $i++) {
foreach (my $j=0; $j < @aa_code; $j++) {
if(defined $polyphen2) {
if(defined $polyphen2) {
for(my $i=1; $i<=length($seq); $i++) {
$effect->{"aaPosition"} = $i;
my @scores = ();
foreach (my $j=0; $j < @aa_code; $j++) {
@preds = $polyphen2->get_prediction($i, $aa_code[$j]);
$effect->{"aaPositions"}->{$i}->{$aa_code[$j]}->{"pe"} = $effect_code{$preds[0]};
$effect->{"aaPositions"}->{$i}->{$aa_code[$j]}->{"ps"} = $preds[1];
if(defined $preds[0] || defined $preds[1]) {
push @scores, {"aaAlternate" => $aa_code[$j], "score" => $preds[1], "effect" => $preds[0]};
$effect->{"scores"} = \@scores;
}
}
if(@scores) {
print FILE to_json($effect)."\n";
}
}
}

my $sift = $prot_function_adaptor->fetch_sift_predictions_by_translation_md5($md5seq);
for(my $i=1; $i<=length($seq); $i++) {
foreach (my $j=0; $j < @aa_code; $j++) {
if(defined $sift) {
@preds = $sift->get_prediction($i, $aa_code[$j]);
$effect->{"aaPositions"}->{$i}->{$aa_code[$j]}->{"se"} = $effect_code{$preds[0]};
$effect->{"aaPositions"}->{$i}->{$aa_code[$j]}->{"ss"} = $preds[1];
}
}
}
print FILE to_json($effect)."\n";
$effect->{"source"} = "sift";
my $sift = $prot_function_adaptor->fetch_sift_predictions_by_translation_md5($md5seq);
if(defined $sift) {
for(my $i=1; $i<=length($seq); $i++) {
$effect->{"aaPosition"} = $i;
my @scores = ();
foreach (my $j=0; $j < @aa_code; $j++) {
@preds = $sift->get_prediction($i, $aa_code[$j]);
if(defined $preds[0] || defined $preds[1]) {
push @scores, {"aaAlternate" => $aa_code[$j], "score" => $preds[1], "effect" => $preds[0]};
$effect->{"scores"} = \@scores;
}
}
if(@scores) {
print FILE to_json($effect)."\n";
}
}
}
}
}
close(FILE);

## GZip output to save space in Amazon AWS
# exec("gzip prot_func_pred_chr_".$chrom->seq_region_name);
exec("gzip " . $outdir . "/prot_func_pred_chr_" . $chr->seq_region_name . ".json");
}

sub print_parameters {
Expand Down
60 changes: 60 additions & 0 deletions cellbase-app/app/scripts/mirtarbase/fix-gene-symbol.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#!/bin/bash

# The original MirTarBase hsa_MTI.xlsx contains invalid Gene Symbols in 793 lines.
# To fix it, that file has to be converted to a CSV file, i.e.: hsa_MTI.csv
#
# After converting to CSV file, we can see the errors from the original file for the Gene Symbols (column 4),
# e.g.: 06-mar:
# MIRT050267,hsa-miR-25-3p,Homo sapiens,06-mar,10299,Homo sapiens,CLASH,Functional MTI (Weak),23622248
# MIRT051174,hsa-miR-16-5p,Homo sapiens,06-mar,10299,Homo sapiens,CLASH,Functional MTI (Weak),23622248
#
# This script fix those lines and convert the column 4 for a vaild Gene Symbol:
#
# MIRT050267,hsa-miR-25-3p,Homo sapiens,MARCHF6,10299,Homo sapiens,CLASH,Functional MTI (Weak),23622248
# MIRT051174,hsa-miR-16-5p,Homo sapiens,MARCHF6,10299,Homo sapiens,CLASH,Functional MTI (Weak),23622248

# Check the parameters number
if [ "$#" -ne 1 ]; then
echo "Usage: $0 <csv_file>"
exit 1
fi

# Check CSV file
csv_file="$1"
if [ ! -f "$csv_file" ]; then
echo "CSV file '$csv_file' does not exist."
exit 1
fi

# Fix gene-symbol
while IFS=$'\t' read -r c1 c2 c3 c4 c5 c6 c7 c8 c9 || [[ -n "$c1" ]]; do
# Aplica las condiciones
if [ "$c5" = "10299" ]; then
c4="MARCHF6"
elif [ "$c5" = "51257" ]; then
c4="MARCHF2"
elif [ "$c5" = "54708" ]; then
c4="MARCHF5"
elif [ "$c5" = "54996" ]; then
c4="MTARC2"
elif [ "$c5" = "55016" ]; then
c4="MARCHF1"
elif [ "$c5" = "57574" ]; then
c4="MARCHF4"
elif [ "$c5" = "64757" ]; then
c4="MTARC1"
elif [ "$c5" = "64844" ]; then
c4="MARCHF7"
elif [ "$c5" = "92979" ]; then
c4="MARCHF9"
elif [ "$c5" = "115123" ]; then
c4="MARCHF3"
elif [ "$c5" = "220972" ]; then
c4="MARCHF8"
elif [ "$c5" = "441061" ]; then
c4="MARCHF11"
fi

# Print line
echo -e "$c1\t$c2\t$c3\t$c4\t$c5\t$c6\t$c7\t$c8\t$c9"
done < "$csv_file"
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,14 @@
import com.beust.jcommander.*;
import org.opencb.cellbase.app.cli.CliOptionsParser;
import org.opencb.cellbase.core.api.key.ApiKeyQuota;
import org.opencb.cellbase.lib.EtlCommons;

import java.util.HashMap;
import java.util.List;
import java.util.Map;

import static org.opencb.cellbase.lib.EtlCommons.*;

/**
* Created by imedina on 03/02/15.
*/
Expand Down Expand Up @@ -87,12 +90,15 @@ public class DownloadCommandOptions {
@ParametersDelegate
public SpeciesAndAssemblyCommandOptions speciesAndAssemblyOptions = speciesAndAssemblyCommandOptions;

@Parameter(names = {"-d", "--data"}, description = "Comma separated list of data to download: genome, gene, "
+ "variation, variation_functional_score, regulation, protein, conservation, "
+ "clinical_variants, repeats, svs, pubmed and 'all' to download everything", required = true, arity = 1)
@Parameter(names = {"-d", "--data"}, description = "Comma separated list of data to download: " + GENOME_DATA + "," + GENE_DATA
+ "," + VARIATION_FUNCTIONAL_SCORE_DATA + "," + REGULATION_DATA + "," + PROTEIN_DATA + "," + CONSERVATION_DATA + ","
+ CLINICAL_VARIANT_DATA + "," + REPEATS_DATA + "," + ONTOLOGY_DATA + "," + PUBMED_DATA + "," + PHARMACOGENOMICS_DATA
+ "," + PGS_DATA + "," + REVEL_DATA + "," + ALPHAMISSENSE_DATA + "; or use 'all' to download everything", required = true,
arity = 1)
public String data;

@Parameter(names = {"-o", "--outdir"}, description = "Downloaded files will be saved in this directory.", required = true, arity = 1)
@Parameter(names = {"-o", "--outdir"}, description = "Downloaded files will be saved in this directory.", required = true,
arity = 1)
public String outputDirectory;
}

Expand All @@ -102,9 +108,11 @@ public class BuildCommandOptions {
@ParametersDelegate
public CommonCommandOptions commonOptions = commonCommandOptions;

@Parameter(names = {"-d", "--data"}, description = "Comma separated list of data to build: genome, genome_info, "
+ "gene, variation, variation_functional_score, regulation, protein, ppi, conservation, drug, "
+ "clinical_variants, repeats, svs, splice_score, pubmed. 'all' builds everything.", required = true, arity = 1)
@Parameter(names = {"-d", "--data"}, description = "Comma separated list of data to build: " + GENOME_DATA + "," + GENE_DATA + ","
+ VARIATION_FUNCTIONAL_SCORE_DATA + "," + REGULATION_DATA + "," + PROTEIN_DATA + "," + CONSERVATION_DATA + ","
+ CLINICAL_VARIANT_DATA + "," + REPEATS_DATA + "," + ONTOLOGY_DATA + "," + SPLICE_SCORE_DATA + "," + PUBMED_DATA + ","
+ PHARMACOGENOMICS_DATA + "," + PGS_DATA + "," + REVEL_DATA + "," + ALPHAMISSENSE_DATA + "; or use 'all' to build"
+ " everything", required = true, arity = 1)
public String data;

@Parameter(names = {"-s", "--species"}, description = "Name of the species to be built, valid formats include 'Homo sapiens' or 'hsapiens'", required = false, arity = 1)
Expand Down Expand Up @@ -190,8 +198,9 @@ public class LoadCommandOptions {
public CommonCommandOptions commonOptions = commonCommandOptions;

@Parameter(names = {"-d", "--data"}, description = "Data model type to be loaded: genome, gene, variation,"
+ " conservation, regulation, protein, clinical_variants, repeats, regulatory_pfm, splice_score, pubmed, pharmacogenomics."
+ " 'all' loads everything", required = true, arity = 1)
+ " conservation, regulation, protein, clinical_variants, repeats, regulatory_pfm, splice_score, pubmed, pharmacogenomics,"
+ " protein_functional_prediction, missense_variation_functional_score, alphamissense; and 'all' loads everything",
required = true, arity = 1)
public String data;

@Parameter(names = {"-i", "--input"}, required = true, arity = 1,
Expand Down Expand Up @@ -237,8 +246,8 @@ public class ExportCommandOptions {
public CommonCommandOptions commonOptions = commonCommandOptions;

@Parameter(names = {"-d", "--data"}, description = "Data model type to be loaded: genome, gene, variation, "
+ "conservation, regulation, protein, clinical_variants, repeats, regulatory_pfm, splice_score, pubmed. 'all' "
+ " loads everything", required = true, arity = 1)
+ EtlCommons.PROTEIN_SUBSTITUTION_PREDICTION_DATA + ", conservation, regulation, protein, clinical_variants, repeats,"
+ " regulatory_pfm, splice_score, pubmed. 'all' export everything", required = true, arity = 1)
public String data;

@Parameter(names = {"--db", "--database"}, description = "Database name, e.g., cellbase_hsapiens_grch38_v5", required = true,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,10 @@ public static void main(String[] args) {
commandExecutor.execute();
} catch (IOException | URISyntaxException | CellBaseException e) {
commandExecutor.getLogger().error("Error: " + e.getMessage());
e.printStackTrace();
System.exit(1);
}
}
}
}

}
Loading
Loading