-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChEBI subsets #105
base: main
Are you sure you want to change the base?
ChEBI subsets #105
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
.PHONY=all chebi-subsets | ||
|
||
# All ------------------------------------------------------------------------------------------------------------------ | ||
all: chebi-subsets | ||
|
||
# Analysis ------------------------------------------------------------------------------------------------------------- | ||
input/analysis/: | ||
mkdir -p $@ | ||
|
||
output/analysis/: | ||
mkdir -p $@ | ||
|
||
# - ChEBI subsets | ||
PART_MAPPINGS=loinc_release/Loinc_2.78/AccessoryFiles/PartFile/PartRelatedCodeMapping.csv | ||
CHEBI_OWL=input/analysis/chebi.owl | ||
CHEBI_MODULE=output/analysis/chebi_module.txt | ||
CHEBI_OUT_BOT=output/analysis/chebi-subset-BOT.owl | ||
CHEBI_OUT_MIREOT=output/analysis/chebi-subset-MIREOT.owl | ||
|
||
chebi-subsets: $(CHEBI_OUT_BOT) $(CHEBI_OUT_MIREOT) | ||
|
||
input/analysis/chebi.owl.gz: | input/analysis/ | ||
wget -O $@ ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl.gz | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ontology PURLThis is more a general thing but always use PURLs for ontology download locations There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @matentzn This is a good idea. Don't we typically use Bioregistry as the central location for getting canonical PURLs? However, I don't see an There is a URI for it at Ontobee: These PURLs have the disadvantage though of pointing to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, we should always use official OBO PURLS: http://purl.obolibrary.org/obo/chebi.owl etc.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @matentzn OK that's good. I just pushed a commit and now it's using the PURL! But what I'm sort of asking is how do I know what the PURL URI is for a given ontology URI? Is there somewhere that I can search? I thought that place was BioRegistry, but that's not the case here. I suppose I could just type out http://purl.obolibrary.org/obo/MAIN_ONTOLOGY_SPELLING.owl or http://purl.obolibrary.org/obo/MAIN_ONTOLOGY_SPELLING/MAIN_ONTOLOGY_SPELLING.FILE_EXTENSION and check to see if they exist, but that's not a good UX. |
||
|
||
input/analysis/chebi.owl: input/analysis/chebi.owl.gz | ||
gunzip -c $< > $@ | ||
rm $< | ||
|
||
# todo: bug fix for label comment: Alwyas shows up as ' # ,'. Alternatively, I could just not include the label comment. | ||
$(CHEBI_MODULE): $(PART_MAPPINGS) | output/analysis/ | ||
awk -F'",' '/ebi\.ac\.uk\/chebi/ { \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extracting module via
|
||
split($$0, parts, "\""); \ | ||
for (i=1; i<=NF; i++) { \ | ||
if (parts[i] ~ /CHEBI:/) { \ | ||
id = parts[i]; \ | ||
gsub(".*CHEBI:", "http://purl.obolibrary.org/obo/CHEBI_", id); \ | ||
gsub(",.*", "", id); \ | ||
print id " # " parts[i+1] \ | ||
} \ | ||
} \ | ||
}' $< > $@ | ||
|
||
# BOT: use the SLME (Syntactic Locality Module Extractor) to extract a bottom module | ||
# - Source: https://robot.obolibrary.org/extract | ||
# - The BOT, or BOTTOM, -module contains mainly the terms in the seed, plus all their super-classes and the | ||
# inter-relations between them. The module is called BOT (or BOTTOM) because it takes a view from the BOTTOM of the | ||
# class-hierarchy upwards. Modules of this type are typically of a medium size and should be used if there is a need to | ||
# include all super-classes in the module. This is the most widely used module type - when in doubt, use this one. | ||
$(CHEBI_OUT_BOT): $(CHEBI_OWL) $(CHEBI_MODULE) | ||
robot extract --method BOT \ | ||
--input $(CHEBI_OWL) \ | ||
--term-file $(CHEBI_MODULE) \ | ||
--output $@ | ||
|
||
# MIREOT: Minimum Information to Reference an External Ontology Term | ||
# - Source: https://robot.obolibrary.org/extract | ||
# - To specify upper and lower term files, use --upper-terms and --lower-terms. The upper terms are the upper boundaries | ||
# of what will be extracted. If no upper term is specified, all terms up to the root (owl:Thing) will be returned. The | ||
# lower term (or terms) is required; this is the limit to what will be extracted, e.g. no descendants of the lower term | ||
# will be included in the result. | ||
$(CHEBI_OUT_MIREOT): $(CHEBI_OWL) $(CHEBI_MODULE) | ||
robot extract --method MIREOT \ | ||
--input $(CHEBI_OWL) \ | ||
--lower-terms $(CHEBI_MODULE) \ | ||
--output $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where to put ChEBI inputs
I have the goal of moving all of the sources somewhere in
input/
. I've decided that'll probably beinput/sources/
, or something to denote all of the inputs that actually go in to generating our outputs.As I'm not sure yet if ChEBI will be such an input, I'm putting it in
input/analysis/
.I also plan to eventually move
owl-files/
anddata/
intoinput/
, but won't do that in this PR.