-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChEBI subsets #105
base: main
Are you sure you want to change the base?
ChEBI subsets #105
Conversation
|
||
# input | ||
input/* | ||
!input/owl-files/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where to put ChEBI inputs
I have the goal of moving all of the sources somewhere in input/
. I've decided that'll probably be input/sources/
, or something to denote all of the inputs that actually go in to generating our outputs.
As I'm not sure yet if ChEBI will be such an input, I'm putting it in input/analysis/
.
I also plan to eventually move owl-files/
and data/
into input/
, but won't do that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makefile
@ShahimEssaid The makefile is back. I find that this will likely be the best way to do this work. Let me know if you have any thoughts otherwise.
Added an all
goal which just includes chebi-subsets
.
chebi-subsets
includes:
CHEBI_OUT_BOT=output/analysis/chebi-subset-BOT.owl
CHEBI_OUT_MIREOT=output/analysis/chebi-subset-MIREOT.owl
These goals require this as input:
CHEBI_MODULE=output/analysis/chebi_module.txt
...which queries:
PART_MAPPINGS=loinc_release/Loinc_2.78/AccessoryFiles/PartFile/PartRelatedCodeMapping.csv
Results: Google Drive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note there are more mappings to be found here: monarch-initiative/monarch-mapping-commons#35
- Add: makefile: To add goals for creating these outputs. - Update: .gitignore: To include folders for these inputs/outputs.
8307fa8
to
003d12a
Compare
|
||
# todo: bug fix for label comment: Alwyas shows up as ' # ,'. Alternatively, I could just not include the label comment. | ||
$(CHEBI_MODULE): $(PART_MAPPINGS) | output/analysis/ | ||
awk -F'",' '/ebi\.ac\.uk\/chebi/ { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracting module via awk
@ShahimEssaid For extracting the unique list of mapped ChEBI terms from the LOINC mapping CSV, I wanted to not rely on Python for this, but maybe I'll change that for a few reasons:
- Windows doesn't come with
awk
, etc. - Couldn't get it to display commented labels next to the terms.
- The
awk
command, and parsing CSV with default unix commands, is still non-trivial. It's hard to read / maintain. - Has an off-by-1 (row) error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its easy enough, write a sssom adapter and add to sssom-py? Note that you will have other mappings to process, e.g. monarch-initiative/monarch-mapping-commons#35, since the LOINC part mappings is always incomplete, so makes sense to try and convert everything to sssom first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think the SSSOM part is a separate issue, but thanks for the reminder. I think for this proof of concept a very quick pandas
func will suffice.
Are there some docs on what it means to create a "SSSOM adapter" for a given source?
I wonder if that may be non-ideal / impossible in this case, because the source mapping CSV requires download of the full LOINC release, which is also behind a license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we use these outputs?
One/both of these?
Or do we have other goals in mind?
makefile
Outdated
chebi-subsets: $(CHEBI_OUT_BOT) $(CHEBI_OUT_MIREOT) | ||
|
||
input/analysis/chebi.owl.gz: | input/analysis/ | ||
wget -O $@ ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ontology PURL
This is more a general thing but always use PURLs for ontology download locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn This is a good idea. Don't we typically use Bioregistry as the central location for getting canonical PURLs?
However, I don't see an .owl
URI there, only URIs for prefix maps.
There is a URI for it at Ontobee:
These PURLs have the disadvantage though of pointing to the .owl
, not the .owl.gz
, which I think is a better option whenever it is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should always use official OBO PURLS:
http://purl.obolibrary.org/obo/chebi.owl
http://purl.obolibrary.org/obo/chebi/chebi.owl.gz
etc..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn OK that's good. I just pushed a commit and now it's using the PURL!
But what I'm sort of asking is how do I know what the PURL URI is for a given ontology URI? Is there somewhere that I can search?
I thought that place was BioRegistry, but that's not the case here.
I suppose I could just type out http://purl.obolibrary.org/obo/MAIN_ONTOLOGY_SPELLING.owl or http://purl.obolibrary.org/obo/MAIN_ONTOLOGY_SPELLING/MAIN_ONTOLOGY_SPELLING.FILE_EXTENSION and check to see if they exist, but that's not a good UX.
- Delete: Alternative, commented out, variations of subsetting ChEBI.
6cdc025
to
957af41
Compare
- Update: Download URI: Changed to PURL
So far creates sub-hierarchy of ChEBI for only what is mapped to LOINC. Main future goal is to use to create alt LOINC hierarchy.
Changes
ChEBI subsets
Results
Google Drive