-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customise OBO products #2860
Customise OBO products #2860
Conversation
Use a new command in the Uberon ROBOT plugin to produce "customized" OBO artefacts in which: * all "untranslatable" OWL axioms (owl-axioms tag) are stripped (they were already stripped before, this just changes the way we do it); * all GCI axioms are stripped (#2856); * when a class has several rdfs:comment annotations (which is not allowed in OBO), they are merged into a single annotation (#2847).
$(ROBOT) merge -i $(SRC) \ | ||
remove --base-iri $(URIBASE)/CL_ --axioms external --trim false \ | ||
convert -f obo --check false -o $(TMPDIR)/cl-check.obo | ||
uberon:obo-export --merge-comments --obo-output $(TMPDIR)/cl-check.obo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my woooooooord - I would love a generic OBO "fixer" tool that does things like merging comments and dropping duplicate labels! This should really be in ROBOT but hey! Plugin is just as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will certainly consider upstreaming that to ROBOT at some point. At the very least for the --strip-owl-axioms
option (removing the untranslatable axioms), since (1) the ODK already does that systematically, so presumably everybody is already fine with that behaviour, and (2) the OWLAPI already provides the required option to enable that behaviour, it’s just that ROBOT is not using it.
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@ | ||
|
||
$(ONT).obo: $(ONT).owl | all_robot_plugins | ||
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concerns with this approach is that we artificially break isomorphicity between owl and obo products (on content supported by OBO format). I am ok with this suggestion, but I would ask @cmungall for a review, and consider the alternative, which is that --merge-comments
is applied during preprocessing (was it called SRCMERGED?). Of course, I don't have anything against --strip-gci-axioms --strip-owl-axioms and in fact, I would kinda like that in ODK over grep -v
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes me a bit queasy too, but round-tripping is already broken for OBO products as untranslatable OWL axioms are stripped from header. I'm not aware of any products that use the OBO GCI axioms @cmungall - please yell if I'm wrong.
They are mainly an irritation as we get frequent tickets from people confused by them. I have a slight preference that comments get merged in all products but went with this as @gouttegd indicated it was much more straightforward for the pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concerns with this approach is that we artificially break isomorphicity between owl and obo products (on content supported by OBO format).
As I said elsewhere: that horse has already left the barn. It left the barn at the latest when the ODK started stripping untranslatable axioms in all OBO products – a behaviour that is not configurable.
I personally couldn’t care less about that. From my point of view, the OBO format, for all its benefits, is a legacy format and people who choose to use an OBO artefact over any other format should already be aware that they may not get the full picture of what the ontology contains.
but I would ask @cmungall for a review
You may notice that this is exactly what I did. :)
consider the alternative, which is that --merge-comments is applied during preprocessing
Disagree. For two reasons.
First, merging the comments make them slightly less practical, because the annotations on those comments (cross-references) are merged as well, and there is no longer any way to know which cross-reference applies to which part of the comment. Okay, it may not be a big deal, but why should we make comments less useful in all artefacts in all formats just to accommodate the idiosyncrasies of one particular format?
Second, merging the comments early in the pipeline would not in fact be as easy as it sounds, because in this instance the comments are spread over two different files (comments from the editors in cl-edit.owl
, auto-generated comments from the CellMark folks in the CLM-CL component). So we would need to merge the CLM-CL component first. OK, we could do that in preprocessing, but then we would need to leave that component out of $(OTHER_SRC)
(basically stop treating as an ODK “component”, and more as a completely special file) -- otherwise it would be merged again later, along with its original comments.
And no, using $(SRCMERGED)
is of no use, because $(SRCMERGED)
is never actually used to build any release artefacts. It is used for some checks and for extracting the import seeds, nothing more.
I did in fact consider the alternative and try merging the comments early in the pipeline, and soon concluded that it was way too much of a hassle. Feel free to have a go at it if you want, but my position is that late merging specifically for the OBO products is the option that is the least disruptive to the existing pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said elsewhere: that horse has already left the barn. It left the barn at the latest when the ODK started stripping untranslatable axioms in all OBO products – a behaviour that is not configurable.
Note my emphasis on "isomorphic (on content supported by OBO format)". Deleting axioms that cant be rendered in the format is a bit different than merging.
I am thoroughly convinced by your argumentation, and I want to add that "rdfs:comment" is really not a thing we need to concern ourselves with too much - merge-comments is probably better than "drop random one", which is the alternative if the goal is "legal OBO".
Just for future: I would argue strongly against merging key properties like label and definition. Hopefully this is never going to be necessary!
Thanks for the explanations!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge-comments is probably better than "drop random one", which is the alternative if the goal is "legal OBO".
For completeness, I have implemented the "drop random one" behaviour as an option to obo-export
, should someone want that behaviour.
Basically with the next version of the Uberon plugin it will be possible to choose what happens when a class has more than one comment:
- merge them into a single one;
- discard them all;
- discard all but one (the one that is kept being chosen in a non-deterministic manner).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue strongly against merging key properties like label and definition. Hopefully this is never going to be necessary!
I personally wouldn’t care (as I don’t care much about the OBO format generally), but that’s noted.
But if we ever get to the point that there is a need to merge labels and/or definitions, then something must have gone horribly wrong long before the conversion to OBO, and it’d be much better to fix the root cause than trying to hide the dust under the rug during the OBO conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, I have implemented the "drop random one" behaviour as an option to obo-export, should someone want that behaviour.
Soooo coool! Great. Eventually you will need to start thinking of the scope of your robot plugin, I think it goes way beyond Uberon already! But no worries, I can rename it to "gouttegd-tools" locally if I so wish it :D
it’d be much better to fix the root cause
Exactly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But no worries, I can rename it to "gouttegd-tools" locally if I so wish it
That is of course completely possible given the way the ROBOT plugins work, but that’s something that would be best avoided IMHO. If everybody starts renaming plugins locally according to their own wishes, it’s going to be a mess. I’d recommend that whenever a plugin is used, it is used under its original name, to minimise confusion.
The fact that a given plugin was initially named after a particular project because it was originally specifically intended for that project should not be a problem, and is not reason enough IMO to rename the plugin.
The FlyBase plugin was initially intended solely for FlyBase’s ontologies (hence its name), and now CL is also using it. Is that reason enough to rename it? I don’t think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice to see finally a path to OBO repair coming up! Thanks!!
@cmungall - last opportunity to comment as we need to merge ASAP for CL release this week. Thanks! |
I think practicality and simplicity dictates that we strip the GCIs from obo format, so I approve the PR. Non-normative comments follow: I do think this is different from the case of stripping the owl-axioms header, this is a more surprising difference. Usually the axioms that get stripped with owl-axioms are random artefacts, imports from other ontologies trying to be more clever with rococo axioms, etc. In contrast the GCIs are usually a best attempt to capture context-dependent biology, and are documented approaches to problems like taxon-specificity of relationships (in the case of Uberon - in the particular case that prompted this in CL, I don't think they are so useful). To solve the broader class of problems we need LPGs with simple and clear semantics. Clear ways to indicate contextual or quoted statements, with defined graph projections to OWL. The reason for the success of obo format was because people want a simple graph format for ontologies and KGs, but the direct coupling with OWL has been a mixed blessing. |
This PR add a couple of tweaks to CL’s OBO artefacts:
rdfs:comment
annotations (something that is not allowed in OBO), the comments are merged into a single annotation (requested in Addresses #2786 NTR CD57-positive enterocyte #2847, because one of the components we merge may contain automatically generated comments, which would then cause the resulting OBO files to be invalid if CL’s-edit
file also happened to contain a comment for the same terms);This is done by a new
obo-export
command recently added to the Uberon ROBOT plugin (code here for the technically inclined who would want to have a look).Since we now use a dedicated command to produce OBO files, while we’re at it that same command also takes care of removing the
owl-axioms
tag (containing all the “untranslatable“ axioms, that is the axioms that cannot be represented in pure OBO). This was normally done in the ODK by a call togrep
:but is now done by the
obo-export
command directly.