Customise OBO products #2860

gouttegd · 2024-12-12T15:36:44Z

This PR add a couple of tweaks to CL’s OBO artefacts:

when a class has several rdfs:comment annotations (something that is not allowed in OBO), the comments are merged into a single annotation (requested in Addresses #2786 NTR CD57-positive enterocyte #2847, because one of the components we merge may contain automatically generated comments, which would then cause the resulting OBO files to be invalid if CL’s -edit file also happened to contain a comment for the same terms);
GCI axioms are stripped entirely (requested in [Typo/Bug]CL:0000163 CL:0000164 #2856, on the grounds that they are a needless complication -- users of OBO artefacts typically expect simple SubClassOf relationships and may not be equipped to deal with GCI axioms).

This is done by a new obo-export command recently added to the Uberon ROBOT plugin (code here for the technically inclined who would want to have a look).

Since we now use a dedicated command to produce OBO files, while we’re at it that same command also takes care of removing the owl-axioms tag (containing all the “untranslatable“ axioms, that is the axioms that cannot be represented in pure OBO). This was normally done in the ODK by a call to grep:

$(ROBOT) convert --input $< --check false -f obo $(OBO_FORMAT_OPTIONS) -o $@.tmp.obo && grep -v ^owl-axioms $@.tmp.obo > $@ && rm $@.tmp.obo

but is now done by the obo-export command directly.

Use a new command in the Uberon ROBOT plugin to produce "customized" OBO artefacts in which: * all "untranslatable" OWL axioms (owl-axioms tag) are stripped (they were already stripped before, this just changes the way we do it); * all GCI axioms are stripped (#2856); * when a class has several rdfs:comment annotations (which is not allowed in OBO), they are merged into a single annotation (#2847).

matentzn · 2024-12-12T19:20:21Z

src/ontology/cl.Makefile

 	$(ROBOT) merge -i $(SRC) \
 		 remove --base-iri $(URIBASE)/CL_ --axioms external --trim false \
-		 convert -f obo --check false -o $(TMPDIR)/cl-check.obo
+		 uberon:obo-export --merge-comments --obo-output $(TMPDIR)/cl-check.obo


oh my woooooooord - I would love a generic OBO "fixer" tool that does things like merging comments and dropping duplicate labels! This should really be in ROBOT but hey! Plugin is just as well!

I will certainly consider upstreaming that to ROBOT at some point. At the very least for the --strip-owl-axioms option (removing the untranslatable axioms), since (1) the ODK already does that systematically, so presumably everybody is already fine with that behaviour, and (2) the OWLAPI already provides the required option to enable that behaviour, it’s just that ROBOT is not using it.

matentzn · 2024-12-12T19:24:10Z

src/ontology/cl.Makefile

+	$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@
+
+$(ONT).obo: $(ONT).owl | all_robot_plugins
+	$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@


My only concerns with this approach is that we artificially break isomorphicity between owl and obo products (on content supported by OBO format). I am ok with this suggestion, but I would ask @cmungall for a review, and consider the alternative, which is that --merge-comments is applied during preprocessing (was it called SRCMERGED?). Of course, I don't have anything against --strip-gci-axioms --strip-owl-axioms and in fact, I would kinda like that in ODK over grep -v.

Makes me a bit queasy too, but round-tripping is already broken for OBO products as untranslatable OWL axioms are stripped from header. I'm not aware of any products that use the OBO GCI axioms @cmungall - please yell if I'm wrong.

They are mainly an irritation as we get frequent tickets from people confused by them. I have a slight preference that comments get merged in all products but went with this as @gouttegd indicated it was much more straightforward for the pipelines.

My only concerns with this approach is that we artificially break isomorphicity between owl and obo products (on content supported by OBO format).

As I said elsewhere: that horse has already left the barn. It left the barn at the latest when the ODK started stripping untranslatable axioms in all OBO products – a behaviour that is not configurable.

I personally couldn’t care less about that. From my point of view, the OBO format, for all its benefits, is a legacy format and people who choose to use an OBO artefact over any other format should already be aware that they may not get the full picture of what the ontology contains.

but I would ask @cmungall for a review

You may notice that this is exactly what I did. :)

consider the alternative, which is that --merge-comments is applied during preprocessing

Disagree. For two reasons.

First, merging the comments make them slightly less practical, because the annotations on those comments (cross-references) are merged as well, and there is no longer any way to know which cross-reference applies to which part of the comment. Okay, it may not be a big deal, but why should we make comments less useful in all artefacts in all formats just to accommodate the idiosyncrasies of one particular format?

Second, merging the comments early in the pipeline would not in fact be as easy as it sounds, because in this instance the comments are spread over two different files (comments from the editors in cl-edit.owl, auto-generated comments from the CellMark folks in the CLM-CL component). So we would need to merge the CLM-CL component first. OK, we could do that in preprocessing, but then we would need to leave that component out of $(OTHER_SRC) (basically stop treating as an ODK “component”, and more as a completely special file) -- otherwise it would be merged again later, along with its original comments.

And no, using $(SRCMERGED) is of no use, because $(SRCMERGED) is never actually used to build any release artefacts. It is used for some checks and for extracting the import seeds, nothing more.

I did in fact consider the alternative and try merging the comments early in the pipeline, and soon concluded that it was way too much of a hassle. Feel free to have a go at it if you want, but my position is that late merging specifically for the OBO products is the option that is the least disruptive to the existing pipelines.

As I said elsewhere: that horse has already left the barn. It left the barn at the latest when the ODK started stripping untranslatable axioms in all OBO products – a behaviour that is not configurable.

Note my emphasis on "isomorphic (on content supported by OBO format)". Deleting axioms that cant be rendered in the format is a bit different than merging.

I am thoroughly convinced by your argumentation, and I want to add that "rdfs:comment" is really not a thing we need to concern ourselves with too much - merge-comments is probably better than "drop random one", which is the alternative if the goal is "legal OBO".

Just for future: I would argue strongly against merging key properties like label and definition. Hopefully this is never going to be necessary!

Thanks for the explanations!

merge-comments is probably better than "drop random one", which is the alternative if the goal is "legal OBO".

For completeness, I have implemented the "drop random one" behaviour as an option to obo-export, should someone want that behaviour.

Basically with the next version of the Uberon plugin it will be possible to choose what happens when a class has more than one comment:

merge them into a single one;

discard them all;

discard all but one (the one that is kept being chosen in a non-deterministic manner).

I would argue strongly against merging key properties like label and definition. Hopefully this is never going to be necessary!

I personally wouldn’t care (as I don’t care much about the OBO format generally), but that’s noted.

But if we ever get to the point that there is a need to merge labels and/or definitions, then something must have gone horribly wrong long before the conversion to OBO, and it’d be much better to fix the root cause than trying to hide the dust under the rug during the OBO conversion.

For completeness, I have implemented the "drop random one" behaviour as an option to obo-export, should someone want that behaviour.

Soooo coool! Great. Eventually you will need to start thinking of the scope of your robot plugin, I think it goes way beyond Uberon already! But no worries, I can rename it to "gouttegd-tools" locally if I so wish it :D

it’d be much better to fix the root cause

Exactly!

But no worries, I can rename it to "gouttegd-tools" locally if I so wish it

That is of course completely possible given the way the ROBOT plugins work, but that’s something that would be best avoided IMHO. If everybody starts renaming plugins locally according to their own wishes, it’s going to be a mess. I’d recommend that whenever a plugin is used, it is used under its original name, to minimise confusion.

The fact that a given plugin was initially named after a particular project because it was originally specifically intended for that project should not be a problem, and is not reason enough IMO to rename the plugin.

The FlyBase plugin was initially intended solely for FlyBase’s ontologies (hence its name), and now CL is also using it. Is that reason enough to rename it? I don’t think so.

Fine by me :)

matentzn

Really nice to see finally a path to OBO repair coming up! Thanks!!

dosumis · 2024-12-16T14:40:36Z

@cmungall - last opportunity to comment as we need to merge ASAP for CL release this week. Thanks!

cmungall · 2024-12-17T01:58:13Z

I think practicality and simplicity dictates that we strip the GCIs from obo format, so I approve the PR.

Non-normative comments follow:

I do think this is different from the case of stripping the owl-axioms header, this is a more surprising difference. Usually the axioms that get stripped with owl-axioms are random artefacts, imports from other ontologies trying to be more clever with rococo axioms, etc. In contrast the GCIs are usually a best attempt to capture context-dependent biology, and are documented approaches to problems like taxon-specificity of relationships (in the case of Uberon - in the particular case that prompted this in CL, I don't think they are so useful).

To solve the broader class of problems we need LPGs with simple and clear semantics. Clear ways to indicate contextual or quoted statements, with defined graph projections to OWL. The reason for the success of obo format was because people want a simple graph format for ontologies and KGs, but the direct coupling with OWL has been a mixed blessing.

gouttegd added 2 commits December 12, 2024 14:44

Commit re-generated Makefile.

594bd17

gouttegd self-assigned this Dec 12, 2024

gouttegd requested review from cmungall, dosumis and matentzn December 12, 2024 15:47

matentzn reviewed Dec 12, 2024

View reviewed changes

matentzn approved these changes Dec 13, 2024

View reviewed changes

cmungall mentioned this pull request Dec 17, 2024

[Typo/Bug]CL:0000163 CL:0000164 #2856

Closed

cmungall approved these changes Dec 17, 2024

View reviewed changes

gouttegd merged commit 1fec991 into master Dec 17, 2024
1 check passed

gouttegd deleted the customize-obo-products branch December 17, 2024 10:41

aleixpuigb mentioned this pull request Jan 7, 2025

Error during CL release with uberon:obo-export #2877

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customise OBO products #2860

Customise OBO products #2860

gouttegd commented Dec 12, 2024

matentzn Dec 12, 2024

gouttegd Dec 12, 2024

matentzn Dec 12, 2024

dosumis Dec 12, 2024 •

edited

Loading

gouttegd Dec 12, 2024 •

edited

Loading

matentzn Dec 13, 2024

gouttegd Dec 13, 2024

gouttegd Dec 13, 2024 •

edited

Loading

matentzn Dec 13, 2024

gouttegd Dec 13, 2024

matentzn Dec 13, 2024

matentzn left a comment

dosumis commented Dec 16, 2024

cmungall commented Dec 17, 2024

Customise OBO products #2860

Customise OBO products #2860

Conversation

gouttegd commented Dec 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dosumis Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

gouttegd Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gouttegd Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matentzn left a comment

Choose a reason for hiding this comment

dosumis commented Dec 16, 2024

cmungall commented Dec 17, 2024

dosumis Dec 12, 2024 •

edited

Loading

gouttegd Dec 12, 2024 •

edited

Loading

gouttegd Dec 13, 2024 •

edited

Loading