Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customise OBO products #2860

Merged
merged 2 commits into from
Dec 17, 2024
Merged

Customise OBO products #2860

merged 2 commits into from
Dec 17, 2024

Conversation

gouttegd
Copy link
Collaborator

This PR add a couple of tweaks to CL’s OBO artefacts:

  • when a class has several rdfs:comment annotations (something that is not allowed in OBO), the comments are merged into a single annotation (requested in Addresses #2786 NTR CD57-positive enterocyte #2847, because one of the components we merge may contain automatically generated comments, which would then cause the resulting OBO files to be invalid if CL’s -edit file also happened to contain a comment for the same terms);
  • GCI axioms are stripped entirely (requested in [Typo/Bug]CL:0000163 CL:0000164 #2856, on the grounds that they are a needless complication -- users of OBO artefacts typically expect simple SubClassOf relationships and may not be equipped to deal with GCI axioms).

This is done by a new obo-export command recently added to the Uberon ROBOT plugin (code here for the technically inclined who would want to have a look).

Since we now use a dedicated command to produce OBO files, while we’re at it that same command also takes care of removing the owl-axioms tag (containing all the “untranslatable“ axioms, that is the axioms that cannot be represented in pure OBO). This was normally done in the ODK by a call to grep:

$(ROBOT) convert --input $< --check false -f obo $(OBO_FORMAT_OPTIONS) -o $@.tmp.obo && grep -v ^owl-axioms $@.tmp.obo > $@ && rm $@.tmp.obo

but is now done by the obo-export command directly.

Use a new command in the Uberon ROBOT plugin to produce "customized" OBO
artefacts in which:

* all "untranslatable" OWL axioms (owl-axioms tag) are stripped (they
  were already stripped before, this just changes the way we do it);
* all GCI axioms are stripped (#2856);
* when a class has several rdfs:comment annotations (which is not
  allowed in OBO), they are merged into a single annotation (#2847).
@gouttegd gouttegd self-assigned this Dec 12, 2024
$(ROBOT) merge -i $(SRC) \
remove --base-iri $(URIBASE)/CL_ --axioms external --trim false \
convert -f obo --check false -o $(TMPDIR)/cl-check.obo
uberon:obo-export --merge-comments --obo-output $(TMPDIR)/cl-check.obo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my woooooooord - I would love a generic OBO "fixer" tool that does things like merging comments and dropping duplicate labels! This should really be in ROBOT but hey! Plugin is just as well!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will certainly consider upstreaming that to ROBOT at some point. At the very least for the --strip-owl-axioms option (removing the untranslatable axioms), since (1) the ODK already does that systematically, so presumably everybody is already fine with that behaviour, and (2) the OWLAPI already provides the required option to enable that behaviour, it’s just that ROBOT is not using it.

$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@

$(ONT).obo: $(ONT).owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concerns with this approach is that we artificially break isomorphicity between owl and obo products (on content supported by OBO format). I am ok with this suggestion, but I would ask @cmungall for a review, and consider the alternative, which is that --merge-comments is applied during preprocessing (was it called SRCMERGED?). Of course, I don't have anything against --strip-gci-axioms --strip-owl-axioms and in fact, I would kinda like that in ODK over grep -v.

Copy link
Contributor

@dosumis dosumis Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me a bit queasy too, but round-tripping is already broken for OBO products as untranslatable OWL axioms are stripped from header. I'm not aware of any products that use the OBO GCI axioms @cmungall - please yell if I'm wrong.

They are mainly an irritation as we get frequent tickets from people confused by them. I have a slight preference that comments get merged in all products but went with this as @gouttegd indicated it was much more straightforward for the pipelines.

Copy link
Collaborator Author

@gouttegd gouttegd Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concerns with this approach is that we artificially break isomorphicity between owl and obo products (on content supported by OBO format).

As I said elsewhere: that horse has already left the barn. It left the barn at the latest when the ODK started stripping untranslatable axioms in all OBO products – a behaviour that is not configurable.

I personally couldn’t care less about that. From my point of view, the OBO format, for all its benefits, is a legacy format and people who choose to use an OBO artefact over any other format should already be aware that they may not get the full picture of what the ontology contains.

but I would ask @cmungall for a review

You may notice that this is exactly what I did. :)

consider the alternative, which is that --merge-comments is applied during preprocessing

Disagree. For two reasons.

First, merging the comments make them slightly less practical, because the annotations on those comments (cross-references) are merged as well, and there is no longer any way to know which cross-reference applies to which part of the comment. Okay, it may not be a big deal, but why should we make comments less useful in all artefacts in all formats just to accommodate the idiosyncrasies of one particular format?

Second, merging the comments early in the pipeline would not in fact be as easy as it sounds, because in this instance the comments are spread over two different files (comments from the editors in cl-edit.owl, auto-generated comments from the CellMark folks in the CLM-CL component). So we would need to merge the CLM-CL component first. OK, we could do that in preprocessing, but then we would need to leave that component out of $(OTHER_SRC) (basically stop treating as an ODK “component”, and more as a completely special file) -- otherwise it would be merged again later, along with its original comments.

And no, using $(SRCMERGED) is of no use, because $(SRCMERGED) is never actually used to build any release artefacts. It is used for some checks and for extracting the import seeds, nothing more.

I did in fact consider the alternative and try merging the comments early in the pipeline, and soon concluded that it was way too much of a hassle. Feel free to have a go at it if you want, but my position is that late merging specifically for the OBO products is the option that is the least disruptive to the existing pipelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said elsewhere: that horse has already left the barn. It left the barn at the latest when the ODK started stripping untranslatable axioms in all OBO products – a behaviour that is not configurable.

Note my emphasis on "isomorphic (on content supported by OBO format)". Deleting axioms that cant be rendered in the format is a bit different than merging.

I am thoroughly convinced by your argumentation, and I want to add that "rdfs:comment" is really not a thing we need to concern ourselves with too much - merge-comments is probably better than "drop random one", which is the alternative if the goal is "legal OBO".

Just for future: I would argue strongly against merging key properties like label and definition. Hopefully this is never going to be necessary!

Thanks for the explanations!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge-comments is probably better than "drop random one", which is the alternative if the goal is "legal OBO".

For completeness, I have implemented the "drop random one" behaviour as an option to obo-export, should someone want that behaviour.

Basically with the next version of the Uberon plugin it will be possible to choose what happens when a class has more than one comment:

  • merge them into a single one;
  • discard them all;
  • discard all but one (the one that is kept being chosen in a non-deterministic manner).

Copy link
Collaborator Author

@gouttegd gouttegd Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue strongly against merging key properties like label and definition. Hopefully this is never going to be necessary!

I personally wouldn’t care (as I don’t care much about the OBO format generally), but that’s noted.

But if we ever get to the point that there is a need to merge labels and/or definitions, then something must have gone horribly wrong long before the conversion to OBO, and it’d be much better to fix the root cause than trying to hide the dust under the rug during the OBO conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, I have implemented the "drop random one" behaviour as an option to obo-export, should someone want that behaviour.

Soooo coool! Great. Eventually you will need to start thinking of the scope of your robot plugin, I think it goes way beyond Uberon already! But no worries, I can rename it to "gouttegd-tools" locally if I so wish it :D

it’d be much better to fix the root cause

Exactly!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no worries, I can rename it to "gouttegd-tools" locally if I so wish it

That is of course completely possible given the way the ROBOT plugins work, but that’s something that would be best avoided IMHO. If everybody starts renaming plugins locally according to their own wishes, it’s going to be a mess. I’d recommend that whenever a plugin is used, it is used under its original name, to minimise confusion.

The fact that a given plugin was initially named after a particular project because it was originally specifically intended for that project should not be a problem, and is not reason enough IMO to rename the plugin.

The FlyBase plugin was initially intended solely for FlyBase’s ontologies (hence its name), and now CL is also using it. Is that reason enough to rename it? I don’t think so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me :)

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice to see finally a path to OBO repair coming up! Thanks!!

@dosumis
Copy link
Contributor

dosumis commented Dec 16, 2024

@cmungall - last opportunity to comment as we need to merge ASAP for CL release this week. Thanks!

@cmungall
Copy link
Member

I think practicality and simplicity dictates that we strip the GCIs from obo format, so I approve the PR.

Non-normative comments follow:

I do think this is different from the case of stripping the owl-axioms header, this is a more surprising difference. Usually the axioms that get stripped with owl-axioms are random artefacts, imports from other ontologies trying to be more clever with rococo axioms, etc. In contrast the GCIs are usually a best attempt to capture context-dependent biology, and are documented approaches to problems like taxon-specificity of relationships (in the case of Uberon - in the particular case that prompted this in CL, I don't think they are so useful).

To solve the broader class of problems we need LPGs with simple and clear semantics. Clear ways to indicate contextual or quoted statements, with defined graph projections to OWL. The reason for the success of obo format was because people want a simple graph format for ontologies and KGs, but the direct coupling with OWL has been a mixed blessing.

@gouttegd gouttegd merged commit 1fec991 into master Dec 17, 2024
1 check passed
@gouttegd gouttegd deleted the customize-obo-products branch December 17, 2024 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants