Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customise OBO products #2860

Merged
merged 2 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions src/ontology/Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# ----------------------------------------
# Makefile for cl
# Generated using ontology-development-kit
# ODK Version: v1.5.3
# ODK Version: v1.5.4
# ----------------------------------------
# IMPORTANT: DO NOT EDIT THIS FILE. To override default make goals, use cl.Makefile instead

Expand All @@ -10,7 +10,7 @@
# More information: https://github.com/INCATools/ontology-development-kit/

# Fingerprint of the configuration file when this Makefile was last generated
CONFIG_HASH= fe26dd231ab531dab5609c794de598d2a28083168916941a3276fe672cfe8be2
CONFIG_HASH= b8fef012ac2d7e3df8e8871cfa305bf6b9d5b6a56394a042bdf48a83d3d2e019


# ----------------------------------------
Expand Down Expand Up @@ -46,7 +46,7 @@ REPORT_PROFILE_OPTS =
OBO_FORMAT_OPTIONS =
SPARQL_VALIDATION_CHECKS = equivalent-classes owldef-self-reference nolabels pmid-not-dbxref obsolete-replaced_by obsolete-alt-id orcid-contributor illegal-annotation-property label-synonym-polysemy illegal-date def-not-only-xref id-format
SPARQL_EXPORTS = cl_terms cl-edges cl-synonyms cl-xrefs cl-def-xrefs
ODK_VERSION_MAKEFILE = v1.5.3
ODK_VERSION_MAKEFILE = v1.5.4

TODAY ?= $(shell date +%Y-%m-%d)
OBODATE ?= $(shell date +'%d:%m:%Y %H:%M')
Expand Down Expand Up @@ -156,7 +156,7 @@ $(ROBOT_PLUGINS_DIRECTORY)/flybase.jar:
curl -L -o $@ https://github.com/FlyBase/flybase-robot-plugin/releases/download/flybase-robot-plugin-0.2.2/flybase.jar

$(ROBOT_PLUGINS_DIRECTORY)/uberon.jar:
curl -L -o $@ https://github.com/gouttegd/uberon-robot-plugin/releases/download/uberon-robot-plugin-0.3.1/uberon.jar
curl -L -o $@ https://github.com/gouttegd/uberon-robot-plugin/releases/download/uberon-robot-plugin-0.3.3/uberon.jar


# ----------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion src/ontology/cl-odk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ robot_plugins:
- name: flybase
mirror_from: https://github.com/FlyBase/flybase-robot-plugin/releases/download/flybase-robot-plugin-0.2.2/flybase.jar
- name: uberon
mirror_from: https://github.com/gouttegd/uberon-robot-plugin/releases/download/uberon-robot-plugin-0.3.1/uberon.jar
mirror_from: https://github.com/gouttegd/uberon-robot-plugin/releases/download/uberon-robot-plugin-0.3.3/uberon.jar
subset_group:
products:
- id: BDS_subset
Expand Down
34 changes: 32 additions & 2 deletions src/ontology/cl.Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -175,10 +175,10 @@ test: $(REPORTDIR)/taxon-constraint-check.txt
# ----------------------------------------

.PHONY: obocheck
obocheck: $(SRC)
obocheck: $(SRC) | all_robot_plugins
$(ROBOT) merge -i $(SRC) \
remove --base-iri $(URIBASE)/CL_ --axioms external --trim false \
convert -f obo --check false -o $(TMPDIR)/cl-check.obo
uberon:obo-export --merge-comments --obo-output $(TMPDIR)/cl-check.obo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my woooooooord - I would love a generic OBO "fixer" tool that does things like merging comments and dropping duplicate labels! This should really be in ROBOT but hey! Plugin is just as well!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will certainly consider upstreaming that to ROBOT at some point. At the very least for the --strip-owl-axioms option (removing the untranslatable axioms), since (1) the ODK already does that systematically, so presumably everybody is already fine with that behaviour, and (2) the OWLAPI already provides the required option to enable that behaviour, it’s just that ROBOT is not using it.

fastobo-validator $(TMPDIR)/cl-check.obo

test_obsolete: $(ONT).obo
Expand Down Expand Up @@ -258,6 +258,36 @@ pattern_docs: $(ALL_PATTERN_FILES)
-o ../../docs/patterns/


# ----------------------------------------
# CUSTOM OBO OUTPUT
# ----------------------------------------
OBO_EXPORT_OPTIONS = --merge-comments --strip-gci-axioms --strip-owl-axioms

$(SUBSETDIR)/%.obo: $(SUBSETDIR)/%.owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@

$(ONT).obo: $(ONT).owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concerns with this approach is that we artificially break isomorphicity between owl and obo products (on content supported by OBO format). I am ok with this suggestion, but I would ask @cmungall for a review, and consider the alternative, which is that --merge-comments is applied during preprocessing (was it called SRCMERGED?). Of course, I don't have anything against --strip-gci-axioms --strip-owl-axioms and in fact, I would kinda like that in ODK over grep -v.

Copy link
Contributor

@dosumis dosumis Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me a bit queasy too, but round-tripping is already broken for OBO products as untranslatable OWL axioms are stripped from header. I'm not aware of any products that use the OBO GCI axioms @cmungall - please yell if I'm wrong.

They are mainly an irritation as we get frequent tickets from people confused by them. I have a slight preference that comments get merged in all products but went with this as @gouttegd indicated it was much more straightforward for the pipelines.

Copy link
Collaborator Author

@gouttegd gouttegd Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concerns with this approach is that we artificially break isomorphicity between owl and obo products (on content supported by OBO format).

As I said elsewhere: that horse has already left the barn. It left the barn at the latest when the ODK started stripping untranslatable axioms in all OBO products – a behaviour that is not configurable.

I personally couldn’t care less about that. From my point of view, the OBO format, for all its benefits, is a legacy format and people who choose to use an OBO artefact over any other format should already be aware that they may not get the full picture of what the ontology contains.

but I would ask @cmungall for a review

You may notice that this is exactly what I did. :)

consider the alternative, which is that --merge-comments is applied during preprocessing

Disagree. For two reasons.

First, merging the comments make them slightly less practical, because the annotations on those comments (cross-references) are merged as well, and there is no longer any way to know which cross-reference applies to which part of the comment. Okay, it may not be a big deal, but why should we make comments less useful in all artefacts in all formats just to accommodate the idiosyncrasies of one particular format?

Second, merging the comments early in the pipeline would not in fact be as easy as it sounds, because in this instance the comments are spread over two different files (comments from the editors in cl-edit.owl, auto-generated comments from the CellMark folks in the CLM-CL component). So we would need to merge the CLM-CL component first. OK, we could do that in preprocessing, but then we would need to leave that component out of $(OTHER_SRC) (basically stop treating as an ODK “component”, and more as a completely special file) -- otherwise it would be merged again later, along with its original comments.

And no, using $(SRCMERGED) is of no use, because $(SRCMERGED) is never actually used to build any release artefacts. It is used for some checks and for extracting the import seeds, nothing more.

I did in fact consider the alternative and try merging the comments early in the pipeline, and soon concluded that it was way too much of a hassle. Feel free to have a go at it if you want, but my position is that late merging specifically for the OBO products is the option that is the least disruptive to the existing pipelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said elsewhere: that horse has already left the barn. It left the barn at the latest when the ODK started stripping untranslatable axioms in all OBO products – a behaviour that is not configurable.

Note my emphasis on "isomorphic (on content supported by OBO format)". Deleting axioms that cant be rendered in the format is a bit different than merging.

I am thoroughly convinced by your argumentation, and I want to add that "rdfs:comment" is really not a thing we need to concern ourselves with too much - merge-comments is probably better than "drop random one", which is the alternative if the goal is "legal OBO".

Just for future: I would argue strongly against merging key properties like label and definition. Hopefully this is never going to be necessary!

Thanks for the explanations!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge-comments is probably better than "drop random one", which is the alternative if the goal is "legal OBO".

For completeness, I have implemented the "drop random one" behaviour as an option to obo-export, should someone want that behaviour.

Basically with the next version of the Uberon plugin it will be possible to choose what happens when a class has more than one comment:

  • merge them into a single one;
  • discard them all;
  • discard all but one (the one that is kept being chosen in a non-deterministic manner).

Copy link
Collaborator Author

@gouttegd gouttegd Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue strongly against merging key properties like label and definition. Hopefully this is never going to be necessary!

I personally wouldn’t care (as I don’t care much about the OBO format generally), but that’s noted.

But if we ever get to the point that there is a need to merge labels and/or definitions, then something must have gone horribly wrong long before the conversion to OBO, and it’d be much better to fix the root cause than trying to hide the dust under the rug during the OBO conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, I have implemented the "drop random one" behaviour as an option to obo-export, should someone want that behaviour.

Soooo coool! Great. Eventually you will need to start thinking of the scope of your robot plugin, I think it goes way beyond Uberon already! But no worries, I can rename it to "gouttegd-tools" locally if I so wish it :D

it’d be much better to fix the root cause

Exactly!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no worries, I can rename it to "gouttegd-tools" locally if I so wish it

That is of course completely possible given the way the ROBOT plugins work, but that’s something that would be best avoided IMHO. If everybody starts renaming plugins locally according to their own wishes, it’s going to be a mess. I’d recommend that whenever a plugin is used, it is used under its original name, to minimise confusion.

The fact that a given plugin was initially named after a particular project because it was originally specifically intended for that project should not be a problem, and is not reason enough IMO to rename the plugin.

The FlyBase plugin was initially intended solely for FlyBase’s ontologies (hence its name), and now CL is also using it. Is that reason enough to rename it? I don’t think so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me :)


$(ONT)-base.obo: $(ONT)-base.owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@

$(ONT)-full.obo: $(ONT)-full.owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@

$(ONT)-simple.obo: $(ONT)-simple.owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@

$(ONT)-basic.obo: $(ONT)-basic.owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@

$(ONT)-non-classified.obo: $(ONT)-non-classified.owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@

cl-plus.obo: cl-plus.owl | all_robot_plugins
$(ROBOT) uberon:obo-export --input $< $(OBO_EXPORT_OPTIONS) --obo-output $@


# ----------------------------------------
# DIFFS/REPORTS
# ----------------------------------------
Expand Down
Loading