Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using SHACL as a schema language #10

Open
MFSY opened this issue Aug 5, 2018 · 2 comments
Open

Consider using SHACL as a schema language #10

MFSY opened this issue Aug 5, 2018 · 2 comments

Comments

@MFSY
Copy link

MFSY commented Aug 5, 2018

Hi guys,
This initiative is great. Describing various neuroscience datasets so that they can be easily shared and discovered over the web is a great idea.

I want to raise awareness of the INCF/Neuroshapes project and I'm looking for synergies here.

The INCF/Neuroshapes project definitely shares some of your goals (Neuroscience datasets description and exploration through REST APIs and the web) and adopts more or less the same approach with slight differences though. It uses json-ld (instead of plain json) and W3C SHACL as a schema language (instead of json schema).

The differences are not that big:

  • A json-ld document is a json document to which a context object is added to disambiguate its content. Projects like schema.org and bioschemas.org are using it a lot. When it comes to share, link and connect data over the web and through APIs, json-ld seems to be the way to go.
  • W3C SHACL is a superset of json schema in term of possible constraints even if their syntax are quite different. There is a SHACL version of schema.org and Bioschemas is using SHEX which is very close to SHACL. SHACL can operate on a json-ld document and on any RDF document.

In Neuroshapes, I'm actively working on a schema called MINDS (Minimal Information about a Neuroscience DataSet). I have a pull request (not finalised yet) in Neuroshapes showing what it may look like. The question we try to figure out with MINDS is: is there a set of minimal information one needs to know about a neuroscience datasets to reuse it (we think of subject, brain location, contributors, ...) ?

I'll be happy to join you guys during the brainhackmtl 2018 at Montreal to work on MINDS and producing SHACL schemas for the data formats you guys are working on.

Let me know if you're interested and see you during the hackathon.

@gkiar
Copy link
Member

gkiar commented Aug 5, 2018

Hey @MFSY !

Thanks for the comment, and I totally agree, that would be great. While I'm certainly aware of SHACL and JSON-LD, I must admit I've been a bit intimidated by them in my first glances, so perhaps at the hackathon you can teach me a bit about them, and help me build a parser for this data into Apine! How does that sound?

To ensure one part of this project is clear, the idea is to have datasets be easily queryable based on a schema, not to define any particular schema. I think MINDS is a fantastic use-case of this tool, and would be an awesome excuse to build in support for SHACL/JSON-LD schemas.

I'm also planning to add support for interpreting CSV headers naively this week (for @leiliew and some exciting data she has!), and I think these three forms (including the already-present JSON schema) cover 80% of schema/data definitions, so that'd be great!

Let's definitely connect at the hackathon - I'll introduce myself when you show up on Tuesday.

@MFSY
Copy link
Author

MFSY commented Aug 7, 2018

Hi @gkiar ,
Thank you for the answer. I can definitely help on SHACL. I'm interested in getting a SHACL version of BIDS as well. You can help me on that as well.

See you at the hackathon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants