-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example of the HED schema library for SCORE implementation #324
base: master
Are you sure you want to change the base?
Conversation
Before this example can be merged, this other pull request in the bids-specificartion that adds the use of an HED schema library bids-specification pull request needs to be merged first. Then the dataset_description.json can be updated to include the SCORE library schema. |
xeeg_hed_score/sub-eegSeizureTUH/ses-eeg01/eeg/sub-eegSeizureTUH_ses-eeg01_channels.tsv
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's nice to see a dataset with the MEF data format. Could you please make sure that all binary data files are zeroed out, that is, that they have 0kb?
crosslinking, this PR needs a spec PR and a following validator PR to be merged first: |
The PRs mentioned in #324 (comment) have meanwhile been merged, so I think you can address some of the reviewer comments above @tpatpa |
I made minor edits to match the latest HED-SCORE library schema. Validation is still failing because the validator is validating against a previous version of the HED-SCORE library schema. We are in the final stages before releasing the schema's latest version. Once released this will be validated and could be merged as well. |
We just released HED-SCORE V1.0.0. If you follow the link make sure to click on "Show another version" to browse. The remaining question for this BIDS-HED-SCORE example is on how to list a set of channels affected by an event. It seems like the discussion in the electrophysiology derivatives was going for a Pythonic list of strings, eg ['C3','F3'] or ['C3-F3','P3-C3'] in a separate channels column. Any preference for these examples? @sappelhoff @robertoostenveld @VisLab @guiomar @CPernet |
congrats Dora, Tal, et al!
yes, this should be a link to the specific discussion thread: gdoc discussion My preference is indeed the "pythonic list of strings" (clarified more in the above linked thread) but apart from Robert and me, nobody seems to have offered an opinion yet. |
+1 list of strings |
Could you clarify what would be meant by: ['C3-F3','P3-C3']? If it means a group of channels as ordered by the Also, given that quotes are usually not used in events files, would [C3, F3] be preferred? One further option, which perhaps the HED working group should discuss separately, is how HED might be used to define named groups of channels which could then be referred to by name. This does not preclude the need for deciding on a format for a channels column in the events files and their derivatives. |
Let me copy the discussion thread from the Gdoc here, perhaps that clarifies some things:
|
I am in favor of this format. I realize that this discussion pertains to derivatives, however, it might also be relevant to events in the main BIDS specification. Right now the BIDS specification of If If Would this new format of column value require a new |
Maybe we can use |
I agree that developing an "HED way" to deal with this problem would be very nice in addition to a "straight forward - non HED" BIDS solution.
you are raising a very important point here, thanks -- I have not considered that my proposal would introduce a new "data type" for rows in TSV files apart from those defined by We should:
In any case, we need some more people to chime in :-)
Not all JSON files have an |
One other possibility would be just to use a string for the channel list rather than an actual list. Then BIDS doesn't have to be concerned about validating necessarily, although it could. "[Cz7, O1, O2]" Downstream tools could then do their own thing to interpret a channels column, parse, and make sure that these are real channels. |
I think a
This may be similar or comparable to a case in the BIDS-connectivity-BEP, for channel-to-channel connectivity metrics where we also want to interpret a column with respect to an _channels.tsv or _electrodes.tsv file (discussing there whether this should be a sourceAtlas, but it is not an actual atlas...). |
For the format of this field a "pythonic list" has the downside of encouraging developers to do an eval which is not a safe practice. A comma separated list (or a single item) would work effectively as a representation. Brackets aren't strictly necessary as these files are a TSV. CCing @christinerogers for EEGnet |
I agree with you -- I guess I proposed it this way to distinguish |
Super, thanks @sappelhoff -- I'll raise this at the BIDS maintainers meeting, since @Andesha 's request for code-agnostic (consistent, comma-separated instead of pythonic) lists helps those making BIDS-supporting data platforms (like EEGNet which is webfacing) across modalities. Has this been discussed elsewhere in the BIDS specs, i.e. is there an implementation cost concern? cc @CPernet Does this make sense to you @arnodelorme @VisLab ? |
Discussing this with a few maintainers we settled on allowing json arrays whose only values are strings as an acceptable solution. The json specification allows json documents to start with a square bracket so any languages implementation of the json parser should be sufficient. We should also be able to represent this in the schema well enough to tip the validators off that elements of the column may need to be parsed. @effigies did I misrepresent anything here? |
Hi @rwblair , |
@rwblair nothing misrepresented, but here's a more extended version: We discussed two possible approaches:
Advantages to CSV:
Advantages to JSON arrays:
|
My apologies for the ambiguity and finality in tone of my post. This was from a chat after the maintainers meeting you presented at. I completely agree with you that it needs more discussion here and in the next maintainers meeting. |
thanks @rwblair - let's continue this detailed point in a more general/accessible place than this channel. cc @sappelhoff happy to follow your lead here. |
Looks good.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good --- but you will need to release SCORE 1.1.1 before this will validate.
Looks good to me, ok after it is validated! |
New example of the HED SCORE schema library used for event annotations in several EEG and invasive EEG datasets. Dataset validated with [email protected] and PENDING validation of HED annotations with HED python tools.
Annotations validates with example notebook (https://github.com/hed-standard/hed-examples/blob/main/hedcode/jupyter_notebooks/bids_validate_dataset_with_libraries.ipynb) and HED python tools (https://pypi.org/project/hedtools/)
[ERR] The validation on this HED string returned an error. (code: 104 - HED_ERROR) -- HED validated separately
- dataset_description.json fixed and waiting for [bids-specification pull request](bids-standard/bids-specification#1106) - units added to _channels.tsv for TUH subjects - Files truncated via [terminal](https://github.com/bids-standard/bids-examples/blob/master/CONTRIBUTING.md#why-do-we-only-host-truncated-data-with-0kb-size)
Correct annotations to match the latest prerelease library schema.
Example validated following latest schema release (HED score_1.0.0) and tools update using HED-examples jupyter notebook (https://github.com/hed-standard/hed-examples/blob/main/hedcode/jupyter_notebooks/bids/bids_validate_dataset_with_libraries.ipynb).
Array data in .tsv cells discussion is converging towards a comma-separated list (possibly wrapped delimiters list), so the example was edited accordingly - combined rows representing the same event with a list of channels in the channel column. See full discussion here: bids-standard/bids-specification#1446
Channel column in _events.tsv should match the channel name so updated _channels.tsv to accommodate 2 different references.
Adding 'SourceDatasets' per comment (bids-standard#324 (comment))
Updated example to use the HED-SCORE schema in its partnered version (score_1.1.0). - changed specified HEDVersion in dataset_description.json - removed HED-SCORE prefix (sc:) - changed column name from 'HED' to 'HED_annotations' since the use of 'HED' is only allowed as a second-level child in these files. The example was validated with bids-validator and HED jupyter notebook - BIDS data set that uses partnered schema has no HED validation errors.
Removing Other-organized-rhythm/non-interesting events (bckg) annotations since the background is considered baseline. If we end up with empty files they are removed.
corrected bids-validator error
corrected bids-validator error
Trailing " removed
Corrected mix-up with the TUH dataset stop_time and duration.
Updated example to use the HED-SCORE schema with score_1.1.1 - changed specified HEDVersion in dataset_description.json - changed event annotations to match labels used for annotating the Temple University Hospital EEG Corpus (TUEG) and their corresponding SCORE HED library schema annotations.
@tpatpa I rebased your branch on NOTE if you have uncommitted local changes on your branch, do not run the above ☝️ ... just comment here in that case :-) And in the future, it would be good if you could make a new git branch before submitting a pull request. Submitting pull requests directly from your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI fails with the following issue:
1: [ERR] The validation on this HED string returned an error. (code: 104 - HED_ERROR)
./dataset_description.json
Evidence: ERROR: [SCHEMA_LOAD_FAILED] Could not load HED schema "{"nickname":"","version":"1.1.1","library":"score","localPath":""}" from remote repository - "XMLHttpRequest is not defined". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#schema-load-failed.)
Please visit https://neurostars.org/search?q=HED_ERROR for existing conversations about this issue.
cc @tpatpa @VisLab @dorahermes
Are you also sharing the full data of this example somewhere? For example on OSF or GIN?
Are the channel names such as FP1-F7
also how they occur in the .edf
files? (in the full data, of course)
Furthermore: Why is this example called xeeg
with an x
, instead of just eeg
?
This PR will also need to add the new example to our example index. I can do that for you, however, once we have resolved the above issue.
After that, I'd be happy to merge this.
We're having a chicken and egg problem here... The HED version must be a released version, but SCORE 1.1.1 has not been released yet due to unresolved issues. @dorahermes can you review 1.1.1 and we can discuss the issues offline and move towards release. |
This PR includes examples of the HED schema library for the Standardized Computer-based Organized Reporting of EEG (SCORE) implementation in several EEG and invasive EEG datasets in the two ways to annotate HED in BIDS.
BIDS dataset was validated with [email protected] and HED annotations were separately validated with the example jupyter notebook and the hedtools that will be incorporated in the bids-validator soon.