Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an EnsureMappings() constraint #283

Open
adswa opened this issue Mar 3, 2023 · 3 comments
Open

Create an EnsureMappings() constraint #283

adswa opened this issue Mar 3, 2023 · 3 comments

Comments

@adswa
Copy link
Member

adswa commented Mar 3, 2023

EnsureMapping only takes a mapping with a single key-value pair. A check for required keys in meta-add would need a similar constraint that can take multiple key-value pairs.

@adswa
Copy link
Member Author

adswa commented Mar 7, 2023

I struggle to piece this together from smaller constraints. I feel like I might just miss an important piece of information to get this going:

Here's an example input:

{'type': 'file',
 'dataset_version': 'c81cf6189e2a87d3a0ad3c19e3eb6e8f3c84c121',
 'path': 'someii.txt',
 'extractor_name': 'metalad_example_file',
 'extractor_version': '0.0.1',
 'extraction_parameter': {},
 'extraction_time': 1677600832.9462137,
 'agent_name': 'Adina Wagner',
 'agent_email': '[email protected]',
 'extracted_metadata': {'@id': 'datalad:SHA1-s14--45fccab8df4dfa8e7077260ad45771bd25090bef',
  'type': 'file',
  'path': 'some.txt',
  'content_byte_size': 0,
  'comment': 'example file extractor executed at 1677600832.9461908'}}

The output needs to stay like this, but I require validation that a certain set of keys are included in the dict.
I could have EnsureTupleOf extract and return the keys from this dict, and EnsureContainsSet (proposed in #284) could validate that they indeed contain all required keys like this:


In [36]: from datalad_next.constraints import EnsureTupleOf, NoConstraint, EnsureContainsSet

In [37]: required_keys = (
    ...:         "type",
    ...:         "extractor_name",
    ...:         "extractor_version",
    ...:         "extraction_parameter",
    ...:         "extraction_time",
    ...:         "agent_name",
    ...:         "agent_email",
    ...:         "dataset_id",
    ...:         "dataset_version",
    ...:         "extracted_metadata")

In [38]: metadata = {"type": "file", "dataset_id": "fe46d177-d92a-4f0b-beac-8655e1aec75d", "dataset_version": "c81cf6189e2a
    ...: 87d3a0ad3c19e3eb6e8f3c84c121", "path": "some.txt", "extractor_name": "metalad_example_file", "extractor_version":
    ...: "0.0.1", "extraction_parameter": {}, "extraction_time": 1678198812.9966538, "agent_name": "Adina Wagner", "agent_e
    ...: mail": "[email protected]", "extracted_metadata": {"@id": "datalad:SHA1-s14--45fccab8df4dfa8e7077260ad45771
    ...: bd25090bef", "type": "file", "path": "some.txt", "content_byte_size": 0, "comment": "example file extractor execut
    ...: ed at 1678198812.9966106"}}

In [39]: EnsureContainsSet(*required_keys)(EnsureTupleOf(item_constraint=NoConstraint())(metadata))
Out[39]: 
('type',
 'dataset_id',
 'dataset_version',
 'path',
 'extractor_name',
 'extractor_version',
 'extraction_parameter',
 'extraction_time',
 'agent_name',
 'agent_email',
 'extracted_metadata')

However, at this point the value is keys-only. Do you see a way for me to perform such checks and preserve the JSON/dict structure?

@adswa
Copy link
Member Author

adswa commented Mar 7, 2023

NB: One perfectly fine outcome of this exploration could also be that a usecase like this is out of scope for the parameter constraint validation system

@adswa
Copy link
Member Author

adswa commented Mar 8, 2023

See also datalad/datalad-gooey#311

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant