Data processing lifecycle of `EDTFField`? #17

jmurty · 2017-06-05T06:42:56Z

We need to give some thought to the lifecycle of data going into the EDTFField.

Initial questions are:

when should the EDTF field be updated based on the value of the associated natural_text_field?
1. if the natural_text_field is cleared, presumable the EDTF and other derived field values should also be cleared?
2. if the natural_text_field value is invalid or cannot be turned into an EDTF, should the existing EDTF and other derived field values be cleared?
3. since parsing of natural_text_field is very slow, could we somehow cache the before-and-after values and only re-parse it when it changes?
can/should we permit direct interaction with the EDTFField?
1. the field is intended to be used only via automatic processing of the natural_text_field source field, but this is both slow and likely to lead to less precise data if there is already a known EDTF value that could be set directly (e.g. via the GLAMkit Collections API)
2. if the EDTF field is set directly and the natural_text_field is not set, should this be reverse-generated from the EDTF value?
3. what if the EDTF field is set directly, then a natural_text_field value is set subsequently. Should the EDTF value always be replaced by the parsednatural_text_field value?
4. should we permit, and do we need to detect or handle, the other derived fields like lower_fuzzy_field being set directly instead of via the EDTFField's processing

The text was updated successfully, but these errors were encountered:

cogat · 2017-06-06T02:01:56Z

Great points @jmurty

I think the upshot is we need to know for an EDTFField whether it was automatically or manually populated, a bit like Occurrences for events.

I envisage this as an associated checkbox called "update automatically", default True. Given that, my answers to your questions are:

if unchecked, never. If checked, whenever the natural_text_field is changed.
i. yes
ii. yes
iii. actually, parsing of natural text into EDTF string is fast; parsing of EDTF string into a python object is slow. I think the best bang-for-buck is to speed up the parser. I implemented the BNF grammar there are a lot of very pedantic definitions of what strings digits can specify a month, etc. I think relaxing these low-level constraints and validating them in Python will speed things up heaps. In answer to the question, comparing before-and-after EDTF strings seems reasonable. It would probably even be a speed-up to retrieve the object from the DB at pre-save and check that.
If unchecked, yes. If checked, then perhaps yes if the checkbox becomes unchecked if you interact directly with the field, otherwise no.
i. agreed
ii. in future it should be possible to derive a natural text description of an EDTF field. This is likely to be locale/client-specific though. I don't see a great need to store the description, and especially not in natural_text_field to avoid feedback loops.
iii. checkbox should remove this possibility
iv. the intention is that those fields are always derived from the EDTF field. Some collections systems have their own lower/upper bound dates. I doubt any such collections systems ALSO handle EDTF, so we can in those cases construct an EDTF date from the lower/upper bounds, which will get us the same derived dates. The difficulty arises when the lower/upper and natural dates contradict, which should be resolved by the above.

How does that sound?

- use faster pyparsing grammar constructs and arrangements to significantly speed up parsing - enable skipped parsing unit tests now that they are not infeasibly slow - add testing requirements to setup.py Anecdotal speed increase is from about 30 seconds to run the `test_date_values` tests down to below 3 seconds. See also #17

jmurty assigned cogat Jun 5, 2017

jmurty mentioned this issue Jun 6, 2017

#11 Achieve order of magnitude speedup of parser #18

Merged

cogat removed their assignment Jan 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data processing lifecycle of `EDTFField`? #17

Data processing lifecycle of `EDTFField`? #17

jmurty commented Jun 5, 2017

cogat commented Jun 6, 2017 •

edited

Loading

Data processing lifecycle of EDTFField? #17

Data processing lifecycle of EDTFField? #17

Comments

jmurty commented Jun 5, 2017

cogat commented Jun 6, 2017 • edited Loading

Data processing lifecycle of `EDTFField`? #17

Data processing lifecycle of `EDTFField`? #17

cogat commented Jun 6, 2017 •

edited

Loading