Implementation proposal: separating core data and metadata within statements #477
Replies: 8 comments 14 replies
-
See these slides for a more visual explanation of what would change (and why). |
Beta Was this translation helpful? Give feedback.
-
@rhiaro questioned offline whether the The next piece of work for @rhiaro and myself to do is to review and update the new data structure here, bringing it into line with current think across all the 0.4 feature implementation proposals. |
Beta Was this translation helpful? Give feedback.
-
@kd-ods and I discussed the We could flatten the structure either by:
While thinking of the statement, record, and record details as separate "chunks" is helpful from a conceptual model and communication point of view, we don't necessarily need to impose this complexity on the schema, if it would simplify ability to work with and process the data. I'd like to think about:
Of course in the end anyone can grab and use/discard whichever bits of the data they want, but we can think about what routes through the data we are facilitating in particular, and what the impact of that might be (eg. would leaving Unflattened version: Option 1: Option 2: |
Beta Was this translation helpful? Give feedback.
-
This and other issues linked from #487 merge discussion of the conceptual model (what are people, entities, records, etc. and how do they relate) with the data model (how is information about people, etc. expressed in JSON objects and fields). I'll focus on the conceptual model, as the data model is typically pretty straightforward if the conceptual model is robust. I've never been clear on the use cases for the many types of metadata in BODS. At base, people who care about BO information want to know about people, entities, and relationships between those. In addition to those, BODS cares about provenance – which is important, but BODS trends toward pathological attention to provenance. In 0.1, BODS included statement metadata. In RDF-land, this would be like annotating a triple with information about its publication date, author, etc. In this issue (and #475), the proposal is to add a record, to model the entry in an IT system that contains the information about people, etc. Like provenance, this is a fascinating analytical problem, that raises awesome epistemological questions – that also seems dislocated from anything a real user might desire. (Edit: #482 furthermore wants to add a declaration i.e. a collection of statements made at the same time by the same agent about the same entity) My suggestion would be – rather than finding more ways to fit more metadata into BODS – take an axe to the metadata. (1) What is the actual usage of all the metadata fields? For example, how frequently are publishers annotating statements to describe that one specific field was changed by so-and-so on this date for this reason? When a field is used, is always the same value provided? If not, do the values follow a predictable pattern? If a metadata field is unused, or its value is predictable, then it's a candidate for removal. (2) What metadata can realistically be generated? My (1) looks to real data supply to answer this question, but as BODS is a young standard (at least in terms of adoption), we can also look at prospective data supply. It's no surprise to me that #475 cuts More research can be done to understand the technical capacities of publishers, but I doubt most can reach the level of detail that BODS allows for. Maintaining a "status" field (#475) is notoriously difficult, and I really doubt many publishers will ever develop the capacity to keep it updated correctly. (3) What metadata could just as well be assumed or implied? For example, in most data standards, Similarly, a publisher might only modify information for a limited (perhaps even only one) reason, and attribute that modification to only one agent, etc. such that the information is more or less predictable – and the simple alternative here is to encourage publishers to describe (in a document) what assumptions can be made and/or what implications can be followed. TLDR: I would really try to eliminate as much metadata as possible, before looking into ways to rearrange the metadata. |
Beta Was this translation helpful? Give feedback.
-
I think "pathological attention to provenance" is probably a fair charge, @jpmckinney! We've been considering how to make the standard more approachable and @ScatteredInk has talked about us moving towards (or producing) a "BODS-lite". It would be good for us to have a live conversation about this at some point if you're interested.
So, we're going at this the other way around! With this 0.4 release of the standard we're hoping to tighten up the framework of the standard so that we can potentially lose some of the detail in future but retain the structure. It's to that end that we are (a) going to have a clearer statement of the conceptual model and (b) remodelling the schema. On the maintenance of a status field: point taken. This is definitely an instance of a trade-off between pushing work to the publisher vs to the data user. Because of the difficulty of maintaining a status field, I don't think we can make it a required field. It does allow us to highlight the importance for data users of being able to infer a timeline from data and to track the lifecycle of a record. As you say: we can "encourage publishers to describe (in a document) what assumptions can be made and/or what implications can be followed" in order to meet that need though. |
Beta Was this translation helpful? Give feedback.
-
Noting here that we are going with @rhiaro's Option 1 and removing a level of nesting by having these properties appear at the top level of Statements:
The workbook has been updated accordingly. |
Beta Was this translation helpful? Give feedback.
-
This proposal on separating core vs metadata and the other proposals are very intertwined. So - to be clear - the decision I alluded to above was simply about removing a layer of unnecessary nesting that was in the original proposal. Your questions here and elsewhere are a helpful level of scrutiny, @jpmckinney. How we eventually settle on the concepts and terminology that make it into v0.4 of BODS is a live issue. Getting from these early iterations of BODS to a v1 with robust governance around it is a journey where we are keeping pace with the wider establishment of the beneficial ownership domain. I've suggested to @StephenAbbott that we reboot discussions about how we manage and develop decision-making along the way. I know you'll be on a call with one another next week, so this might come up there. I have little working time this week, but look forward to considering the detailed issues you have raised over the next weeks. |
Beta Was this translation helpful? Give feedback.
-
Thanks to everyone who shared their thoughts here. Response to issues raised aboveThe idea of restructuring BODS statements to separate statement data from core details about a person, entity or relationship seems uncontroversial. Naming of properties has been raised: particularly the idea of using ‘subject’ to refer to the entity, person or relationship whose details are published in a statement. After some consideration, we are going to reserve ‘subject’ for referring to: (1) the subject of a relationship (as opposed to the interested party) and (2) the subject of a declaration (in both the conceptual and data model). As explained in #475, we will be introducing the concept of a record to the conceptual model. This means that ‘recordDetails’ will be the name of the property added to the schema to support the restructuring proposal. Accepted proposalThe structure as represented in this workbook will be implemented. (As implementation of BODS 0.4 proceeds we expect to keep this spreadsheet up to date with the latest implementation details.) |
Beta Was this translation helpful? Give feedback.
-
Feature name
Separating core data and metadata within statements
Feature ticket
#465
Implementation proposal status
Active
Overview
This proposal leans heavily on introducing the concept of a 'record' to our conceptual and data model. From the change over time proposal:
Once we have this concept of a record, we can push core data about entities, or people or relationships* down into a recordDetails field. All the metadata can be retained under a statementDetails field.
* See this proposal re terminology.
Detailed implementation proposal
You can see how current BODS v0.3 fields would be mapped across to this new structure (in the case of an Entity Statement) in this spreadsheet.
Beta Was this translation helpful? Give feedback.
All reactions