Implementation proposal: separating core data and metadata within statements #477

kd-ods · 2023-03-06T16:08:52Z

kd-ods
Mar 6, 2023
Maintainer

Feature name

Separating core data and metadata within statements

Feature ticket

#465

Implementation proposal status

Active

Overview

This proposal leans heavily on introducing the concept of a 'record' to our conceptual and data model. From the change over time proposal:

[...] a declaration or snapshot consists of a set of statements about the entities, people and relationships involved in a given beneficial ownership network at a given time. The information about those entities, people and relationships may be stored in records which are updated as new statements about them are made.

Once we have this concept of a record, we can push core data about entities, or people or relationships* down into a recordDetails field. All the metadata can be retained under a statementDetails field.

* See this proposal re terminology.

Detailed implementation proposal

You can see how current BODS v0.3 fields would be mapped across to this new structure (in the case of an Entity Statement) in this spreadsheet.

kd-ods · 2023-03-16T15:45:08Z

kd-ods
Mar 16, 2023
Maintainer Author

See these slides for a more visual explanation of what would change (and why).

0 replies

kd-ods · 2023-07-10T14:00:12Z

kd-ods
Jul 10, 2023
Maintainer Author

@rhiaro questioned offline whether the record field and object were adding an unnecessary layer in the data model. The answer is 'yes'. I kept it there in case we wanted to consider grouping records into some kind of declaration object. But since that's not necessary at this point in BODS' development, let's get rid of the field and object and move everything 'up' a level.

The next piece of work for @rhiaro and myself to do is to review and update the new data structure here, bringing it into line with current think across all the 0.4 feature implementation proposals.

0 replies

rhiaro · 2023-08-14T18:34:51Z

rhiaro
Aug 14, 2023
Maintainer

@kd-ods and I discussed the record object nesting more today.

We could flatten the structure either by:

removing record, and having the record metadata (id, status, type) at the same level as the statement metadata. Or,
removing recordDetails and having the person/entity/relationship data sit alongside the record metadata.

While thinking of the statement, record, and record details as separate "chunks" is helpful from a conceptual model and communication point of view, we don't necessarily need to impose this complexity on the schema, if it would simplify ability to work with and process the data.

I'd like to think about:

if making it easy to grab just record details (without record metadata) hampers, enables or encourages particular uses of the data over others, and if that is beneficial or detrimental.
if making it easy to grab a full record (record metadata plus details, without statement metadata) hampers, enables or encourages particular uses of the data over others, and if that is beneficial or detrimental.

Of course in the end anyone can grab and use/discard whichever bits of the data they want, but we can think about what routes through the data we are facilitating in particular, and what the impact of that might be (eg. would leaving recordId sitting alongside the statement metadata make it more likely that statement source/date information is retained when data is reused? Does separating recordStatus from the record details mean that the currentness of a particular block of data becomes ambiguous if the record details are used alone?).

Unflattened version:

)

Option 1:

Option 2:

0 replies

jpmckinney · 2023-08-22T00:00:00Z

jpmckinney
Aug 22, 2023

This and other issues linked from #487 merge discussion of the conceptual model (what are people, entities, records, etc. and how do they relate) with the data model (how is information about people, etc. expressed in JSON objects and fields). I'll focus on the conceptual model, as the data model is typically pretty straightforward if the conceptual model is robust.

I've never been clear on the use cases for the many types of metadata in BODS.

At base, people who care about BO information want to know about people, entities, and relationships between those. In addition to those, BODS cares about provenance – which is important, but BODS trends toward pathological attention to provenance.

In 0.1, BODS included statement metadata. In RDF-land, this would be like annotating a triple with information about its publication date, author, etc.

In this issue (and #475), the proposal is to add a record, to model the entry in an IT system that contains the information about people, etc. Like provenance, this is a fascinating analytical problem, that raises awesome epistemological questions – that also seems dislocated from anything a real user might desire.

(Edit: #482 furthermore wants to add a declaration i.e. a collection of statements made at the same time by the same agent about the same entity)

My suggestion would be – rather than finding more ways to fit more metadata into BODS – take an axe to the metadata.

(1) What is the actual usage of all the metadata fields?

For example, how frequently are publishers annotating statements to describe that one specific field was changed by so-and-so on this date for this reason? When a field is used, is always the same value provided? If not, do the values follow a predictable pattern? If a metadata field is unused, or its value is predictable, then it's a candidate for removal.

(2) What metadata can realistically be generated?

My (1) looks to real data supply to answer this question, but as BODS is a young standard (at least in terms of adoption), we can also look at prospective data supply.

It's no surprise to me that #475 cuts replacesStatement, because most BO data providers do not track history in any way (or useful way). There are also alternatives for the use case of a user wanting to ask questions for a given point in time: the simplest alternative is for publishers to offer dated bulk downloads – and to not delete or replace them.

More research can be done to understand the technical capacities of publishers, but I doubt most can reach the level of detail that BODS allows for.

Maintaining a "status" field (#475) is notoriously difficult, and I really doubt many publishers will ever develop the capacity to keep it updated correctly.

(3) What metadata could just as well be assumed or implied?

For example, in most data standards, ~~authorship~~ the publisher is typically implied by the URL at which the data is originally available. Annotating individual statements with ~~authorship~~ the publisher's information is more a concern specific to an aggregator like the Register; it's important not to conflate the needs of the Register with those of all stakeholders.

Similarly, a publisher might only modify information for a limited (perhaps even only one) reason, and attribute that modification to only one agent, etc. such that the information is more or less predictable – and the simple alternative here is to encourage publishers to describe (in a document) what assumptions can be made and/or what implications can be followed.

TLDR: I would really try to eliminate as much metadata as possible, before looking into ways to rearrange the metadata.

0 replies

kd-ods · 2023-08-23T13:57:05Z

kd-ods
Aug 23, 2023
Maintainer Author

I think "pathological attention to provenance" is probably a fair charge, @jpmckinney!

We've been considering how to make the standard more approachable and @ScatteredInk has talked about us moving towards (or producing) a "BODS-lite". It would be good for us to have a live conversation about this at some point if you're interested.

I would really try to eliminate as much metadata as possible, before looking into ways to rearrange the metadata.

So, we're going at this the other way around! With this 0.4 release of the standard we're hoping to tighten up the framework of the standard so that we can potentially lose some of the detail in future but retain the structure. It's to that end that we are (a) going to have a clearer statement of the conceptual model and (b) remodelling the schema.

On the maintenance of a status field: point taken. This is definitely an instance of a trade-off between pushing work to the publisher vs to the data user. Because of the difficulty of maintaining a status field, I don't think we can make it a required field. It does allow us to highlight the importance for data users of being able to infer a timeline from data and to track the lifecycle of a record. As you say: we can "encourage publishers to describe (in a document) what assumptions can be made and/or what implications can be followed" in order to meet that need though.

3 replies

jpmckinney Aug 23, 2023

I am happy to have a conversation if there's interest.

My unease about going that way around is that you could end up with empty buckets, e.g. the record in the above diagrams.

As I understand, a declaration isn't serialized "first-class" i.e. it appears nowhere as its own, independent object – but is instead just a pair of optional fields on a statement. This makes it easy to remove later (if desired) without leaving an empty bucket. (That said, this approach makes it harder to add more details about declarations in the future, but hopefully this is never desired.)

I don't know the lifecycle of BO registries, but I assume it works like "Company A submitted their annual self-declaration. This supersedes all their previous self-declarations." If that's all there is to it in a majority of cases, then replacesStatement and recordStatus are unnecessary – you just find the most recent statements about an entity (assuming they are annotated with a date).

Sidenote: Why is statementDate not required? If a registry can't fill in a date, I don't think they'll have any more luck filling in replacesStatement or recordStatus. It also seems like a date is essential to determine what is true at a given time.

In the above scenario, where a BO registry is basically as complex as a filing cabinet containing self-declarations (they could even just be piled on the floor), there is no "record" in the real world. There are only declarations, and whether one is true at a given point in time is just a question of checking whether it is the most recent from that declarant (relative to that point in time).

As for the declarations themselves, in the real world, the statements they contain are perhaps held together by being part of the same stapled document. In data, they can perhaps be held together by as little as date + declarant or date + entity.

Anyway, just some thoughts on how to avoid creating structures that could end up empty later.

kd-ods Sep 15, 2023
Maintainer Author

Sidenote: Why is statementDate not required? If a registry can't fill in a date, I don't think they'll have any more luck filling in replacesStatement or recordStatus. It also seems like a date is essential to determine what is true at a given time.

I agree. We will do a wholesale review of required fields before a version 1 release.

Anyway, just some thoughts on how to avoid creating structures that could end up empty later.

Yes. This is an issue. One of the biggest challenges that we're going to face is hitting the sweet-spot on the prescriptive--to--flexible spectrum of this data standard. If it's too flexible then where's the standardisation. And if it's too prescriptive then it won't be the right 'fit' for implementers. My hunch is that we're going to have to support publication of both 'high-resolution' and 'low-resolution' pictures of beneficial ownership over time. As you point out, maintaining a recordID and status may not be realistic for some publishers, but that shouldn't mean that they can't publish valid, useful BODS data.

StephenAbbott Nov 13, 2023
Maintainer

Echoing @kd-ods' comment above, we know that we need to do a full review of required fields ahead of getting to BODS version 1.0. This was already planned in but good to emphasise the necessity.

On @jpmckinney's questions about dates, point taken about balance of complexity vs simplicity.

I wanted to share that we have seen some registries where we weren't able to get a meaningful statementDate (see Indonesia BODS mapping exercise lessons https://github.com/openownership/indonesia-bods-mapping#caveats)

kd-ods · 2023-10-30T15:16:04Z

kd-ods
Oct 30, 2023
Maintainer Author

Noting here that we are going with @rhiaro's Option 1 and removing a level of nesting by having these properties appear at the top level of Statements:

recordId
recordStatus
recordType
recordDetails

The workbook has been updated accordingly.

8 replies

kd-ods Nov 7, 2023
Maintainer Author

Simplify declarantRecord to simply declarant. "declarant" is meaningful to any reader familiar with BO. "declarantRecord" adds a little extra cognitive burden

Yes - I think declarant might be better. Though as @kathryn-ods notes, it does invite confusion, as per your next point....

Explain how declarantRecord differs from or interacts with source.assertedBy. (I'm not clear for which source types it's expected for declarantRecord to be set – whether it can be set for any or only some.)

source.assertedBy will sometimes be a lawyer or agent on behalf of the declaring entity or person. The function of the declarantRecord field is to identify the entity or person at the 'root' of the beneficial ownership network. Whereas the function of the source object is to provide information about the statement's provenance.

I'm going to make a suggestion about the naming of declarantRecord over in #482 which might help.

Conceptually, a statement is describing something, and a common term for that is a "subject". I think BODS would be easier to understand if instead of introducing a term that requires explanation ("record"), a term like "subject" were used.

'Subject' is already used for a concept crucial to the plumbing of BODS: relationship (ownership-or-control) statements have an interestedParty and a subject. Even if it were available as a term, I think 'subject' is too general to help us convey the lifecycle elements of BO information. It's going to be useful for us to say to publishers - eg - "When a person is no longer a BO of a declaring company, you will want to maintain a historical record of that fact on your system, but you should publish a statement with recordStatus 'closed'." I see 'record' as a useful concept, and any explanation required will be worthwhile because of the clarity it brings.

jpmckinney Nov 7, 2023

If we're open to renaming things, then subject in ownership-or-control statements is clearer as simply entity.

The word "subject" doesn't clearly suggest which "thing" is being referenced. If I'm investigating PEPs, then my main interest is the owners, and I might assume that they are the "subject" of a statement (after all, they are the "thing" that has agency and can control or own entities – they make for a good "subject"). In the current documentation, I had to dig around a bit, because subject is circularly defined as "The subject of an ownership or control relationship." A field name like entity would be less confusing. (Note: Entity is itself defined as "A statement identifying and describing the entity that is the subject of the ownership or control described in an ownership or control statement." It only made sense to me because I know BODS uses "entity" for "organization.")

Having freed up subject, it can now be used elsewhere, if desirable.

When a person is no longer a BO of a declaring company, you will want to maintain a historical record of that fact on your system, but you should publish a statement with recordStatus 'closed'.

I have a great deal of trouble understanding this sentence. Is "historical record" meant to mirror the use of the word "record" in BODS? If it's not precisely picking out the same concept, then using "record" colloquially will be confusing. It remains unclear whether maintaining a historical record means creating a new record (with a new ID) or publishing a new statement that refers to an existing record ID, in order to say something about that record. You would have to accompany this sentence with additional explanation.

I think in all cases, explanation is worthwhile for clarity. We want explanation regardless :) But choosing the best possible terms can also alleviate some confusion and time spent refreshing one's memory about how BODS uses a given term. We do need a concept here – we are just trying to choose the best term for that concept.

kd-ods Nov 7, 2023
Maintainer Author

choosing the best possible terms can also alleviate some confusion and time spent refreshing one's memory about how BODS uses a given term. We do need a concept here – we are just trying to choose the best term for that concept.

I agree. And I think we have to have scope for renaming things before reaching a v1 of BODS.

If we're open to renaming things, then subject in ownership-or-control statements is clearer as simply entity.

Yes, interesting suggestion.

StephenAbbott Nov 13, 2023
Maintainer

As these changes won't be dealt with in version 0.4, I've recorded suggestions and discussion on an internal tracker to return to in future, @jpmckinney

StephenAbbott Nov 13, 2023
Maintainer

To @jpmckinney's point in an earlier comment, this level of discussion and feedback is really welcome and it is on me and the Open Ownership team to make sure that the governance mechanisms for BODS are clear and robust going forward.

This is an area where we know we need to do work to move on from the existing working group/virtual meetings structure.

We've started towards this via templates for raising feature requests, planned features for development and implementation proposals, inviting more discussion and flagging some of the key features we know we want to develop for future versions of BODS.

But still need to do work on maintenance and support processes, version management, overall governance and tracking work/decisions about features.

kd-ods · 2023-10-31T13:28:01Z

kd-ods
Oct 31, 2023
Maintainer Author

I'm confused as to how this has been decided

This proposal on separating core vs metadata and the other proposals are very intertwined. So - to be clear - the decision I alluded to above was simply about removing a layer of unnecessary nesting that was in the original proposal.

Your questions here and elsewhere are a helpful level of scrutiny, @jpmckinney. How we eventually settle on the concepts and terminology that make it into v0.4 of BODS is a live issue. Getting from these early iterations of BODS to a v1 with robust governance around it is a journey where we are keeping pace with the wider establishment of the beneficial ownership domain. I've suggested to @StephenAbbott that we reboot discussions about how we manage and develop decision-making along the way. I know you'll be on a call with one another next week, so this might come up there.

I have little working time this week, but look forward to considering the detailed issues you have raised over the next weeks.

3 replies

StephenAbbott Oct 31, 2023
Maintainer

Thanks @kd-ods.

@jpmckinney Will be good to reconnect next week and to see how best we can learn from these comments and document how our current BODS governance processes need to change as we move towards having updated mechanisms and procedures

jpmckinney Oct 31, 2023

the decision I alluded to above was simply about removing a layer of unnecessary nesting that was in the original proposal

@kd-ods Aha, so by "we are going with rhiaro's Option 1", you mean "we are amending the proposal to match rhiaro's Option 1", and not "rhiaro's proposal (Option 1) is accepted" ?

@StephenAbbott Looking forward to catching up!

kd-ods Nov 7, 2023
Maintainer Author

so by "we are going with rhiaro's Option 1", you mean "we are amending the proposal to match rhiaro's Option 1"

Yes. (Though we do need to make final decisions soon.)

kd-ods · 2023-12-06T09:42:46Z

kd-ods
Dec 6, 2023
Maintainer Author

Thanks to everyone who shared their thoughts here.

Response to issues raised above

The idea of restructuring BODS statements to separate statement data from core details about a person, entity or relationship seems uncontroversial.

Naming of properties has been raised: particularly the idea of using ‘subject’ to refer to the entity, person or relationship whose details are published in a statement. After some consideration, we are going to reserve ‘subject’ for referring to: (1) the subject of a relationship (as opposed to the interested party) and (2) the subject of a declaration (in both the conceptual and data model). As explained in #475, we will be introducing the concept of a record to the conceptual model. This means that ‘recordDetails’ will be the name of the property added to the schema to support the restructuring proposal.

Accepted proposal

The structure as represented in this workbook will be implemented.

(As implementation of BODS 0.4 proceeds we expect to keep this spreadsheet up to date with the latest implementation details.)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation proposal: separating core data and metadata within statements #477

{{title}}

Replies: 8 comments 14 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Implementation proposal: separating core data and metadata within statements #477

kd-ods Mar 6, 2023 Maintainer

Feature name

Feature ticket

Implementation proposal status

Overview

Detailed implementation proposal

Replies: 8 comments · 14 replies

kd-ods Mar 16, 2023 Maintainer Author

kd-ods Jul 10, 2023 Maintainer Author

rhiaro Aug 14, 2023 Maintainer

jpmckinney Aug 22, 2023

kd-ods Aug 23, 2023 Maintainer Author

jpmckinney Aug 23, 2023

kd-ods Sep 15, 2023 Maintainer Author

StephenAbbott Nov 13, 2023 Maintainer

kd-ods Oct 30, 2023 Maintainer Author

kd-ods Nov 7, 2023 Maintainer Author

jpmckinney Nov 7, 2023

kd-ods Nov 7, 2023 Maintainer Author

StephenAbbott Nov 13, 2023 Maintainer

StephenAbbott Nov 13, 2023 Maintainer

kd-ods Oct 31, 2023 Maintainer Author

StephenAbbott Oct 31, 2023 Maintainer

jpmckinney Oct 31, 2023

kd-ods Nov 7, 2023 Maintainer Author

kd-ods Dec 6, 2023 Maintainer Author

Response to issues raised above

Accepted proposal

kd-ods
Mar 6, 2023
Maintainer

Replies: 8 comments 14 replies

kd-ods
Mar 16, 2023
Maintainer Author

kd-ods
Jul 10, 2023
Maintainer Author

rhiaro
Aug 14, 2023
Maintainer

jpmckinney
Aug 22, 2023

kd-ods
Aug 23, 2023
Maintainer Author

kd-ods Sep 15, 2023
Maintainer Author

StephenAbbott Nov 13, 2023
Maintainer

kd-ods
Oct 30, 2023
Maintainer Author

kd-ods Nov 7, 2023
Maintainer Author

kd-ods Nov 7, 2023
Maintainer Author

StephenAbbott Nov 13, 2023
Maintainer

StephenAbbott Nov 13, 2023
Maintainer

kd-ods
Oct 31, 2023
Maintainer Author

StephenAbbott Oct 31, 2023
Maintainer

kd-ods Nov 7, 2023
Maintainer Author

kd-ods
Dec 6, 2023
Maintainer Author