Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-evaluate support for @vocab in base VCDM v2 context #1514

Closed
msporny opened this issue Jun 29, 2024 · 10 comments
Closed

Re-evaluate support for @vocab in base VCDM v2 context #1514

msporny opened this issue Jun 29, 2024 · 10 comments
Assignees
Labels
CR1 This item was processed during CR1 normative The PR is a normative change to the CR specification pr exists

Comments

@msporny
Copy link
Member

msporny commented Jun 29, 2024

Previously, the VCWG decided to define a @vocab value in the base context (see #953). Recently, a security disclosure (which is still under debate) has resulted in a number of individuals that had previously been in support of defining an @vocab in pulling their support for the feature since it is, at best, not very well understood, and at worst, leads to unexpected security-related concerns for those that do not understand the ramifications of using it.

We no longer have consensus for the feature (this is the new information that the security disclosure has highlighted). At a minimum, we need to poll the group again to see if @vocab has the support it needs to remain in the VCDM v2 base context.

There are additional proposal options, which include:

  • For the "Getting Started" section, create a "development context", which might just be the examples/v2 context.
  • Strongly advise against the use of @vocab in a production setting (but still allow it).
  • Ban the use of @vocab in any production setting (and implement normative specification text and tests that enforce the behaviour).
  • Create an "issuer-defined" context that moves the @vocab declaration to that document (for those that want to continue to create/use "private term" VCs).

We'll gather feedback in this issue and then implement whatever achieves consensus.

@msporny msporny added normative The PR is a normative change to the CR specification CR1 This item was processed during CR1 labels Jun 29, 2024
@msporny msporny self-assigned this Jun 29, 2024
@OR13
Copy link
Contributor

OR13 commented Jun 30, 2024

It could be that given the security security issues with complex contexts, the base context should be reduced to only the terms and key words necessary for the "claims data model", and not include the "securing specific" data model claims, such as:

And lastly:

I put @vocab last because its not clear to me if data integrity implementers (or json-ld in community in general) agree on the purpose of placing this keyword in a context.

IIRC, the VCWG originally added it because there was significant interest in "not requiring JSON-LD + RDF processing", in order to issue or verify credentials... but when you don't otherwise mandate valid JSON-LD + RDF, such as through a normative statement like:

The JSON-LD claimset MUST canonicalize to the same application/n-quads by the issuer and the verifier.

There is a chance that the verifier might process the claims differently than the issuer intended, and more specifically, implementers of did resolvers and credential graphs (network diagrams), observed that without a default @vocab much of the RDF that you might wish to analyze simply exploded at the point of analysis, when issuers signed with securing mechanisms that don't require valid RDF, but then verifiers expected for it to be produced later.

This RFC from the IAB is a much better take on this topic: https://datatracker.ietf.org/doc/rfc9413/

Time and experience show that negative consequences to
interoperability accumulate over time if implementations silently
accept faulty input. This problem originates from an implicit
assumption that it is not possible to effect change in a system the
size of the Internet. When one assumes that changes to existing
implementations are not presently feasible, tolerating flaws feels
inevitable.

The problem @vocab was targeted at was addressing these "silent faults", but they are originally introduced by the split in interpretation of what the claimset is.

If the claimset is JSON-LD, a fault can occur in the JSON processing (invalid json syntax, exceeded max depth, bad unicode handling, etc).

If the claimset is RDF, a fault can occur in the processing of the RDF concrete serialization (like application/rdf+xml, or application/n-quads).

If you say that a claimset is JSON-LD that MUST always produce valid RDF, you get faults from both categories.
If you say that a claimset is JSON-LD that MAY produce valid RDF, you still get faults from both categories, you just might ignore the RDF faults because they are expected, because of the normative framing.

I support re-evaluating including @vocab in the base context, I'll share my view of how to improve the specification, but I am not a W3C member, and the WG will need to decide how they want to handle this sort of thing.

Option 1: Keep Saying No RDF Processing is Required

Decide if you want silent failures to come in the form of "bad term definitions...@vocab stays" or "no rdf produced... @vocab removed".

Option 2: Make RDF Processing Mandatory

Drop @vocab, add normative text that explains that RDF needs to be producible regardless of the securing mechanism, and optionally move the securing mechanism details to separate contexts per best practices.

This way, you are clear to consumers with normative language that valid "high quality" RDF is expected for every valid instance of the data model, and you have given them normative guidance "you MUST understand this, in order to implement the spec properly", on how to produce "high quality RDF".

I'm in favor of Option 2. The reason is that I have observed over the years lots of confusion regarding JSON-LD, including as described here: https://tess.oconnor.cx/2023/09/polyglots-and-interoperability

I feel that without stronger guidance on how JSON-LD is expected to be processed in a dependent specification, like Decentralized Identifier or Verifiable Credentials, the ambiguity creates an open wound that festers and never heals.

It leads to conversations with customers and partners that sound like: "You don't have to look at the RDF, but if you don't a verifier might come complain to you in the future about what you issued", or "You will need a stricter regulator profile to ensure JSON-LD produces RDF, because the base specification does not actually ensure this property"....

IMO, @vocab is just a symptom of the underlying problem, there is no consensus on mandatory processing of RDF... Its better to fix that in specification text, than to hide it in details of a JSON-LD keyword.

@dlongley
Copy link
Contributor

dlongley commented Jun 30, 2024

I think we should do all of the following:

  1. Remove @vocab from the core context.
  2. For the "Getting Started" section, create a "development context", which might just be the examples/v2 context.
  3. Strongly advise against the use of @vocab in a production setting (but still allow it). [Note: I think we should say that it cannot offer term protection, so use of the JSON-LD compaction API is recommended prior to the consumption of documents with contexts that use @vocab.]

As for this point:

Create an "issuer-defined" context that moves the @vocab declaration to that document (for those that want to continue to create/use "private term" VCs).

I think "issuer-defined" terms (and "private claims") are a footgun in the global, three-party model, so my preference is not to define a context at all with such an @vocab value. This approach doesn't prevent someone else from defining such a context (we can't prevent this), but we don't need to endorse that approach.

I think "private claims" are the actual basis of any coherent "polyglot problem", as they are ambiguous on a global scale. These sorts of claims are only remotely sensible in a two-party model where there is an assumption of a tight coupling between the issuer and verifier, and the holder functions not as a fully independent actor, but as a transporter of opaque envelopes of information.

I think endorsing this concept in our core context was a mistake as it encourages people to make assumptions based on the historically ever-familiar two-party model; a model that isn't applicable here. These assumptions can result in a number of problems including, but not limited to, making it more difficult for general purpose wallets to help holders make choices, unduly incentivizing centralization in the marketplace, failing to understand document contexts prior to consuming information, and harming privacy by requiring permission from the issuer to express the same information in different ways.

@aniltj
Copy link

aniltj commented Jun 30, 2024

Remove @vocab from the core context.

+1

For the "Getting Started" section, create a "development context" ...

+1 particularly if the existence of that development context is something I can test for (and prevent via an implementation profile if need be)... and drop to the floor if it exists in production.

Strongly advise against the use of @vocab in a production setting (but still allow it).
[Note: I think we should say that it cannot offer term protection,
so use of the JSON-LD compaction API is recommended prior to the consumption
of documents with contexts that use @vocab.]

I favor not using undefined terms and banning the use of @vocab from production implementations, so will wait to see how strong the "strongly advice against" language is, but agree that this is a good step forward.

For those who are not planning on using JSON-LD aware API's, should there also not be clear guidance provided that since the VCDM v2 data model is JSON-LD compact form, you need to have checks in your processing logic to catch the equivalent errors that a JSON-LD aware API will catch?

@kimdhamilton
Copy link
Contributor

My votes:

  • +1 to remove @vocab from the base context
  • +1 to at least "Strong advice against" in production
  • Flexible on additional considerations on how this is achieved.

@longpd
Copy link
Contributor

longpd commented Jun 30, 2024

Concurrence with:

  • +1 to remove base context @vocab
  • +1 to strong advice to
    o check processing logic to catch the equivalent errors that a JSON-LD aware API will catch
    o not use @vocab in production but open to how that is conveyed, and if allowed what to check first

@tplooker
Copy link

As someone who was originally in favour of having @vocab in the core context but also the author of the reported security vulnerabilities cited I'd just like to clarify my POV on this issue.

  1. @vocab is a broadly useful feature with respect to JSON-LD, something that hasn't changed with the reporting of this security vulnerability.
  2. My position around having @vocab in the core context for developer ease of use hasn't changed I believe that to be important. However its not a hill I am willing to die on any longer.
  3. What has become clear is a flaw / design issue with JSON-LD which means the term protection feature offered by @protected doesn't extend to terms defined by @vocab

In my opinion we should be focusing on the root cause of the issue here, which is fixing how @vocab can be used in data integrity, because simply removing it from the core context doesn't mean it won't be used. If it were fixed then many of the arguments in this thread about having @vocab in the core context or not would be less relevant, because using @vocab would be safe.

@aniltj
Copy link

aniltj commented Jun 30, 2024

@vocab is a broadly useful feature with respect to JSON-LD ...

Agree with @tplooker on this.

However, also believe that having @vocab in the base context blinds developers to its existence, and promotes its misuse.

An option that can serve both perspectives is the use of @vocab for development and testing purposes via a "development" or "secondary" context that a developer has to explicitly and with full awareness use.

@iherman
Copy link
Member

iherman commented Jul 3, 2024

The issue was discussed in a meeting on 2024-07-03

  • no resolutions were taken
View the transcript

1.5. Re-evaluate support for @vocab in base VCDM v2 context (issue vc-data-model#1514)

See github issue vc-data-model#1514.

Brent Zundel: now we get to talk about 1514. Re-evaluate support for @vocab in base VCDM v2 context. coming out of a conversation in the Data Integrity spec. Some folks are suggesting there is a critical vulnerability. This could be a mitigation.

See github issue vc-data-integrity#272.

Manu Sporny: the discussion in the DI spec asserts a number of things...one is a realization that some people do not understand how @vocab works. because of that it has been misinterpreted and misused in that security disclosure. this discussion has led some to change their position on adding @vocab to the base context.
… the issue asserts we should remove @vocab from the base context. still up to us to decide how it could be used, if at all. the spec doesn't say 'don't use it in production' - folks in the thread think it must not be used in production (MUST vs SHOULD). how do we enforce that? should we? there are legitimate uses of @vocab/@base in production.
… there is enough here to raise a PR after we discuss this a bit more on the call today.

Ivan Herman: if @vocab must not be used that would require all participant parties to check that. that means off-the-shelf LD checkers cannot do this, since it is valid LD.

Dave Longley: +1 that we consider some language changes but not add a MUST NOT; any verifier must understand the contexts it consumes information from anyway, and they can only allow list contexts that don't include @vocab (so long as @vocab is removed from the core context).

Manu Sporny: you are right. there are some LD processors considering putting in a feature around this. I don't know if there is support for pulling this into our spec. There are legitimate uses of @vocab in production. Example: if the last @vocab in a context array, and your application knows that, ... it could be fine to use @vocab if you order it properly and there are other similar scenarios.
… feels like we're closing off a bunch of use cases for no real reason. the current security disclosure specifically did not do checks that we highlight in the spec. do not think we'll get consensus. most we'll see is a 'should not' or strongly discourage it unless you know what you're doing.

Dave Longley: I tend to agree. a MUST NOT is a bridge too far. I do think removing @vocab from the base context is a good idea. any context should be vetted, verifiers do not need to accept with @vocab if they vet the core context (and we remove it).

Michael Jones: I was talking with Orie about this. The statements he made...he has a slight preference for always getting to RDF even if as a result of @vocab terms. If it is removed, then removal should mean terms are interpreted as JSON not RDF.

Ivan Herman: trying to make clear what I understand the proposal to be. 1 - remove from the core context. in parallel 2 - reinforce text to say 'don't use that if you can avoid it'. I agree with both proposals.

Manu Sporny: yes, your understanding of the proposal is correct, Ivan.

Ivan Herman: to Mike - I do not understand everything Orie is stating. I know he has this opinion that everything should be done on RDF only. I do not want to get into this, and not the right person to discuss this (RDF bias). His statement that we should treat it as JSON...I do not understand what he means.

Dave Longley: we decided a while ago that VC 2.0 uses LD compacted form. That requires that you understand the @context field. Not something you can just ignore. That makes things simple. We can clarify more if we need to do so. When you understand that...it prevents these problems from being raised.

Gabe Cohen: My main concern was to reduce the complexity on implementers that are more LD-averse, and I'm afraid that removing vocab increases the burden on implementers. I can see the arguments for using LD and understanding what its doing, but like the convenience for @vocab provided for those that wanted the feature. Is there a middle ground here?

Michael Jones: I understand what Orie is saying -- we get a mapping for all terms that do not appear in context entries. This is why we added it. As an engineering mechanism I still think it's valuable. I am prone to leaving it alone.

Ivan Herman: +1 to selfissued, I understand now what Orie meant. Thx.

Manu Sporny: We cannot leave it alone anymore - there is no support from the WG. We can figure out what to do about it. Gabe you asked - is there middle ground here? Yes - I think that's what's being proposed. The section we had said 'don't worry, just use the base context' - that section can be updated to say - use these two contexts: the base and examples context since it has @vocab. Can work until they're ready for 'production'.
… IF they really want to use @vocab we can provide a template with an @vocab file..that is not a big ask.

Dave Longley: +1 to that plan, but none of that changes the fact that everyone must check the @context field, you can't ignore it (and the spec already says this).

Manu Sporny: LD-averse people can continue to use the mechanism, we can continue to strongly recommend they don't do that. one of the negotiations around vocab ... we were concerned that people that were LD-averse would split the group and start competitive work at IETF and negatively impact both communities...that happened. So, that weakens the argument to have @vocab at all.
… we have said you do not need to use an LD processor, use a simplified set of rules, said just check the context array and make sure you're OK with the contents, ... but there's only so much we can do. if developers are not going to use the spec since publishing a single context with an @vocab definition is too difficult, then I don't know we need to cater to those developers anymore.

Brent Zundel: it sounds like we have a proposed path forward. remove @vocab from the core context. create an example/experimental context with @vocab for test purposes. did not hear anyone say no. a possible 3rd step - if you want to keep using undefined terms, then you can publish your own @vocab context.
… let's spend one more minute and then move on to controller document.

Anil John: as someone implementing using DI and JOSE, using LD v2 using compact form is a credential for us. there will be no undefined terms in how we are creating credentials. all credentials we create will have clearly defined terms in the context..and can verify that the terms are coming from us.
… I am sympathetic that @vocab provides value. I disagree with having it in the base context. Developers become blind to it. The position that splits the difference (@vocab is bad vs @vocab adds value), we can add a secondary context that developers can add to note there are undefined terms.

Dave Longley: +1 to Anil, @vocab is useful in a closed setting like development, but it creates conflicts and problems in the general ecosystem.

Anil John: we support removing @vocab from the base context. support in a 2nd context for development purposes...so developers have to be aware of it...that's fine.

Michael Jones: is it the case now that conforming JSON-LD implementations will throw an error if there are undefined terms?

Manu Sporny: yes...not all of them but we can force them to.

Dave Longley: conforming implementations will throw an error, yes.

Michael Jones: thanks that is good data. responding to Brent's summary that no one has spoken against removal. I have spoken against removal. I would like to have this go out to some people - like Tobias - who are in different time zones, before making a decision.
… I would like more discussion before deciding on this call.

Dmitri Zagidulin: Responding to Manu's point about not worrying about LD-averse implementers. Not quite the case...I know of multiple implementers that are new to LD. any removal of friction, such as including an @vocab (though I understand the concern) -- let's not discount that audience.

Manu Sporny: Agree that we want to remove as much friction as possible for people that are new to LD (and even people that regularly use LD).

Dave Longley: +1 to not discount, but to move @vocab to examples and new developer space.

Dave Longley: +1 to Manu.

Dmitri Zagidulin: we want to remove friction. we could recommend an inline @vocab, which is an option.

Ivan Herman: +1 to fall back on inline @vocab.

Gabe Cohen: +1 if we can inline @vocab I'm less opposed..

Brent Zundel: I open to reaching out to MATTR and others. Not sure how much they should dictate group direction.

Dave Longley: yeah, i don't see why we can't inline it -- verifiers in production would reject it if they haven't allowlisted it.

Phillip Long: +1 to in-line @vocab - which sounds like a good compromise.

@msporny
Copy link
Member Author

msporny commented Jul 16, 2024

PRs #1520, #1524, and #1525 have been raised to address this issue. This issue will be closed once those PRs have been merged.

@msporny
Copy link
Member Author

msporny commented Jul 21, 2024

PRs #1520, #1524, and #1525 have been merged, closing.

@msporny msporny closed this as completed Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CR1 This item was processed during CR1 normative The PR is a normative change to the CR specification pr exists
Projects
None yet
Development

No branches or pull requests

8 participants