Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIDs as interfaces #203

Closed
Tracked by #199
achingbrain opened this issue Sep 26, 2022 · 9 comments
Closed
Tracked by #199

CIDs as interfaces #203

achingbrain opened this issue Sep 26, 2022 · 9 comments

Comments

@achingbrain
Copy link
Member

#161 was closed and it looks like we've backtracked on the CIDs as interfaces thing. Now we have a Link interface and CID class that implements the link interface. We also have CIDs as an ArrayBufferView which, I don't know, maybe didn't turn out to be as useful as we thought it might be at the time, but the end result is it has extra properties byteOffset and byteLength which are obscured from the uninterested caller by making them non-enumerable. This works but it is very slow.

If we're going to town on #199 maybe we could do the CID-as-interface. I'd like to propose a very minimal interface, any static methods would be exposed as named exports instead.

export type CIDVersion = 0 | 1

export interface CID {
  version: CIDVersion
  code: number
  multihash: MultihashDigest
  bytes: Uint8Array

  equals: (other: CID | Uint8Array | string) => boolean
}

// create CIDs
export function createCID (version: CIDVersion, code: number, digest: MultihashDigest): CID
export function cidFromString (string: string): CID
export function cidFromBytes (bytes: Uint8Array): CID

// housekeeping
export function toV0 (cid: CID): CID
export function toV1 (cid: CID): CID
export function isCID (value: any): value is CID

Some thoughts:

  • toV0/toV1 can we live without these? A quick grep of the js-ipfs codebase and we only call them in tests and for top-level UI type output - the cli and the http gateway, would vote to remove and let users have their own utils for this
  • asCID/isCID my (perhaps unpopular) opinion is that these are named too similarly and that asCID is clunky to use in practice as it requires null-guarding on the output. If the CID interface is only the fields above perhaps we don't need to convert old impls into new ones any more (the original purpose of asCID IIRC)
  • parse/decode vs cidFromString/cidFromBytes - the meaning of the words 'parse' and 'decode' are so close as to be synonymous which makes the API hard to use without reference to documentation; let's just say what they do. @libp2p/peer-id has taken this approach and it makes the codebase a bit more straightforward to the newcomer.
  • toJSON can we live without this? Similar to toV0 and toV1 I only see it used in UIs.
  • toString(encoder?: MultibaseEncoder) omission - maybe if you have a non-default encoder you could just use it to encode the .bytes property of the CID instead of passing it to the toString method? Calling cid.toString() would only ever return the default encoding (base58btc for v0, base32 for v1). The current impl maintains a cache of strings which might be a reason to keep it? I am not sure where this has been a bottleneck in the past though.
  • CIDIface/CIDImpl - @multiformats/multiaddr and @libp2p/peer-id have avoided this sort of naming scheme by intentionally not exporting the default implementation. You only get factory functions that take some input and return something that conforms to the interface. I suggest following that convention here too.
@rvagg
Copy link
Member

rvagg commented Sep 27, 2022

A lot of these want a @Gozala answer

@rvagg rvagg mentioned this issue Sep 27, 2022
5 tasks
@Gozala
Copy link
Contributor

Gozala commented Sep 27, 2022

If we're going to town on #199 maybe we could do the CID-as-interface. I'd like to propose a very minimal interface, any static methods would be exposed as named exports instead.

I think using distinct interface name from class name was good move and removes room for whole lot of confusion. It is also better aligned with the IPLD schemas which uses term Link. There is more discussion and rational behind this in here #161 (comment)

I would much rather have discussion about specifics of the Link interface and their methods in the PR here https://github.com/multiformats/js-multiformats/pull/199/files#diff-8f1fa16ca1a04cad324542dd17bac9be3096a4827d12966eed96fa5f2b89eac4R21-R42 as opposed to a new alternative interface here, because it simply drops bunch of things that had been considered in the design of the Link interface.

That said I'll try to provide some constructive feedback on individual points:

toV0/toV1 can we live without these? A quick grep of the js-ipfs codebase and we only call them in tests and for top-level UI type output - the cli and the http gateway, would vote to remove and let users have their own utils for this

I won't particularity miss those and I got to say I'm ready to let v0 legacy go. As you may have notices I've dropped toV0 in Link interface and only added toV1 as it felt convenient in case I have to ensure I'm working with v1. If you all want to drop this I'm not the one to hold you back.

asCID/isCID my (perhaps unpopular) opinion is that these are named too similarly and that asCID is clunky to use in practice as it requires null-guarding on the output. If the CID interface is only the fields above perhaps we don't need to convert old impls into new ones any more (the original purpose of asCID IIRC)

This circular reference is a very useful trick for retaining CID-ness when transferring them across JS realms / workers, because circular reference stays intact while class information is lost. This had been a primary motivation while asCID was a nice side effect.

The CID.asCID(v) had been useful to adopt CID class even if v is of different class or if it was transferred across JS realms / workers in which case all methods will be erased. Please note that CID.isCID(v) in that case may return true but expected methods will not exist on v.

Given the tradeoffs I still think that asCID is a good idea, even if it appears bit odd. That said I have left it out from Link interface because it's mostly concerns the implementation and not the interface.

  • parse/decode vs cidFromString/cidFromBytes - the meaning of the words 'parse' and 'decode' are so close as to be synonymous which makes the API hard to use without reference to documentation; let's just say what they do.

All of the multicodec / multibase stuff uses those names and terms. I'd say stay consistent, yet I would personally not look into updating all the code to change from decode / encode / parse / format to fromBytes / toBytes / fromString / toString.

  • toJSON can we live without this? Similar to toV0 and toV1 I only see it used in UIs.

I think it useful in interface (as we have some code that uses that for inference) & in implementation it is crucial as it needs to leave out asCID field. What's the reason to cut it out ?

  • toString(encoder?: MultibaseEncoder) omission - maybe if you have a non-default encoder you could just use it to encode the .bytes property of the CID instead of passing it to the toString method? Calling cid.toString() would only ever return the default encoding (base58btc for v0, base32 for v1). The current impl maintains a cache of strings which might be a reason to keep it? I am not sure where this has been a bottleneck in the past though.

Previous implementation used to take base names e.g. base32 for this, which I swapped with passing base encoder. There is some caching and slight differences in v0 / v1 handling.

While I have no personal attachment to this method I believe a lot of nft.storage / web.storage assumes that we can pass base32 encoder there to get appropriate representation and if things just start ignoring that parameter it would probably lead to some problems that would be easy to miss.

  • CIDIface/CIDImpl - @multiformats/multiaddr and @libp2p/peer-id have avoided this sort of naming scheme by intentionally not exporting the default implementation. You only get factory functions that take some input and return something that conforms to the interface. I suggest following that convention here too.

I'm not sure I fully understand this suggestion. With Link interface in place CID class becomes implementation detail that everyone could gradually move away from. I don't believe massive breaking changes are better in any regard as it forces everyone to upgrade or leave things behind as opposed to provide a carrot, which is what I was hoping to achieve with #197

I also can not compare to @multiformats/multiaddr or @libp2p/peer-id as we don't use those. We do however use this library all over the place and #197 was aiming to address exact problems we've been struggling with, which this proposal I'm afraid isn't going to address.

@Gozala
Copy link
Contributor

Gozala commented Sep 27, 2022

I also have to say it is hard not to be frustrated after:

  1. My work on "CID as interface" CID interface #161 was (rightly) called out to be incredibly disruptive .
  2. So I have proposed and implemented less disruptive alternative (and was encouraged to do so).
  3. Long threads explaining why certain type parameters were introduced and how they are useful, yet won't get into way (thank to provided defaults).
  4. @rvagg's heroic effort of verifying that indeed added types caused no problems in code that did not care for those.
  5. Recognizing that "CID as Link interface implementation" was less disruptive & less confusing & implementing it in feat: implement link interface & module #197
  6. Getting that work blocked and folded under v10.0.0 tracking #199 PR.
  7. Blocking all that now on this even more radical and more disruptive change for which I can't seem to see any rational.

Please don't get this a wrong way, but it's been almost a year now & I can't help but feel that all of this efforts had been a waste of time or worse because bunch of code had been built around upcoming Link interface (and it's type parameters).

@Gozala
Copy link
Contributor

Gozala commented Sep 27, 2022

My constructive suggestion here would be to move along with #199.

These breaking changes could be discussed and considered separately. I don't believe existence of Link interface would prevent changing CID class to CID interface. Same goes for dropping certain methods and properties if it would appear to be beneficial. At least that way we'll have a choice to opt-in into those breaking changes when we ready.

@achingbrain
Copy link
Member Author

it is hard not to be frustrated

I think this is completely valid, but I also think the landscape has shifted a bit over the past 12+ months. With the renaming of go-ipfs to kubo we're vacating the ipfs namespace to get a bit more experimental, more single-issue and less kitchen-sink.

There's a desire to do something different to js-ipfs in the same vein. Given that it needs to be be smaller and more lightweight, I think it's ok to look at streamlining some of the other code in the stack.

That said, maybe it'd be better to just start with Link instead of CID, though I have similar thoughts about some of the fields there.

toV0/toV1

Given how much data is in CIDv0 it's not going away any time soon, but I also don't see us converting between V0 and V1 very often, certainly not often enough to require utility functions like this to be on the interface. Sounds like we agree here.

asCID

That said I have left it out from Link interface because it's mostly concerns the implementation and not the interface.

This is a quite central to the point I think, if it's a useful thing, it could live in a separate file as a util an only imported if required. It doesn't need to be on the interface.

toJSON

in implementation it is crucial as it needs to leave out asCID field

asCID is non-enumerable, even after #202 so I don't think this is an issue

What's the reason to cut it out ?

I didn't see a reason to include it in the interface, it could be something that's an implementation detail.

That is, I think an implementation of a CID could be valid/useful without a toJSON method.

Similar to toString really, these are serialization concerns and maybe could be something we leave out of the interface.

toString

..assumes that we can pass base32 encoder there to get appropriate representation

Hopefully type checkers and tests would catch regressions here?

It just seems like it could be a concern external to the interface (e.g. the expectation is you operate on the .bytes property) and the interface could be a bit smaller/simpler.

parse/decode

Would still really prefer these methods just said what they did.

CIDIface/CIDImpl

I'm not sure I fully understand this suggestion.

This was mostly in response to this comment. The point is by not exporting the default class you don't tie yourself in knots over the names of things.

...

My constructive suggestion here would be to move along with #199.

I agree, I don't think #199 should be blocked by this.

@Gozala
Copy link
Contributor

Gozala commented Sep 30, 2022

@achingbrain I think there are few things to consider here, when thinking about trimming some methods from the interface:

  1. Minimal interface is nice because it's easy to implement and I think that is primary motivation here.
  2. Ideally consumer code should not need to import extra module to make use of the value being passed to it. That is to say if we turning CIDv0 to CIDv1 isn't uncommon it's very convenient to have that method on interface as that would remove need for pulling in some dependency (this only applies if you don't create new CIDs).

With above rational following fields had been added to Link interface in #199

  1. equals - I actually don't have find myself using it much, yet it would seem odd to introduce dependency on multiformats just to compare two links.
  2. toString - It takes base thing and I find using it fairly often. That base32.encode(link.bytes) is probably fine only thing you'd loose is cashing and even than we could probably add that to base encoders instead if desired.
  3. toV1 - We have bunch of code that upgrades all CIDs to v1. That code does not create CIDs just stores them in DB etc... Having to introduce dependency on multiformats just to turn them into v1 seems undesirable, that said not huge problem.
  4. link - Function that gives you back link. It may seem odd, yet it is very convenient when working with mix of links and data that can be turned into links. There is more context on this here Proposal: add some method e.g. .toLink(): CID on CID object #185
  5. toJSON - I have added it because IMO it produces unexpected results and as a result messes type inference in type aware RPC libraries which without would assume JSON structure maps actual object structure, which in this case is not. If we remove the method it's probably fine to remove it from interface as well, however if we do override default behavior it's good idea to have this on interface so that it does not mislead type aware tools and people.

I think asCID is a special case & I have considered adding it to Link interface, but then dropped it. It was tough call, but my rational here was following:

  1. Usually code that needs to worry about it also needs to use do something like CID.asCID(link) otherwise it has no guarantees that any of the methods are around (because those get dropped across realms/threads).
  2. Most of the other code does not need to care about this field & it seems easier to add things to remove things so I thought we could leave it out and if turns out to be a bad idea we can always add it back.

I would like to also address byteOffset, byteLength fields. I don't think they were a bad idea, it's just I was not allowed to complete implementation of that idea. What I really wanted (and did it in some other places) to make CID a subclass of Uint8Array so you could just point to a byte range in the memory. However it was difficult to make shift from current implementation which was used as a class with a specific constructor and all. I'm still hoping to get there over time by 1. moving away from constructor to factory functions and 2. moving from class to interface at that point it will not be a breaking change anymore.

@RangerMauve
Copy link
Collaborator

Given how much data is in CIDv0 it's not going away any time soon, but I also don't see us converting between V0 and V1 very often, certainly not often enough to require utility functions like this to be on the interface. Sounds like we agree here.

I'm constantly making use of this method in my code. 😅

achingbrain added a commit that referenced this issue Oct 15, 2022
From #203

> I would like to also address byteOffset, byteLength fields. I don't think they were a bad idea, it's just I was not allowed to complete implementation of that idea.

Given that the feature is incomplete and untested, remove it for
now and if it's required it can be implemented fully in the future.

Fixes #208

BREAKING CHANGE: The CID class no longer has `.byteOffset` and `.byteLength` properties - these can be accessed via the `.bytes` property instead
achingbrain added a commit that referenced this issue Oct 15, 2022
From #203

> I would like to also address byteOffset, byteLength fields. I don't think they were a bad idea, it's just I was not allowed to complete implementation of that idea.

Given that the feature is incomplete and untested, remove it for
now and if it's required it can be implemented fully in the future.

Fixes #208

BREAKING CHANGE: The CID class no longer has `.byteOffset` and `.byteLength` properties - these can be accessed via the `.bytes` property instead
achingbrain added a commit that referenced this issue Oct 15, 2022
This is an attempt at implementing #203, converting the CID class
into a minimal interface and having factory functions to return
objects that conform to the CID interface.

I've preserved all the generics the `Link` interface introduced and
there are functions in `link.js` that add the extra methods to turn
something that conforms to `CID` into something that can be used as
a `Link` so existing code using the `Link` API should not have to
change.

Notably there was no need to update any of the `Link` tests.

The static methods on CID have been exported as individual functions
- the names remain the same (`decode`, `parse`, etc) in an attempt
to be less disruptive.

Code using these methods should mostly just need to change:

```js
import { CID } from 'multiformats/cid'
```

to:

```js
import * as CID from 'multiformats/cid
```

Types can be imported as:

```ts
import type { CID } from 'multiformats/interface'
```

or as before:

```ts
import type { CID } from 'multiformats/cid'
```

BREAKING CHANGE: the CID class is now an interface
achingbrain added a commit that referenced this issue Oct 15, 2022
This is an attempt at implementing #203, converting the CID class
into a minimal interface and having factory functions to return
objects that conform to the CID interface.

I've preserved all the generics the `Link` interface introduced and
there are functions in `link.js` that add the extra methods to turn
something that conforms to `CID` into something that can be used as
a `Link` so existing code using the `Link` API should not have to
change.

Notably there was no need to update any of the `Link` tests.

The static methods on CID have been exported as individual functions
- the names remain the same (`decode`, `parse`, etc) in an attempt
to be less disruptive.

Code using these methods should mostly just need to change:

```js
import { CID } from 'multiformats/cid'
```

to:

```js
import * as CID from 'multiformats/cid
```

Types can be imported as:

```ts
import type { CID } from 'multiformats/interface'
```

or as before:

```ts
import type { CID } from 'multiformats/cid'
```

BREAKING CHANGE: the CID class is now an interface
achingbrain added a commit that referenced this issue Oct 15, 2022
This is an attempt at implementing #203, converting the CID class
into a minimal interface and having factory functions to return
objects that conform to the CID interface.

I've preserved all the generics the `Link` interface introduced and
there are functions in `link.js` that add the extra methods to turn
something that conforms to `CID` into something that can be used as
a `Link` so existing code using the `Link` API should not have to
change.

Notably there was no need to update any of the `Link` tests.

The static methods on CID have been exported as individual functions
- the names remain the same (`decode`, `parse`, etc) in an attempt
to be less disruptive.

Code using these methods should mostly just need to change:

```js
import { CID } from 'multiformats/cid'
```

to:

```js
import * as CID from 'multiformats/cid
```

Types can be imported as:

```ts
import type { CID } from 'multiformats/interface'
```

or as before:

```ts
import type { CID } from 'multiformats/cid'
```

BREAKING CHANGE: the CID class is now an interface
achingbrain added a commit that referenced this issue Oct 15, 2022
This is an attempt at implementing #203, converting the CID class
into a minimal interface and having factory functions to return
objects that conform to the CID interface.

I've preserved all the generics the `Link` interface introduced and
there are functions in `link.js` that add the extra methods to turn
something that conforms to `CID` into something that can be used as
a `Link` so existing code using the `Link` API should not have to
change.

Notably there was no need to update any of the `Link` tests.

The static methods on CID have been exported as individual functions
- the names remain the same (`decode`, `parse`, etc) in an attempt
to be less disruptive.

Code using these methods should mostly just need to change:

```js
import { CID } from 'multiformats/cid'
```

to:

```js
import * as CID from 'multiformats/cid
```

Types can be imported as:

```ts
import type { CID } from 'multiformats/interface'
```

or as before:

```ts
import type { CID } from 'multiformats/cid'
```

BREAKING CHANGE: the CID class is now an interface
@achingbrain
Copy link
Member Author

I've taken all the above on board and opened a PR that implements this here for further discussion: #211

achingbrain added a commit that referenced this issue Oct 15, 2022
This is an attempt at implementing #203, converting the CID class
into a minimal interface and having factory functions to return
objects that conform to the CID interface.

I've preserved all the generics the `Link` interface introduced and
there are functions in `link.js` that add the extra methods to turn
something that conforms to `CID` into something that can be used as
a `Link` so existing code using the `Link` API should not have to
change.

Notably there was no need to update any of the `Link` tests.

The static methods on CID have been exported as individual functions
- the names remain the same (`decode`, `parse`, etc) in an attempt
to be less disruptive.

Code using these methods should mostly just need to change:

```js
import { CID } from 'multiformats/cid'
```

to:

```js
import * as CID from 'multiformats/cid
```

Types can be imported as:

```ts
import type { CID } from 'multiformats/interface'
```

or as before:

```ts
import type { CID } from 'multiformats/cid'
```

BREAKING CHANGE: the CID class is now an interface
@rvagg
Copy link
Member

rvagg commented Dec 6, 2022

Closing for now, I think this is redundant (although I acknowledge that it may not be ideal yet and we may want to make further changes as discussed here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants