Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make block size limit part of the BlockEncoder API #223

Open
Gozala opened this issue Nov 8, 2022 · 5 comments
Open

Make block size limit part of the BlockEncoder API #223

Gozala opened this issue Nov 8, 2022 · 5 comments
Assignees

Comments

@Gozala
Copy link
Contributor

Gozala commented Nov 8, 2022

Currently BlockEncoder interface just encodes passed input into bytes

export interface BlockEncoder<Code extends number, T> {
name: string
code: Code
encode: (data: T) => ByteView<T>
}

Problem is:

  1. If you go beyond block size limit, your block won't be bit-swappable.
  2. You may not even know you've gone past block size limit.
  3. Most people is not even aware of the block size limit.

Proposed solution:

I would like to propose amending our BlockEncoder interface as follows:

export interface BlockEncoder<Code extends number, T> {
  name: string
  code: Code
  /**
   * Encodes given data. If `buffer` is provided data is written into it and
   * a `Uint8Array` view of the written bytes is returned.  If `buffer` is not
   * passed new buffer is allocated with a `byteLength` corresponding to
   * default block size limit in IPFS. If encoded data does not fit the `buffer`
   * `RangeError` exception is thrown.
   */
  encode: (data: T, buffer?: Uint8Array) => ByteView<T>
}

Idea here is that:

  1. User should be able to optionally pass in the buffer to write data into.
  2. If data does not fit the block size limit encode fails.
  3. User is still able to encode blocks larger than a block size limit by passing in larger buffer.

This will be non-breaking change at the API level, but it would be breaking in the sense that errors will occur if block is larger than a block size limit. Never the less it seems like a better default than silently letting things slip.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 8, 2022

There is some overlap with #222, however this is much smaller in scope and does no attempt to reduce computational overhead.

@rvagg
Copy link
Member

rvagg commented Dec 6, 2022

I think I'm fine with this but a little dubious of this as the right way to achieve the stated goal because it requires you allocating the maximum size byte array for each encode, which may end up being very wasteful if your encode regularly gives you only a fraction of that size. At least now our codecs (mostly?) only request allocation of what they actually need (or close to it, there's some trickery involved but it's not a very large optimistic allocation).

The change would also be fairly invasive if you want to go changing codecs. dag-pb would probably the easiest to change to start with but it's still pretty invasive (off the top of my head). So if you want to experiment with the API and have time then go ahead and see how it works out!

@rvagg rvagg moved this to 🏃‍♀️ In Progress in IPLD team's weekly tracker Dec 6, 2022
@Gozala
Copy link
Contributor Author

Gozala commented Dec 7, 2022

I think I'm fine with this but a little dubious of this as the right way to achieve the stated goal because it requires you allocating the maximum size byte array for each encode, which may end up being very wasteful if your encode regularly gives you only a fraction of that size. At least now our codecs (mostly?) only request allocation of what they actually need (or close to it, there's some trickery involved but it's not a very large optimistic allocation).

This does not match my experience. We find ourselves allocating fixed size buffers (frame of sorts) that we intended to pack with some data and send it off, then repeat until we're done.

With current APIs we have two choices:

  1. Collect blocks until we reach frame size, then allocate frame and copy all blocks. This causes spikes in memory use as there are moments where things are double allocated.
  2. Allocate frame ahead and copy bytes from every block then drop the block. This does not cause spikes but we have to release blocks or create a replicas that point into byte range in the frame.

With proposed API we no longer have to choose between either of two, instead we can just allocate frame and encode blocks directly into it. That does mean that for each block we'll create a new Uint8Array view into buffer from the current offset with max block length, but those views will be short lived and are a lot cheaper.

@rvagg
Copy link
Member

rvagg commented Dec 8, 2022

Oh, right, so you're allocating a large chunk and then wanting to present a slice to the decoder to fill; I was imagining allocating a new large chunk for each one but that's unnecessary with Uint8Arrays if you have a nice queue lined up. That's fair enough.

@BigLep
Copy link

BigLep commented Jan 3, 2023

2023-01-03 IPLD triage conversation: @Gozala are you going to take this on?

@rvagg rvagg moved this from 🏃‍♀️ In Progress to 🥞 Todo in IPLD team's weekly tracker Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants