Add metadata api #267

paulsonnentag · 2024-01-10T14:52:07Z

metadata can be attached to changes
global metadata can be set on repo with repo.setGlobalMetadata({...})
global metadata is attached to each change that's made through the repo
metadata can be also set locally by passing a metadata object to DocHandle.change or DocHandle.changeAt
local metadata overrides global metadata

More details here:
https://tiny-essay-editor.netlify.app/#automerge:ZJSzpLWWLBWJz67TiNM1c4fb2sa

- metadata can be attached to changes - global metadata can be set on repo with `repo.setGlobalMetadata({...})` - global metadata is attached to each change that's made through the repo - metadata can be also set locally by passing a metadata object to `DocHandle.change` or `DocHandle.changeAt` - local metadata overrides global metadata

neftaly · 2024-01-11T02:05:12Z

packages/automerge-repo/src/DocHandle.ts

+  }
+
+  return {
+    time: options.time,


I'm assuming this is an epoch? I wouldn't use it unless there was some sort of indication if it's a client or server timestamp, a way to sign client time (trusted timestamp), and a spec for the format.

Being able to add timestamps to changes is not a new feature that already existed before.The timestamps are purely advisory and are not used internally for any algorithms.

Ok. Sorry that was a very convoluted way to explain myself.
I don't understand what it's for, should it be in the metadata instead?

I agree it would be more consistent not to make the timestamp special and instead just put it into the metadata. What do you think @alexjg?

Yeah I think the API should be just to have the timestamp in the metadata and then internally pull it out and separately pass it to automerge.

Oh, is it used internally somehow? I would actually feel better if it was separate from the metadata then, so we're not hiding a potential complexity.

Timestamps are not used by the CRDT logic – they are only informational, to enable things like history visualisation. But the current Automerge data format already has a place for storing timestamps; removing it and moving it into the generic metadata would be a breaking change to the data format.

alexjg · 2024-01-11T10:22:22Z

packages/automerge-repo/src/DocHandle.ts

@@ -42,6 +42,9 @@ export class DocHandle<T> //
  #timeoutDelay: number
  #remoteHeads: Record<StorageId, A.Heads> = {}

+  // Reference to global meta data that is set on the repo to be attached to each change
+  #globalMetadataRef?: ChangeMetadataRef


It might be worth allowing this to a be a function (DocumentId) -> Record<..>?

Yes I think that would be a better API

acurrieclark · 2024-01-11T10:36:21Z

I have been pondering for a while how best to proceed with adding change metadata. This seems like a good compromise. However, I have 2 thoughts:

This is technically a breaking change, as it removes the message property from the change options.
I worry (mostly because I don't know how this is stored under the hood) about document bloat. For example, in a document where each keypress registers a change, metadata which could be identical for hundreds of changes is replicated hundreds of times.

For my own part, I am really only looking for a way to identify which user has made a change. My current thought is to store a list of userIds and their corresponding actorIds within the document. This could either be the same doc or a separate one.

neftaly · 2024-01-11T11:04:08Z

I would use this to allow users to annotate their changes, and when debugging. I wouldn't have a use for globalMetadata. I have been using message=JSON.stringify(...) but would appreciate a proper way to do it.

alexjg · 2024-01-11T11:25:29Z

@acurrieclark you're correct that this will bloat the document a lot. This PR is a test to see if the API is useful. If it is useful then we'll store the metadata in a columnar encoded fashion so that it compressed well.

acurrieclark · 2024-01-11T11:27:47Z

Ah OK, so this is a precursor to adding it at he automerge level?

alexjg · 2024-01-11T11:28:11Z

packages/automerge-repo/src/DocHandle.ts

+  patchCallback?: A.PatchCallback<T>
+}
+
+export type ChangeMetadata = Record<string, number | string | boolean>


Looking ahead to how we compress this I am not sure this API will do everything we need. When compressing this metadata we don't store the names of the fields, but instead an integer column ID. This means that the application will need to provide some mapping from a column ID to the name of the field in the metadata object.

I’d suggest that for app-defined metadata columns we identify the columns by name+type, rather than by a numeric column ID. That would simplify the API and only cost a few bytes more space. The question is how the type should be identified in the API. With a non-null value we could check if the value is an integer, string, or byte array, and assign it to the appropriate typed column. With a null value we could just treat it as absent, and any metadata columns that exist because of non-null values on other changes will just be filled in with null anyway.

Ah that makes sense. This would imply storing the union of every metadata key of every change in the documen in a lookup table somewhere in the serialized document chunk right?

Yeah I think so. The first time a change has a non-null value in its metadata, we create a column identified by its metadata key and the type of the value. The serialised document will have to store every metadata column that exists on any of the changes. Changes that don't mention a particular metadata column just fill it in with null, as is the behaviour for the Automerge-internal columns.

This does mean that if a user never puts anything except null as the value for a metadata key then we would have to do something like not write it to the document at all right (because we don't know what column type to write). This means it would not be possible to distinguish between a null value and a not-present value. Maybe we should say that you can't write null values to avoid ambiguity?

Yup agree, let's disallow nulls.

I've been thinking a little bit more about this. At some point we're going to want to have some kind of squash/rebase workflow I think. In such a workflow we would need to decide what to do with the metadata on each change. I think ideally we would just encode all the metadata into the squashed change. This suggests to me that we should actually treat the metadata as a multimap, somewhat like the query parameters in a URL. @ept @paulsonnentag what do you think?

What would a squash look like? Could a user still override the metadata to set it to something custom for the squashed change, or would it be purely mechanical that the metadata of the squashed changes would always be the union of the metadata of the individual changes?

@alexjg Good point. I'd think that a squash commit would need to bring in custom logic for compacting the metadata: for example, we might not want to keep every single timestamp, but only the minimum and maximum among the timestamps in the squashed range. For authors we might want to keep the set of distinct users who have contributed at least one change, and for signatures we might want to keep the most recent signature per branch per signing key. This suggests to me that we can keep the data model for metadata on a single change simple (a single value per entry in the map), and figure out how to represent changes on squash commits once we get to that point.

I just realised something: if we do author attribution using metadata on changes, it would probably not be possible to do attribution on a squash commit or shallow clone, because the per-change information is not available. On the other hand, if we do attribution by mapping actor IDs to user IDs, attribution should still be possible, because the squash should preserve opIds, and the actor-to-user mapping can be included in the squash. That would be an argument for using the actorIds for attribution.

HerbCaudill · 2024-01-11T13:29:44Z

I like this idea a lot.

I think maybe the global/local terminology is going to be confusing - "local" suggests "device-level", when that's more like what you mean by "global".

Rather than come up with alternative wording, I'd maybe just drop the word global from the API - repo.setMetadata or repo.setChangeMetadata is sufficiently clear.

geoffreylitt · 2024-01-11T14:07:46Z

Just throwing out a use case in case it affects how we think about using/implementing this: I'm interested in trying to use this feature to tag changes with information about domain-level actions that created the change. Imagine a JSON object representing the inputs to a function like addComment or moveCardOnBoard being stored alongside the resulting change. This would be helpful for, eg, showing domain-level information in a changelog.

Rather than come up with alternative wording, I'd maybe just drop the word global from the API - repo.setMetadata or repo.setChangeMetadata is sufficiently clear.

+1

paulsonnentag · 2024-01-12T10:51:25Z

I've implemented the API @alexjg suggested. Instead of setting change metadata globally on the repo you can configure a changeMetadata function in the repo config:

const repo = new Repo({
  ...  
  changeMetadata: (documentId) => ({ author: "bob" })
})

I've also added a way to set metadata when a document is initially created:

const handle = repo.create({ metadata: {author: "bob"} })

alexjg · 2024-01-12T11:31:01Z

packages/automerge-repo/src/DocHandle.ts

+  }
+
+  return {
+    time: options.time,


Yeah I think the API should be just to have the timestamp in the metadata and then internally pull it out and separately pass it to automerge.

alexjg · 2024-01-12T11:34:41Z

packages/automerge-repo/src/DocHandle.ts

+      metadata = {}
+    }
+
+    Object.assign(metadata, options.metadata)


Per https://github.com/automerge/automerge-repo/pull/267/files#r1449148679 I think we should check that the values here are only of the allowed types and throw if not.

- time is also passed in with the metadata object and extracted when storing it automerge to take advantage of the fact that automerge stores timestamps as deltas - throw an error if metadata contains values that are not primitive (number, string, boolean)

alexjg · 2024-01-12T12:33:25Z

packages/automerge-repo/src/DocHandle.ts

+      continue
+    }
+
+    if (type !== "number" && type !== "string" && type !== "boolean") {


We can also allow Uint8Array here.

Although I guess we currently can't serialize that to JSON in a nice way so maybe we leave that for the future.

alexjg · 2024-01-12T12:34:47Z

LGTM

- undefined values are removed and don't show up in the metadata - If the changeMetadata function adds a value to a key but subsequently in the DocHandleChangeOptions the key is set to undefined the value is removed

acurrieclark · 2024-01-13T11:18:12Z

I absolutely agree with the addition of metadata here, and see why (in the short term) it needs to override the message option, but I would still suggest that removing time and message from the ChangeOptions is a breaking change and that they should be kept in for now.

pvh · 2024-01-24T00:41:03Z

So @paulsonnentag and I discussed this and I think we shouldn't merge this until/unless the lower level Automerge changes get made. The immediate problem that this patch solves for @paulsonnentag is that it's tricky to set a commit message on a commit made inside of the automerge-codemiror plugin (used for author attribution) but the implementation is expensive (something like 20+ characters per keystroke!) and the API is complicated in a way that solves the problem but I would like to avoid.

I showed him a way of working around the problem by replacing the change function on the handle that gets passed into the editor and while that is a horrible hack it does at least solve the problem at a prototype quality level. He's also got a forked version of automerge-repo in use in his prototype that works for him for now.

So -- closing this as WONTFIX for today but I think it's a meaningful problem and we should return to it in the future.

On another topic, thinking about the problem above (passing in the full handle + path to automerge-codemirror) has got @paulsonnentag and me thinking about what a better API might look like... hopefully we'll have time to explore that some time soon.

neftaly reviewed Jan 11, 2024

View reviewed changes

alexjg reviewed Jan 11, 2024

View reviewed changes

Fix: pass in global metadata on initial doc creation

9d21e1b

paulsonnentag added 3 commits January 12, 2024 10:01

Replace setGlobalMetadata with changeMetadata function

c5d41a0

allow to create new documents with metadata

f918cba

fix type error

787bb33

fix DocSynchronizer tests

d310a79

alexjg reviewed Jan 12, 2024

View reviewed changes

Handle undefined values in metadata

35babef

- undefined values are removed and don't show up in the metadata - If the changeMetadata function adds a value to a key but subsequently in the DocHandleChangeOptions the key is set to undefined the value is removed

pvh closed this Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metadata api #267

Add metadata api #267

paulsonnentag commented Jan 10, 2024

neftaly Jan 11, 2024

paulsonnentag Jan 11, 2024 •

edited

Loading

neftaly Jan 11, 2024

paulsonnentag Jan 11, 2024

alexjg Jan 12, 2024

neftaly Jan 13, 2024

ept Jan 15, 2024

alexjg Jan 11, 2024

paulsonnentag Jan 11, 2024

acurrieclark commented Jan 11, 2024

neftaly commented Jan 11, 2024

alexjg commented Jan 11, 2024

acurrieclark commented Jan 11, 2024

alexjg Jan 11, 2024

ept Jan 11, 2024

alexjg Jan 11, 2024

ept Jan 11, 2024

alexjg Jan 11, 2024

ept Jan 11, 2024

alexjg Jan 13, 2024 •

edited

Loading

paulsonnentag Jan 15, 2024

ept Jan 15, 2024

HerbCaudill commented Jan 11, 2024

geoffreylitt commented Jan 11, 2024

paulsonnentag commented Jan 12, 2024

alexjg Jan 12, 2024

alexjg Jan 12, 2024

alexjg Jan 12, 2024

alexjg Jan 12, 2024

alexjg commented Jan 12, 2024

acurrieclark commented Jan 13, 2024

pvh commented Jan 24, 2024

Add metadata api #267

Add metadata api #267

Conversation

paulsonnentag commented Jan 10, 2024

Choose a reason for hiding this comment

paulsonnentag Jan 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acurrieclark commented Jan 11, 2024

neftaly commented Jan 11, 2024

alexjg commented Jan 11, 2024

acurrieclark commented Jan 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexjg Jan 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HerbCaudill commented Jan 11, 2024

geoffreylitt commented Jan 11, 2024

paulsonnentag commented Jan 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexjg commented Jan 12, 2024

acurrieclark commented Jan 13, 2024

pvh commented Jan 24, 2024

paulsonnentag Jan 11, 2024 •

edited

Loading

alexjg Jan 13, 2024 •

edited

Loading