Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata api #267

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 64 additions & 5 deletions packages/automerge-repo/src/DocHandle.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ export class DocHandle<T> //
#timeoutDelay: number
#remoteHeads: Record<StorageId, A.Heads> = {}

// Reference to global meta data that is set on the repo to be attached to each change
#globalMetadataRef?: ChangeMetadataRef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth allowing this to a be a function (DocumentId) -> Record<..>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that would be a better API


/** The URL of this document
*
* @remarks
Expand All @@ -54,10 +57,15 @@ export class DocHandle<T> //
/** @hidden */
constructor(
public documentId: DocumentId,
{ isNew = false, timeoutDelay = 60_000 }: DocHandleOptions = {}
{
isNew = false,
timeoutDelay = 60_000,
globalMetadataRef,
}: DocHandleOptions = {}
) {
super()
this.#timeoutDelay = timeoutDelay
this.#globalMetadataRef = globalMetadataRef
this.#log = debug(`automerge-repo:dochandle:${this.documentId.slice(0, 5)}`)

// initial doc
Expand Down Expand Up @@ -340,7 +348,7 @@ export class DocHandle<T> //
}

/** `change` is called by the repo when the document is changed locally */
change(callback: A.ChangeFn<T>, options: A.ChangeOptions<T> = {}) {
change(callback: A.ChangeFn<T>, options: DocHandleChangeOptions<T> = {}) {
if (!this.isReady()) {
throw new Error(
`DocHandle#${this.documentId} is not ready. Check \`handle.isReady()\` before accessing the document.`
Expand All @@ -349,7 +357,14 @@ export class DocHandle<T> //
this.#machine.send(UPDATE, {
payload: {
callback: (doc: A.Doc<T>) => {
return A.change(doc, options, callback)
return A.change(
doc,
optionsWithGlobalMetadata(
options,
this.#globalMetadataRef?.current ?? {}
),
callback
)
},
},
})
Expand All @@ -362,7 +377,7 @@ export class DocHandle<T> //
changeAt(
heads: A.Heads,
callback: A.ChangeFn<T>,
options: A.ChangeOptions<T> = {}
options: DocHandleChangeOptions<T> = {}
): string[] | undefined {
if (!this.isReady()) {
throw new Error(
Expand All @@ -373,7 +388,15 @@ export class DocHandle<T> //
this.#machine.send(UPDATE, {
payload: {
callback: (doc: A.Doc<T>) => {
const result = A.changeAt(doc, heads, options, callback)
const result = A.changeAt(
doc,
heads,
optionsWithGlobalMetadata(
options,
this.#globalMetadataRef?.current ?? {}
),
callback
)
resultHeads = result.newHeads
return result.newDoc
},
Expand Down Expand Up @@ -448,12 +471,48 @@ export class DocHandle<T> //
}
}

function optionsWithGlobalMetadata<T>(
options: DocHandleChangeOptions<T>,
globalMetadata: ChangeMetadata
): A.ChangeOptions<T> {
let metadata = { ...globalMetadata }

if (options.metadata) {
if (!metadata) {
metadata = {}
}

Object.assign(metadata, options.metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://github.com/automerge/automerge-repo/pull/267/files#r1449148679 I think we should check that the values here are only of the allowed types and throw if not.

}

return {
time: options.time,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is an epoch? I wouldn't use it unless there was some sort of indication if it's a client or server timestamp, a way to sign client time (trusted timestamp), and a spec for the format.

Copy link
Collaborator Author

@paulsonnentag paulsonnentag Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to add timestamps to changes is not a new feature that already existed before.The timestamps are purely advisory and are not used internally for any algorithms.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Sorry that was a very convoluted way to explain myself.
I don't understand what it's for, should it be in the metadata instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be more consistent not to make the timestamp special and instead just put it into the metadata. What do you think @alexjg?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the API should be just to have the timestamp in the metadata and then internally pull it out and separately pass it to automerge.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, is it used internally somehow? I would actually feel better if it was separate from the metadata then, so we're not hiding a potential complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamps are not used by the CRDT logic – they are only informational, to enable things like history visualisation. But the current Automerge data format already has a place for storing timestamps; removing it and moving it into the generic metadata would be a breaking change to the data format.

message:
Object.values(metadata).length > 0 ? JSON.stringify(metadata) : undefined,
patchCallback: options.patchCallback,
}
}

// WRAPPER CLASS TYPES

/** @hidden */
export interface DocHandleOptions {
isNew?: boolean
timeoutDelay?: number
globalMetadataRef?: ChangeMetadataRef
}

// todo: remove this type once we have real metadata on changes in automerge
// as an interim solution we use the message attribute to store the metadata as a JSON string
export interface DocHandleChangeOptions<T> {
metadata?: ChangeMetadata
time?: number
patchCallback?: A.PatchCallback<T>
}

export type ChangeMetadata = Record<string, number | string | boolean>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking ahead to how we compress this I am not sure this API will do everything we need. When compressing this metadata we don't store the names of the fields, but instead an integer column ID. This means that the application will need to provide some mapping from a column ID to the name of the field in the metadata object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d suggest that for app-defined metadata columns we identify the columns by name+type, rather than by a numeric column ID. That would simplify the API and only cost a few bytes more space. The question is how the type should be identified in the API. With a non-null value we could check if the value is an integer, string, or byte array, and assign it to the appropriate typed column. With a null value we could just treat it as absent, and any metadata columns that exist because of non-null values on other changes will just be filled in with null anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense. This would imply storing the union of every metadata key of every change in the documen in a lookup table somewhere in the serialized document chunk right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so. The first time a change has a non-null value in its metadata, we create a column identified by its metadata key and the type of the value. The serialised document will have to store every metadata column that exists on any of the changes. Changes that don't mention a particular metadata column just fill it in with null, as is the behaviour for the Automerge-internal columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does mean that if a user never puts anything except null as the value for a metadata key then we would have to do something like not write it to the document at all right (because we don't know what column type to write). This means it would not be possible to distinguish between a null value and a not-present value. Maybe we should say that you can't write null values to avoid ambiguity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup agree, let's disallow nulls.

Copy link
Contributor

@alexjg alexjg Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking a little bit more about this. At some point we're going to want to have some kind of squash/rebase workflow I think. In such a workflow we would need to decide what to do with the metadata on each change. I think ideally we would just encode all the metadata into the squashed change. This suggests to me that we should actually treat the metadata as a multimap, somewhat like the query parameters in a URL. @ept @paulsonnentag what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would a squash look like? Could a user still override the metadata to set it to something custom for the squashed change, or would it be purely mechanical that the metadata of the squashed changes would always be the union of the metadata of the individual changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexjg Good point. I'd think that a squash commit would need to bring in custom logic for compacting the metadata: for example, we might not want to keep every single timestamp, but only the minimum and maximum among the timestamps in the squashed range. For authors we might want to keep the set of distinct users who have contributed at least one change, and for signatures we might want to keep the most recent signature per branch per signing key. This suggests to me that we can keep the data model for metadata on a single change simple (a single value per entry in the map), and figure out how to represent changes on squash commits once we get to that point.

I just realised something: if we do author attribution using metadata on changes, it would probably not be possible to do attribution on a squash commit or shallow clone, because the per-change information is not available. On the other hand, if we do attribution by mapping actor IDs to user IDs, attribution should still be possible, because the squash should preserve opIds, and the actor-to-user mapping can be included in the squash. That would be an argument for using the actorIds for attribution.

export interface ChangeMetadataRef {
current: ChangeMetadata
}

export interface DocHandleMessagePayload {
Expand Down
19 changes: 17 additions & 2 deletions packages/automerge-repo/src/Repo.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@ import {
interpretAsDocumentId,
parseAutomergeUrl,
} from "./AutomergeUrl.js"
import { DocHandle, DocHandleEncodedChangePayload } from "./DocHandle.js"
import {
DocHandle,
DocHandleEncodedChangePayload,
ChangeMetadata,
ChangeMetadataRef,
} from "./DocHandle.js"
import { RemoteHeadsSubscriptions } from "./RemoteHeadsSubscriptions.js"
import { headsAreSame } from "./helpers/headsAreSame.js"
import { throttle } from "./helpers/throttle.js"
Expand Down Expand Up @@ -55,6 +60,8 @@ export class Repo extends EventEmitter<RepoEvents> {
#remoteHeadsSubscriptions = new RemoteHeadsSubscriptions()
#remoteHeadsGossipingEnabled = false

#globalMetadataRef: ChangeMetadataRef = { current: {} }

constructor({
storage,
network,
Expand Down Expand Up @@ -331,7 +338,10 @@ export class Repo extends EventEmitter<RepoEvents> {

// If not, create a new handle, cache it, and return it
if (!documentId) throw new Error(`Invalid documentId ${documentId}`)
const handle = new DocHandle<T>(documentId, { isNew })
const handle = new DocHandle<T>(documentId, {
isNew,
globalMetadataRef: this.#globalMetadataRef,
})
this.#handleCache[documentId] = handle
return handle
}
Expand All @@ -346,6 +356,11 @@ export class Repo extends EventEmitter<RepoEvents> {
return this.#synchronizer.peers
}

/** Set meta data that will be attached to each change that is created through a handle from this repo */
setGlobalMetadata(metadata: ChangeMetadata) {
this.#globalMetadataRef.current = metadata
}

getStorageIdOfPeer(peerId: PeerId): StorageId | undefined {
return this.peerMetadataByPeerId[peerId]?.storageId
}
Expand Down
100 changes: 100 additions & 0 deletions packages/automerge-repo/test/DocHandle.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,106 @@ describe("DocHandle", () => {
assert(wasBar, "foo should have been bar as we changed at the old heads")
})

describe("metadata on changes", () => {
it("should allow to pass in a reference to global metadata", () => {
const handle = new DocHandle<TestDoc>(TEST_ID, {
isNew: true,
globalMetadataRef: { current: { author: "bob" } },
})

const doc1 = handle.docSync()

// ... with change
handle.change(doc => {
doc.foo = "bar"
})

// ... with change at
handle.changeAt(A.getHeads(doc1), doc => {
doc.foo = "baz"
})

const doc2 = handle.docSync()

const changes = A.getChanges(doc1, doc2).map(A.decodeChange)
assert.equal(changes.length, 2)
assert.equal(changes[0].message, JSON.stringify({ author: "bob" }))
assert.equal(changes[1].message, JSON.stringify({ author: "bob" }))
})

it("should allow to add additional local metadata with", () => {
const handle = new DocHandle<TestDoc>(TEST_ID, {
isNew: true,
globalMetadataRef: { current: { author: "bob" } },
})

const doc1 = handle.docSync()

// ... with change
handle.change(
doc => {
doc.foo = "bar"
},
{ metadata: { message: "with change" } }
)

// ... with change at
handle.changeAt(
A.getHeads(doc1),
doc => {
doc.foo = "baz"
},
{ metadata: { message: "with changeAt" } }
)

const doc2 = handle.docSync()

const changes = A.getChanges(doc1, doc2).map(A.decodeChange)
assert.equal(changes.length, 2)
assert.equal(
changes[0].message,
JSON.stringify({ author: "bob", message: "with change" })
)
assert.equal(
changes[1].message,
JSON.stringify({ author: "bob", message: "with changeAt" })
)
})

it("should allow to override global data with change", () => {
const handle = new DocHandle<TestDoc>(TEST_ID, {
isNew: true,
globalMetadataRef: { current: { author: "bob" } },
})

const doc1 = handle.docSync()

// ... with change
handle.change(
doc => {
doc.foo = "bar"
},
{ metadata: { author: "sandra" } }
)

// ... with change at
handle.changeAt(
A.getHeads(doc1),
doc => {
doc.foo = "baz"
},
{ metadata: { author: "frank" } }
)

const doc2 = handle.docSync()

const changes = A.getChanges(doc1, doc2).map(A.decodeChange)
assert.equal(changes.length, 2)
assert.equal(changes[0].message, JSON.stringify({ author: "sandra" }))
assert.equal(changes[1].message, JSON.stringify({ author: "frank" }))
})
})

describe("ephemeral messaging", () => {
it("can broadcast a message for the network to send out", async () => {
const handle = new DocHandle<TestDoc>(TEST_ID, { isNew: true })
Expand Down
21 changes: 21 additions & 0 deletions packages/automerge-repo/test/Repo.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import * as Uuid from "uuid"
import { describe, expect, it } from "vitest"
import { READY } from "../src/DocHandle.js"
import { parseAutomergeUrl } from "../src/AutomergeUrl.js"

import {
generateAutomergeUrl,
stringifyAutomergeUrl,
Expand Down Expand Up @@ -451,6 +452,26 @@ describe("Repo", () => {
repo.import<TestDoc>(A.init<TestDoc> as unknown as Uint8Array)
}).toThrow()
})

it("can set global change metadata", () => {
const { repo } = setup()

const handle = repo.create<TestDoc>()

const doc1 = handle.docSync()

repo.setGlobalMetadata({ author: "bob" })
handle.change(doc => {
doc.foo = "bar"
})

const doc2 = handle.docSync()

const changes = A.getChanges(doc1, doc2).map(A.decodeChange)

expect(changes.length).toEqual(1)
expect(changes[0].message).toEqual(JSON.stringify({ author: "bob" }))
})
})

describe("with peers (linear network)", async () => {
Expand Down
Loading