-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: contract log query #1601
base: develop
Are you sure you want to change the base?
feat: contract log query #1601
Conversation
…ct/<principal>/events` endpoint
Codecov Report
@@ Coverage Diff @@
## develop #1601 +/- ##
============================================
- Coverage 77.41% 21.45% -55.96%
============================================
Files 78 78
Lines 11161 11302 +141
Branches 2487 2521 +34
============================================
- Hits 8640 2425 -6215
- Misses 2403 8198 +5795
- Partials 118 679 +561
... and 52 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome. Just left a couple comments and some discussion items.
|
||
/** @param { import("node-pg-migrate").MigrationBuilder } pgm */ | ||
exports.down = pgm => { | ||
pgm.dropIndex('contract_logs', 'value_json_path_ops_idx'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This index is never created, is this an old line?
@@ -1379,6 +1379,7 @@ export interface SmartContractEventInsertValues { | |||
contract_identifier: string; | |||
topic: string; | |||
value: PgBytea; | |||
value_json: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best if we set this as string | null
IMO to avoid storing empty strings
/* | ||
test.skip('generate', () => { | ||
const result = clarityCompactJsonVectors.map(([, hexEncodedCV, compactJson]) => { | ||
const encodedJson = clarityValueToCompactJson(hexEncodedCV); | ||
const reprResult = decodeClarityValueToRepr(hexEncodedCV); | ||
return [reprResult, hexEncodedCV, JSON.stringify(encodedJson)]; | ||
}); | ||
console.log(result); | ||
}); | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
src/tests/jsonpath-tests.ts
Outdated
/* | ||
test('generate vector data', () => { | ||
const result = postgresExampleVectors.map((input) => { | ||
const complexity = containsDisallowedJsonPathOperation(input); | ||
return [input, complexity ? complexity.operation : null]; | ||
}); | ||
console.log(result); | ||
}); | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
} | ||
} | ||
|
||
async function complete_1680181889941_contract_log_json( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a pressing issue that would require us to ship this feature with this kind of migration? This code looks good to me (and I like this experimentation as you specified in the PR desc), but I worry that it could create some problems if we want to add newer migrations that rely on the new columns or something. IMO it would be easier and more maintainable to just ship this as a regular new migration and rely on replay, or to explicitly mark this code as a "patch" somehow which we could drop once we switch to a new v8 release.
What do you think?
…ty and ban some operations
I'd like to get more feedback on this feature to prioritize work on it. My sense is that if this were available, then A) contract authors would take more advantage of structured logs (i.e. print statements) and B) apps would leverage this API feature to make more straightforward and efficient queries. |
Closes #1598
This PR adds two new query params to the
/extended/v1/contract/<principal>/events
endpoint.?filter_path=<jsonpath expression>
:Optional
jsonpath
expression to select only results that contain items matching the expression.?contains=<json object>
:Optional stringified JSON to select only results that contain the given JSON.
Rationale
Clarity smart contracts are able to generate structured log outputs during the execution of a contract-call or contract-deploy transaction. For more context see the Clarity
print
docs. These contract logs can then be ingested by an event-observer (such as the API) to store and serve contract-specific data. Today we have several examples of this:bns
contract emits logs used by the API to process, track, and serve clients BNS information like ownership, zonefiles, etc.pox
contract emit logs used by the API to store and track stacking related data, which is used in account balance endpoints (i.e.locked
balance), Rosetta queries, endpoints for Stacking pool operators, etc.send-many-stx
contract, used by essentially all exchanges, performs batch stx-transfers and uses contract logs to emit a corresponding memo, which the API stores and returns in an endpoint also used by exchanges to track when their wallets have received a batch stx-transfer and the associated memo.These are some of the contracts and logs that this API and others currently have special handling for. Typically, for each of these contracts, the log events have unique code implemented to denormalize the log payload, store in a new table, and new endpoints + sql queries.
There are many more deployed contracts which have their own application-specific usage of contract logs, with their own unique payloads. The API doesn't and cannot reasonably implement custom handling for all of the current and future contracts in the same way it has for the contracts listed above. Currently, clients must use the
/extended/v1/contract/{contract_id}/events
endpoint. This simply returns the latest N logs for a given contract. Clients must paginate through these requests, and manually parse out and filter the logs they need. Even for the contracts that already receive special treatment (e.g.pox
,bns
), there are requests for new endpoints with different query criteria which would require us to implement new endpoints and sql queries.In the Ethereum world, the ability to emit and query structured logs is a critical element of their app ecosystem. Token standards that you may be familiar with (e.g. ERC20) specify the logs that contracts must emit when token operations happen. The
eth_getLogs
RPC endpoint provides a mechanism for efficiently querying those logs. For example "give me all transfers for a given FT associated with a given recipient address".In Stacks, we have have the building blocks for offering similar capabilities.
Contact log query implementation
Prior to this PR, contract logs (i.e. structured Clarity print outputs) were stored in a table in the
contract_logs
in their original consensus-serialized binary encoding. This encoding made it essentially impossible to perform queries against the contents of the log object. Postgres supports a storing arbitrary structured documents in ajsonb
column type. This column type also supports a few types of advanced indexes (e.g.GIN(jsonb_ops)
andGIN(jsonb_path_ops)
) that make it possible perform efficient querys against the arbitrary JSON.So what we can do is 1) decode the binary Clarity buffers into an object, 2) encode that into a JSON object, 3) store that in a new
jsonb
column in thecontracts_log
table, then 4) implement API endpoint(s) that allow more precise queries against these logs.This PR implements the above. One of the design decisions was to implement yet-another way to encode Clarity values into JSON. The existing schemas we have are lossless (you can reverse the JSON back into the original Clarity value), however, that comes with overhead from having to store Clarity type information (because there is not a 1-1 mapping between JSON primitives and Clarity primitives), so that JSON is less readability and takes up more space. In our use-case, we aren't necessarily concerned with preserving exact Clarity typing, rather we want the JSON to be easy-to-query, readable, space-efficient, index-efficient, and have good interoperability with JSON-based workflows (e.g.
jsonpath
expressions). Here is the spec for how the Clarity values are encoded into a JSON object in this PR:null
._error
which is set to the unwrapped error value.hex
with the hex-encoded string as the value, and the keyutf8
with the utf8-encoded string as the value. When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD�
will be used to represent those errors. Both of these are included because Clarity Buffers are used for storing both binary data and string data (e.g. BNS names).<address>
or<address>.<contract_name>
.Usage
Here's an example of how the subnets-nft-demo app can be refactored. The app fetches a list of all events, then on the client decodes of the Clarity values and filters for specific events. This can now be done in a single query leveraging the new
filter_path
query param. This accepts ajsonpath
expression.Previous code:
https://github.com/hirosystems/subnets-nft-demo/blob/34e433a2d2893f36e1767ce635ee280baf1acbf6/src/stacks/apiCalls.ts#L96-L111
Example of doing the same thing with a single fetch using a
jsonpath
expression filter:Example of using the new
contains
param to perform a similar query:Here's another example of using
jsonpath
expression filter to fetch recent BNSname-revoke
andname-renewal
events:Here's an example of using the new
contains
param. This is also a filter, but it takes a JSON object and returns only the contract log events that contain the given object. Here's an example of fetching recent BNSname-revoke
events:Considerations
The
jsonpath
expressions supported in Postgres can include relatively complex operations. For example, recursive key-value lookups, regex, math operations, type coercions, and more. Many of these cannot take advantage of the postgres JSON-specific indexes, which can result in unreasonably expensive queries. The vast majority of expected use-cases only need simple jsonpath expressions which my testing so far have shown to be reasonably efficient. So we need to determine when a given jsonpath expression is "too complex" and reject the request. This PR has a very simple regex-based version of that implemented, however, these regexes are easy to trick and bypass. We need to validate against the actual AST of a jsonpath expression. Existingjsonpath
parsing libraries in js do not support the postgres flavor of expressions, so I'm in the process of switching the regex-based validation tojsonpath-pg
. This is a library I've wrote which compiles the grammar and lexer C code from the postgres codebase into WASM and exposes the AST object. This can be used to reliable determine the complexity of an expression.Alternatives
The capabilities described above could potentially be implemented with other tools. For example, no-sql solutions like GraphQL or MongoDB. However, I'd like to explore the possibility of getting this approach to work before considering adding new systems/layers into the project's tech stack.
Upgrading + database migration
This PR changes the sql schema and adds a new non-nullable column to an existing table. Typically, we'd modify the existing migration tables (which breaks existing dbs), and release this as a major version that requires an event-replay. This is easy for API contributors, however, these kinds of breaking changes are difficult for deployment, both internal and external, because event-replays increasingly take more time and resources.
So this PR is experimenting with an more advanced sql schema migration which is compatible with existing deployments and does not require a major version bump or event-replay. Migration libraries (including the one we use) tend not to support advanced table manipulation like what this PR requires, so a new function is called directly after the regular migrations in order to perform the "advanced" migrations. In this case, it takes around half an hour to run this migration because it involves querying and updating every row in the
contract_logs
table (which internally postgres treats as a DELETE and INSERT, without the ability to batch).In a future release when a breaking change is strictly required, the "advanced migration" function for this case can be dropped.
TODO