Add Node Validator API #1771

Ayiga · 2024-07-24T00:01:17Z

This PR:

Provides a versioned implementation of the Node Validator API. This API is meant to be used in conjunction with the Node Validator UI that has been designed on the Block Explorer.

The create node-metrics was chosen over node-validator based on a recommendation from @clu8, as the crate and systems may be utilized without the node-validator aspects.

Key places to review:

node-metrics/src/service/data_state/mod.rs
node-metrics/src/service/client_state/mod.rs

Usage example test is located in
node-metrics/src/api/node_validator/v0/create_node_validator_api.rs

The `node-metrics` is designed to provide information about the block chain and the nodes connected to the block chain network. The name of the crate was suggested by Charles so that it might be used by other systems. However, in general it is intended to be used to serve information for the Node Validator Dashboard. This crates defines various messages and logic for consuming and distributing information about the Espresso Block Chain and the nodes connected to the Espresso Sequencer network. The current implementation is a basic implementation in an effort to get the simplest setup working. At the moment, there are two issues that prevent anything from building. - The first is that the `Sink` implementation of `Connection` from `tide-disco` requires the message being send to bea borrowed value instead of the value itself. I feel that this is unexpected, and it makes it difficult to actually send messages as a passthrough. - The second is that the `sequencer` crate is being consumed in an effort to extract the `SeqTypes` type. By doing this, the idea is that we should be able to use the actual types defined by the sequencer so that we are able to consume the exact information we will be given in a production environment. However, due to this import the tests that have been written cannot be evaluated. I believe that this means that this cannot work as expected, and will need to be rethought.

Remove bad test from main

Add more types for Server and Client message connection Refactor thread based blocking processing to async runtime processing Add several tests to ensure correct behavior

Rename some local variables in tests for better clarity

There will be a need to attempt to be resilient and be able to recover from certain classes of errors. As such it is helpful to be able to create a Stream from a given Client as needed. This will allow for future changes to be able to re create the stream in the event of a disconnection (which will happen).

By default `BitVec`s are `usize` in their representation. This works great for systems that are able to decode `u64` types from a JSON representation, but for languages that have a unified number system, such as Javascript, these are not representable accurately when decoded. In order to support these cases it is easier to just have the `BitVec` represent it's data in a format that is supported. `u32` could be used here, but for maximum compatibility, I've chosen to use `u16` instead.

In order to retrieve the StakeTable it is beneficial to have a function implementation to retrieve the StakeTable from a Sequencer.

The Node Identity stored in DataState was stored as a tuple of keys to `NodeIdentity`. However, `NodeIdentity` already stores the public key, so it is unnecessary, especially following the update where the `NodeIdentity` itself has every other field as being optional.

Due to the potential difficulty there can be in knowing your standing IP Address on cloud platforms, the requirement of knowing the ip address of a node has been replaced with a public url instead. This public url should be the base url for an API endpoint for the sequencer.

We intended to be able to retrieve Node Identity information from the prometheus metrics. As such we need a function that can handle this interaction. Using the plugin for `prometheus-parse`, and `reqwest` we can retrieve the data given a valid base URL to work with. From there we can parse the data that is available in the resulting `Scrape` object to fill in the missing pieces of Node Identity information.

A wallet address is able to be decoded from a string using the `FromStr` trait.

The key "latitude" is used for both latitude and longitude values. This corrects the mistake by using "longitude" for longitude. Fixes an off-by-one error.

The Wallet address is meant to be a 160 bit address, which is 20 bytes. So it's hexidecimal representation should be 40 characters instead of the 32 that have been being set up to this point. e

The logic for verifying the node identity in a Scrape, and then the data's population is a little large and daunting to consume all at once. It's better to split these pieces into separate functions so that the related parsing pieces are together.

The test `test_process_client_handling_stream_subscribe_voters` never completes and just runs forever. The reason for this is due to a flaw in the creation of the test itself. The test is intended to check on the real-time submission of voters data, and the distribution of that data to the subscribers. However, the setup has the users subscribe to `NodeIdentity` updates instead of `Voters` updates. To fix the issue we just need to change the subscribe calls to refer to voters instead.

In order to verify the behavior of the ConnectedNetwork under various assumptions specific scenario unit tests are desired in order to ensure that they behave as expected. Fix external event handler imports

node-metrics/src/api/node_validator/v0/node_validator.toml

node-metrics/src/lib.rs

node-metrics/src/service/client_message/mod.rs

jbearer · 2024-08-01T16:40:35Z

sequencer/src/lib.rs

+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_NODE_NAME").unwrap_or("".into()),
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_WALLET_ADDRESS").unwrap_or("".into()),
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_COMPANY_NAME").unwrap_or("".into()),
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_COMPANY_WEBSITE").unwrap_or("".into()),
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_OPERATING_SYSTEM").unwrap_or("".into()),
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_NODE_TYPE")
+                .unwrap_or(format!("espresso-sequencer {}", Ver::VERSION)),
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_NETWORK_TYPE").unwrap_or("".into()),
+        ]);
+
+    // Expose Node Identity Location via the status/metrics API
+    metrics
+        .text_family(
+            "node_identity_location".into(),
+            vec!["country".into(), "latitude".into(), "longitude".into()],
+        )
+        .create(vec![
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_COUNTRY_CODE").unwrap_or("".into()),
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_LATITUDE").unwrap_or("".into()),
+            std::env::var("ESPRESSO_SEQUENCER_IDENTITY_LONGITUDE").unwrap_or("".into()),
+        ]);


Instead of reading these directly from the environment, it would be better to get them from the clap options, that way they show up in the auto-generated help. For example, we could create a struct like

#[derive(clap::Parser)] struct Identity { #[clap(long = "identity-company-name", env = "ESPRESSO_SEQUENCER_IDENTITY_COMPANY_NAME")] company_name: Option<String>, // etc }

and then add to Options:

struct Options { ... #[clap(flatten)] identity: Identity, }

Addressed in ba8048c.

Though, to take advantage of std::env::consts::OS, I ended up implementing Default for the Identity structure, and taking advantage of

#[clap(skip)]

to customize how it gets populated.

Suggestion: instead of #[clap(skip)]/default, try #[clap(..., default_value = std::env::consts::OS)] on the operating_system field.

Resolved in e2a2180.

Also, apparently the Default::default call prevented the program from being able to parse all other derived arguments. I'm not certain as to the exact reason for this, but either way by populating using default_value, and reverting to flatten, the issue is resolved.

node-metrics/src/api/node_validator/v0/create_node_validator_api.rs

@jbearer

The `InternalClientMessage` enum largely follows the `ClientMessage`, and as such it has many duplicate cases. As pointed out by @jbearer in this comment: #1771 (comment). It makes more sense to wrap the `ClientMessage`s instead of having duplicate cases for them. This also has the added benefit of cleaning up much of boilerplate implementation details. Additionally, this relocates the `PartialEq` implementation into the test module so that it is only defined for unit tests.

@jbearer

The `new` call for `DataState` takes a `stake_table` as part of its arguments. Currently, the `DataState` gets called with `Default::default()` arguments, and then the `stake_table` is replaced within it. This is unnecessary as one of the next steps of initialization is to retrieve the `stake_table` from the `stake_table_url_base` url. Instead this should just defer the creation of the `DataState` until after the `stake_table` is retrieved. Suggested by @jbearer in this discussion: #1771 (comment)

@jbearer

@jbearer has noted that the `leaf_chain` within the `Decide` hotshot event returns the highest `Leaf` first: #1771 (comment) The leaf sender is really wanting the leafs to arrive in ascending order. In order to achieve this, the iterator for the `leaf_chain` has been reversed.

@jbearer

As pointed out by @jbearer the documentation for the `node-validator` API was largely copied and then not modified to reflect the specific intentions and features of the `node-validator` endpoints: #1771 (comment) #1771 (comment) Modifies the documentation to reflect the purpose and intention behind the `node-validator` API.

@jbearer

As pointed out by @jbearer, the `drop` calls at the end of the block are not needed: #1771 (comment)

@jbearer

Proposed by @jbearer in comment: #1771 (comment)

@jbearer

It is not clear what valid values for `network_type` and `node_type` is or should be. In order to add clarity comments have been added to these two fields to provides some ideas for what could be valid values for these fields. Change suggested by @jbearer: #1771 (comment)

@jbearer

Add `create_roll_call_response` to handle message serialization Suggested by @jbearer: #1771 (comment) Replace dummy public key with actual public key Suggested by @jbearer: #1771 (comment) Replace `Vec` of `JoinHandle` and `Drop` code with `TaskList` suggested by @jbearer: #1771 (comment)

@jbearer

Based on feedback provided by @jbearer: #1771 (comment) #1771 (comment) Instead of querying all of the environment variables separately, and dynamically after the program has launched, we can take advantage of clap's ability to automatically populate the desired data points from either command line arguments or environment variables. As an added benefit the fields can be grouped together into a single `Identity` structure.

@jbearer

Based on feedback discussion with @jbearer: #1771 (comment) #1771 (comment) In the cases where we are no longer able to make meaningful progress, and in an effort to be better about not failing silently and then having our data slowly stagnate over time, these early task returns have been replaced with panics for improved failure indicators.

@jbearer

From discussion with @jbearer #1771 (comment) Based on the converesation with Jeb the default value can be utilized to populate the operating system value from the environment directly. While implementing this fix, it was also discovered that the previous way of populating the Identity using `Default::default` actually prevented the program from running at all due to bad initialization. I haven't determined the exact reason, but by switching back to default_value population, we can revert to using the `flatten` option, and the issue seems to be fixed as a result.

@jbearer

Discussion with @jbearer: #1771 (comment) Right now we don't have a meaningful definition of what an address for a Sequencer is. As a result it makes more sense to omit the wallet address field rather than to keep it in an effort to create a forward compatible definition. This commit removes the definition, population, and representation of the wallet address from both the sequencer, and from the node-validator api. This will require a change in the Node Validator UI, as it expects a wallet_address to be present.

Add /v0/ to node-validator in docker-compose.yaml Fix `depends_on` conditions in docker-compose.yaml

The task joins were causing the tests to fail due to the closed channels resulting in a panic. Since this is desired behavior, the tests for them have been removed from the tests so as not to cause failures due to desired panic behavior.

rob-maron and others added 30 commits July 22, 2024 11:32

sequencer rollcall

5db2dc7

external message handler

a599c5d

address lints

8497354

don't respond if we didn't specify a URL

01ba5e9

remove patch

fe97c14

Merge remote-tracking branch 'origin/main' into rm/roll-call-changes

a9b95d3

Implement PartialEq for ServerMessage and ClientMessage

86f4049

Remove bad test from main

Expand implementation

424bd92

Add more types for Server and Client message connection Refactor thread based blocking processing to async runtime processing Add several tests to ensure correct behavior

Rename and add comments to assumption tests

61e1363

Add specific errors for tide_disco::Error and surf_disco::Error

9c97ec7

Replace async_std::io::timeout with prelude FutureExt .timeout

605ddee

Rename some local variables in tests for better clarity

Add voters Sender and Receiver processing to process_leaf_stream

eccb8ae

Fix missing equality implementation for LatestVoters ServerMessage

6fb7fc2

Format and sort Cargo.toml dependencies

0b41463

Remove unused lib types

05014b8

Refactor NodeIdentity fields to be Optional

3d611dc

Add structure for decoding StakeTable from sequencer

ae8f041

In order to retrieve the StakeTable it is beneficial to have a function implementation to retrieve the StakeTable from a Sequencer.

Update Cargo.lock file for node-metrics

83f81c8

Add parsing of wallet address

7d398d2

A wallet address is able to be decoded from a string using the `FromStr` trait.

Fix incorrect key used for longitude

607cd43

The key "latitude" is used for both latitude and longitude values. This corrects the mistake by using "longitude" for longitude. Fixes an off-by-one error.

Fix badly formatted Wallet address

1a520b9

The Wallet address is meant to be a 160 bit address, which is 20 bytes. So it's hexidecimal representation should be 40 characters instead of the 32 that have been being set up to this point. e

Fix LocationDetails Debug trait test

3707bc6

Ayiga requested review from ImJeremyHe, sveitser, jbearer, tbro and imabdulbasit as code owners July 31, 2024 14:21

Ayiga added 4 commits July 31, 2024 15:30

Add connected network task tests for node-validator

64507cf

In order to verify the behavior of the ConnectedNetwork under various assumptions specific scenario unit tests are desired in order to ensure that they behave as expected. Fix external event handler imports

Add Node validator api port to env file

f07c65d

Add node_validator to process compose

ecc922f

Add node-validator to docker scripts

cb132dd

jbearer reviewed Aug 1, 2024

View reviewed changes

Ayiga added 14 commits August 1, 2024 14:42

Replace early return in HotShotEventProcessingTask with continue

ea3d37a

Remove unnecessary drops

eebb3ce

As pointed out by @jbearer, the `drop` calls at the end of the block are not needed: #1771 (comment)

Refactor define_api call to be more succinct

b46ff57

Proposed by @jbearer in comment: #1771 (comment)

Add /v0/ to node_validator in process-compose.yaml

07c4722

Add /v0/ to node-validator in docker-compose.yaml Fix `depends_on` conditions in docker-compose.yaml

jbearer approved these changes Aug 2, 2024

View reviewed changes

Ayiga added 3 commits August 2, 2024 12:22

Remove task joins from tests

6bbcb6e

The task joins were causing the tests to fail due to the closed channels resulting in a panic. Since this is desired behavior, the tests for them have been removed from the tests so as not to cause failures due to desired panic behavior.

Merge branch 'main' into ts/node-validator

ddf35aa

Add cancels to async tasks in tests to avoid panics

355093c

Ayiga merged commit 8c01b24 into main Aug 5, 2024
16 checks passed

Ayiga deleted the ts/node-validator branch August 5, 2024 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Node Validator API #1771

Add Node Validator API #1771

Ayiga commented Jul 24, 2024 •

edited

Loading

jbearer Aug 1, 2024

Ayiga Aug 1, 2024

jbearer Aug 1, 2024

Ayiga Aug 2, 2024

Add Node Validator API #1771

Add Node Validator API #1771

Conversation

Ayiga commented Jul 24, 2024 • edited Loading

This PR:

Key places to review:

jbearer Aug 1, 2024

Choose a reason for hiding this comment

Ayiga Aug 1, 2024

Choose a reason for hiding this comment

jbearer Aug 1, 2024

Choose a reason for hiding this comment

Ayiga Aug 2, 2024

Choose a reason for hiding this comment

Ayiga commented Jul 24, 2024 •

edited

Loading