Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Node Validator API #1771

Merged
merged 75 commits into from
Aug 5, 2024
Merged

Add Node Validator API #1771

merged 75 commits into from
Aug 5, 2024

Conversation

Ayiga
Copy link
Member

@Ayiga Ayiga commented Jul 24, 2024

Closes #1778

This PR:

Provides a versioned implementation of the Node Validator API. This API is meant to be used in conjunction with the Node Validator UI that has been designed on the Block Explorer.

The create node-metrics was chosen over node-validator based on a recommendation from @clu8, as the crate and systems may be utilized without the node-validator aspects.

Key places to review:

node-metrics/src/service/data_state/mod.rs
node-metrics/src/service/client_state/mod.rs

Usage example test is located in
node-metrics/src/api/node_validator/v0/create_node_validator_api.rs

rob-maron and others added 30 commits July 22, 2024 11:32
The `node-metrics` is designed to provide information about the block
chain and the nodes connected to the block chain network.  The name of
the crate was suggested by Charles so that it might be used by other
systems.  However, in general it is intended to be used to serve
information for the Node Validator Dashboard.

This crates defines various messages and logic for consuming and
distributing information about the Espresso Block Chain and the nodes
connected to the Espresso Sequencer network.

The current implementation is a basic implementation in an effort to get
the simplest setup working.  At the moment, there are two issues that
prevent anything from building.

- The first is that the `Sink` implementation of `Connection` from
  `tide-disco` requires the message being send to bea  borrowed value
  instead of the value itself.  I feel that this is unexpected, and it makes
  it difficult to actually send messages as a passthrough.

- The second is that the `sequencer` crate is being consumed in an effort
  to extract the `SeqTypes` type.  By doing this, the idea is that we should
  be able to use the actual types defined by the sequencer so that we are
  able to consume the exact information we will be given in a production
  environment.  However, due to this import the tests that have been
  written cannot be evaluated.  I believe that this means that this cannot
  work as expected, and will need to be rethought.
Add more types for Server and Client message connection

Refactor thread based blocking processing to async runtime processing

Add several tests to ensure correct behavior
Rename some local variables in tests for better clarity
There will be a need to attempt to be resilient and be able to recover from certain
classes of errors. As such it is helpful to be able to create a Stream from a given
Client as needed.  This will allow for future changes to be able to re create the
stream in the event of a disconnection (which will happen).
By default `BitVec`s are `usize` in their representation.  This works great for systems that
are able to decode `u64` types from a JSON representation, but for languages that have a
unified number system, such as Javascript, these are not representable accurately when
decoded.  In order to support these cases it is easier to just have the `BitVec` represent it's
data in a format that is supported.

`u32` could be used here, but for maximum compatibility, I've chosen to use `u16` instead.
In order to retrieve the StakeTable it is beneficial to have a function implementation
to retrieve the StakeTable from a Sequencer.
The Node Identity stored in DataState was stored as a tuple of keys to `NodeIdentity`.  However,
`NodeIdentity` already stores the public key, so it is unnecessary, especially following the update
where the `NodeIdentity` itself has every other field as being optional.
Due to the potential difficulty there can be in knowing your standing
IP Address on cloud platforms, the requirement of knowing the ip address
of a node has been replaced with a public url instead.  This public url
should be the base url for an API endpoint for the sequencer.
We intended to be able to retrieve Node Identity information from the prometheus
metrics.  As such we need a function that can handle this interaction. Using the
plugin for `prometheus-parse`, and `reqwest` we can retrieve the data given a
valid base URL to work with.  From there we can parse the data that is available
in the resulting `Scrape` object to fill in the missing pieces of Node Identity
information.
A wallet address is able to be decoded from a string using the `FromStr` trait.
The key "latitude" is used for both latitude and longitude values.  This corrects the
mistake by using "longitude" for longitude.  Fixes an off-by-one error.
The Wallet address is meant to be a 160 bit address, which is 20 bytes.
So it's hexidecimal representation should be 40 characters instead of the 32
that have been being set up to this point.

e
The logic for verifying the node identity in a Scrape, and then the data's population
is a little large and daunting to consume all at once.  It's better to split these pieces
into separate functions so that the related parsing pieces are together.
The test `test_process_client_handling_stream_subscribe_voters` never completes and just
runs forever.  The reason for this is due to a flaw in the creation of the test itself.
The test is intended to check on the real-time submission of voters data, and the
 distribution of that data to the subscribers.  However, the setup has the users subscribe
to `NodeIdentity` updates instead of `Voters` updates.  To fix the issue we just need to
change the subscribe calls to refer to voters instead.
Ayiga added 4 commits July 31, 2024 15:30
In order to verify the behavior of the ConnectedNetwork under various assumptions specific
scenario unit tests are desired in order to ensure that they behave as expected.

Fix external event handler imports
node-metrics/src/api/node_validator/v0/node_validator.toml Outdated Show resolved Hide resolved
node-metrics/src/api/node_validator/v0/node_validator.toml Outdated Show resolved Hide resolved
node-metrics/src/lib.rs Outdated Show resolved Hide resolved
node-metrics/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 161 to 181
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_NODE_NAME").unwrap_or("".into()),
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_WALLET_ADDRESS").unwrap_or("".into()),
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_COMPANY_NAME").unwrap_or("".into()),
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_COMPANY_WEBSITE").unwrap_or("".into()),
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_OPERATING_SYSTEM").unwrap_or("".into()),
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_NODE_TYPE")
.unwrap_or(format!("espresso-sequencer {}", Ver::VERSION)),
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_NETWORK_TYPE").unwrap_or("".into()),
]);

// Expose Node Identity Location via the status/metrics API
metrics
.text_family(
"node_identity_location".into(),
vec!["country".into(), "latitude".into(), "longitude".into()],
)
.create(vec![
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_COUNTRY_CODE").unwrap_or("".into()),
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_LATITUDE").unwrap_or("".into()),
std::env::var("ESPRESSO_SEQUENCER_IDENTITY_LONGITUDE").unwrap_or("".into()),
]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reading these directly from the environment, it would be better to get them from the clap options, that way they show up in the auto-generated help. For example, we could create a struct like

#[derive(clap::Parser)]
struct Identity {
  #[clap(long = "identity-company-name", env = "ESPRESSO_SEQUENCER_IDENTITY_COMPANY_NAME")]
  company_name: Option<String>,
  // etc
}

and then add to Options:

struct Options {
  ...
  #[clap(flatten)]
  identity: Identity,
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ba8048c.

Though, to take advantage of std::env::consts::OS, I ended up implementing Default for the Identity structure, and taking advantage of

#[clap(skip)]

to customize how it gets populated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: instead of #[clap(skip)]/default, try #[clap(..., default_value = std::env::consts::OS)] on the operating_system field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in e2a2180.

Also, apparently the Default::default call prevented the program from being able to parse all other derived arguments. I'm not certain as to the exact reason for this, but either way by populating using default_value, and reverting to flatten, the issue is resolved.

Ayiga added 14 commits August 1, 2024 14:42
The `InternalClientMessage` enum largely follows the `ClientMessage`, and as such it has many
duplicate cases.  As pointed out by @jbearer in this comment:
#1771 (comment).

It makes more sense to wrap the `ClientMessage`s instead of having duplicate cases for them.
This also has the added benefit of cleaning up much of boilerplate implementation details.

Additionally, this relocates the `PartialEq` implementation into the test module so that it is
only defined for unit tests.
The `new` call for `DataState` takes a `stake_table` as part of its arguments.  Currently, the
`DataState` gets called with `Default::default()` arguments, and then the `stake_table` is
replaced within it.  This is unnecessary as one of the next steps of initialization is to
retrieve the `stake_table` from the `stake_table_url_base` url.

Instead this should just defer the creation of the `DataState` until after the `stake_table`
is retrieved.

Suggested by @jbearer in this discussion:
#1771 (comment)
@jbearer has noted that the `leaf_chain` within the `Decide` hotshot event returns the highest
`Leaf` first:
#1771 (comment)

 The leaf sender is really wanting the leafs to arrive in ascending order.  In
order to achieve this, the iterator for the `leaf_chain` has been reversed.
As pointed out by @jbearer the documentation for the `node-validator` API was largely copied
and then not modified to reflect the specific intentions and features of the `node-validator`
endpoints:
#1771 (comment)
#1771 (comment)

Modifies the documentation to reflect the purpose and intention behind the `node-validator` API.
As pointed out by @jbearer, the `drop` calls at the end of the block are not needed:
#1771 (comment)
It is not clear what valid values for `network_type` and `node_type` is or should be. In order
to add clarity comments have been added to these two fields to provides some ideas for what
could be valid values for these fields.

Change suggested by @jbearer:
#1771 (comment)
Add `create_roll_call_response` to handle message serialization

Suggested by @jbearer:
#1771 (comment)

Replace dummy public key with actual public key

Suggested by @jbearer:
#1771 (comment)

Replace `Vec` of `JoinHandle` and `Drop` code with `TaskList`

suggested by @jbearer:
#1771 (comment)
Based on feedback provided by @jbearer:
#1771 (comment)
#1771 (comment)

Instead of querying all of the environment variables separately, and dynamically after the
program has launched, we can take advantage of clap's ability to automatically populate the
desired data points from either command line arguments or environment variables.  As an added
benefit the fields can be grouped together into a single `Identity` structure.
Based on feedback discussion with @jbearer:
#1771 (comment)
#1771 (comment)

In the cases where we are no longer able to make meaningful progress, and in an effort to be
better about not failing silently and then having our data slowly stagnate over time, these
early task returns have been replaced with panics for improved failure indicators.
From discussion with @jbearer
#1771 (comment)

Based on the converesation with Jeb the default value can be utilized to populate the operating
system value from the environment directly.  While implementing this fix, it was also discovered
that the previous way of populating the Identity using `Default::default` actually prevented the
program from running at all due to bad initialization.

I haven't determined the exact reason, but by switching back to default_value population, we
can revert to using the `flatten` option, and the issue seems to be fixed as a result.
Discussion with @jbearer:
#1771 (comment)

Right now we don't have a meaningful definition of what an address for a Sequencer is.  As a
result it makes more sense to omit the wallet address field rather than to keep it in an effort
to create a forward compatible definition.

This commit removes the definition, population, and representation of the wallet address from
both the sequencer, and from the node-validator api.

This will require a change in the Node Validator UI, as it expects a wallet_address to be
present.
Add /v0/ to node-validator in docker-compose.yaml

Fix `depends_on` conditions in docker-compose.yaml
Ayiga added 3 commits August 2, 2024 12:22
The task joins were causing the tests to fail due to the closed channels
resulting in a panic.  Since this is desired behavior, the tests for them
have been removed from the tests so as not to cause failures due to
desired panic behavior.
@Ayiga Ayiga merged commit 8c01b24 into main Aug 5, 2024
16 checks passed
@Ayiga Ayiga deleted the ts/node-validator branch August 5, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Node Validator API service for use by Node Validator Dashboard
3 participants