Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving sync parachain performance for workers #2888

Merged
merged 15 commits into from
Jul 23, 2024

Conversation

silva-fj
Copy link
Contributor

@silva-fj silva-fj commented Jul 11, 2024

This PR improves the performance for sync with parachains a bit. I've tested it with litentry rococo.

Here are the results:

Current:

[2024-07-11T15:03:34.444Z INFO  litentry_worker::parentchain_handler] [Litentry] Syncing blocks from 0 to 24043
[2024-07-11T15:03:34.444Z DEBUG litentry_worker::parentchain_handler] >>>>>>>>>>>>>>>>>>>> Starting batch sync <<<<<<<<<<<<<<<<<<<<
[2024-07-11T15:03:34.909Z INFO  litentry_worker::parentchain_handler] [Litentry] syncing parentchain to 24043. already synced until block 0. immediate import=true. est. remaining: 3h 6m 3s 526ms 747us 150ns
[2024-07-11T15:03:34.909Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1001 blocks in 464ms 264us 889ns
[2024-07-11T15:03:38.867Z DEBUG litentry_worker::parentchain_handler] >>>>> [Litentry] Synced parentchain batch in 3s 958ms 550us 433ns
[2024-07-11T15:03:38.867Z INFO  litentry_worker::parentchain_handler] [Litentry] Synced 1001 out of 24043 finalized parentchain blocks
[2024-07-11T15:03:39.316Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1001 blocks in 447ms 820us 558ns
[2024-07-11T15:03:46.684Z DEBUG litentry_worker::parentchain_handler] >>>>> [Litentry] Synced parentchain batch in 7s 368ms 798us 150ns
[2024-07-11T15:03:46.684Z INFO  litentry_worker::parentchain_handler] [Litentry] Synced 2002 out of 24043 finalized parentchain blocks
[2024-07-11T15:03:47.124Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1001 blocks in 439ms 177us 262ns
[2024-07-11T15:03:54.511Z DEBUG litentry_worker::parentchain_handler] >>>>> [Litentry] Synced parentchain batch in 7s 386ms 474us 448ns
[2024-07-11T15:03:54.511Z INFO  litentry_worker::parentchain_handler] [Litentry] Synced 3003 out of 24043 finalized parentchain blocks
[2024-07-11T15:03:54.959Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1001 blocks in 447ms 534us 184ns
[2024-07-11T15:04:02.491Z DEBUG litentry_worker::parentchain_handler] >>>>> [Litentry] Synced parentchain batch in 7s 532ms 744us 764ns
[2024-07-11T15:04:02.492Z DEBUG litentry_worker::parentchain_handler] [Litentry] Synced 4004 out of 24043 finalized parentchain blocks

New:

[2024-07-11T14:41:33.548Z INFO  litentry_worker::parentchain_handler] [Litentry] Syncing blocks from 0 to 23933
[2024-07-11T14:41:33.548Z DEBUG litentry_worker::parentchain_handler] >>>>>>>>>>>>>>>>>>>> Starting batch sync <<<<<<<<<<<<<<<<<<<<
[2024-07-11T14:41:33.635Z INFO  litentry_worker::parentchain_handler] [Litentry] syncing parentchain to 23933. already synced until block 0. immediate import=true. est. remaining: 34m 23s 468ms 389us 619ns
[2024-07-11T14:41:33.635Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1000 blocks in 86ms 182us 830ns
[2024-07-11T14:41:33.635Z DEBUG litentry_worker::parentchain_handler] [Litentry] Found 1000 block(s) to sync in this chunk. immediate import=true
[2024-07-11T14:41:37.682Z DEBUG litentry_worker::parentchain_handler] >>>> [Litentry] Synced parentchain batch in 4s 47ms 236us 657ns
[2024-07-11T14:41:37.682Z INFO  litentry_worker::parentchain_handler] [Litentry] Synced 1000 out of 23933 finalized parentchain blocks
[2024-07-11T14:41:37.765Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1000 blocks in 82ms 435us 816ns
[2024-07-11T14:41:37.765Z DEBUG litentry_worker::parentchain_handler] [Litentry] Found 1000 block(s) to sync in this chunk. immediate import=true
[2024-07-11T14:41:45.298Z DEBUG litentry_worker::parentchain_handler] >>>> [Litentry] Synced parentchain batch in 7s 533ms 92us 133ns
[2024-07-11T14:41:45.298Z INFO  litentry_worker::parentchain_handler] [Litentry] Synced 2000 out of 23933 finalized parentchain blocks
[2024-07-11T14:41:45.384Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1000 blocks in 85ms 803us 899ns
[2024-07-11T14:41:45.384Z DEBUG litentry_worker::parentchain_handler] [Litentry] Found 1000 block(s) to sync in this chunk. immediate import=true
[2024-07-11T14:41:52.704Z DEBUG litentry_worker::parentchain_handler] >>>> [Litentry] Synced parentchain batch in 7s 319ms 210us 969ns
[2024-07-11T14:41:52.704Z INFO  litentry_worker::parentchain_handler] [Litentry] Synced 3000 out of 23933 finalized parentchain blocks
[2024-07-11T14:41:52.789Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1000 blocks in 84ms 480us 849ns
[2024-07-11T14:41:52.789Z DEBUG litentry_worker::parentchain_handler] [Litentry] Found 1000 block(s) to sync in this chunk. immediate import=true
[2024-07-11T14:42:00.305Z DEBUG litentry_worker::parentchain_handler] >>>> [Litentry] Synced parentchain batch in 7s 516ms 377us 787ns
[2024-07-11T14:42:00.305Z INFO  litentry_worker::parentchain_handler] [Litentry] Synced 4000 out of 23933 finalized parentchain blocks
[2024-07-11T14:42:00.391Z DEBUG litentry_worker::parentchain_handler] [Litentry] Fetched 1000 blocks in 85ms 436us 539ns
[2024-07-11T14:42:00.391Z DEBUG litentry_worker::parentchain_handler] [Litentry] Found 1000 block(s) to sync in this chunk. immediate import=true
[2024-07-11T14:42:07.858Z DEBUG litentry_worker::parentchain_handler] >>>> [Litentry] Synced parentchain batch in 7s 467ms 196us 715ns
[2024-07-11T14:42:07.858Z INFO  litentry_worker::parentchain_handler] [Litentry] Synced 5000 out of 23933 finalized parentchain blocks

Notice how the time to fetch the blocks is significantly reduced by calling the api in parallel. There may be some more room for improvement but we need to measure first to be able to find the bottlenecks, which is not easy task on development when we don't have the same data as in prod.

In the future (once we migrate to the new sgx sdk), we could use the async version of the substrate-api-client. More context in the issue.

@silva-fj silva-fj requested a review from a team July 11, 2024 16:53
Copy link

linear bot commented Jul 11, 2024

@silva-fj
Copy link
Contributor Author

Running lit_test_failed_parentchain_extrinsic.sh works for me locally, I'm not sure what the problem is with CI

Using client binary ./../bin/litentry-cli
Using node uri ws://127.0.0.1:10044
Using trusted-worker uri ws://127.0.0.1:2000

CLIENT is: ./../bin/litentry-cli -p 10044 -P 2000 -u ws://127.0.0.1 -U ws://127.0.0.1
New Account created: 0xd266a81c0d51c9b0ece2f85daa3d659e4bde596d2fd42d555109730e70255f0f
Linking identity to Bob
Finished Linking identity to Bob
Capturing IDGraph Hash of Bob
Initial ID Graph Hash of Bob: 0x8cffcda6d69f27ba44bf82adf4b9a244b0c1560c2c057280f84c15ea381c2818
New Account created: 0x8c4f7ca222e5e30323979918435d7e77899130ee0923a2d30f76510580b13155
Linking new identity to Bob with Eve as delegate signer
Finished Linking identity to Bob
Capturing IDGraph Hash of Bob
Final ID Graph Hash of Bob: 0x8cffcda6d69f27ba44bf82adf4b9a244b0c1560c2c057280f84c15ea381c2818
Failed Parentchain extrinsic not executed

Copy link
Collaborator

@BillyWooo BillyWooo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
good to know about into_par_iter

@Kailai-Wang
Copy link
Collaborator

Running lit_test_failed_parentchain_extrinsic.sh works for me locally, I'm not sure what the problem is with CI

Using client binary ./../bin/litentry-cli
Using node uri ws://127.0.0.1:10044
Using trusted-worker uri ws://127.0.0.1:2000

CLIENT is: ./../bin/litentry-cli -p 10044 -P 2000 -u ws://127.0.0.1 -U ws://127.0.0.1
New Account created: 0xd266a81c0d51c9b0ece2f85daa3d659e4bde596d2fd42d555109730e70255f0f
Linking identity to Bob
Finished Linking identity to Bob
Capturing IDGraph Hash of Bob
Initial ID Graph Hash of Bob: 0x8cffcda6d69f27ba44bf82adf4b9a244b0c1560c2c057280f84c15ea381c2818
New Account created: 0x8c4f7ca222e5e30323979918435d7e77899130ee0923a2d30f76510580b13155
Linking new identity to Bob with Eve as delegate signer
Finished Linking identity to Bob
Capturing IDGraph Hash of Bob
Final ID Graph Hash of Bob: 0x8cffcda6d69f27ba44bf82adf4b9a244b0c1560c2c057280f84c15ea381c2818
Failed Parentchain extrinsic not executed

Hmm if it only fails in this PR while working elsewhere, we need to look into it

@silva-fj
Copy link
Contributor Author

Hmm if it only fails in this PR while working elsewhere, we need to look into it

I can't reproduce it on azure-dev-2, I'll add some logs to see if I can find something on CI

@silva-fj silva-fj force-pushed the p-859-sync-live-parachain-is-too-slow-for-workers branch 3 times, most recently from 17d76f2 to b049a3c Compare July 15, 2024 15:22
Copy link
Member

@felixfaisal felixfaisal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @silva-fj
So I'm not convinced with the use of rayon and parallelism to speed up the syncing and I don't think there is enough testing evidence to suggest otherwise, So a few points

  1. There is no control on the amount or level of parallelism, that is we have par_iter which probably takes chunks of 1000 blocks at a time, Does that mean we spawn a 1000 threads each time?
  2. If we don't have a millisecond latency in response times from the Node then we potentially have a 1000 threads idling which is eating up resources. (even if there is a 60s timeout)
  3. So in the CI, the indirect call execution takes more time than anticipated, I increased the sleep time and it passed - https://github.com/litentry/litentry-parachain/actions/runs/9943514750 which would indicate that OS is having a tough time scheduling threads
  4. Need more testing to show that other parts of the system such as vc-task and link-identity (via stf) are not having to queue up tasks due to non-availability of threads.

Can we control the number of threads used by rayon and limit it?

@silva-fj
Copy link
Contributor Author

Thanks @silva-fj So I'm not convinced with the use of rayon and parallelism to speed up the syncing and I don't think there is enough testing evidence to suggest otherwise, So a few points

  1. There is no control on the amount or level of parallelism, that is we have par_iter which probably takes chunks of 1000 blocks at a time, Does that mean we spawn a 1000 threads each time?
  2. If we don't have a millisecond latency in response times from the Node then we potentially have a 1000 threads idling which is eating up resources. (even if there is a 60s timeout)
  3. So in the CI, the indirect call execution takes more time than anticipated, I increased the sleep time and it passed - https://github.com/litentry/litentry-parachain/actions/runs/9943514750 which would indicate that OS is having a tough time scheduling threads
  4. Need more testing to show that other parts of the system such as vc-task and link-identity (via stf) are not having to queue up tasks due to non-availability of threads.

Can we control the number of threads used by rayon and limit it?

By default, Rayon uses the same number of threads as the number of CPUs available. On systems with hyperthreading enabled this equals the number of logical cores and not the physical ones.

Yes, we can control the number of threads used but also please note that this is used during initialisation, so I think it should be fine if it takes some resources. If this theory is correct, it probably just means that the resources available on CI are a bit limited which is understandable. The test runs fine on dev-servers, so lack of resources on CI could explain it and if that's the case I think we should probably wait until the initialisation is done (maybe calling the /is_initialized endpoint) before calling the cli commands in the test.

I'll try to test it a bit, if your theory is correct it would explain a lot, thank you for looking into it 👍🏼

So I'm not convinced with the use of rayon and parallelism to speed up the syncing and I don't think there is enough testing evidence to suggest otherwise

Can you elaborate on this? Ideally we should be using the async api to make these requests but unfortunately we can't use it yet. Nevertheless, calling these blocking functions in parallel does help to improve the sync a bit

@felixfaisal
Copy link
Member

Can you elaborate on this? Ideally we should be using the async api to make these requests but unfortunately we can't use it yet. Nevertheless, calling these blocking functions in parallel does help to improve the sync a bit

Yeah I would prefer async over multi-threading since these are mostly I/O bound tasks, but I guess multi-threading is the only option we currently have. Something else related to infrastructure is if the node can support so many requests in prod without terminating the WS connection but this is out of scope for this PR.

please note that this is used during initialisation, so I think it should be fine if it takes some resources

I missed out on this, then in that case, i guess it's fine to allocate as much as possible

I think we should probably wait until the initialisation is done (maybe calling the /is_initialized endpoint) before calling the cli commands in the test.

Sounds like a good idea

@silva-fj silva-fj force-pushed the p-859-sync-live-parachain-is-too-slow-for-workers branch from 754d506 to 5e690c0 Compare July 16, 2024 12:21
@silva-fj
Copy link
Contributor Author

I've spent quite some time debugging on CI and trying to make the script to work by calling the /is_initialized endpoint and running the tests after the initialization but I could not make it work (on CI), maybe I was not using the right port. So, for now I've increased the sleep time before requesting the id graphs to make the test pass

@silva-fj silva-fj requested a review from a team July 18, 2024 07:12
Copy link
Member

@felixfaisal felixfaisal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, Still a bit skeptical about how this would work in prod or with a live chain. Would be nice if we have someone else to take a look.

@Kailai-Wang
Copy link
Collaborator

I'm ok-ish with it, although I'm not sure how many CPUs will be available within an enclave.

Can you test it a bit with sdk-v2.0 branch before merging it?
I ask this because we have a lower number of TCSNum there

@silva-fj
Copy link
Contributor Author

I'm ok-ish with it, although I'm not sure how many CPUs will be available within an enclave.

Can you test it a bit with sdk-v2.0 branch before merging it? I ask this because we have a lower number of TCSNum there

There is no change in how the enclave api is being called, it's still 1 call at a time on each loop. What is different now is how we collect the data to pass to the enclave api on each loop

@silva-fj
Copy link
Contributor Author

Still a bit skeptical about how this would work in prod or with a live chain

If our production nodes can't handle this load, we have bigger problems that this 😆

@Kailai-Wang
Copy link
Collaborator

Cool, let's get it merged

@Kailai-Wang Kailai-Wang enabled auto-merge (squash) July 23, 2024 14:12
@Kailai-Wang Kailai-Wang merged commit f789aad into dev Jul 23, 2024
32 checks passed
@Kailai-Wang Kailai-Wang deleted the p-859-sync-live-parachain-is-too-slow-for-workers branch July 23, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants