-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving sync parachain performance for workers #2888
Improving sync parachain performance for workers #2888
Conversation
Running
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
good to know about into_par_iter
Hmm if it only fails in this PR while working elsewhere, we need to look into it |
I can't reproduce it on |
17d76f2
to
b049a3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @silva-fj
So I'm not convinced with the use of rayon and parallelism to speed up the syncing and I don't think there is enough testing evidence to suggest otherwise, So a few points
- There is no control on the amount or level of parallelism, that is we have
par_iter
which probably takes chunks of 1000 blocks at a time, Does that mean we spawn a 1000 threads each time? - If we don't have a millisecond latency in response times from the Node then we potentially have a 1000 threads idling which is eating up resources. (even if there is a 60s timeout)
- So in the CI, the indirect call execution takes more time than anticipated, I increased the sleep time and it passed - https://github.com/litentry/litentry-parachain/actions/runs/9943514750 which would indicate that OS is having a tough time scheduling threads
- Need more testing to show that other parts of the system such as vc-task and link-identity (via stf) are not having to queue up tasks due to non-availability of threads.
Can we control the number of threads used by rayon and limit it?
By default, Rayon uses the same number of threads as the number of CPUs available. On systems with hyperthreading enabled this equals the number of logical cores and not the physical ones. Yes, we can control the number of threads used but also please note that this is used during initialisation, so I think it should be fine if it takes some resources. If this theory is correct, it probably just means that the resources available on CI are a bit limited which is understandable. The test runs fine on dev-servers, so lack of resources on CI could explain it and if that's the case I think we should probably wait until the initialisation is done (maybe calling the I'll try to test it a bit, if your theory is correct it would explain a lot, thank you for looking into it 👍🏼
Can you elaborate on this? Ideally we should be using the async api to make these requests but unfortunately we can't use it yet. Nevertheless, calling these blocking functions in parallel does help to improve the sync a bit |
Yeah I would prefer async over multi-threading since these are mostly I/O bound tasks, but I guess multi-threading is the only option we currently have. Something else related to infrastructure is if the node can support so many requests in prod without terminating the WS connection but this is out of scope for this PR.
I missed out on this, then in that case, i guess it's fine to allocate as much as possible
Sounds like a good idea |
754d506
to
5e690c0
Compare
I've spent quite some time debugging on CI and trying to make the script to work by calling the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good, Still a bit skeptical about how this would work in prod or with a live chain. Would be nice if we have someone else to take a look.
I'm ok-ish with it, although I'm not sure how many CPUs will be available within an enclave. Can you test it a bit with sdk-v2.0 branch before merging it? |
There is no change in how the enclave api is being called, it's still 1 call at a time on each loop. What is different now is how we collect the data to pass to the enclave api on each loop |
If our production nodes can't handle this load, we have bigger problems that this 😆 |
Cool, let's get it merged |
This PR improves the performance for sync with parachains a bit. I've tested it with litentry rococo.
Here are the results:
Current:
New:
Notice how the time to fetch the blocks is significantly reduced by calling the api in parallel. There may be some more room for improvement but we need to measure first to be able to find the bottlenecks, which is not easy task on development when we don't have the same data as in prod.
In the future (once we migrate to the new sgx sdk), we could use the async version of the substrate-api-client. More context in the issue.