-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upon restart p2p connectback or sync are very hard plus rpc usually timoeut #836
Comments
we have 3 or 4 nodes hitting this . If reimport from almost brand new snaphosts - still hard to connect |
I am encountering the same issues |
See #829. The handshake messages are harmless, but they do make it hard to see what the real issue is if anything. Can you provide your complete log? |
In my experience this is what logs can look like when nodeos is exceptionally slow applying blocks due to low memory or poor disk performance. Can you run |
Indeed always list with status D However, I definitely have more than enough memory. it uses 2GB, and I have 64GB. I will check the disk; however, it's hard to believe that this is the root cause; the node worked perfectly until the new version was upgraded. |
yes , it's D even stuck |
last 100000lines of logs of one of the stuck node , should include the process which i restarted it few times and tried various param change to see if helpful |
Thanks for the log. Unfortunately you are hitting: #830 which will be fixed in
etc. Until 1.0.2, just remove all but one, two, or at most three good peers. |
even with 1 - 3 peers it's still pretty easy to hit same issues at our side after few attempts For example , for using snapshot x single peer that works previously to rework (did not come up in error logs before) , may still encounter something like below
turn out it won't finish the import->catchup flow forever and if one turning off at this stage - node got corrupted and need rerun the import flow === |
our nodes have the same problem, become very unstable since upgrade to spring |
Please DM me complete logs and config.ini. Would like to try and reproduce. |
we are having a similar issue. Nodes are struggling to process blocks, RPC API stops responding. Snapshots take a long time to process and, once processed, the node doesn't start syncing up blocks. We have noticed a huge increase in data per block on Friday and again today. What is the recommended configuration for running a node that can keep up? Looking for advice on modifications to config.ini like number of net/http threads etc... |
this corruption seems to happen because the process is so busy that it doesn't catch SIGTERM so it can't finish gracefully |
It will respond to SIGTERM, it just takes awhile. When I tried stopping my syncing node it took 9 minutes to respond to SIGTERM. |
@heifner could you share a config.ini for a high-performing (i.e z1d.6xlarge) RPC node? |
If not also being used as a P2P relay node for transactions: Since you have plenty of RAM you can run with: You also have plenty of cores, so: Keep your peer list small, just a small number of reliable peers. If you are not processing read-only-transactions and not running a public node: If not running a public node, and not using: |
We examined The rpc-thread timeout may still occur after importing multiple day's data , haven't examined completely for catching-up with an older snapshot, which i need to do for eliminating data gap - a much minor issue then
Wonder if this graceful-stop time could be improved (thread priority/race ?!) , in old version even in moderate config machine it just needs within 1 or at most 2 minute i recall |
Yes, we have a plan to fix it: #284 |
Fixed by #284 and released in 1.0.3. |
v1.0.1
nodes were powerful enough staying in sync before reboot - but post reboot looks everything change
hard to connect , keep doing handshake
for hours
(p2p port already open to world)
The text was updated successfully, but these errors were encountered: