-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need better docs and examples encouraging the use of socketTimeout in AWS clients for V3. SDK should warn or error when invalid V2 style params are used. #6763
Comments
|
@kuhe Wow, thanks for mentioning this, we will update to requestTimeout. I was wondering why there were so many similar looking settings. Regarding the issue we encountered, in the afternoon of 12/16/2024, our lambda code which peforms a streaming read from S3, transform records, and streaming write to S3 using the @aws-sdk/lib-storage Upload to do a multi-part upload, started encountering RequestTimeouts in 30-40% of our executions. Looking at the logs, the stream would often make progress for several minutes and then just timeout. The time between the last successful S3 call till the one that failed was about 72s in many cases. In the previous several months, logs only showed it happening maybe 3 times, but now it was happening regularly 1000s of times every hour. This code has been in place for years and worked well even after upgrading to AWS SDK V3 earlier in the year. The problem got persistently worse failing up to 50% of the time on 12/17/2024. This problem persisted and we involved AWS support team (both lambda and S3 teams). After exhausting all ideas and suggestions, we enabled more logging and were able to zero in on the fact that it was a RequestTimeout that was the root cause. After realizing the timeout property had changed to socketTimeout (from V2 to V3), we set this to 15m and everything started succeeding properly. So it appears that for whatever reason, there is a default s3 client socketTimeout/requestTimeout that is less than 72s. If this was due to something else like a Node.js timeout then I don't believe setting socketTimeout would have resolved it. I will be happy to discuss further. |
Also note that another user also reported requestTimeout issue after they migrated to V3 (and did not have socketTimeout or requestTimeout set). So there appears to be some default timeout when it is not set. See #6762 |
We hit the RequestTimeout while using: |
Describe the issue
Recently we spent several days trying to debug mysterious RequestTimeout errors that started showing up in one of our AWS environments. This was occurring with the S3 client but after understanding the issue it could easily occur for any SDK client.
We had converted from V2 to V3 months ago and did not notice any problems till recently.
After close inspection, we had missed the conversion of the V2 httpOptions.timeout to the new requestTimeout.socketTimeout in V3. Although it is not specified in the docs, the default socketTimeout must be shorter than 72s since we were hitting it often at that interval.
So it should be made much more obvious that this is an important value to increase since the default is no longer sufficient and can just start failing.
Another PR was just created with the same problem, so I will also link it here. #6762 (I commented on the solution to help the user). I guess this evidence that more docs are needed.
I would expect that the typscript docs would explain the importance of the socketTimeout and what the default value is when unset. It might also provide the type of error one might expect if this is too low (RequestTimeout). It should be made clear that this value should be set since the current default is not sufficient (at least for S3). Currently the description for this item is completely blank.
I see that you recently updated the migrating.md notable-changes section which helps if people were coming from V2 to V3. However I think it may still need some improvement to explain the importance of the socketTimeout value and what may happen if not set.
I believe that the examples provided and especially the best practices docs or examples should include this value by default to encourage its use since you really should not run with in production with the default.
It also would have been really helpful to have warnings or error messages when we create an AWS client using the old V2 parameters like httpOptions.timeout. This would have prevented us from introducing this regression bug with our code. Currently there does not appear to be any indication that you are using the old values and that they will no longer do anything. I know it's difficult to check all settings, but the ones that have clearly moved, should be errored or at least warned about so the code can be corrected.
I don't know if the other API's are as sensitive to timeouts as S3 is, but I would suspect it would be good practice to increase the socketTimeout for all of them if the default remains so low. The main one that we hit was when using lib-storage to do a multi-part upload.
The code examples for the various commands also do not show any best practices for setting the options. Many of the examples don't even show the options object at all. So while maybe it is overkill to have these details in every example, maybe you should have a comment by the options object linking to a nice doc that would spell out all the important client settings that one should typically use.
I think it would be smart for best practices to show all the details one needs to make your code production ready.
Another thing that would have been helpful for us is that when using @aws-sdk/lib-storage to do a multi-part upload, the error generated by the Upload object is just a basic AbortError with no indication of why it was aborted. So it wasn't until we turned on AWS SDK client logging that we could find the real error the RequestTimeout. So you might mention in the docs for Upload object that AbortError could be caused by a RequestTimeout and that enabling logging for errors is a smart thing. I think this tip alone deserves it's own issue so I will add one for SDK logging too.
Links
The text was updated successfully, but these errors were encountered: