You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected: Uploads are not committed to S3 when the process exits
Actual behaviour: Uploads are committed when the process exits
Steps/code to reproduce the problem
It seems that the __del__ finalizer is being called on both:
io.BufferedIOBase which MultipartWriter extends from, calling close
GzipFile when the process exits, which in turn calls closeMultipartWriter due to the close method being monkeypatched
when these objects are garbage collected, e.g. when the system exits
This does not seem to be intended behaviour when using multipart uploads, it's not desirable to complete multipart uploads in the middle of the upload when the process exits elsewhere, e.g. on EC2 when we get a spot instance termination, and has led to us seeing incomplete data, and corrupted gzip uploads, being written to S3 when other issues cause the system to exit.
For other uploads, this behaviour is OK as it means the files are closed as expected- writes aren't intended to be atomic.
Ideally close should never be called on MultipartWriter, unless a context manager exits without an exception, or if close is explicitly called by the user code.
We were trying to get around this by monkeypatching the file object to remove the __del__ method from both the gzip and smart open s3 file objects in our user code, but didn't have any success without monkeypatching the whole classmethod on gzip.GzipFile which is undesirable as it seems there's special behaviour with python dundermethods, there may be a way around this though.
Problem description
Steps/code to reproduce the problem
It seems that the
__del__
finalizer is being called on both:io.BufferedIOBase
whichMultipartWriter
extends from, callingclose
GzipFile
when the process exits, which in turn callsclose
MultipartWriter
due to theclose
method being monkeypatchedwhen these objects are garbage collected, e.g. when the system exits
This does not seem to be intended behaviour when using multipart uploads, it's not desirable to complete multipart uploads in the middle of the upload when the process exits elsewhere, e.g. on EC2 when we get a spot instance termination, and has led to us seeing incomplete data, and corrupted gzip uploads, being written to S3 when other issues cause the system to exit.
For other uploads, this behaviour is OK as it means the files are closed as expected- writes aren't intended to be atomic.
Ideally
close
should never be called onMultipartWriter
, unless a context manager exits without an exception, or ifclose
is explicitly called by the user code.Versions
The text was updated successfully, but these errors were encountered: