-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic UrlOperations.upload for ssh- and file-URLs #649
Conversation
0c248eb
to
8c7fc58
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #649 +/- ##
==========================================
+ Coverage 89.23% 92.47% +3.24%
==========================================
Files 195 195
Lines 14272 14301 +29
Branches 2164 2162 -2
==========================================
+ Hits 12735 13225 +490
+ Misses 1258 812 -446
+ Partials 279 264 -15 ☔ View full report in Codecov by Sentry. |
8c7fc58
to
407abfb
Compare
20c22ac
to
d0e28c1
Compare
d0e28c1
to
0cbdbbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this PR together. I hope we can get this done quickly now. Given the age of this PR, I'd like to ask for the first and the last commit of this series to be posted as two individual PRs. They are largely unrelated to the actual topic, valid and complete fixes that should not be hanging here. Thanks!
The main diff of the PR changes the URL operations upload from a straight copy to a copy-to-tmp-and-final-mv implementation. This is good IMHO.
However, the diff introduces two random number reinitializations. We cannot do this, and I do not understand why it would be needed either. Please add rationale on the necessity, and add the protective scaffolding (getstate(), setstate()) to limit the impact of the reinitialization to only this very function and without impairing the ability of any Python script importing DataLad to run reproducibly.
Thanks!
This commit adds atomic upload to `url_operations.ssh`. It uses `datalad_next.shell` to implement all ssh-operations. The commit also adds an explicit test for `url_operations.ssh.stat`.
This commit adds an atomic upload for files. This can prevent errors where two threads are writing to the same file concurrently. The commit also fixes type errors in `url_operations.file`.
This commit adds a description of the atomicity properties of upload to the doc-string of the method `url_operations.base.UrlOperations.upload`.
654e5b8
to
f07a613
Compare
I removed the reinitialization of random numbers and added a time-stamp modifier to the nonce-values. That should prevent name collisions, even if the random state is identically on calls to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new implementation adds a longer suffix (38chars) (like .transfer-8267388836_17682337760925293
that include a random number and a timestamp to make collisions unlikely. This is not cheap on filesystems with path length limits, but also an implementation detail that can be addressed, if and when there is a real need.
Thanks @christian-monch ! |
This PR addresses issue datalad-ria #102 and touches on issue #454 where atomic upload operations are discussed as well.
The PR also adds atomic upload to files. This can prevent data loss in the case where multiple threads write to the same file concurrently.
All ssh- and file-uploads should be atomic, which means an
atomic
-parameter that controls atomicity on a per-call basis is not necessary.