Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix large SFTP reads #638

Open
wants to merge 7 commits into
base: devel
Choose a base branch
from
Open

Fix large SFTP reads #638

wants to merge 7 commits into from

Conversation

Jakuje
Copy link
Contributor

@Jakuje Jakuje commented Aug 30, 2024

SUMMARY

Fixes and reproducers for large SFTP reads that truncate file with the second chunk.

Fixes: #341

ISSUE TYPE
  • Bugfix Pull Request

@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Aug 30, 2024
@Jakuje Jakuje marked this pull request as ready for review August 30, 2024 11:23
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarCloud

Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/ansible-pylibssh-638
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@Jakuje Jakuje changed the title Sftp large Fix large SFTP reads Aug 30, 2024
@Qalthos Qalthos self-requested a review September 9, 2024 12:53
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

@kucharskim
Copy link

This version works, but it is much slower comparing to SCP version.

$ time python3 test_sftp1.py
ssh.is_connected=1
<pylibsshext.sftp.SFTP object at 0xd2daf8eb910>
get
None
put
None
Closing sftp session...
Closing connection...
    1m19.09s real     0m00.16s user     0m00.21s system
$ time python3 test_scp1.py
ssh.is_connected=1
<pylibsshext.scp.SCP object at 0x7bec33ef820>
get
0
put
0
Closing connection...
    0m03.59s real     0m00.17s user     0m00.13s system

Above we see approx 80 seconds vs 4 seconds.

I've bumped all 1024 instances in sftp.pyx to 64 * 1024 like here (in put() and in get()):

-            buffer = f.read(1024)
+            buffer = f.read(1024 * 64)

and this improved time significantly:

$ time python3 test_sftp1.py
ssh.is_connected=1
<pylibsshext.sftp.SFTP object at 0x44fda57910>
get
None
put
None
Closing sftp session...
Closing connection...
    0m05.36s real     0m00.17s user     0m00.11s system

@Jakuje
Copy link
Contributor Author

Jakuje commented Nov 19, 2024

Yes, your observations are right. Buffer of 16k is recommended by the the libssh documentation to have some reasonable performance:

https://api.libssh.org/master/libssh_tutor_sftp.html

I can change it as part of this PR or separate, but the PRs are starting to depend on each other, making the changes hard.

@Jakuje
Copy link
Contributor Author

Jakuje commented Nov 19, 2024

Using 64k chunks might be problematic as the SSH specification suggests maximum packet size. Most of the implementation work with larger ones, but if they do not, it causes hard to debug issues, such as curl/curl#11804. I would go ahead with 32k as libssh will split large chunks anyway to this size.

@kucharskim
Copy link

Sure, whatever works. Just to make sftp faster.

@Jakuje
Copy link
Contributor Author

Jakuje commented Nov 19, 2024

Sure, whatever works. Just to make sftp faster.

If you want to get even better performance, you can try #641. Its still in draft as it requires libssh 0.11 and I am not sure how to support at the same time also old version, but it should get even larger speeds.

@Jakuje
Copy link
Contributor Author

Jakuje commented Nov 19, 2024

Filled #664 for increasing the chunk size (on top of this PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sftp.get downloads only the last chunk of file
2 participants