Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.27.0] Refactor: IPFS download #480

Merged
merged 16 commits into from
Dec 27, 2022
Merged

Conversation

Karrenbelt
Copy link

@Karrenbelt Karrenbelt commented Dec 9, 2022

Proposed changes

Refactor and clean up IPFSTool.download.
partially addresses #501, which may be closed once this PR is accepted.

Fixes

If it fixes a bug or resolves a feature request, be sure to link to that issue.

Types of changes

What types of changes does your code introduce to agents-aea?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING doc
  • I am making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Lint and unit tests pass locally with my changes and CI passes too
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that code coverage does not decrease.
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

@Karrenbelt Karrenbelt force-pushed the fix/ipfs_download_retries branch from e884054 to 91284fa Compare December 9, 2022 21:56
@codecov-commenter
Copy link

codecov-commenter commented Dec 10, 2022

Codecov Report

Base: 90.30% // Head: 90.30% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (017afa2) compared to base (67ac9a1).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@                   Coverage Diff                   @@
##           tests/ipfs_download     #480      +/-   ##
=======================================================
- Coverage                90.30%   90.30%   -0.01%     
=======================================================
  Files                      352      352              
  Lines                    28702    28695       -7     
=======================================================
- Hits                     25920    25912       -8     
- Misses                    2782     2783       +1     
Flag Coverage Δ
unittests 90.30% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
plugins/aea-cli-ipfs/aea_cli_ipfs/ipfs_utils.py 100.00% <100.00%> (ø)
aea/package_manager/v1.py 99.35% <0.00%> (-0.65%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Karrenbelt Karrenbelt marked this pull request as draft December 12, 2022 17:14
Karrenbelt added a commit that referenced this pull request Dec 14, 2022
@Karrenbelt Karrenbelt marked this pull request as ready for review December 14, 2022 22:59
@Karrenbelt Karrenbelt force-pushed the fix/ipfs_download_retries branch from 724d2ce to a9f6b4c Compare December 17, 2022 17:09
@Karrenbelt Karrenbelt changed the base branch from main to tests/ipfs_download_retries December 17, 2022 17:09
@Karrenbelt Karrenbelt changed the title Fix: IPFS download retries Refactor: IPFS download retries Dec 17, 2022
@DavidMinarsch DavidMinarsch changed the title Refactor: IPFS download retries [v1.27.0] Refactor: IPFS download retries Dec 23, 2022
DavidMinarsch
DavidMinarsch previously approved these changes Dec 23, 2022
@Karrenbelt
Copy link
Author

This will need to wait till the morning before it's ready. When 489 is in I'll sync this one up.

@DavidMinarsch DavidMinarsch changed the base branch from tests/ipfs_download_retries to main December 23, 2022 21:38
@DavidMinarsch DavidMinarsch dismissed their stale review December 23, 2022 21:38

The base branch was changed.

@Karrenbelt Karrenbelt mentioned this pull request Dec 24, 2022
10 tasks
@Karrenbelt Karrenbelt changed the base branch from main to tests/ipfs_download December 24, 2022 14:46
@Karrenbelt Karrenbelt changed the title [v1.27.0] Refactor: IPFS download retries [v1.27.0] Refactor: IPFS download Dec 24, 2022
Comment on lines +334 to +343
# else it is a directory containing a single package path
package_path = download_path
if fix_path:
# assumption is it contains one nested directory: the package
paths = list(download_path.glob("*"))
assumption_is_valid = len(paths) == 1 and paths[0].is_dir()
if not assumption_is_valid: # pragma: no cover
error_msg = f"Expected a single directory, found: {paths}"
raise DownloadError(error_msg)
package_path = paths.pop()
Copy link
Author

@Karrenbelt Karrenbelt Dec 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added explicit check for the apparent assumption. Relates to this original piece of code

package_path = None
if os.path.isdir(downloaded_path):
downloaded_files = os.listdir(downloaded_path)
if len(downloaded_files) > 0:
package_name, *_ = os.listdir(downloaded_path)
package_path = str(Path(target_dir) / package_name)
if package_path is None:
package_path = target_dir

I refactored and introduced move_to_target_dir to encapsulate this logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice refactor.

but i dont like the mix of functionalities in download method.

i think we need quite simple download with retry method
ans some high level method that uses simple download and applies all the package related things

download method should know nothing about packages, files and how handle data downloaded.

Copy link
Author

@Karrenbelt Karrenbelt Dec 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree (as per #501), it's mostly that I try to limit the amount of changes in a single PR. Now with the nested function it should be easier to achieve in a future refactor. The issue with the refactor is that the fix_path (which is ALWAYS true) would need to be removed from the function arguments (and hence constitute a breaking change).

We could add a DeprecationWarning for version 2.0 for anyone that passes the kwarg. How we could do this is: 1. change the default to None (we want to detect anyone passing a boolean explicitly) and raise the warning in case it is not None.

Comment on lines -329 to -330
if os.path.exists(os.path.join(target_dir, hash_id)): # pragma: nocover
raise DownloadError(f"{hash_id} was already downloaded to {target_dir}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer raising here. ipfshttpclient.Client.get also overwrites if the path already exists: if your local is different, your local is wrong.

Furthermore, if we fix_path and effectively remove the content up one directory, this check becomes useless.


if os.path.exists(os.path.join(target_dir, hash_id)): # pragma: nocover
raise DownloadError(f"{hash_id} was already downloaded to {target_dir}")
package_path.rename(target_dir / package_path.name)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming a file or directory to one that already exists will throw an error.

Comment on lines -241 to -248
def test_ipfs_download_target_path_exists(self) -> None:
"""Test aea ipfs download target_path exists."""

Path(self.target_dir, self.some_ipfs_hash).mkdir(parents=True)
expected = f"{self.some_ipfs_hash} was already downloaded"
with pytest.raises(click.ClickException, match=expected):
self.run_cli(*self.args, catch_exceptions=False, standalone_mode=False)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tests is no longer relevant

Comment on lines -237 to -240

with pytest.raises(DownloadError, match="was already downloaded to"):
os.mkdir(Path(tmp_dir) / "some")
ipfs_tool.download("some", tmp_dir, attempts=5)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and neither is this one

@Karrenbelt Karrenbelt force-pushed the fix/ipfs_download_retries branch from 13e12ef to 017afa2 Compare December 24, 2022 15:05
@Karrenbelt
Copy link
Author

Karrenbelt commented Dec 24, 2022

failures

  • windows 3.7
    FAILED tests/test_cli/test_get_multiaddress.py::TestGetMultiAddressCommandConnectionNegative::test_run[fake-password]
    FAILED tests/test_helpers/test_dependency_tree.py::test_generation_of_dependency_tree_of_project
    FAILED tests/test_cli/test_test.py::TestVendorPackageTestByType::test_run[protocol]
    FAILED tests/test_cli/test_test.py::TestVendorPackageTestByType::test_run[connection]
    FAILED tests/test_cli/test_test.py::TestVendorPackageTestByType::test_run[skill]
    FAILED tests/test_cli/test_test.py::TestVendorPackageTestByType::test_run[contract]
    FAILED tests/test_cli/test_test.py::TestPackageTestByType::test_run[protocol]
    FAILED tests/test_cli/test_test.py::TestPackageTestByType::test_run[connection]
    FAILED tests/test_cli/test_test.py::TestPackageTestByType::test_run[skill] - ...
    FAILED tests/test_cli/test_test.py::TestPackageTestByType::test_run[contract]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPathWithCov::test_run[connection]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPathWithCov::test_run[protocol]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPathWithCov::test_run[skill]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPathWithCov::test_run[contract]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPathEmptyTestSuite::test_run[protocol]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPathEmptyTestSuite::test_run[contract]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPathEmptyTestSuite::test_run[skill]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPathEmptyTestSuite::test_run[connection]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPath::test_run[protocol]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPath::test_run[contract]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPath::test_run[connection]
    FAILED tests/test_cli/test_test.py::TestPackageTestByPath::test_run[skill] - ...
    FAILED tests/test_aea_builder.py::TestExtraDeps::test_install_dependency - ae...
    FAILED tests/test_cli/test_generate_all_protocols.py::TestParentAsRootDir::test_root_dir_dont_match
    FAILED tests/test_cli/test_generate_all_protocols.py::TestParentAsRootDir::test_root_dir_parent
    FAILED tests/test_cli/test_generate_all_protocols.py::TestGenerateAllProtcols::test_no_bump
    FAILED tests/test_cli/test_generate_all_protocols.py::TestGenerateAllProtcols::test_check_clean_pass
    FAILED tests/test_cli/test_generate_all_protocols.py::TestGenerateAllProtcols::test_check_bump
    FAILED tests/test_cli/test_generate_all_protocols.py::TestGenerateAllProtcols::test_check_clean_fail
    FAILED tests/test_cli/test_generate_all_protocols.py::TestGenerateAllProtcols::test_no_bump_failure
    FAILED tests/test_cli/test_generate_all_protocols.py::TestGenerateAllProtcols::test_check_bump_fail
    FAILED tests/test_cli/test_generate_all_protocols.py::TestGenerateAllProtcols::test_run
    ERROR tests/test_helpers/test_ipfs/test_base.py::TestDirectoryHashing::test_get_dir_hash[1-1-bafybeies5n7nngqpzwzhety4ndktmzfjgwr5pk2ge3wbftqjwymhlzrc7e]
    ERROR tests/test_helpers/test_ipfs/test_base.py::TestDirectoryHashing::test_depth_0
    ERROR tests/test_helpers/test_ipfs/test_base.py::TestDirectoryHashing::test_depth_multi
    ERROR tests/test_helpers/test_ipfs/test_base.py::TestDirectoryHashing::test_hash_bytes
    ERROR tests/test_helpers/test_ipfs/test_base.py::TestDirectoryHashing::test_get_dir_hash[0-0-QmTGBxU5aqqpeiihQxcWr4xynhqWt23R73Btss2j8r9XcC]
    ERROR tests/test_helpers/test_ipfs/test_base.py::TestDirectoryHashing::test_get_dir_hash[1-0-QmYEAWM6jmjffDQNiyVjVWFcx7SRvutnoRuEP6AJKN9apY]
    ERROR tests/test_helpers/test_ipfs/test_base.py::TestDirectoryHashing::test_depth_1
    ERROR tests/test_helpers/test_ipfs/test_base.py::TestDirectoryHashing::test_get_dir_hash[0-1-bafybeicjexomh6l2rb3efmzojmsx2p2gynjzg3eztf4quu6zyepmnisn4e]
    = 32 failed, 2521 passed, 51 skipped, 51 deselected, 1 warning, 8 errors, 2 rerun in 5512.57s (1:31:52) =
    
    one observes 3221225477, and hex(3221225477) == "0xc0000005" (stack overflow)
  • windows 3.8 - rare error (see here)
              finally:
     >           runner.stop()
     ...
     self = <ProactorEventLoop running=False closed=False debug=False>
    
         def run_forever(self):
             try:
     >           assert self._self_reading_future is None
     E           AssertionError
     ...
     FAILED tests/test_launcher.py::TestAsyncLauncherMode::test_start_stop
    

@Karrenbelt Karrenbelt requested a review from solarw December 25, 2022 14:56
solarw
solarw previously approved these changes Dec 26, 2022
Copy link
Collaborator

@solarw solarw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, couple optional comments

else:
shutil.copy(download_path, downloaded_path)
break
return move_to_target_dir(Path(tmp_dir) / hash_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suppose move_to_target_dir(Path(tmp_dir) / hash_id) should not fail and can be extracted outside attempts loop

if file system operation fails, probably it's not recoverable

Comment on lines +334 to +343
# else it is a directory containing a single package path
package_path = download_path
if fix_path:
# assumption is it contains one nested directory: the package
paths = list(download_path.glob("*"))
assumption_is_valid = len(paths) == 1 and paths[0].is_dir()
if not assumption_is_valid: # pragma: no cover
error_msg = f"Expected a single directory, found: {paths}"
raise DownloadError(error_msg)
package_path = paths.pop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice refactor.

but i dont like the mix of functionalities in download method.

i think we need quite simple download with retry method
ans some high level method that uses simple download and applies all the package related things

download method should know nothing about packages, files and how handle data downloaded.

DavidMinarsch
DavidMinarsch previously approved these changes Dec 27, 2022
@DavidMinarsch DavidMinarsch changed the base branch from tests/ipfs_download to main December 27, 2022 09:26
@DavidMinarsch DavidMinarsch dismissed stale reviews from solarw and themself December 27, 2022 09:26

The base branch was changed.

@DavidMinarsch DavidMinarsch merged commit 33ff36d into main Dec 27, 2022
@DavidMinarsch DavidMinarsch deleted the fix/ipfs_download_retries branch December 27, 2022 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants