Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the class ZipArchiveOperations, which implements archive operations on zip-files #407

Closed

Conversation

christian-monch
Copy link
Contributor

This PR adds ZipArchiveOperations, a class that implements ArchiveOperations for zip files. It makes use of
datalad_next.iter_collections.zipfile.iter_zip to implement iteration.

The PR includes tests. The test code comes with a new pytest-fixture called sample_zip, which creates a sample zip-file with known content, much like the sample_tar_xz-fixture.

Remark: while working on this PR, I came across a type-annotation issue #406. I will create a PR to solve #406 later

@christian-monch christian-monch requested a review from mih as a code owner June 7, 2023 11:21
@christian-monch christian-monch force-pushed the zip-archive-operations branch 2 times, most recently from 177fccb to b258777 Compare June 7, 2023 19:14
@codecov
Copy link

codecov bot commented Jun 8, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (14f3563) 92.45% compared to head (e866e1b) 92.07%.

❗ Current head e866e1b differs from pull request most recent head 9bf90b6. Consider uploading reports for the commit 9bf90b6 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #407      +/-   ##
==========================================
- Coverage   92.45%   92.07%   -0.38%     
==========================================
  Files         128      124       -4     
  Lines        9641     9128     -513     
  Branches     1045        0    -1045     
==========================================
- Hits         8914     8405     -509     
- Misses        703      723      +20     
+ Partials       24        0      -24     
Files Coverage Δ
...alad_next/archive_operations/tests/test_zipfile.py 100.00% <100.00%> (ø)
datalad_next/conftest.py 100.00% <100.00%> (ø)
datalad_next/archive_operations/zipfile.py 97.72% <97.72%> (ø)

... and 53 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mih added a commit to mih/datalad-next that referenced this pull request Jun 10, 2023
Modelled after the `ZipArchiveOperations` tests in
datalad#407
mih added a commit to mih/datalad-next that referenced this pull request Jun 11, 2023
Modelled after the `ZipArchiveOperations` tests in
datalad#407
mih added a commit to mih/datalad-next that referenced this pull request Jun 11, 2023
- More sensible `__repr__` for `ArchiveOperations`
- Document methods
- More flexible handling of archive item names. `open()` and all other
  such methods can now accept `item.name` from `__iter__` results
  directly. Based on the assumption that all TAR archives use member
  names in POSIX notation.
- Add unit tests with comprehensive coverage. Modelled after the
  `ZipArchiveOperations` tests in
  datalad#407 but with more
  context manager use.
Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. It motivated me to implement #415 which addresses a number of insufficiencies that became apparent while reviewing this PR. Please have a look re documentation and flexibility re archive item identifiers. I think it would make sense to align zip and tar support in this regard.

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we sorted out how this is supposed to be in various other PRs. I think mostly the item identifier flexibility needs to be aligned (allow for PurePosixPath) still.

@christian-monch
Copy link
Contributor Author

christian-monch commented Jun 21, 2023

Here is the plan:

@christian-monch christian-monch force-pushed the zip-archive-operations branch from f52ffc6 to c179406 Compare June 21, 2023 09:08
@christian-monch
Copy link
Contributor Author

The zip file format introduces another naming-related quirk. Directory names in zip-files end with "/". All driectory-items (and file-items) that are yielded by iter_zip and by ZipArchiveOperations.__iter__, i.e. ZipFileItem-instances, have a name-property with type PurePosixPath. PurePosixPath does not distinguish between a path referring to a directory and a path referring to a file. It therefore also cannot be used to transparently do the "right thing". PR #430 introduces ZipFileDirPath to fix that.

@christian-monch
Copy link
Contributor Author

FMPOV this is ready to go

@christian-monch christian-monch requested a review from mih June 22, 2023 10:12
This commit adds `ZipArchiveOperations`,
a class that implements `ArchiveOperations`
for zip files. It makes use of
`datalad_next.iter_collections.zipfile.iter_zip`
to implement iteration.

The commit comes with tests. The test
code includes a pytest-fixture called `sample_zip`,
which creates a sample zip-file with known content,
much like the `sample_tar_xz`-fixture.
This commit fixes a bug in the tests
where Unix path-separators were always
used to identify directory entries in
a zip-archive. That is wrong on
Windows, where Windows-path separators
have to be used.
This commit ensures that all zip-file names
that are used as parameters in methods of
ZipArchiveOperations use '/' in names
of zip-archive items.

That is due to the fact that zip enforces
the use '/' in the names of its items
(zip-APPNOTE.TXT 6.3.3 from September 1st,
2012).

Until PR datalad#409
is merged this will ensure that the proper
file names are used to check for items in
zip-archives. The commit can be reverted
once PR  datalad#409
is merged.
This commit removes duplicated code for the
sample_zip-fixture. It makes the sample_zip-
fixture avaiable through conftest.py and
re-uses it on the ZipArchiveOperations tests.
This commit uses PurePosixPath to
properly read the path of a zip-archive
on Windows in test_open()
This commit renames the file
`datalad_next/archive_operations/tests/test_zipfile_archive_operations.py`
to `datalad_next/archive_operations/tests/test_zipfile.py`. This
is better readable and in sync with the name of the tests for
`datalad_next/archive_operations/tarfile.py`
This commit adapts the ZipArchiveOperations
code to the current type of `ZipfileItem.name`,
i.e. to `PurePosixPath`.

The commit also adapts the tests for
ZipArchiveOperations to the updated types,
and aligns the tests to the test code
for TarArchiveOperations. This is in
preparation for a test code consolidation
@mih mih force-pushed the zip-archive-operations branch from 9bf90b6 to 5fbb2a0 Compare December 18, 2023 06:31
@mih
Copy link
Member

mih commented Dec 18, 2023

Looks like this PR is not longer up-to-speed with the other developments that happened since it was posted.

FAILED ../archive_operations/tests/test_zipfile.py::test_ziparchive_iterator - AssertionError: assert 'test-archive//' in ZipArchiveOperations(/home/appveyor/DLTMP/pytest-of-appveyor/pytest-0/zipfile0/sample.zip, cfg=None)

@mih
Copy link
Member

mih commented Dec 18, 2023

Started working on an update

@mih
Copy link
Member

mih commented Dec 18, 2023

Replaced by #578

@mih mih closed this Dec 18, 2023
@christian-monch christian-monch deleted the zip-archive-operations branch July 16, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants