Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAR: Implement extraction and archiving of hardlinks. #286

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

wilx
Copy link

@wilx wilx commented May 25, 2023

This implements hardlinks in TAR. There is also a test to prove it.

@wilx
Copy link
Author

wilx commented May 25, 2023

My intention with this is to be able to create TAR archives with hardlinks with maven-assembly-plugin.

However, I am not sure if this is the best possible way to do this. I am not sure if it fits right into the PlexusIoResource framework.

@wilx wilx force-pushed the master-hardlinks branch from 1ffb0f5 to 913783b Compare May 25, 2023 19:28
Copy link
Member

@plamentotev plamentotev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing. Hard links is something we still missing to better support tar files.

This change does not cover some edge cases. Lets say file and link are file and hard link in a tar file. In the most straight forward case we are going to extract both files and everything will work fine. But there are other cases.

We can extract only link. If file is not present then I'm not sure the proper exception is caught in order to just create the file. But more importantly if file do exist on the file system, a hard link to it would be created instead of creating new file.

Plexus Archiver have a feature named file mappers. So after extracting file may be called someFile. What we should do in such cases?

Also does the order of the entries matter. Is it possible that link is extracted before file and in such case would file be hard link to link?

if (entry.getType() == ArchiveEntry.SYMLINK) {
final SymlinkDestinationSupplier plexusIoSymlinkResource =
(SymlinkDestinationSupplier) entry.getResource();
final SymlinkDestinationSupplier plexusIoSymlinkResource = (SymlinkDestinationSupplier) ioResource;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to hard links? It seems that it is just refactoring to avoid calling entry.getResource() multiple times. If this is the case I would rather have this in separate PR or at least separate commit as the change is already complex enough to understand on its own.

final Path file = fileResource.getFile().toPath();
if (Files.exists(file)) {
final BasicFileAttributeView fileAttributeView =
Files.getFileAttributeView(file, BasicFileAttributeView.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use LinkOption.NOFOLLOW_LINKS to be on the safe side, although it seems that symbolic links are already handled (haven't tested it).

@@ -233,18 +241,43 @@ protected void tarFile(ArchiveEntry entry, TarArchiveOutputStream tOut, String v
}
}

boolean doCopy = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this name confusing. isLink, or something else, probably would be more appropriate as it seems that the flag indicates whether the entry is link (symbolic or hard) to some other entry.

if (Files.exists(file)) {
final BasicFileAttributeView fileAttributeView =
Files.getFileAttributeView(file, BasicFileAttributeView.class);
if (fileAttributeView != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to extract this check to separate method to make this method easier to read.

}
}
}
if (te == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I don't follow why this is added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

te is assigned either from the symlink branch or the hardlink branch. But the hardlink branch can also not assign value at all. This handles that case plus all other cases of non-symlink&non-hardlink.

@@ -278,8 +280,9 @@ protected void extractFile(
String entryName,
final Date entryDate,
final boolean isDirectory,
final boolean isSymlink,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking change. I wonder if we can avoid it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how.

@plamentotev
Copy link
Member

plamentotev commented May 27, 2023

Also we need to be extra careful when extracting multiple copies instead of creating hard links. If a tar file contains single 1 MB file and thousands of hard links, then extracted (if files are copied) could be 1 GB, possibly resulting in DoS attack.

@wilx
Copy link
Author

wilx commented Aug 21, 2023

FYI, I will not be pursuing this pull request further. I was curious if it could be done. But it is unclear to me how to do this so that it matches all the requirements and design criteria of the library. Feel free to pick it up or close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants