-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: containers-add-dhub - add multiple images/tags/repositories from docker hub #135
Conversation
It's rough, but it might be at least somewhat functional. I let the first few entries of echo repronim/ | python containers_add_dhub_tags.py and echo neurodebian | python containers_add_dhub_tags.py complete. The lack of progress about the underlying `docker pull` is unfortunate (and I bet there's a -container issue open about it). The main design decision here is to name the results (the image and downloaded manifest) based on the manifest's .config.digest value and then use exists() checks as an indication that we already have the result locally. Whether that's actually valid should be revisited. Ref: ReproNim/containers#48
Use directory structure to make it easy to see which repo a digest belongs to. Ref: ReproNim/containers#48 (comment)
The digest was used in the first pass to avoid worrying about invalid characters, but as mentioned in the comment and on the ReproNim/containers issue tracker [*], this isn't good for recognizing the name. Instead construct the name from combining the repository and tag, replacing any characters that containers-add doesn't allow with "--". This introduces ambiguity and the potential for conflicts but is probably good enough. [*] ReproNim/containers#48 (comment)
Make the containers-list output less noisy for official images. Ref: ReproNim/containers#48 (comment)
Subdirectories were added a few commits back (69e7fe5).
…tiple architectures etc and also - support multiple architectures - request specific (based on digest) image, not just "first" one - add bunch of TODO comments for what todo next
…datalad-container itself I think this script would be at least a valuable tool within datalad-container or might even better become a proper command of the datalad-container extension
* local-dhub-tags/master: RF: de-dataset it and move file under tools/ so we could ingest into datalad-container itself ENH: make ugly short code into a pretty long spaggetti to support multiple architectures etc Update --help's description for directories Drop "library/" in containers-add name Use tag in containers-add name Store images and manifests under namespace/repo subdirectory Prototype of script to feed Docker Hub tags to containers-add [DATALAD] new dataset
no architecuture if only one, and no last_pushed if None
Codecov Report
@@ Coverage Diff @@
## master #135 +/- ##
=======================================
Coverage 86.65% 86.65%
=======================================
Files 17 17
Lines 922 922
=======================================
Hits 799 799
Misses 123 123 Continue to review full report at Codecov.
|
Thanks for turning my initial sketch into something more useful. I think in terms of exploration this is good, and I don't have any objections to having this script or something like it live in Here's an example to sync the neurodebian repo: $ skopeo sync --src docker --scoped --dest dir docker.io/neurodebian images/
$ tree --charset=ascii images | head
images
`-- docker.io
`-- library
|-- neurodebian:artful
| |-- 261816990a775a30f88752a13a62a52bcde56bb65e4a55f197a2e9fb9bb5920e
| |-- 448bb314afa553bfb1578121328bbe92d2b5ca0f411967e7a0a200f672ade92f
| |-- 4ccdce43d1e00fd03ac5438d39e731c16db3dfcf03c68390884b8e8c814221ca
| |-- 518254c3dbad5ed8bf16b404277faae75f3ba8bd5fcd69a217de42fbed22f250
| |-- 78ff727be57a68299558bb40b737669ca5cb9a8db948411d852ec809c14e7a1f
| |-- 82656eee95ad054e0aa75486e7c55b7666c26abbd9bf19373dd349f6e172ce9d Setting backing up/syncing a registry aside, the two missing pieces for working with skopeo are the adapter and support for the URLs in the datalad special remote. I think the mechanics of both these aspects are straightforward, but the trickier part about both is thinking through the design to leave room for other sources and targets. In the case of the blob downloader, that probably just boils down to including the specific registry as part of the URL. In the case of the adapter, the main thing I have in mind is execution with podman (gh-89, gh-106). I'm working on getting an initial version of the downloader and skopeo adapter set up. [*] It looks like support for that is coming: containers/skopeo#880 |
Great, thanks @kyleam for looking into it! |
Hmm, good point, but actually an update wouldn't work in general with a directory destination. It will fail refusing to overwrite the directory. I had assumed it would skip because I saw "Copying blob X skipped: already exists" in sync output posted to the issue tracker, but, looking at the code now, it seems like that'd only be for copying to a registry destination, not a local directory. So, it was too soon to say sync could mostly replace this script. We could still do a |
I think we better just have this PR merged to provide a tool. All the TODOs might come later if decided to proceed (likely with the next docker hub announcement ;) ) |
It is the first truly "containers-" (as opposed to "container-") command intended to add multiple containers from the Docker hub organization, possibly for multiple repositories.
Since docker hub announced that retention policy is about to be changed (well, deadline now moved to mid 2021 so not as urgent), we better provide easy means to "mirror" (or backup) all (or desired subset) of docker containers within a datalad dataset. We are planing to do that within https://github.com/repronim/containers , which ATM contains only docker containers rebuilt into singularity containers. But IMHO it could be of great benefit to have a helper tool or even a command so any user of datalad-container could establish a seamless backup of docker container from his/her (or just a favorite collection of) docker repository .
ATM it is possible to do that with this helper script (not a
containers-
command yet) which was initially crafted by @kyleam and then tortured into spaghetti code by me, for an "official" Docker images repo (e.g.busybox
orneurodebian
) which is just a shortcut tolibrary/<repo>
on the hub, or any other than library collection of the repositories (e.g.repronim/
).It already supports multiple architecture image collections (e.g.
busybox
) and annotates architecture (if multiple archs for the tag) in the image name. See the header of the file for more information/conventionsIf eager to try (although might want to uncomment the TEST definitions in the file) - try running in some target new dataset (probably created with
-c text2git
), e.g..../tools/containers_add_dhub_tags.py <(echo busybox)
ortools/containers_add_dhub_tags.py <(echo repronim/)
.TODOs
repo:tag
) which would use latest for the default archcontainers-add-dhub