Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Dockerfile remove redundant directives #3914

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

polarathene
Copy link

@polarathene polarathene commented Dec 28, 2024

  • Removes the /etc/nsswitch.conf workaround.
  • Removes the redundant usage of VOLUME directives.

Related issue(s)

The associated issues have been ignored for over a year. They've now been marked as stale, this PR attempts to address them for this repo.

See the issues for detailed justification of the summarized changes.

Checklist

  • I have read the contributing guidelines.
  • I have referenced an issue containing the design document if my change introduces a new feature.
  • I am following the contributing code guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security vulnerability, I
    confirm that I got the approval (please contact
    [email protected]) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added or changed the documentation.

Further Comments

These changes should be rather straight-forward. The bulk of the changeset is just noise repeating the same lines across several Dockerfiles, but I did notice inconsistencies which hint that these files may need to be revisited by someone more familiar with them.

Especially with the HSM image which appears to have had the runner stage broken since this June 2023 PR. though only earlier stages are built with the Makefile:

hydra/Makefile

Lines 89 to 96 in 8e71f91

.PHONY: quicktest-hsm
quicktest-hsm:
DOCKER_BUILDKIT=1 DOCKER_CONTENT_TRUST=1 docker build --progress=plain -f .docker/Dockerfile-hsm --target test-hsm -t oryd/hydra:${IMAGE_TAG} --target test-hsm .
.PHONY: test-refresh
test-refresh:
UPDATE_SNAPSHOTS=true go test -failfast -short -tags sqlite,sqlite_omit_load_extension ./...
DOCKER_BUILDKIT=1 DOCKER_CONTENT_TRUST=1 docker build --progress=plain -f .docker/Dockerfile-hsm --target test-refresh-hsm -t oryd/hydra:${IMAGE_TAG} --target test-refresh-hsm .

The HSM quickstart compose example should attempt to build the runner stage and fail:

dockerfile: .docker/Dockerfile-hsm

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@polarathene
Copy link
Author

Given the past associated issues were neglected, I am not too keen to invest more time than necessary.

I'll give the CLA a look and the other checklist items if this PR actually gets acknowledged with interest to merge it.


The scratch image seems redundant, you've got a replacement image now with the rough equivalent via Google distroless base image. You might as well drop it? The main difference apart from noted issues in the scratch Dockerfile is also a lack of the sqlite support.

Your alpine vs sqlite (alpine) images are effectively the same too, except for the sqlite package and slightly different default CMD. Probably no need or benefit maintaining the two with such a low distinction?

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the past associated issues were neglected, I am not too keen to invest more time than necessary.

No problem!

The scratch image seems redundant, you've got a replacement image now with the rough equivalent via Google distroless base image. You might as well drop it? The main difference apart from noted issues in the scratch Dockerfile is also a lack of the sqlite support.

Some dockerfiles are only used in the repo itself (for dev purposes) and some are being pushed to our docker registry. You can find the files used for prod distribution in the goreleaser config. Generally not too keen to deprecate image variants because someone always complains about it.

Your alpine vs sqlite (alpine) images are effectively the same too, except for the sqlite package and slightly different default CMD. Probably no need or benefit maintaining the two with such a low distinction?

That makes sense but please in another pr

.docker/Dockerfile-alpine Show resolved Hide resolved
Comment on lines +56 to +59
# NOTE: This is broken already. Even though this image provides a shell, you'd need to configure it with
# `SHELL ["/busybox/sh", "-c"]`, however `apt-get` does not exist either in a distroless image.
# This was original an Alpine image, the refactoring was not verified properly in this commit:
# https://github.com/ory/hydra/commit/c1e1a569621d88365dceee7372ca49ecd119f939#diff-ae54bef08e3587b28ad8e93eb253a9a5cd9ea6f4251977e35b88dc6b42329e25L31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HSM image is really just to run some e2e hsm tests. It's not being distributed and should not be used.

Copy link
Member

@aeneasr aeneasr Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, e2e tests are now failing. Probably just revert the changes here

https://github.com/ory/hydra/actions/runs/12523819546/job/34933781514?pr=3914

@aeneasr aeneasr requested a review from Demonsthere January 2, 2025 10:19
@polarathene
Copy link
Author

polarathene commented Jan 5, 2025

Generally not too keen to deprecate image variants because someone always complains about it.

Since they're the same though, just use the same Dockerfile and publish it as two separate images? That should avoid your concern about dropping the original source file for an image variant?

Your alpine vs sqlite (alpine) images are effectively the same too, except for the sqlite package and slightly different default CMD. Probably no need or benefit maintaining the two with such a low distinction?

That makes sense but please in another pr

👍


The HSM image is really just to run some e2e hsm tests. It's not being distributed and should not be used.

No worries, just pointing out that it's presently broken since the switch, just AFAIK your tests don't use this final stage of the image so it's not getting caught by them.

Yeah, e2e tests are now failing. Probably just revert the changes here

https://github.com/ory/hydra/actions/runs/12523819546/job/34933781514?pr=3914

That is an easy fix, it just wants apt-get instead of apt? Or has something changed that it can't find the package?


EDIT: Actually I don't see where I made a change to affect that? 🤔 It's complaining about this part of your CI:

- name: Setup HSM libs and packages
run: |
sudo apt install -y softhsm opensc

test-hsm:
name: Run HSM tests
needs:
- sdk-generate
runs-on: ubuntu-latest

I'm surprised the image has been building for you? The softhsm package hasn't been supported since Ubuntu 16.04 (Xenial), it's since been packaged as part of softhsm2:

image


UPDATE 2: I've split this concern out to a separate issue, it's specific to the CI runner (Ubuntu) vs the images (Debian, where it's a non-issue).

@polarathene
Copy link
Author

polarathene commented Jan 5, 2025

TL;DR: Spotted a few more bugs/issues while looking into responding to your feedback.

EDIT: Extracted to separate issues, feel free to ignore the rest of this comment:

Registries and tags (publishing bug)

Some dockerfiles are only used in the repo itself (for dev purposes) and some are being pushed to our docker registry.

Is the DockerHub registry oryd/hydra no longer being used? (EDIT: Yes it is, just no new hydra release for approx a year, redundant images are not actually published as implied)

  • It has some tags for images as SHA256 digests with .sig extension that don't seem to be actual images, those shouldn't be there.
  • The last release was about a year ago with latest and v2.2.0 tags published. That seems to be the latest release on GH releases page too, so it seems like DockerHub is still a valid registry being maintained.
  • While I do see these images published as multi-platform images, I'm also seeing individual images published with their platform as part of the tag (eg: v2.2.0-arm64 + v2.2.0-arm64-distroless) which seems redundant?

Perhaps something is a bit iffy with your release process?

You can find the files used for prod distribution in the goreleaser config.

I'm not too familiar with GoReleaser, but I am with Github Actions equivalent for building and publishing images to DockerHub.

I only see two variables defined here:

hydra/.goreleaser.yml

Lines 11 to 12 in 8e5fae4

dockerfile_alpine: ".docker/Dockerfile-alpine"
dockerfile_static: ".docker/Dockerfile-distroless-static"

Which seems to align with your two variants on DockerHub -distroless tag suffix for Debian image with glibc and no package manager, while the version only tags are Alpine with musl and it's package manager.

I haven't done a thorough search for where the other Dockerfile variants are potentially being used (which may be internal for you, or referenced externally from this repo like in docs?), but unless there's another registry you had in mind it doesn't seem like your earlier concern about deprecating variants is applicable if they were never published to registries?

Your install docs for Hydra with Docker reference oryd/hydra but not a specific image registry, so that often defaults to DockerHub:

image

No mention there of the individual platform variants or the .sig digest tags. I'm going to assume that's a publishing bug with your GoReleaser config?

oryd/hydra default Alpine image vs -sqlite variant

Both this repo's Get Started and Develop sections refer to a different Hydra docs page for Docker which uses the quickstart.yml, and that likewise refers to oryd/hydra:

image: oryd/hydra:v2.2.0

Although the Develop section instructs make docker to build an image of the same tag locally instead of pulling from the registry, and that uses the -sqlite image variant Dockerfile:

hydra/Makefile

Lines 71 to 74 in 8e5fae4

# Build local docker images
.PHONY: docker
docker:
DOCKER_BUILDKIT=1 DOCKER_CONTENT_TRUST=1 docker build --progress=plain -f .docker/Dockerfile-build -t oryd/hydra:${IMAGE_TAG}-sqlite .

Which as we have discussed earlier -sqlite variant is effectively the Alpine image with the sqlite package (2.5 MB added weight uncompressed) and with CMD set to serve which will just output help message as it expects serve all / serve public / serve admin, meanwhile the published images default to serve all and this -sqlite image if built locally as oryd/hydra has the container command modified in quickstart.yml... so there's really no compelling reason to keep this Dockerfile around either.

Technically though, the quickstart.yml has a specific release tag (good), while the make docker command defaults IMAGE_TAG to latest if not specified, and since the quickstart doesn't use that ENV, nor does the README instructions mention it in the Develop section, the image is built locally but the compose config still pulls from DockerHub 😅 So that's another "bug" observed.


Reference - Publishing image via GHA instead

For reference this is what publishing to image registries looks like with Github Actions:

# Tag an image with semver pattern (useful for tools that understand the tag convention),
# optionally with a tag rule that reflects your primary git branch builds for testing:
- name: 'Prepare tags'
  id: prep
  uses: docker/[email protected]
  with:
    images: |
      ${{ secrets.DOCKER_REPOSITORY }}
      ${{ secrets.GHCR_REPOSITORY }}
    tags: |
      type=edge,branch=master
      type=semver,pattern={{major}}
      type=semver,pattern={{major}}.{{minor}}
      type=semver,pattern={{major}}.{{minor}}.{{patch}}


# Two registries to publish to configured (DockerHub and GHCR):
- name: 'Login to DockerHub'
  uses: docker/login-action@v3
  with:
    username: ${{ secrets.DOCKER_USERNAME }}
    password: ${{ secrets.DOCKER_PASSWORD }}
- name: 'Login to GitHub Container Registry'
  uses: docker/login-action@v3
  with:
    registry: ghcr.io
    username: ${{ github.actor }}
    password: ${{ secrets.GITHUB_TOKEN }}


# Build the image from a Dockerfile for the requested platforms,
# then publish a multi-platform image at the registries with the tags declared above
- name: 'Build and publish images'
  uses: docker/[email protected]
  with:
    context: .
    # Customize any Dockerfile ARG values here:
    build-args: |
      DMS_RELEASE=${{ github.ref_type == 'tag' && github.ref_name || 'edge' }}
      VCS_REVISION=${{ github.sha }}
    platforms: linux/amd64,linux/arm64
    push: true
    tags: ${{ steps.prep.outputs.tags }}
    # Disable provenance attestation: https://docs.docker.com/build/attestations/slsa-provenance/
    provenance: false

That's all there is to it really, works well! (I omitted a little bit from the link referenced for brevity)

Github Actions also has reusable workflows, allowing you to setup a template with the above steps and pass some input variables in to support the different Dockerfile and tag variants for example. You can see an example of that here.

Perhaps GoReleaser is simpler than that, but the reference I've provided with official Github Actions from Docker doesn't have that same publishing weirdness at DockerHub for mailserver/docker-mailserver tags pushed.

@polarathene polarathene requested a review from a team as a code owner January 5, 2025 22:25
@polarathene
Copy link
Author

Documenting some more feedback should it appeal to any maintainers that want to pursue it.

A Dockerfile with flexible sourcing of the Hydra binary

I only mention this since your currently maintaining both approaches via separate Dockerfiles.

Dockerfile-distroless-static is effectively the last stage of Dockerfile-build, but without the /var/lib/sqlite directory.

If you want to have a Dockerfile that can build an image with the hydra binary COPY supporting both the builder stage, or a copy from context (supplied externally), you could have this with in the same Dockerfile as different stages that you select by a build target option instead of specifying different files.

FROM runner AS provide-hydra
COPY hydra /usr/bin/hydra

FROM runner AS build-hydra
COPY --from=builder /usr/bin/hydra /usr/bin/hydra

As the last stage will be the default target, it would build Hydra with the image build by default. To use the external build of Hydra instead just use:

docker build --target provide-hydra --file Dockerfile-build --tag oryd/hydra:latest-distroless .

A Dockerfile that builds with configurable Alpine/Distroless base image

The FROM runner stage reference could be an ARG to switch between the Alpine an Debian supported image bases, which would also remove the separate Dockerfile-alpine.

Roughly, after the builder stage in Dockerfile-build, you could have the following:

# NOTE: ARG referenced via FROM must be defined at the top of the Dockerfile
# Place this above the builder stage
ARG BASE_IMAGE=alpine


# These stages setup the Alpine and Debian base images for the runtime stage:
FROM alpine:3.20 AS alpine
RUN apk add --no-cache ca-certificates
FROM gcr.io/distroless/static-debian12:nonroot AS distroless


# The runtime stage defaults to Alpine, to build the distroless variant instead use:
# `docker build --build-arg BASE_IMAGE=distroless`
FROM ${BASE_IMAGE} AS runtime
RUN --mount=from=busybox:latest,src=/bin/,dst=/bin/ mkdir -p /var/lib/sqlite
ENTRYPOINT ["hydra"]
CMD ["serve", "all"]


# Final stage decides where to retrieve the hydra binary from.
# Defaults to `build-hydra`, provide your own build via
# `docker build --target provide-hydra`
FROM runner AS provide-hydra
COPY hydra /usr/bin/hydra
FROM runner AS build-hydra
COPY --from=builder /usr/bin/hydra /usr/bin/hydra

All 4 image variants can then be built:

docker build --build-arg BASE_IMAGE=alpine --target build-hydra
docker build --build-arg BASE_IMAGE=alpine --target provide-hydra
docker build --build-arg BASE_IMAGE=distroless --target build-hydra
docker build --build-arg BASE_IMAGE=distroless --target provide-hydra

Personally though, if you're doing static builds of Hydra outside of the Dockerfile to copy into the image, if this is only for some runtime purpose like development, you could just use a volume mount instead?

If the builder stage were always used for copying the hydra binary, then the prior/existing /var/lib/sqlite dir creation approach is valid (instead of the busybox approach to support mkdir with distroless):

COPY --from=builder /var/lib/sqlite /var/lib/sqlite

You can also use a FROM scratch stage if you'd want the builder to output the binary to the host instead of an image via --output:

FROM scratch AS export-hydra
COPY --from=builder /usr/bin/hydra /hydra
# The image contents is exported to the local filesystem:
docker buildx build --output ./ --target export-hydra --file Dockerfile-build .

That'd give you the inverse of copying an external build into the image if you found it more convenient to build with Docker and export the binary for releases elsewhere.

Dockerfile-hsm improvements

Likewise for Dockerfile-hsm, the -tags for Go need to be adjusted/appended but otherwise differences are minimal:

# Use the same `builder` stage from `Dockerfile-build`
# NOTE: this might require `go test` to also include the arg `-ldflags="-extldflags=-static"`?
# Modify the `builder` stage likewise use the `GO_BUILD_TAGS` ARG too
# Define the ARG above the builder stage to assign a common default
ARG GO_BUILD_TAGS=sqlite,sqlite_omit_load_extension


FROM builder AS test-hsm-base
ENV HSM_ENABLED=true
ENV HSM_LIBRARY=/usr/lib/softhsm/libsofthsm2.so
ENV HSM_TOKEN_LABEL=hydra
ENV HSM_PIN=1234

RUN apt-get -y install softhsm2 opensc
RUN pkcs11-tool --module "$HSM_LIBRARY" --slot 0 --init-token --so-pin 0000 --init-pin --pin "$HSM_PIN" --label "$HSM_TOKEN_LABEL"

FROM test-hsm-base AS test-hsm
ARG GO_BUILD_TAGS
RUN go test -p 1 -failfast -short -tags=${GO_BUILD_TAGS},hsm ./...

FROM test-hsm-base AS test-hsm-refresh
ARG GO_BUILD_TAGS UPDATE_SNAPSHOTS=true
RUN go test -p 1 -failfast -short -tags=${GO_BUILD_TAGS},hsm,refresh ./...

Much smaller/simpler to maintain like that.

The GO_BUILD_TAGS ARG could be avoided, but it might help to centralize on it, plus you can easily adjust them at build time for all image variants this way.

# Now you can build via target:
docker build --target test-hsm
docker build --target test-hsm-refresh

# And for the final stage of `Dockerfile-hsm` (but actually works),
# just set the build arg to that HSM stage:
docker build --build-arg BASE_IMAGE=test-hsm-base

While this could allow for a single Dockerfile for every image this repo builds, I don't really encourage that. Since the HSM images are only for internal use, you could just perform an extra build command to get the builder stage built as a separate image, before the Dockerfile-hsm references that via FROM (or you could leverage Bake config file to keep it as a single build command).

Regardless, Dockerfile-hsm could be simplified with the above approach. You could also throw away the final stage since it doesn't actually build and apparently isn't used anywhere anyway?

Copy link
Author

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies about all the verbosity, I don't really want to think about all this again and move onto other things instead 😂

Here's some final feedback to consider. I understand all my input could be overwhelming so I'll wrap this up with a checklist summary and links in a follow-up comment that should help make this more digestible for anyone interested.

# NOTE: This is required for read/write by SQLite.
install --owner ory --group ory --directory /var/lib/sqlite

# NOTE: Presumably this was already created by the prior RUN directive
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# NOTE: Presumably this was already created by the prior RUN directive
# NOTE: This location was created by `softhsm2` package install with ownership `root:softhsm`,
# while the `pkcs11-tool` command generated a token dir + files with permissions of 600.
# To support running hydra in the container as the non-root ory user, grant ownership for RW:

/var/lib/softhsm/tokens is created when installing the softhsm / softhsm2 package. Ownership is root:softhsm. As is the token dir/files generated by the earlier pkcs11-tool command.

Rather than chown -R this directory and contents, it'd be wiser to just add ory to the softhsm group. However, the token generated only allows access to the file owner (root):

$ ls -l /var/lib/softhsm
drwxrws--- 2 root softhsm 4096 Apr  1  2024 tokens

$ ls -l /var/lib/softhsm/tokens
drwx--S--- 2 root softhsm 4096 Jan  5 22:34 c0e04acd-3e15-0266-6003-f9394edce34b

$ ls -l /var/lib/softhsm/tokens/c0e04acd-3e15-0266-6003-f9394edce34b
-rw------- 1 root softhsm   8 Jan  5 22:34 generation
-rw------- 1 root softhsm   0 Jan  5 22:34 token.lock
-rw------- 1 root softhsm 320 Jan  5 22:34 token.object

So in this case chown is required for the owner specifically. Probably shouldn't mess with the group ownership? 🤷‍♂️


Personally unless there's a clear reason for building an image to run with a non-root user, I'd suggest keeping it simple to maintain and just using root which avoids all this.

Anyone bind mounting a volume for local storage to the container is not going to have the ory UID/GID match their host user since regardless of the previous implicit UID/GID (100:101) or the static 500:500 proposed here, the typical non-root host user is 1000:1000.

That leaves the remaining benefit as a "best practice", but AFAIK is mostly moot if the user instead runs the container with Podman/Docker in rootless mode which uses /etc/subuid + /etc/subgid tables on the host to map the containers root UID + GID to the hosts user and any other container UID/GID values to the subuid / subgid range. Alternatively, when run as rootful, the container can be run with all capabilities dropped, which is implicit when running a process as a non-root user in the container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally unless there's a clear reason for building an image to run with a non-root user, I'd suggest keeping it simple to maintain and just using root which avoids all this.

That leaves the remaining benefit as a "best practice", but AFAIK is mostly moot if the user instead runs the container with Podman/Docker in rootless mode which uses /etc/subuid + /etc/subgid tables on the host to map the containers root UID + GID to the hosts user and any other container UID/GID values to the subuid / subgid range. Alternatively, when run as rootful, the container can be run with all capabilities dropped, which is implicit when running a process as a non-root user in the container.

[...] mostly moot if the user instead [...]

security is never moot if you assume that everyone is smart or experienced enough to do the right thing. Running as root will fail several CI checks, please revert the user changes made

Comment on lines +7 to +15
# Add a user/group for Ory with a stable UID + GID:
# NOTE: This only appears relevant for supporting hydra as non-root, otherwise unnecessary.
addgroup --system --gid 500 ory
adduser --system --uid 500 \
--gecos "Ory User" \
--home /home/ory \
--ingroup ory \
--shell /sbin/nologin \
ory
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal to consider removing this in future follow-up PR.

  • Only seems relevant as a common "best practice" that some users request. It has it's fair share of caveats though, especially when not clearly communicated/documented for the image.
  • The only known dependency on this is optional SQLite for a fixed DSN location at /var/lib/sqlite, but this becomes redundant if the container has a bind mount volume, which again is not documented for clarity, but a common practice for persistence vs named data volumes.
  • Without any other support needed in the container, a user that wants this could better run the container with the user option to set the user to switch to and gain the same benefits.
  • If Hydra ever needs to use some linux capabilities, you'd need to add a step with setcap to grant them for non-root users, and ideally at runtime in Go handle checking this to raise the capability rather than enforce it via setcap (especially if the feature using it is optional). With root (including rootless Docker/Podman), the need for setcap is avoided. Keep this simple to maintain, since historically in this repo the Docker support is already showing inconsistencies in maintenance, mostly due to redundant noise for running as a non-root user by default.

Comment on lines +17 to +22
# Create the sqlite directory with ownership to that user and group:
# NOTE: This is required for read/write by SQLite.
# - Path may be a default value somewhere, or only explicitly provided via DSN?
# - Owner/Group is only relevant to permissions allowing the hydra process to read/write to the location.
# - Bind mount volumes will replace the ownership with that of the host directory, requiring correction.
install --owner ory --group ory --directory /var/lib/sqlite
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added commentary here, but from what I understand this path is only relevant when it's part of the DSN configured like in quickstart.yml? While ownership is only adjusted to support the non-root ory user to have RW access.

Some platforms like OpenShift have a feature that uses the root group for access while randomizing the UID the container runs as. I'm not entirely sure how viable that approach is and raised some concerns at another repo (caddy-docker) where someone proposed supporting that.

However, group/other permissions need to be adjusted for the DSN path regardless so that the hydra process has write access to create the DB (the execute bit on the directory is also required for that operation), thus 770 (rwxrwx---) is necessary.

services:
  hydra:
    image: localhost/hydra:v2.2.0
    # No delay in shutdown (uses tini as PID 1), proper signal forwarding + reaping:
    init: true
    # Optional: Run as non-root user, but use the root group for /var/lib/sqlite to avoid needing a chown
    user: 500:0
    environment:
      # DSN to SQLite:
      # https://www.ory.sh/docs/self-hosted/deployment#sql-persistent
      DSN: sqlite:///var/lib/sqlite/db.sqlite?_fk=true
    # Unlike the current quickstart.yml example which relies upon the Docker Compose `depends_on`
    # feature to apply DB migrations through another container instance, just run both commands here:
    # NOTE: While the quickstart doesn't mention it, `migrate sql` should technically
    # have you manually create a backup copy prior to running it (take caution with restart policy if automating):
    # https://www.ory.sh/docs/hydra/self-hosted/dependencies-environment
    entrypoint: ["ash", "-c"]
    command:
      - |
        hydra --config /etc/hydra/config.yml migrate sql --read-from-env --yes
        hydra --config /etc/hydra/config.yml serve all --dev --sqa-opt-out
    # Build image locally (no need to try pull from a remote registry):
    pull_policy: build
    build:
      dockerfile_inline: |
        FROM alpine
        RUN install --mode 770 --directory /var/lib/sqlite
        COPY --from=oryd/hydra:v2.2.0 /usr/bin/hydra /usr/bin/hydra
        # `--chmod` only required when container user is not run as root,
        # Volume mounting the config instead is fine as `git clone` permissions use umask which should result in 644.
        ADD --chmod=640 https://raw.githubusercontent.com/ory/hydra/refs/heads/master/contrib/quickstart/5-min/hydra.yml /etc/hydra/config.yml

So that works well, and if the user bind mounts a host directory instead, they can just set the user field to match UID of their host user. This avoids needing to set permissions of /var/lib/sqlite to 770, or know the UID/GID for the containers ory user/group is so that they can chmod their host directory ownership to that for compatibility 👍 (both inconvenient)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The dockerfile_inline would probably need to build the binary inline as well, otherwise the binary type might mismatch (osx versus linux for example)


USER ory
COPY hydra /usr/bin/hydra
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context is lacking a bit for these COPY instructions.

Since the image itself doesn't have a build stage, it's rather vague what the hydra binary was built with. I doubt it matters for most, but since this is used for the official image published to DockerHub, some context might be worthwhile.

I assume going forward both Dockerfile-alpine + Dockerfile-distroless will take the binary built via Dockerfile-build, but perhaps you typically build Hydra outside of a container? 🤷‍♂️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of docker compose and make docker, the binaries are built by goreleaser and then injected into the containers. So this is correct here.


ENTRYPOINT ["hydra"]
CMD ["serve", "all"]
USER ory
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As covered by earlier review comments, this could be dropped in a follow-up PR.

Technically a breaking change (but so is the 500:500 UID/GID proposed in this PR), but all a user should need to do is adjust the ownership of the /var/lib/sqlite or run with --user 101:101 (might differ by image depending on implicit UID/GID assigned to ory).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this change, the hydra command should run as non-root in my view

Comment on lines +1 to +4
# TODO: Remove this file in favor of distroless-static variant:
# https://github.com/ory/hydra/blob/master/.docker/Dockerfile-distroless-static
# However if published to any registry, continue to publish the variant tag but as an alias to `-distroless` tags:
# https://github.com/ory/hydra/pull/3914#pullrequestreview-2527315326
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is redundant AFAIK, I don't know where it's being used/published, so proposed changes for consistency remain here and the file could be dropped in a follow-up PR, reverting the change if required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and we no longer distribute this scratch image but instead the distroless variant. It should be OK to be removed.

Comment on lines +1 to +4
# TODO: Remove this file in favor of the main/default Alpine image. The sqlite package is no longer required:
# https://github.com/ory/hydra/blob/master/.docker/Dockerfile-alpine
# However if published to any registry, continue to publish the variant tag but as an alias to standard Alpine image tags:
# https://github.com/ory/hydra/pull/3914#pullrequestreview-2527315326
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is redundant AFAIK, I don't know where it's being used/published, so proposed changes for consistency remain here and the file could be dropped in a follow-up PR, reverting the change if required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it, I also could not find any use for it.

@polarathene
Copy link
Author

Overview of everything above

This comment should summarize everything with links for more details if you need them.

Changes from this PR

This PR applies the following changes:

  • Refactor Alpine images to be consistent with directives and common shared logic.
    • RUN directive now uses HereDocs feature (requires Docker v23 or DOCKER_BUILDKIT=1 if older docker engine).
    • 😎 Install ca-certificates + create ory user+group (with UID/GID pinned for consistency)
    • 😎 /var/lib/sqlite directory created via install command (effectively mkdir + chown)
    • 🚀 Remove legacy nsswitch.conf workaround
    • 🚀 Remove legacy VOLUME directive
    • 👀 Realization that Dockefile-scratch + Dockerfile-sqlite are redundant and could be deleted in future.
    • ⚠️ Realization that final stage of Dockerfile-hsm is broken, but stage isn't used (removal in future proposed, see TODOs below for more info).
  • USER directive shifted to end of each Dockerfile. Preferring non-root likely causes more issues than benefits (removal in future proposed, see TODOs below for more info).
  • Additional commentary. Some of it could be dropped, but adding directly into the source should see that it gets addressed more promptly or future maintenance / readers have better context to reference.

Overall if we strip the commentary and ignore minor differences, Dockerfile-alpine, Dockerfile-scratch, Dockerfile-sqlite and Dockerfile-hsm are all normalized to roughly this same common snippet:

RUN <<HEREDOC
    apk upgrade --no-cache
    apk add --no-cache --upgrade ca-certificates

    addgroup --system --gid 500 ory
    adduser --system --uid 500 \
      --gecos "Ory User" \
      --home /home/ory \
      --ingroup ory \
      --shell /sbin/nologin \
      ory

    install --owner ory --group ory --directory /var/lib/sqlite
HEREDOC

COPY hydra /usr/bin/hydra
ENTRYPOINT ["hydra"]
CMD ["serve", "all"]
USER ory

While addressing the two referenced issues at the start of the PR description (removal of legacy nsswitch.conf and VOLUME directive).

In future if dropping the non-root ory user, the Alpine and Distroless images effectively become this:

FROM alpine:3.20
RUN apk add --no-cache ca-certificates && mkdir -p /var/lib/sqlite

COPY hydra /usr/bin/hydra
ENTRYPOINT ["hydra"]
CMD ["serve", "all"]
FROM gcr.io/distroless/static-debian12:nonroot
RUN --mount=from=busybox:latest,src=/bin/,dst=/bin/ mkdir -p /var/lib/sqlite

COPY hydra /usr/bin/hydra
ENTRYPOINT ["hydra"]
CMD ["serve", "all"]

All the verbosity I've provided is towards instilling confidence to pursue that change via a follow-up PR.

Additional context

  • The DockerHub registry only publishes two images, Alpine (default) + Debian (-distroless suffix). HSM variant is only for testing. Each Dockerfile has COPY for the hydra externally, except for Dockerfile-build (although this does not appear to be used for publishing the -distroless images?)
  • Dockerfile-alpine + Dockerfile-build are the only relevant images to retain / revise. The HSM support is minor changes, see TODOs below for future work.
  • I've not updated any other Dockerfile like the conformance test, which may warrant adopting the same changes for easier maintenance (it's effectively -distroless variant but with custom cert/CA added, there's better ways to approach that). Side-note other Dockerfiles for testing are over 5 years old with base images, those should probably get updated.

Future PRs / tasks

Related TODOs from observations while tackling this PR, not to be resolved by this PR:

  • E2E test in CI is failing for HSM. This PR doesn't influence that, it's a difference with the CI running Ubuntu while the Docker images use Debian which has the softhsm package (Issue).
  • DockerHub (and any other registry that might be published to?) has misconfigured publishing config (Issue, Ref).
  • Images presently published (v2.2.0) have a dynamically linked hydra binary for Alpine with sqlite support, while the Distroless variant lacks sqlite support (unless switching to Dockerfile-build like make docker uses to produce a -sqlite image tag). Both images should be published with the same static binary (Ref, see minor inconsistency section).
  • DX concerns with docs related to Docker and Development guidance for contributors (Issue, Issue 2).
  • Follow-up PR should consider dropping non-root, it doesn't seem necessary to maintain, nor properly documented to the image consumer (Ref 1, Ref 2, Ref 3, Ref 4).
  • Follow-up PR could delete a few redundant Dockerfile variants (Dockerfile-sqlite, Dockerfile-scratch, Dockerfile-distroless-static), which along with defaulting to root as the container user could potentially cover all images in a small single Dockerfile (or two if keeping HSM separate) (Ref 1. Ref 2, Ref 3, Ref 4, Ref 5, Ref 6).
  • The HSM image final stage is broken. Apparently not used, so could be dropped (Ref 1, Ref 2).
  • GO111MODULE=on is no longer relevant since Go 1.16 (Feb 2021), ignored since Go 1.17. You should probably remove GO111MODULE from the various files in this repo.

@aeneasr
Copy link
Member

aeneasr commented Jan 8, 2025

The HSM image final stage is broken. Apparently not used, so could be dropped (#3914 (comment), #3914 (comment)).

Unfortunately, that stage is used for CI tests and is causing CI failures

@aeneasr aeneasr mentioned this pull request Jan 10, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants