Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔐 Reduce Docker size by half + improve security #465

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

timoa
Copy link
Contributor

@timoa timoa commented Nov 14, 2024

Details

This PR optimized the Dockerfiles (frontend + backend), significantly reducing the image size and improving security by running as a non-root user (node).

Image Current size New size Improvement
perplexica-frontend 2.37GB 846MB -64%
perplexica-backend 1.84GB 640MB -65%

Here are the fundamental changes and explanations:

  1. Multi-stage build: I used a two-stage build process. The first stage (builder) installs dependencies and builds the application. The second stage only copies the necessary files for running the application.

  2. Removed ARG variables on the backend image: Since Docker Compose or Kubernetes will provide the environment variables, I removed them from the Dockerfile. We can set these variables in your Docker Compose file or Kubernetes deployment configuration.

  3. Optimized copying and building: I first copy only the package.json and yarn.lock files, then install dependencies. This allows better caching of the dependency installation step.

  4. Minimal production image: The final stage only copies the built assets, node_modules, and necessary files from the builder stage, resulting in a much smaller Docker image.

  5. Use a more standard folder for the app: I replaced the /home/perplexica with /app and updated the docker-compose.yaml file to this new path.

  6. Use a non-root user (node): Instead of using the root user by default, I changed the container user to node (the default user for official Node Docker images). I had to set this user's permissions on the Dockerfiles and the docker-compose.yaml to avoid permission issues on the SQLite DB file.

  7. Use an ARG variable for the backend image to use by default the node user when running on Kubernetes and the root user if running with Docker Compose. The Docker Compose volumes are created with the root user, and the SQLite DB is accessible only as read-only if running the node user.

  8. Update the Node version to Node v22 for the frontend and backend.

  9. I tried to move all the ENV vars to a shared .env in the root folder by providing the right syntax to the Docker Compose file, but I haven't figured out why the Frontend app still looking for its ./ui/.envfile. If you have any insights, I will be happy to fix it. For now, it uses the same .env file for the backend and frontend (I have updated the README with additional instructions).

Important

The downside to running the backend Docker image as a non-root user is that Docker Compose will mount the volume as root, and the node user will have access to the DB only in read-only mode.
Docker Compose must run only with the --build flag to force rebuild the Docker image with the root user.
By default, the Docker images will be published to Docker Hub with the node user.

We will not have this issue with the SQLite DB permissions on Kubernetes because the volumes are managed differently.

Moving to a Postgres DB will fix this issue and help scale the project later. The Docker Compose will be able to launch a Postgres image, and it will be the same for Kubernetes with a dedicated pod or managed database like AWS RDS.

You can keep the root user for the backend image if you think that is too much for simple use with the Docker Compose file, but using the non-root user (node) will be more secure.

@timoa
Copy link
Contributor Author

timoa commented Nov 14, 2024

cc: @rrfaria: I closed the previous PR to provide a cleaner branch.

@timoa timoa marked this pull request as ready for review November 14, 2024 22:48
@ItzCrazyKns
Copy link
Owner

You mentioned that you fixed issue with NEXT_PUBLIC_ENV vars. It cannot be fixed, since I now provide prebuild images, the public env vars are hardcoded in the code generated by nextjs. Its not feasible to change them and changing them via a script is not a practical approach. Your PR seems good for users who want to build images locally, but in other terms even if I link the env variable, the vars would still be hardcoded.

@timoa
Copy link
Contributor Author

timoa commented Nov 16, 2024

You mentioned that you fixed issue with NEXT_PUBLIC_ENV vars. It cannot be fixed, since I now provide prebuild images, the public env vars are hardcoded in the code generated by nextjs. Its not feasible to change them and changing them via a script is not a practical approach. Your PR seems good for users who want to build images locally, but in other terms even if I link the env variable, the vars would still be hardcoded.

There is a lot of projects that are using the NEXT_PUBLIC_API_URL on Github, but maybe you have a specific use case.
I'm not a NextJS expert, but since you're building on a Docker image, maybe you don't need to build the frontend static version? It will run with the node engine and be able to get access to the ENV vars.
In this case, it will use the .env file provided by the Docker Compose, like in my PR.
It will also work when deploying it on Kubernetes, where it is using the ENV vars provided by the K8S pod.

My last Helm chart was for the project TypeBot (PR in progress), a NextJS app with a backend and frontend.
The Dockerfile is a bit complex, but it gets the ENV vars from the .env file, and it works well on Kubernetes using public images. I will try to look at the frontend's build process and see if it can work for Perplexica.

@Froggy232
Copy link

Hi there,
First, thanks for your messages, it seems there is some hope for it to work!
Do you have any news on this? I try to run perplexica in a podman pod, and I think I have a problem that this would solve.
Thanks you a lot, of course, I would totally understand if you haven't had the times, or if you haven't found a solution.
Have a nice day,
Best regards

@timoa
Copy link
Contributor Author

timoa commented Nov 25, 2024

I will try to look closer this week. I will fix the new conflicts first. Thanks!

@Froggy232
Copy link

Hi,
Is there any news on the possibility to use custom NEXT_PUBLIC_API_URL and NEXT_PUBLIC_WS_URL with prebuilt docker images?
Of course, I would totally understand if it was delayed, or not possible at all finally.
Thanks for your work!

@mitchross
Copy link

@timoa any chance to get these conflicts resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants