This document details the structure of the code in this repository. For information on how to set up a dev environment and run code, consult the readme.
Data Services follows a polylith approach to structuring the repository (using the Polylith Poetry Plugin):
components
contain all the application code, divided into modules based on entity typesbases
contains the glue code to bring the different components together into a single unit/api. The entrypoint of applications is usually amain.py
in one of the basesprojects
contains the Dockerfiles and pyproject.toml's for each deployed service
There are three independent services/deployments (projects/bases):
- Renku Data Services (
data_api
): The main CRUD service for persisting data in postgres - Secrets Storage (
secrets_storage_api
): Handles loading user secrets securely when needed - Background Jobs (
background_jobs
): Kubernetes cronjobs for recurring tasks
Within components, there are the following modules:
- app_config: Application configuration file for data services
- authn: Authentication code (Keycloak and Gitlab)
- authz: Authorization code using SpiceDB/Authzed
- base_api: Common functionality shared by different APIs
- base_models: Common functionality shared by all domain models
- base_orm: Common functionality shared by database object relational models
- connected_services: Code concerning third-party integrations (e.g. Gitlab/Github)
- crc: Compute resource controls code, dealing with resource classes and resource pools for interactive compute
- db_config: Database configuration
- errors: Common application error types shared by all apis
- k8s: Kubernetes client code
- message_queue: Redis streams messaging code
- migrations: Database migrations
- namespace: Code for handling namespaces (user/groups)
- platform: Renku platform configuration code
- project: Code for Project entities
- repositories: Code for git repositories associated with projects
- secrets: Code for handling user secrets
- session: User session management
- storage: Cloud storage management
- users: User management
- utils: Common utilities for reuse across the code base
This repository follows a light-weight Hexagonal Architecture approach (also known as ports and adapters). Modules are usually split into:
apispec.yaml
and autogenerated from itapispec.py
, which is the OpenAPI specification for endpoints. Customizations are done inapispec_base.py
blueprints.py
contains the Sanic endpoints for the api, dealing with (de-)serialization and validation.models.py
contains domain modelscore.py
contains business logicorm.py
contains object-relational models for the databasedb.py
contains the database repositories
The models and code in models.py
and core.py
form the inner circle in the architecture. This means they can be depended on
by everything, but should not depend on the outer layers. Instead, they should depend on interfaces/protocols that are
implemented in the outer layers.
The database repositories in db.py
encapsulate database queries into a usable API. Their methods should match usecases and shouldn't
leak ORMs to the outside.
The blueprints.py
code should call business logic in core.py
, or if there is no business logic, call the database repositories
directly. They should always use domain models for communicating with other code.
Data Services mirrors users that it gets from Keycloak and Keycloak is the source of truth for users. We use Authzed/SpiceDB for authorization. This enables complex transitive rules for who is permitted to perform which actions. Redis Streams is used for sending events to other services, mainly Renku Search which is responsible for indexing and searching entities.
There is a Devcontainer available in the .devcontainer
folder.
If you use VSCode, this should be picked up automatically.
Alternatively, you can run it with the devcontainer cli by running:
$ devcontainer up --workspace-folder .
$ devcontainer exec --container-id renku-data-services_devcontainer-data_service-1 -- bash
The devcontainer contains Postgres, SpiceDB, the correct Python environment and other useful development tools.
When using nix, a development environment can be created:
- Run
nix develop
in the source root to drop into the development environment. - In another terminal, run
vm-run
(headless) to start a vm running necessary external services, like the PostgreSQL database. - Run
poetry install
to install the python venv
Then make run
, make tests
etc can be used as usual.
The environment also contains other useful tools, like ruff-lsp, pyright and more. Instead of a vm, a development environment using NixOS containers is also available.
It will run a bash shell, check out direnv and the use flake function if you prefer to keep your favorite shell.
You can run style checks using make style_checks
.
To run the test test suite, use make tests
(you likely need to run in the devcontainer for this to work, as it needs
some surrounding services to run).
We use Alembic for migrations and we have a single version table for all schemas. This version table
is used by Alembic to determine what migrations have been applied or not and it resides in the common
schema. That is why all the Alembic commands include the --name common
argument.
Our Alembic setup is such that we have multiple schemas. Most use cases will probably simply use
the common
schema. However, if you add a new schema, you have to make sure to add the
metadata for it in the components/renku_data_services/migrations/env.py
file.
To create a new migration:
DUMMY_STORES=true poetry run alembic -c components/renku_data_services/migrations/alembic.ini --name common revision -m "<message>" --autogenerate --version-path components/renku_data_services/migrations/versions
You can specify a different version path if you wish to, just make sure it is listed in alembic.ini
under
version_locations
.
To run all migrations:
DUMMY_STORES=true poetry run alembic -c components/renku_data_services/migrations/alembic.ini --name=common upgrade heads