Developing renku-data-services

This document details the structure of the code in this repository. For information on how to set up a dev environment and run code, consult the readme.

Architecture

Polylith

Data Services follows a polylith approach to structuring the repository (using the Polylith Poetry Plugin):

components contain all the application code, divided into modules based on entity types
bases contains the glue code to bring the different components together into a single unit/api. The entrypoint of applications is usually a main.py in one of the bases
projects contains the Dockerfiles and pyproject.toml's for each deployed service

Bases/Projects

There are three independent services/deployments (projects/bases):

Renku Data Services (data_api): The main CRUD service for persisting data in postgres
Secrets Storage (secrets_storage_api): Handles loading user secrets securely when needed
Background Jobs (background_jobs): Kubernetes cronjobs for recurring tasks

Components

Within components, there are the following modules:

app_config: Application configuration file for data services
authn: Authentication code (Keycloak and Gitlab)
authz: Authorization code using SpiceDB/Authzed
base_api: Common functionality shared by different APIs
base_models: Common functionality shared by all domain models
base_orm: Common functionality shared by database object relational models
connected_services: Code concerning third-party integrations (e.g. Gitlab/Github)
crc: Compute resource controls code, dealing with resource classes and resource pools for interactive compute
db_config: Database configuration
errors: Common application error types shared by all apis
k8s: Kubernetes client code
message_queue: Redis streams messaging code
migrations: Database migrations
namespace: Code for handling namespaces (user/groups)
platform: Renku platform configuration code
project: Code for Project entities
repositories: Code for git repositories associated with projects
secrets: Code for handling user secrets
session: User session management
storage: Cloud storage management
users: User management
utils: Common utilities for reuse across the code base

This repository follows a light-weight Hexagonal Architecture approach (also known as ports and adapters). Modules are usually split into:

apispec.yaml and autogenerated from it apispec.py, which is the OpenAPI specification for endpoints. Customizations are done in apispec_base.py
blueprints.py contains the Sanic endpoints for the api, dealing with (de-)serialization and validation.
models.py contains domain models
core.py contains business logic
orm.py contains object-relational models for the database
db.py contains the database repositories

The models and code in models.py and core.py form the inner circle in the architecture. This means they can be depended on by everything, but should not depend on the outer layers. Instead, they should depend on interfaces/protocols that are implemented in the outer layers. The database repositories in db.py encapsulate database queries into a usable API. Their methods should match usecases and shouldn't leak ORMs to the outside. The blueprints.py code should call business logic in core.py, or if there is no business logic, call the database repositories directly. They should always use domain models for communicating with other code.

Supporting Services

Data Services mirrors users that it gets from Keycloak and Keycloak is the source of truth for users. We use Authzed/SpiceDB for authorization. This enables complex transitive rules for who is permitted to perform which actions. Redis Streams is used for sending events to other services, mainly Renku Search which is responsible for indexing and searching entities.

Setting up a development environment

Devcontainer

There is a Devcontainer available in the .devcontainer folder. If you use VSCode, this should be picked up automatically. Alternatively, you can run it with the devcontainer cli by running:

$ devcontainer up --workspace-folder .
$ devcontainer exec --container-id renku-data-services_devcontainer-data_service-1 -- bash

The devcontainer contains Postgres, SpiceDB, the correct Python environment and other useful development tools.

Developing with nix

When using nix, a development environment can be created:

Run nix develop in the source root to drop into the development environment.
In another terminal, run vm-run (headless) to start a vm running necessary external services, like the PostgreSQL database.
Run poetry install to install the python venv

Then make run, make tests etc can be used as usual.

The environment also contains other useful tools, like ruff-lsp, pyright and more. Instead of a vm, a development environment using NixOS containers is also available.

It will run a bash shell, check out direnv and the use flake function if you prefer to keep your favorite shell.

Running Tests

You can run style checks using make style_checks. To run the test test suite, use make tests (you likely need to run in the devcontainer for this to work, as it needs some surrounding services to run).

Migrations

We use Alembic for migrations and we have a single version table for all schemas. This version table is used by Alembic to determine what migrations have been applied or not and it resides in the common schema. That is why all the Alembic commands include the --name common argument.

Our Alembic setup is such that we have multiple schemas. Most use cases will probably simply use the common schema. However, if you add a new schema, you have to make sure to add the metadata for it in the components/renku_data_services/migrations/env.py file.

To create a new migration:

DUMMY_STORES=true poetry run alembic -c components/renku_data_services/migrations/alembic.ini --name common revision -m "<message>" --autogenerate --version-path components/renku_data_services/migrations/versions

You can specify a different version path if you wish to, just make sure it is listed in alembic.ini under version_locations.

To run all migrations: DUMMY_STORES=true poetry run alembic -c components/renku_data_services/migrations/alembic.ini --name=common upgrade heads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEVELOPING.md

DEVELOPING.md

Developing renku-data-services

Architecture

Polylith

Bases/Projects

Components

Supporting Services

Setting up a development environment

Devcontainer

Developing with nix

Running Tests

Migrations

Files

DEVELOPING.md

Latest commit

History

DEVELOPING.md

File metadata and controls

Developing renku-data-services

Architecture

Polylith

Bases/Projects

Components

Supporting Services

Setting up a development environment

Devcontainer

Developing with nix

Running Tests

Migrations