Skip to content

Latest commit

 

History

History
135 lines (106 loc) · 6.92 KB

DEVELOPING.md

File metadata and controls

135 lines (106 loc) · 6.92 KB

Developing renku-data-services

This document details the structure of the code in this repository. For information on how to set up a dev environment and run code, consult the readme.

Architecture

Polylith

Data Services follows a polylith approach to structuring the repository (using the Polylith Poetry Plugin):

  • components contain all the application code, divided into modules based on entity types
  • bases contains the glue code to bring the different components together into a single unit/api. The entrypoint of applications is usually a main.py in one of the bases
  • projects contains the Dockerfiles and pyproject.toml's for each deployed service

Bases/Projects

There are three independent services/deployments (projects/bases):

  • Renku Data Services (data_api): The main CRUD service for persisting data in postgres
  • Secrets Storage (secrets_storage_api): Handles loading user secrets securely when needed
  • Background Jobs (background_jobs): Kubernetes cronjobs for recurring tasks

Components

Within components, there are the following modules:

  • app_config: Application configuration file for data services
  • authn: Authentication code (Keycloak and Gitlab)
  • authz: Authorization code using SpiceDB/Authzed
  • base_api: Common functionality shared by different APIs
  • base_models: Common functionality shared by all domain models
  • base_orm: Common functionality shared by database object relational models
  • connected_services: Code concerning third-party integrations (e.g. Gitlab/Github)
  • crc: Compute resource controls code, dealing with resource classes and resource pools for interactive compute
  • db_config: Database configuration
  • errors: Common application error types shared by all apis
  • k8s: Kubernetes client code
  • message_queue: Redis streams messaging code
  • migrations: Database migrations
  • namespace: Code for handling namespaces (user/groups)
  • platform: Renku platform configuration code
  • project: Code for Project entities
  • repositories: Code for git repositories associated with projects
  • secrets: Code for handling user secrets
  • session: User session management
  • storage: Cloud storage management
  • users: User management
  • utils: Common utilities for reuse across the code base

This repository follows a light-weight Hexagonal Architecture approach (also known as ports and adapters). Modules are usually split into:

  • apispec.yaml and autogenerated from it apispec.py, which is the OpenAPI specification for endpoints. Customizations are done in apispec_base.py
  • blueprints.py contains the Sanic endpoints for the api, dealing with (de-)serialization and validation.
  • models.py contains domain models
  • core.py contains business logic
  • orm.py contains object-relational models for the database
  • db.py contains the database repositories

The models and code in models.py and core.py form the inner circle in the architecture. This means they can be depended on by everything, but should not depend on the outer layers. Instead, they should depend on interfaces/protocols that are implemented in the outer layers. The database repositories in db.py encapsulate database queries into a usable API. Their methods should match usecases and shouldn't leak ORMs to the outside. The blueprints.py code should call business logic in core.py, or if there is no business logic, call the database repositories directly. They should always use domain models for communicating with other code.

Supporting Services

Data Services mirrors users that it gets from Keycloak and Keycloak is the source of truth for users. We use Authzed/SpiceDB for authorization. This enables complex transitive rules for who is permitted to perform which actions. Redis Streams is used for sending events to other services, mainly Renku Search which is responsible for indexing and searching entities.

Setting up a development environment

Devcontainer

There is a Devcontainer available in the .devcontainer folder. If you use VSCode, this should be picked up automatically. Alternatively, you can run it with the devcontainer cli by running:

$ devcontainer up --workspace-folder .
$ devcontainer exec --container-id renku-data-services_devcontainer-data_service-1 -- bash

The devcontainer contains Postgres, SpiceDB, the correct Python environment and other useful development tools.

Developing with nix

When using nix, a development environment can be created:

  1. Run nix develop in the source root to drop into the development environment.
  2. In another terminal, run vm-run (headless) to start a vm running necessary external services, like the PostgreSQL database.
  3. Run poetry install to install the python venv

Then make run, make tests etc can be used as usual.

The environment also contains other useful tools, like ruff-lsp, pyright and more. Instead of a vm, a development environment using NixOS containers is also available.

It will run a bash shell, check out direnv and the use flake function if you prefer to keep your favorite shell.

Running Tests

You can run style checks using make style_checks. To run the test test suite, use make tests (you likely need to run in the devcontainer for this to work, as it needs some surrounding services to run).

Migrations

We use Alembic for migrations and we have a single version table for all schemas. This version table is used by Alembic to determine what migrations have been applied or not and it resides in the common schema. That is why all the Alembic commands include the --name common argument.

Our Alembic setup is such that we have multiple schemas. Most use cases will probably simply use the common schema. However, if you add a new schema, you have to make sure to add the metadata for it in the components/renku_data_services/migrations/env.py file.

To create a new migration:

DUMMY_STORES=true poetry run alembic -c components/renku_data_services/migrations/alembic.ini --name common revision -m "<message>" --autogenerate --version-path components/renku_data_services/migrations/versions

You can specify a different version path if you wish to, just make sure it is listed in alembic.ini under version_locations.

To run all migrations: DUMMY_STORES=true poetry run alembic -c components/renku_data_services/migrations/alembic.ini --name=common upgrade heads