Skip to content

Commit

Permalink
Merge pull request #126 from getindata/release-0.6.3
Browse files Browse the repository at this point in the history
Release 0.6.3
  • Loading branch information
szczeles authored May 10, 2022
2 parents be5f0b1 + e5a7e98 commit 5e57a04
Show file tree
Hide file tree
Showing 21 changed files with 85 additions and 1,185 deletions.
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

## [Unreleased]

## [0.6.3] - 2022-05-10

- KFP SDK version bumped to 1.8.11 in order to fix misbehaving TTL issue
- Dropped support for VertexAI, please use [kedro-vertexi](https://kedro-kubeflow.readthedocs.io/en/latest/index.html) instead
- Add Kedro environment name to the pipeline name during upload

## [0.6.2] - 2022-03-10

- Added support for defining retry policy for the Kubeflow Pipelines nodes
Expand Down Expand Up @@ -127,7 +133,9 @@
- Method to schedule runs for most recent version of given pipeline `kedro kubeflow schedule`
- Shortcut to open UI for pipelines using `kedro kubeflow ui`

[Unreleased]: https://github.com/getindata/kedro-kubeflow/compare/0.6.2...HEAD
[Unreleased]: https://github.com/getindata/kedro-kubeflow/compare/0.6.3...HEAD

[0.6.3]: https://github.com/getindata/kedro-kubeflow/compare/0.6.2...0.6.3

[0.6.2]: https://github.com/getindata/kedro-kubeflow/compare/0.6.1...0.6.2

Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![SemVer](https://img.shields.io/badge/semver-2.0.0-green)](https://semver.org/)
[![PyPI version](https://badge.fury.io/py/kedro-kubeflow.svg)](https://pypi.org/project/kedro-kubeflow/)
[![Downloads](https://pepy.tech/badge/kedro-kubeflow)](https://pepy.tech/project/kedro-kubeflow)
[![Downloads](https://img.shields.io/pypi/dm/kedro-kubeflow)](https://img.shields.io/pypi/dm/kedro-kubeflow)

[![Maintainability](https://api.codeclimate.com/v1/badges/fff07cbd2e5012a045a3/maintainability)](https://codeclimate.com/github/getindata/kedro-kubeflow/maintainability)
[![Test Coverage](https://api.codeclimate.com/v1/badges/fff07cbd2e5012a045a3/test_coverage)](https://codeclimate.com/github/getindata/kedro-kubeflow/test_coverage)
[![Documentation Status](https://readthedocs.org/projects/kedro-kubeflow/badge/?version=latest)](https://kedro-kubeflow.readthedocs.io/en/latest/?badge=latest)
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fgetindata%2Fkedro-kubeflow.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fgetindata%2Fkedro-kubeflow?ref=badge_shield)

## About

Expand Down
5 changes: 1 addition & 4 deletions docs/source/02_installation/02_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,6 @@ run_config:
# on the same tag, or Never if you use only local images
image_pull_policy: IfNotPresent

# Location of Vertex AI GCS root, required only for vertex ai pipelines configuration
root: bucket_name/gcs_suffix

# Name of the kubeflow experiment to be created
experiment_name: Kubeflow Plugin Demo [${branch_name|local}]

Expand Down Expand Up @@ -66,7 +63,7 @@ run_config:
# is collapsed to one node.
#node_merge_strategy: none

# Optional volume specification (only for non vertex-ai)
# Optional volume specification
volume:

# Storage class - use null (or no value) to use the default storage
Expand Down
7 changes: 3 additions & 4 deletions docs/source/03_getting_started/01_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,14 @@ $ source venv-demo/bin/activate
Then, `kedro` must be present to enable cloning the starter project, along with the latest version of `kedro-kubeflow` plugina and kedro-docker (required to build docker images with the Kedro pipeline nodes):

```
$ pip install 'kedro<0.17' kedro-kubeflow kedro-docker
$ pip install 'kedro<0.18' kedro-kubeflow kedro-docker
```

With the dependencies in place, let's create a new project:

```
$ kedro new --starter=git+https://github.com/getindata/kedro-starter-spaceflights.git --checkout allow_nodes_with_commas
$ kedro new --starter=spaceflights
Project Name:
=============
Please enter a human readable name for your new project.
Expand All @@ -53,8 +54,6 @@ Change directory to the project generated in /home/mario/kedro/kubeflow-plugin-d
A best-practice setup includes initialising git and creating a virtual environment before running `kedro install` to install project-specific dependencies. Refer to the Kedro documentation: https://kedro.readthedocs.io/
```

> TODO: switch to the official `spaceflights` starter after https://github.com/quantumblacklabs/kedro-starter-spaceflights/pull/10 is merged
Finally, go the demo project directory and ensure that kedro-kubeflow plugin is activated:

```console
Expand Down
60 changes: 3 additions & 57 deletions docs/source/03_getting_started/02_gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,61 +54,7 @@ The above will work if you are connecting from within GCP VM or locally with spe
service account credentials. It will *NOT* work for credentials obtained with `google
cloud application-default login`.

### Using `kedro-kubeflow` with Vertex AI Pipelines (EXPERIMENTAL)

[Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines)
is a fully managed service that allows to easily deploy
[Kubeflow Pipelines](https://www.kubeflow.org/docs/pipelines/overview/pipelines-overview/)
on a serverless Google service. [Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines)
was still in a Preview mode when this plugin version was released, therefore plugin
capability is also limited.

##### 1. Preparing configuration

In order the plugin picks Vertex AI Pipelines as a target infrastructure, it has to be indicated
in configuration. As the solution is serverless, no URL is to be provided. Instead, special set
of parameters has to be passed, so that connection is established with proper GCP service.

```yaml
host: vertex-ai-pipelines
project_id: hosting-project
region: europe-west4
run_config:
root: vertex-ai-pipelines-accessible-gcs-bucket/pipelines-specific-path
```
If the pipeline requires access to services that are not exposed to public internet, you need to configure [VPC peering between Vertex internal network and VPC that hosts the internal service](https://cloud.google.com/vertex-ai/docs/general/vpc-peering) and then set the VPC identifier in the configuration. Optionally, you can add custom host aliases:
```yaml
run_config:
vertex_ai_networking:
vpc: projects/12345/global/networks/name-of-vpc
host_aliases:
- ip: 10.10.10.10
hostnames: ['mlflow.internal']
- ip: 10.10.20.20
hostnames: ['featurestore.internal']
```
##### 2. Preparing environment variables
There're the following specific environment variables required for the pipeline to run correctly:
* SERVICE_ACCOUNT - full email of service account that job will use to run the pipeline. Account has
to have access to `run_config.root` path. Variable is optional, if no given, project compute account is used
* MLFLOW_TRACKING_TOKEN - identity token required if MLFlow is used inside the project and mlflow access
is protected. Token is passed as it is to kedro nodes in order to authenticate against MLFlow service.
Can be generated via `gcloud auth print-identity-token` command.

#### 3. Supported commands

Following commands are supported:

```bash
kedro kubeflow compile
kedro kubeflow run-once
kedro kubeflow schedule
kedro kubeflow list-pipelines
```

![Vertex_AI_Pipelines](vertex_ai_pipelines.png)
### Using `kedro-kubeflow` with Vertex AI Pipelines (DEPRECATED)

Vertex AI Pipelines support in `kedro-kubeflow` has been deprecated in favour of the
new plugin [kedro-vertexai](https://kedro-vertexai.readthedocs.io/en/latest/)
2 changes: 1 addition & 1 deletion kedro_kubeflow/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""kedro_kubeflow."""

version = "0.6.2"
version = "0.6.3"
2 changes: 2 additions & 0 deletions kedro_kubeflow/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ def upload_pipeline(ctx, image, pipeline) -> None:
pipeline_name=pipeline,
image=image if image else config.image,
image_pull_policy=config.image_pull_policy,
env=ctx.obj["context_helper"].env,
)


Expand Down Expand Up @@ -236,6 +237,7 @@ def schedule(
cron_expression,
run_name=config.scheduled_run_name,
parameters=format_params(params),
env=ctx.obj["context_helper"].env,
)


Expand Down
26 changes: 1 addition & 25 deletions kedro_kubeflow/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,6 @@
# on the same tag, or Never if you use only local images
image_pull_policy: IfNotPresent
# Location of Vertex AI GCS root, required only for vertex ai pipelines configuration
#root: bucket_name/gcs_suffix
# Name of the kubeflow experiment to be created
experiment_name: {project}
Expand Down Expand Up @@ -66,7 +63,7 @@
# is collapsed to one node.
#node_merge_strategy: none
# Optional volume specification (only for non vertex-ai)
# Optional volume specification
volume:
# Storage class - use null (or no value) to use the default storage
Expand Down Expand Up @@ -156,17 +153,6 @@ def __eq__(self, other):
return self._raw == other._raw


class VertexAiNetworkingConfig(Config):
@property
def vpc(self):
return self._get_or_default("vpc", None)

@property
def host_aliases(self):
aliases = self._get_or_default("host_aliases", [])
return {alias["ip"]: alias["hostnames"] for alias in aliases}


class VolumeConfig(Config):
@property
def storageclass(self):
Expand Down Expand Up @@ -299,12 +285,6 @@ def ttl(self):
def on_exit_pipeline(self):
return self._get_or_default("on_exit_pipeline", None)

@property
def vertex_ai_networking(self):
return VertexAiNetworkingConfig(
self._get_or_default("vertex_ai_networking", {})
)

@property
def node_merge_strategy(self):
strategy = str(self._get_or_default("node_merge_strategy", "none"))
Expand Down Expand Up @@ -341,10 +321,6 @@ def project_id(self):
def region(self):
return self._get_or_fail("region")

@property
def is_vertex_ai_pipelines(self):
return self.host == "vertex-ai-pipelines"

@staticmethod
def initialize_github_actions(project_name, where, templates_dir):
os.makedirs(where / ".github/workflows", exist_ok=True)
Expand Down
23 changes: 10 additions & 13 deletions kedro_kubeflow/context_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ def session(self):

return KedroSession.create(self._metadata.package_name, env=self._env)

@property
def env(self):
return self._env

@property
def context(self):
return self.session.load_context()
Expand All @@ -68,20 +72,13 @@ def config(self) -> PluginConfig:
@property
@lru_cache()
def kfp_client(self):
if self.config.is_vertex_ai_pipelines:
from .vertex_ai.client import VertexAIPipelinesClient
from .kfpclient import KubeflowClient

return VertexAIPipelinesClient(
self.config, self.project_name, self.context
)
else:
from .kfpclient import KubeflowClient

return KubeflowClient(
self.config,
self.project_name,
self.context,
)
return KubeflowClient(
self.config,
self.project_name,
self.context,
)

@staticmethod
def init(metadata, env):
Expand Down
13 changes: 7 additions & 6 deletions kedro_kubeflow/kfpclient.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,15 +94,15 @@ def compile(
)
self.log.info("Generated pipeline definition was saved to %s" % output)

def get_full_pipeline_name(self, pipeline_name):
return f"[{self.project_name}] {pipeline_name}"
def get_full_pipeline_name(self, pipeline_name, env):
return f"[{self.project_name}] {pipeline_name} (env: {env})"[:100]

def upload(self, pipeline_name, image, image_pull_policy="IfNotPresent"):
def upload(self, pipeline_name, image, image_pull_policy, env):
pipeline = self.generator.generate_pipeline(
pipeline_name, image, image_pull_policy
)

full_pipeline_name = self.get_full_pipeline_name(pipeline_name)
full_pipeline_name = self.get_full_pipeline_name(pipeline_name, env)
if self._pipeline_exists(full_pipeline_name):
pipeline_id = self.client.get_pipeline_id(full_pipeline_name)
version_id = self._upload_pipeline_version(pipeline, pipeline_id)
Expand Down Expand Up @@ -169,13 +169,14 @@ def schedule(
experiment_namespace,
cron_expression,
run_name,
parameters={},
parameters,
env,
):
experiment_id = self._ensure_experiment_exists(
experiment_name, experiment_namespace
)
pipeline_id = self.client.get_pipeline_id(
self.get_full_pipeline_name(pipeline)
self.get_full_pipeline_name(pipeline, env)
)
formatted_run_name = run_name.format(**parameters)
self._disable_runs(experiment_id, formatted_run_name)
Expand Down
1 change: 0 additions & 1 deletion kedro_kubeflow/vertex_ai/__init__.py

This file was deleted.

Loading

0 comments on commit 5e57a04

Please sign in to comment.