Skip to content

Commit

Permalink
Merge branch 'release/0.1.8'
Browse files Browse the repository at this point in the history
  • Loading branch information
raethlein committed Nov 13, 2019
2 parents b3a169a + e34d931 commit ea91cf9
Show file tree
Hide file tree
Showing 14 changed files with 978 additions and 92 deletions.
9 changes: 7 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ COPY resources/mlhubspawner /mlhubspawner

RUN \
pip install --no-cache git+https://github.com/jupyterhub/dockerspawner@d1f27e2855d2cefbdb25b29cc069b9ca69d564e3 && \
pip install --no-cache git+https://github.com/ml-tooling/nativeauthenticator@983b203069ca797ff5c595f985075c11ae17656c && \
pip install --no-cache git+https://github.com/ml-tooling/nativeauthenticator@9859a69dcc9d2ae8d827f192a1580d86f897e9f1 && \
pip install --no-cache git+https://github.com/ryanlovett/imagespawner && \
pip install --no-cache /mlhubspawner && \
rm -r /mlhubspawner && \
Expand Down Expand Up @@ -133,6 +133,10 @@ COPY resources/logo.png /usr/local/share/jupyterhub/static/images/jupyter.png
COPY resources/jupyterhub_config.py $_RESOURCES_PATH/jupyterhub_config.py
COPY resources/jupyterhub-mod/template-home.html /usr/local/share/jupyterhub/templates/home.html
COPY resources/jupyterhub-mod/template-admin.html /usr/local/share/jupyterhub/templates/admin.html
COPY resources/jupyterhub-mod/ssh-dialog-snippet.html /usr/local/share/jupyterhub/templates/ssh-dialog-snippet.html
COPY resources/jupyterhub-mod/info-dialog-snippet.html /usr/local/share/jupyterhub/templates/info-dialog-snippet.html
COPY resources/jupyterhub-mod/jsonpresenter /usr/local/share/jupyterhub/static/components/jsonpresenter/
COPY resources/jupyterhub-mod/cleanup-service.py /resources/cleanup-service.py

RUN \
touch $_RESOURCES_PATH/jupyterhub_user_config.py && \
Expand All @@ -154,7 +158,8 @@ ENV \
START_JHUB=true \
START_CHP=false \
EXECUTION_MODE="local" \
HUB_NAME="mlhub"
HUB_NAME="mlhub" \
CLEANUP_INTERVAL_SECONDS=3600

### END CONFIGURATION ###

Expand Down
31 changes: 29 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,15 @@
<a href="#contribution">Contribution</a>
</p>

MLHub is based on [Jupyterhub](https://github.com/jupyterhub/jupyterhub). MLHub allows to create and manage multiple [workspaces](https://github.com/ml-tooling/ml-workspace), for example to distribute them to a group of people or within a team.
MLHub is based on [Jupyterhub](https://github.com/jupyterhub/jupyterhub) with complete focus on Docker and Kubernetes. MLHub allows to create and manage multiple [workspaces](https://github.com/ml-tooling/ml-workspace), for example to distribute them to a group of people or within a team.

## Highlights

- 💫 Create, manage, and access Jupyter notebooks. Use it as an admin to distribute workspaces to other users, use it in self-service mode, or both.
- 🖊️ Set configuration parameters such as CPU-limits for started workspaces.
- 🖥 Access additional tools within the started workspaces by having secured routes.
- 🎛 Tunnel SSH connections to workspace containers.
- 🐳 Focused on Docker and Kubernetes with enhanced functionality.

## Getting Started

Expand Down Expand Up @@ -61,7 +62,10 @@ For Kubernetes deployment, we forked and modified [zero-to-jupyterhub-k8s](https

### Configuration

In the default config, a user named `admin` can register and access the hub. If you use a different authenticator, you might want to set a different user as initial admin user as well.
#### Default Login

When using the default config - so leaving the Jupyterhub config `c.Authenticator.admin_users` as it is -, a user named `admin` can access the hub with admin rights. If you use the default `NativeAuthenticator` as authenticator, youc must register the user `admin` with a password of your choice first before login in.
If you use a different authenticator, you might want to set a different user as initial admin user as well, for example in case of using oauth you want to set `c.Authenticator.admin_users` to a username returned by the oauth login.

#### Environment Variables

Expand All @@ -81,6 +85,18 @@ Here are the additional environment variables for the hub:
</td>
<td>mlhub</td>
</tr>
<tr>
<td>EXECUTION_MODE</td>
<td>Defines in which execution mode the hub is running in. Value is one of [docker | k8s]</td>
<td>local</td>
</tr>
<tr>
<td>CLEANUP_INTERVAL_SECONDS</td>
<td>
Interval in which expired and not-used resources are deleted. Set to -1 to disable the automatic cleanup. For more information, see Section <a href="https://github.com/ml-tooling/ml-hub#cleanup-service">Cleanup Service</a>.
</td>
<td>3600</td>
</tr>
<tr>
<td>SSL_ENABLED</td>
<td>Enable SSL. If you don't provide an ssl certificate as described in <a href="https://github.com/ml-tooling/ml-hub#enable-sslhttps">Section "Enable SSL/HTTPS"</a>, certificates will be generated automatically. As this auto-generated certificate is not signed, you have to trust it in the browser. Without ssl enabled, ssh access won't work as the container uses a single port and has to tell https and ssh traffic apart.</td>
Expand Down Expand Up @@ -114,6 +130,8 @@ Here are the additional environment variables for the hub:
#### Jupyterhub Config

##### Docker-local

Jupyterhub itself is configured via a `config.py` file. In case of MLHub, a default config file is stored under `/resources/jupyterhub_config.py`. If you want to override settings or set extra ones, you can put another config file under `/resources/jupyterhub_user_config.py`. Following settings should probably not be overriden:
- `c.Spawner.environment` - we set default variables there. Instead of overriding it, you can add extra variables to the existing dict, e.g. via `c.Spawner.environment["myvar"] = "myvalue"`.
- `c.DockerSpawner.prefix` and `c.DockerSpawner.name_template` - if you change those, check whether your SSH environment variables permit those names a target. Also, think about setting `c.Authenticator.username_pattern` to prevent a user having a username that is also a valid container name.
Expand Down Expand Up @@ -199,6 +217,15 @@ The "Days to live" flag is purely informational currently and can be seen in the

<img width=100% alt="Picture of admin panel" src="https://github.com/ml-tooling/ml-hub/raw/master/docs/images/create-workspace-options.png">

### Cleanup Service

JupyterHub was originally not created with Docker or Kubernetes in mind, which can result in unfavorable scenarios such as that containers are stopped but not deleted on the host. Furthermore, our custom spawners might create some artifacts that should be cleaned up as well. MLHub contains a cleanup service that is started as a [JupyterHub service](https://jupyterhub.readthedocs.io/en/stable/reference/services.html) inside the hub container. It can be accessed as a REST-API by an admin, but it is also triggered automatically every X timesteps when not disabled (see config for `CLEANUP_INTERVAL_SECONDS`). The service enhances the JupyterHub functionality with regards to the Docker and Kubernetes world. "Containers" is hereby used interchangeably for Docker containers and Kubernetes pods.
The service has two endpoints which can be reached under the Hub service url `/services/cleanup-service/*` with admin permissions.

- `GET /services/cleanup-service/users`: This endpoint is currently doing anything only in Docker-local mode. There, it will check for resources of deleted users, so users who are not in the JupyterHub database anymore, and delete them. This includes containers, networks, and volumes. This is done by looking for labeled Docker resources that point to containers started by hub and belonging to the specific users.

- `GET /services/cleanup-service/expired`: When starting a named workspace, an expiration date can be assigned to it. This endpoint will delete all containers that are expired. The respective named server is deleted from the JupyterHub database and also the Docker/Kubernetes resource is deleted.

## Contribution

- Pull requests are encouraged and always welcome. Read [`CONTRIBUTING.md`](https://github.com/ml-tooling/ml-hub/tree/master/CONTRIBUTING.md) and check out [help-wanted](https://github.com/ml-tooling/ml-hub/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Aissue+label%3A"help+wanted"+sort%3Areactions-%2B1-desc+) issues.
Expand Down
266 changes: 266 additions & 0 deletions resources/jupyterhub-mod/cleanup-service.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
"""
Web service that is supposed to be started via JupyterHub.
By this, the service has access to some information passed
by JupyterHub. For more information check out https://jupyterhub.readthedocs.io/en/stable/reference/services.html
Note: Logs probably don't appear in stdout, as the service is started as a subprocess by JupyterHub
"""

import os
import urllib3
import json
import time
import math
from threading import Thread
import logging

from tornado import web, ioloop
from jupyterhub.services.auth import HubAuthenticated

import docker.errors
from kubernetes import client, config, stream

from mlhubspawner import utils

# Environment variables passed by JupyterHub to the service
prefix = os.environ.get('JUPYTERHUB_SERVICE_PREFIX', '/')
service_url = os.getenv('JUPYTERHUB_SERVICE_URL')
jupyterhub_api_url = os.getenv('JUPYTERHUB_API_URL')
jupyterhub_api_token = os.getenv('JUPYTERHUB_API_TOKEN')

auth_header = {"Authorization": "token " + jupyterhub_api_token}

execution_mode = os.environ[utils.ENV_NAME_EXECUTION_MODE]

http = urllib3.PoolManager()

if execution_mode == utils.EXECUTION_MODE_LOCAL:
docker_client_kwargs = json.loads(os.getenv("DOCKER_CLIENT_KWARGS"))
docker_tls_kwargs = json.loads(os.getenv("DOCKER_TLS_CONFIG"))
docker_client = utils.init_docker_client(docker_client_kwargs, docker_tls_kwargs)
elif execution_mode == utils.EXECUTION_MODE_KUBERNETES:
# incluster config is the config given by a service account and it's role permissions
config.load_incluster_config()
kubernetes_client = client.CoreV1Api()

hub_name = utils.ENV_HUB_NAME
origin_label = "{}={}".format(utils.LABEL_MLHUB_ORIGIN, hub_name)
origin_label_filter = {"label": origin_label}

class UnifiedContainer():

def __init__(self, resource):
self.remove = lambda: logging.info("Remove property is not defined")
self.resource = resource

def with_id(self, id):
self.id = id
return self

def with_name(self, name):
self.name = name
return self

def with_labels(self, labels):
self.labels = labels
return self

def with_remove(self, func):
self.remove = lambda: func(self.resource)
return self

def extract_container(resource):
if execution_mode == utils.EXECUTION_MODE_LOCAL:
unified_container = UnifiedContainer(resource) \
.with_id(resource.id) \
.with_name(resource.name) \
.with_labels(resource.labels) \
.with_remove(lambda container: container.remove(v=True, force=True))
elif execution_mode == utils.EXECUTION_MODE_KUBERNETES:
unified_container = UnifiedContainer(resource) \
.with_id(resource.metadata.uid) \
.with_name(resource.metadata.name) \
.with_labels(resource.metadata.labels)

if unified_container == None:
raise UserWarning("The execution mode environment variable is not set correctly")

return unified_container

def get_hub_docker_resources(docker_client_obj):
return docker_client_obj.list(filters=origin_label_filter)

def get_hub_kubernetes_resources(namespaced_list_command, **kwargs):
return namespaced_list_command(hub_name, **kwargs).items

def get_hub_containers():
if execution_mode == utils.EXECUTION_MODE_LOCAL:
hub_containers = get_hub_docker_resources(docker_client.containers)
elif execution_mode == utils.EXECUTION_MODE_KUBERNETES:
hub_containers = get_hub_kubernetes_resources(kubernetes_client.list_namespaced_pod, field_selector="status.phase=Running", label_selector=origin_label)

return hub_containers

def remove_deleted_user_resources(existing_user_names: []):
"""Remove resources for which no user exists anymore by checking whether the label of user name occurs in the existing
users list.
Args:
existing_user_names: list of user names that exist in the JupyterHub database
Raises:
UserWarning: in Kubernetes mode, the function does not work
"""

if execution_mode == utils.EXECUTION_MODE_KUBERNETES:
raise UserWarning("This method cannot be used in following hub execution mode " + execution_mode)

def try_to_remove(remove_callback, resource) -> bool:
"""Call the remove callback until the call succeeds or until the number of tries is exceeded.
Returns:
bool: True if it could be removed, False if it was not removable within the number of tries
"""

for i in range(3):
try:
remove_callback()
return True
except docker.errors.APIError:
time.sleep(3)

logging.info("Could not remove " + resource.name)
return False


def find_and_remove(docker_client_obj, get_labels, action_callback) -> None:
"""List all resources belonging to `docker_client_obj` which were created by MLHub.
Then check the list of resources for resources that belong to a user who does not exist anymore
and call the remove function on them.
Args:
docker_client_obj: A Python docker client object, such as docker_client.containers, docker_client.networks,... It must implement a .list() function (check https://docker-py.readthedocs.io/en/stable/containers.html)
get_labels (func): function to call on the docker resource to get the labels
remove (func): function to call on the docker resource to remove it
"""

resources = get_hub_docker_resources(docker_client_obj)
for resource in resources:
user_label = get_labels(resource)[utils.LABEL_MLHUB_USER]
if user_label not in existing_user_names:
action_callback(resource)
# successful = try_to_remove(remove, resource)

def container_action(container):
try_to_remove(
lambda: container.remove(v=True, force=True),
container
)

find_and_remove(
docker_client.containers,
lambda res: res.labels,
container_action
)

def network_action(network):
try:
network.disconnect(hub_name)
except docker.errors.APIError:
pass

try_to_remove(network.remove, network)

find_and_remove(
docker_client.networks,
lambda res: res.attrs["Labels"],
network_action
)

find_and_remove(
docker_client.volumes,
lambda res: res.attrs["Labels"],
lambda res: try_to_remove(res.remove, res)
)

def get_hub_usernames() -> []:
r = http.request('GET', jupyterhub_api_url + "/users",
headers = {**auth_header}
)

data = json.loads(r.data.decode("utf-8"))
existing_user_names = []
for user in data:
existing_user_names.append(user["name"])

return existing_user_names

def remove_expired_workspaces():
hub_containers = get_hub_containers()
for container in hub_containers:
unified_container = extract_container(container)
lifetime_timestamp = utils.get_lifetime_timestamp(unified_container.labels)
if lifetime_timestamp != 0:
difference = math.ceil(lifetime_timestamp - time.time())
# container lifetime is exceeded (remaining lifetime is negative)
if difference < 0:
user_name = unified_container.labels[utils.LABEL_MLHUB_USER]
server_name = unified_container.labels[utils.LABEL_MLHUB_SERVER_NAME]
url = jupyterhub_api_url + "/users/{user_name}/servers/{server_name}".format(user_name=user_name, server_name=server_name)
r = http.request('DELETE', url,
body = json.dumps({"remove": True}).encode('utf-8'),
headers = {**auth_header}
)

if r.status == 202 or r.status == 204:
logging.info("Delete expired container " + unified_container.name)
unified_container.remove()

class CleanupUserResources(HubAuthenticated, web.RequestHandler):

@web.authenticated
def get(self):
current_user = self.get_current_user()
if current_user["admin"] is False:
self.set_status(401)
self.finish()
return

try:
remove_deleted_user_resources(get_hub_usernames())
except UserWarning as e:
self.finish(str(e))

class CleanupExpiredContainers(HubAuthenticated, web.RequestHandler):

@web.authenticated
def get(self):
current_user = self.get_current_user()
if current_user["admin"] is False:
self.set_status(401)
self.finish()
return

remove_expired_workspaces()

app = web.Application([
(r"{}users".format(prefix), CleanupUserResources),
(r"{}expired".format(prefix), CleanupExpiredContainers)
])

service_port = int(service_url.split(":")[-1])
app.listen(service_port)

def internal_service_caller():
clean_interval_seconds = int(os.getenv(utils.ENV_NAME_CLEANUP_INTERVAL_SECONDS))
while True and clean_interval_seconds != -1:
time.sleep(clean_interval_seconds)
try:
remove_deleted_user_resources(get_hub_usernames())
except UserWarning:
pass
remove_expired_workspaces()

Thread(target=internal_service_caller).start()

ioloop.IOLoop.current().start()
Loading

0 comments on commit ea91cf9

Please sign in to comment.