Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test kubernetes in CI #3482

Draft
wants to merge 100 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
8e3d7cf
Added options to set annotations and a service account in the Kuberne…
shishichen Jun 7, 2024
45269ed
Correct punctuation in debug message. hack out tests that won't fail-…
benclifford Jun 7, 2024
7ceec7a
Fix a couple of docstrings
benclifford Jun 7, 2024
0cdece2
a bit of name sanitization for default pod names
benclifford Jun 7, 2024
87d3454
fiddle with markings to deal with no shared fs and no staging
benclifford Jun 8, 2024
954bad7
add config file i've been using
benclifford Jun 10, 2024
d3e3828
Merge remote-tracking branch 'shishichen/add-k8s-pod-options'
benclifford Jun 10, 2024
17a00dd
add the dockerfile i've been using
benclifford Jun 10, 2024
62e0e36
beginning of kubernetes-in-CI
benclifford Jun 10, 2024
9086e19
push docker image? upgrade ubuntu
benclifford Jun 10, 2024
26869e6
fiddle with default name
benclifford Jun 10, 2024
16c0a49
Add kubernetes, needed for submitting from inside a cluster
benclifford Jun 10, 2024
b03615a
Add more bits for running everything in a kubernetes cluster
benclifford Jun 10, 2024
e122d19
fix syntax error in github workflow definition
benclifford Jun 10, 2024
ee14f6e
Tighten timeout, add some debugging info at the end
benclifford Jun 10, 2024
120cf78
Correct pod name from my test
benclifford Jun 10, 2024
5c55fe6
try to stop Job from recreating pod on failure, but instead abort fast
benclifford Jun 10, 2024
b56dfd9
Randomise test order to see if a test failure is specific to a partic…
benclifford Jun 10, 2024
dd0f66c
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jun 10, 2024
f8f5a27
Add some memory logging
benclifford Jun 10, 2024
f4a7300
Allocate more memory to workers
benclifford Jun 10, 2024
ffdb021
Add a staging_required marker that apparently wasn't breaking things …
benclifford Jun 10, 2024
21711e9
messing with backoff limits and restart policy
benclifford Jun 10, 2024
fb1733e
remove apparently invalid restart policy
benclifford Jun 10, 2024
73b3e1d
Flush out some more staging_required tests (by setting storage_access…
benclifford Jun 10, 2024
31bc958
Switch the Kubernetes client call to read_namespaced_pod_status() to …
shishichen Jun 12, 2024
8b39024
Fixed Kubernetes worker container launch command to remove trailing s…
shishichen Jun 13, 2024
4538763
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jun 14, 2024
5f43aeb
Merge remote-tracking branch 'shishichen/fix-k8s-launch-cmd' into ben…
benclifford Jun 14, 2024
299de99
Merge remote-tracking branch 'shishichen/swap-k8s-pod-status' into be…
benclifford Jun 14, 2024
68e3a5d
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jun 18, 2024
fe3c55e
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jun 24, 2024
064b833
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jul 2, 2024
9c6a04e
Merge remote-tracking branch 'origin/benc-k8s-kind-ci' into benc-k8s-…
benclifford Jul 2, 2024
ba5f047
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jul 2, 2024
69fbf03
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jul 7, 2024
75b7c02
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jul 31, 2024
780fbb0
Merge remote-tracking branch 'origin/benc-k8s-kind-ci' into benc-k8s-…
benclifford Jul 31, 2024
2324744
Merge branch 'master' into benc-k8s-kind-ci
benclifford Aug 5, 2024
b75a3ae
function data in temp
colinthomas-z80 Aug 19, 2024
2c18d6c
use getpass for username
colinthomas-z80 Aug 19, 2024
c201ec1
use tempfile module
colinthomas-z80 Aug 20, 2024
9f6b037
flake etc
colinthomas-z80 Aug 20, 2024
5ec7cdb
Merge branch 'master' into tmp_function_data
benclifford Aug 21, 2024
c3f6d45
Merge branch 'master' into benc-k8s-kind-ci
benclifford Aug 22, 2024
edf870f
Merge branch 'master' into benc-k8s-kind-ci
benclifford Aug 26, 2024
811b8e5
Merge branch 'master' into benc-k8s-kind-ci
benclifford Sep 3, 2024
7347f64
Merge remote-tracking branch 'refs/remotes/origin/master' into benc-k…
benclifford Sep 4, 2024
cd7229f
Merge branch 'master' into tmp_function_data
benclifford Sep 4, 2024
08f8ce9
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Sep 5, 2024
5967f01
Merge remote-tracking branch 'refs/remotes/origin/benc-k8s-kind-ci' i…
benclifford Sep 5, 2024
4938dbf
Build cctools and run a probably-broken taskvine vs kubernetes test c…
benclifford Sep 5, 2024
98d7693
fix repr in taskvine
benclifford Sep 5, 2024
dfc94a8
install cloudpickle explicitly for taskvine
benclifford Sep 5, 2024
47378f3
Add more time onto job timeout, because more is happening in job with…
benclifford Sep 5, 2024
43af8ef
revert to 180s test time
benclifford Sep 5, 2024
6a32f0f
Log more to the console, kubernetes style
benclifford Sep 5, 2024
d4fab6a
Note a (documentation?) bug in taskvine address selection
benclifford Sep 5, 2024
21dcae6
force hostname based address config, in line with comment in previous…
benclifford Sep 5, 2024
e1cce03
now we're starting taskvine test successfully, give it time to complete
benclifford Sep 5, 2024
4d4b4ba
Make taskvine shutdown scale-in more like htex shutdown scale-in
benclifford Sep 5, 2024
2e42e5c
enable staging_required tests in taskvine, because taskvine might be …
benclifford Sep 5, 2024
3ba7e12
Output timestamps in kubernetes log to help diagnose hangs
benclifford Sep 5, 2024
084d797
failed to get non-staging tests working, made a note in comments
benclifford Sep 5, 2024
f34f2b8
correct duplicated 'and' in pytest -k option
benclifford Sep 5, 2024
60a8611
Merge remote-tracking branch 'colinthomas-z80/tmp_function_data' into…
benclifford Sep 6, 2024
1f09e5c
Add utils to sanitize strings for DNS compliance
rjmello Oct 14, 2024
83278c2
Ensure k8s pod names/labels are RFC 1123 compliant
rjmello Oct 15, 2024
86ade32
Use hex value for k8s job ID instead of pod name
rjmello Oct 15, 2024
0c4d541
Add tests for KubernetesProvider submit
rjmello Oct 17, 2024
08693ab
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Oct 21, 2024
415f780
Merge remote-tracking branch 'origin/rjmello-kube-pod-names' into ben…
benclifford Oct 21, 2024
c78defa
Fix some bad merge
benclifford Oct 21, 2024
54ea143
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Oct 21, 2024
535289f
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Oct 31, 2024
fd26ddd
Merge branch 'master' into benc-k8s-kind-ci
benclifford Nov 1, 2024
419ba64
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jan 6, 2025
61e65fa
remove spurious whitespace add
benclifford Jan 6, 2025
0d089db
Merge remote-tracking branch 'refs/remotes/origin/benc-k8s-kind-ci' i…
benclifford Jan 6, 2025
803ccc7
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jan 6, 2025
f418ebf
Merge remote-tracking branch 'refs/remotes/origin/benc-k8s-kind-ci' i…
benclifford Jan 6, 2025
b688f3b
remove channels
benclifford Jan 6, 2025
bb76cac
Remove repr fix that isn't needed
benclifford Jan 6, 2025
0f0d5b9
Remove spurious paste
benclifford Jan 6, 2025
8966056
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jan 7, 2025
2dc45ab
remove commented out override that has been fixed elsewhere
benclifford Jan 7, 2025
78b7f2d
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jan 15, 2025
a9d147e
move to bookworm (debian stable) with a single python version
benclifford Jan 15, 2025
17d5115
restore partial python version
benclifford Jan 15, 2025
722a828
fix swig install
benclifford Jan 15, 2025
eebdedc
restore python version in path
benclifford Jan 15, 2025
4a497f6
move some files around
benclifford Jan 15, 2025
4cae154
move taskvine config
benclifford Jan 15, 2025
d98c7d8
move htex config
benclifford Jan 15, 2025
8fb5f30
making dockerfile build faster, hopefully no significant change in wh…
benclifford Jan 15, 2025
f74037b
move swig install alongside other apt. fix git clone cli
benclifford Jan 15, 2025
5854b9e
isort
benclifford Jan 15, 2025
bd076fe
add mandatory init
benclifford Jan 15, 2025
230cee0
fix linting
benclifford Jan 15, 2025
88c5f17
remove a note to self about taskvine vs non-shared fs
benclifford Jan 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions .github/workflows/ci-k8s.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: Parsl

on:
pull_request:
types:
- opened
- synchronize

jobs:
k8s-kind-suite:
runs-on: ubuntu-24.04
timeout-minutes: 60

steps:
- uses: actions/checkout@master

- name: Create k8s Kind Cluster
uses: helm/kind-action@v1
with:
# kind tooling uses this name by default, but kind-action uses
# a different default name
cluster_name: kind

- name: Build docker image
uses: docker/build-push-action@v5
with:
context: .
file: parsl/tests/ci_k8s/Dockerfile
tags: parsl:ci

- name: Push docker image into kubernetes cluster
run: |
kind load docker-image parsl:ci

- name: set liberal permissions
run: |
kubectl create clusterrolebinding serviceaccounts-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccounts

- name: launch pytest Job
run: |
free -h
kubectl create -f parsl/tests/ci_k8s/pytest-task.yaml

- name: wait for pytest Job
run: |
kubectl wait --timeout=600s --for=condition=Complete Job pytest

- name: report some info
if: ${{ always() }}
run: |
free -h
kubectl describe pods
kubectl describe jobs
kubectl logs --timestamps Job/pytest
23 changes: 23 additions & 0 deletions parsl/tests/ci_k8s/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
FROM debian:bookworm

RUN apt-get update && apt-get upgrade -y

# git is needed for parsl to figure out it's own repo-specific
# version string, from the github-checked-out repo

# gcc and similar are needed for building taskvine

RUN apt-get update && apt-get install -y git procps python3 python3-dev python3-venv gcc build-essential make pkg-config mpich swig

RUN python3 -m venv /venv

ADD . /parsl
WORKDIR /
RUN git clone https://github.com/cooperative-computing-lab/cctools --depth 1 -b release/7.8.0

WORKDIR /cctools
RUN . /venv/bin/activate && ./configure --prefix=/ && make && make install

WORKDIR /parsl
RUN . /venv/bin/activate && pip3 install '.[kubernetes]' cloudpickle -r test-requirements.txt

Empty file added parsl/tests/ci_k8s/__init__.py
Empty file.
24 changes: 24 additions & 0 deletions parsl/tests/ci_k8s/htex_k8s_kind.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.launchers import SimpleLauncher
from parsl.providers import KubernetesProvider


def fresh_config():
return Config(
executors=[
HighThroughputExecutor(
label="executorname",
storage_access=[],
worker_debug=True,
cores_per_worker=1,
encrypted=False, # needs certificate fs to be mounted in same place...
provider=KubernetesProvider(worker_init=". /venv/bin/activate",
image="parsl:ci",
max_mem="2048Gi"
# was getting OOM-killing of workers with default... this helps
),
)
],
strategy='none',
)
15 changes: 15 additions & 0 deletions parsl/tests/ci_k8s/pytest-task.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: batch/v1
kind: Job
metadata:
name: pytest
spec:
activeDeadlineSeconds: 600
backoffLimit: 0
template:
spec:
restartPolicy: Never
containers:
- name: pytest
image: parsl:ci
command: ["bash", "/parsl/parsl/tests/ci_k8s/runme.sh"]

7 changes: 7 additions & 0 deletions parsl/tests/ci_k8s/runme.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash -e

source /venv/bin/activate

pytest parsl/tests/ --config parsl/tests/ci_k8s/htex_k8s_kind.py -k 'not issue3328 and not staging_required and not shared_fs' -x --random-order

PYTHONPATH=/usr/lib/python3.11/site-packages/ pytest parsl/tests/ --config parsl/tests/ci_k8s/taskvine_k8s_kind.py -k 'not issue3328 and not staging_required and not shared_fs' -x --random-order --log-cli-level=DEBUG
16 changes: 16 additions & 0 deletions parsl/tests/ci_k8s/taskvine_k8s_kind.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from parsl.addresses import address_by_hostname
from parsl.config import Config
from parsl.executors.taskvine import TaskVineExecutor, TaskVineManagerConfig
from parsl.launchers import SimpleLauncher
from parsl.providers import KubernetesProvider


def fresh_config():
return Config(executors=[TaskVineExecutor(manager_config=TaskVineManagerConfig(address=address_by_hostname(), port=9000),
worker_launch_method='provider',
provider=KubernetesProvider(worker_init=". /venv/bin/activate",
image="parsl:ci",
max_mem="2048Gi"
),

)])
Loading