Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into fpetrini-bls-timeout
Browse files Browse the repository at this point in the history
  • Loading branch information
fpetrini15 committed Oct 26, 2023
2 parents 06224a8 + 3dfa18f commit 1e8f311
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 1 deletion.
12 changes: 11 additions & 1 deletion deploy/k8s-onprem/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,16 @@ EOF
$ helm install example -f config.yaml .
```

## Probe Configuration

In `templates/deployment.yaml` is configurations for `livenessProbe`, `readinessProbe` and `startupProbe` for the Triton server container.
By default, Triton loads all the models before starting the HTTP server to respond to the probes. The process can take several minutes, depending on the models sizes.
If it is not completed in `startupProbe.failureThreshold * startupProbe.periodSeconds` seconds then Kubernetes considers this as a pod failure and restarts it,
ending up with an infinite loop of restarting pods, so make sure to sufficiently set these values for your use case.
The liveliness and readiness probes are being sent only after the first success of a startup probe.

For more details, see the [Kubernetes probe documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) and the [feature page of the startup probe](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/950-liveness-probe-holdoff/README.md).

## Using Triton Inference Server

Now that the inference server is running you can send HTTP or GRPC
Expand Down Expand Up @@ -316,4 +326,4 @@ CRDs](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-

```
$ kubectl delete crd alertmanagerconfigs.monitoring.coreos.com alertmanagers.monitoring.coreos.com podmonitors.monitoring.coreos.com probes.monitoring.coreos.com prometheuses.monitoring.coreos.com prometheusrules.monitoring.coreos.com servicemonitors.monitoring.coreos.com thanosrulers.monitoring.coreos.com
```
```
13 changes: 13 additions & 0 deletions deploy/k8s-onprem/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,25 @@ spec:
- containerPort: 8002
name: metrics
livenessProbe:
initialDelaySeconds: 15
failureThreshold: 3
periodSeconds: 10
httpGet:
path: /v2/health/live
port: http
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
httpGet:
path: /v2/health/ready
port: http
startupProbe:
# allows Triton to load the models during 30*10 = 300 sec = 5 min
# starts checking the other probes only after the success of this one
# for details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
periodSeconds: 10
failureThreshold: 30
httpGet:
path: /v2/health/ready
port: http
Expand Down

0 comments on commit 1e8f311

Please sign in to comment.