-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(helm): update gpu-operator ( v24.6.1 → v24.9.1 ) #7912
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| datasource | package | from | to | | ---------- | ------------ | ------- | ------- | | helm | gpu-operator | v24.6.1 | v24.9.1 |
--- HelmRelease: kube-system/nvidia-gpu-operator ClusterRole: kube-system/gpu-operator
+++ HelmRelease: kube-system/nvidia-gpu-operator ClusterRole: kube-system/gpu-operator
@@ -52,27 +52,12 @@
- update
- patch
- delete
- apiGroups:
- ''
resources:
- - events
- - pods
- - pods/eviction
- - services
- verbs:
- - create
- - get
- - list
- - watch
- - update
- - patch
- - delete
-- apiGroups:
- - ''
- resources:
- nodes
verbs:
- get
- list
- watch
- update
@@ -86,39 +71,33 @@
- list
- create
- watch
- update
- patch
- apiGroups:
+ - ''
+ resources:
+ - events
+ - pods
+ - pods/eviction
+ verbs:
+ - create
+ - get
+ - list
+ - watch
+ - update
+ - patch
+ - delete
+- apiGroups:
- apps
resources:
- daemonsets
verbs:
- get
- list
- watch
-- apiGroups:
- - apps
- resources:
- - controllerrevisions
- verbs:
- - get
- - list
- - watch
-- apiGroups:
- - monitoring.coreos.com
- resources:
- - servicemonitors
- - prometheusrules
- verbs:
- - get
- - list
- - create
- - watch
- - update
- - delete
- apiGroups:
- nvidia.com
resources:
- clusterpolicies
- clusterpolicies/finalizers
- clusterpolicies/status
@@ -141,24 +120,12 @@
verbs:
- get
- list
- watch
- create
- apiGroups:
- - coordination.k8s.io
- resources:
- - leases
- verbs:
- - get
- - list
- - watch
- - create
- - update
- - patch
- - delete
-- apiGroups:
- node.k8s.io
resources:
- runtimeclasses
verbs:
- get
- list
--- HelmRelease: kube-system/nvidia-gpu-operator Role: kube-system/gpu-operator
+++ HelmRelease: kube-system/nvidia-gpu-operator Role: kube-system/gpu-operator
@@ -22,12 +22,20 @@
- update
- patch
- delete
- apiGroups:
- apps
resources:
+ - controllerrevisions
+ verbs:
+ - get
+ - list
+ - watch
+- apiGroups:
+ - apps
+ resources:
- daemonsets
verbs:
- create
- get
- list
- watch
@@ -35,17 +43,46 @@
- patch
- delete
- apiGroups:
- ''
resources:
- configmaps
+ - endpoints
+ - pods
+ - pods/eviction
- secrets
+ - services
+ - services/finalizers
- serviceaccounts
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
+- apiGroups:
+ - coordination.k8s.io
+ resources:
+ - leases
+ verbs:
+ - get
+ - list
+ - watch
+ - create
+ - update
+ - patch
+ - delete
+- apiGroups:
+ - monitoring.coreos.com
+ resources:
+ - servicemonitors
+ - prometheusrules
+ verbs:
+ - get
+ - list
+ - create
+ - watch
+ - update
+ - delete
--- HelmRelease: kube-system/nvidia-gpu-operator Deployment: kube-system/gpu-operator
+++ HelmRelease: kube-system/nvidia-gpu-operator Deployment: kube-system/gpu-operator
@@ -28,13 +28,13 @@
openshift.io/scc: restricted-readonly
spec:
serviceAccountName: gpu-operator
priorityClassName: system-node-critical
containers:
- name: gpu-operator
- image: nvcr.io/nvidia/gpu-operator:v24.6.1
+ image: nvcr.io/nvidia/gpu-operator:v24.9.1
imagePullPolicy: IfNotPresent
command:
- gpu-operator
args:
- --leader-elect
- --zap-time-encoding=epoch
@@ -44,13 +44,13 @@
value: ''
- name: OPERATOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: DRIVER_MANAGER_IMAGE
- value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.6.10
+ value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.7.0
volumeMounts:
- name: host-os-release
mountPath: /host-etc/os-release
readOnly: true
livenessProbe:
httpGet:
--- HelmRelease: kube-system/nvidia-gpu-operator ClusterPolicy: kube-system/cluster-policy
+++ HelmRelease: kube-system/nvidia-gpu-operator ClusterPolicy: kube-system/cluster-policy
@@ -17,30 +17,30 @@
operator:
defaultRuntime: docker
runtimeClass: nvidia
initContainer:
repository: nvcr.io/nvidia
image: cuda
- version: 12.5.1-base-ubi8
+ version: 12.6.3-base-ubi9
imagePullPolicy: IfNotPresent
daemonsets:
labels:
- helm.sh/chart: gpu-operator-v24.6.1
+ helm.sh/chart: gpu-operator-v24.9.1
app.kubernetes.io/managed-by: gpu-operator
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
priorityClassName: system-node-critical
updateStrategy: RollingUpdate
rollingUpdate:
maxUnavailable: '1'
validator:
repository: nvcr.io/nvidia/cloud-native
image: gpu-operator-validator
- version: v24.6.1
+ version: v24.9.1
imagePullPolicy: IfNotPresent
plugin:
env:
- name: WITH_WORKLOAD
value: 'false'
mig:
@@ -54,26 +54,26 @@
enabled: false
useNvidiaDriverCRD: false
useOpenKernelModules: false
usePrecompiled: false
repository: nvcr.io/nvidia
image: driver
- version: 550.90.07
+ version: 550.127.08
imagePullPolicy: IfNotPresent
startupProbe:
failureThreshold: 120
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 60
rdma:
enabled: false
useHostMofed: false
manager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'true'
- name: ENABLE_AUTO_DRAIN
value: 'false'
@@ -115,13 +115,13 @@
enabled: false
image: vgpu-manager
imagePullPolicy: IfNotPresent
driverManager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'false'
- name: ENABLE_AUTO_DRAIN
value: 'false'
@@ -140,35 +140,35 @@
url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-535.86.10-snp
name: kata-nvidia-gpu-snp
nodeSelector:
nvidia.com/cc.capable: 'true'
repository: nvcr.io/nvidia/cloud-native
image: k8s-kata-manager
- version: v0.2.1
+ version: v0.2.2
imagePullPolicy: IfNotPresent
vfioManager:
enabled: true
repository: nvcr.io/nvidia
image: cuda
- version: 12.5.1-base-ubi8
+ version: 12.6.3-base-ubi9
imagePullPolicy: IfNotPresent
driverManager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'false'
- name: ENABLE_AUTO_DRAIN
value: 'false'
vgpuDeviceManager:
enabled: true
repository: nvcr.io/nvidia/cloud-native
image: vgpu-device-manager
- version: v0.2.7
+ version: v0.2.8
imagePullPolicy: IfNotPresent
config:
default: default
name: ''
ccManager:
enabled: false
@@ -186,13 +186,13 @@
imagePullPolicy: IfNotPresent
installDir: /usr/local/nvidia
devicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
- version: v0.16.2-ubi8
+ version: v0.17.0
imagePullPolicy: IfNotPresent
env:
- name: PASS_DEVICE_SPECS
value: 'true'
- name: FAIL_ON_INIT_ERROR
value: 'true'
@@ -208,19 +208,19 @@
name: time-slicing-config
default: any
dcgm:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: dcgm
- version: 3.3.7-1-ubuntu22.04
+ version: 3.3.9-1-ubuntu22.04
imagePullPolicy: IfNotPresent
dcgmExporter:
enabled: false
repository: nvcr.io/nvidia/k8s
image: dcgm-exporter
- version: 3.3.7-3.5.0-ubuntu22.04
+ version: 3.3.9-3.6.1-ubuntu22.04
imagePullPolicy: IfNotPresent
env:
- name: DCGM_EXPORTER_LISTEN
value: :9400
- name: DCGM_EXPORTER_KUBERNETES
value: 'true'
@@ -233,24 +233,24 @@
interval: 15s
relabelings: []
gfd:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
- version: v0.16.2-ubi8
+ version: v0.17.0
imagePullPolicy: IfNotPresent
env:
- name: GFD_SLEEP_INTERVAL
value: 60s
- name: GFD_FAIL_ON_INIT_ERROR
value: 'true'
migManager:
enabled: true
repository: nvcr.io/nvidia/cloud-native
image: k8s-mig-manager
- version: v0.8.0-ubuntu20.04
+ version: v0.10.0-ubuntu20.04
imagePullPolicy: IfNotPresent
env:
- name: WITH_REBOOT
value: 'false'
config:
name: null
@@ -258,24 +258,24 @@
gpuClientsConfig:
name: ''
nodeStatusExporter:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: gpu-operator-validator
- version: v24.6.1
+ version: v24.9.1
imagePullPolicy: IfNotPresent
gdrcopy:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: gdrdrv
- version: v2.4.1-1
+ version: v2.4.1-2
imagePullPolicy: IfNotPresent
sandboxWorkloads:
enabled: false
defaultWorkload: container
sandboxDevicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: kubevirt-gpu-device-plugin
- version: v1.2.9
+ version: v1.2.10
imagePullPolicy: IfNotPresent
--- HelmRelease: kube-system/nvidia-gpu-operator Job: kube-system/gpu-operator-cleanup-crd
+++ HelmRelease: kube-system/nvidia-gpu-operator Job: kube-system/gpu-operator-cleanup-crd
@@ -21,15 +21,24 @@
app.kubernetes.io/name: gpu-operator
app.kubernetes.io/instance: nvidia-gpu-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: gpu-operator
spec:
serviceAccountName: gpu-operator
+ tolerations:
+ - effect: NoSchedule
+ key: node-role.kubernetes.io/master
+ operator: Equal
+ value: ''
+ - effect: NoSchedule
+ key: node-role.kubernetes.io/control-plane
+ operator: Equal
+ value: ''
containers:
- name: cleanup-crd
- image: nvcr.io/nvidia/gpu-operator:v24.6.1
+ image: nvcr.io/nvidia/gpu-operator:v24.9.1
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- |
kubectl delete clusterpolicy cluster-policy; kubectl delete crd clusterpolicies.nvidia.com;
--- HelmRelease: kube-system/nvidia-gpu-operator Job: kube-system/gpu-operator-upgrade-crd
+++ HelmRelease: kube-system/nvidia-gpu-operator Job: kube-system/gpu-operator-upgrade-crd
@@ -21,17 +21,26 @@
app.kubernetes.io/name: gpu-operator
app.kubernetes.io/instance: nvidia-gpu-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: gpu-operator
spec:
serviceAccountName: gpu-operator-upgrade-crd-hook-sa
+ tolerations:
+ - effect: NoSchedule
+ key: node-role.kubernetes.io/master
+ operator: Equal
+ value: ''
+ - effect: NoSchedule
+ key: node-role.kubernetes.io/control-plane
+ operator: Equal
+ value: ''
containers:
- name: upgrade-crd
- image: nvcr.io/nvidia/gpu-operator:v24.6.1
+ image: nvcr.io/nvidia/gpu-operator:v24.9.1
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- |
- kubectl apply -f /opt/gpu-operator/nvidia.com_clusterpolicies_crd.yaml; kubectl apply -f /opt/gpu-operator/nvidia.com_nvidiadrivers.yaml;
+ kubectl apply -f /opt/gpu-operator/nvidia.com_clusterpolicies.yaml; kubectl apply -f /opt/gpu-operator/nvidia.com_nvidiadrivers.yaml;
restartPolicy: OnFailure
|
--- kubernetes/main/apps/kube-system/nvidia-gpu-operator/app Kustomization: flux-system/nvidia-gpu-operator HelmRelease: kube-system/nvidia-gpu-operator
+++ kubernetes/main/apps/kube-system/nvidia-gpu-operator/app Kustomization: flux-system/nvidia-gpu-operator HelmRelease: kube-system/nvidia-gpu-operator
@@ -13,13 +13,13 @@
spec:
chart: gpu-operator
sourceRef:
kind: HelmRepository
name: nvidia-operator
namespace: flux-system
- version: v24.6.1
+ version: v24.9.1
install:
crds: CreateReplace
createNamespace: true
remediation:
retries: 3
interval: 30m |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v24.6.1
->v24.9.1
Warning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
Release Notes
NVIDIA/gpu-operator (gpu-operator)
v24.9.1
: GPU Operator 24.9.1 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.9.1/release-notes.html
v24.9.0
: GPU Operator 24.9.0 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.9.0/release-notes.html
v24.6.2
: GPU Operator 24.6.2 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.6.2/release-notes.html
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.