Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(helm): update gpu-operator ( v24.6.1 → v24.9.1 ) #7912

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Dec 31, 2024

This PR contains the following updates:

Package Update Change
gpu-operator (source) minor v24.6.1 -> v24.9.1

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

NVIDIA/gpu-operator (gpu-operator)

v24.9.1: GPU Operator 24.9.1 Release

Compare Source

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.9.1/release-notes.html

v24.9.0: GPU Operator 24.9.0 Release

Compare Source

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.9.0/release-notes.html

v24.6.2: GPU Operator 24.6.2 Release

Compare Source

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.6.2/release-notes.html


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

| datasource | package      | from    | to      |
| ---------- | ------------ | ------- | ------- |
| helm       | gpu-operator | v24.6.1 | v24.9.1 |
@rosey-the-renovator-bot
Copy link
Contributor

--- HelmRelease: kube-system/nvidia-gpu-operator ClusterRole: kube-system/gpu-operator

+++ HelmRelease: kube-system/nvidia-gpu-operator ClusterRole: kube-system/gpu-operator

@@ -52,27 +52,12 @@

   - update
   - patch
   - delete
 - apiGroups:
   - ''
   resources:
-  - events
-  - pods
-  - pods/eviction
-  - services
-  verbs:
-  - create
-  - get
-  - list
-  - watch
-  - update
-  - patch
-  - delete
-- apiGroups:
-  - ''
-  resources:
   - nodes
   verbs:
   - get
   - list
   - watch
   - update
@@ -86,39 +71,33 @@

   - list
   - create
   - watch
   - update
   - patch
 - apiGroups:
+  - ''
+  resources:
+  - events
+  - pods
+  - pods/eviction
+  verbs:
+  - create
+  - get
+  - list
+  - watch
+  - update
+  - patch
+  - delete
+- apiGroups:
   - apps
   resources:
   - daemonsets
   verbs:
   - get
   - list
   - watch
-- apiGroups:
-  - apps
-  resources:
-  - controllerrevisions
-  verbs:
-  - get
-  - list
-  - watch
-- apiGroups:
-  - monitoring.coreos.com
-  resources:
-  - servicemonitors
-  - prometheusrules
-  verbs:
-  - get
-  - list
-  - create
-  - watch
-  - update
-  - delete
 - apiGroups:
   - nvidia.com
   resources:
   - clusterpolicies
   - clusterpolicies/finalizers
   - clusterpolicies/status
@@ -141,24 +120,12 @@

   verbs:
   - get
   - list
   - watch
   - create
 - apiGroups:
-  - coordination.k8s.io
-  resources:
-  - leases
-  verbs:
-  - get
-  - list
-  - watch
-  - create
-  - update
-  - patch
-  - delete
-- apiGroups:
   - node.k8s.io
   resources:
   - runtimeclasses
   verbs:
   - get
   - list
--- HelmRelease: kube-system/nvidia-gpu-operator Role: kube-system/gpu-operator

+++ HelmRelease: kube-system/nvidia-gpu-operator Role: kube-system/gpu-operator

@@ -22,12 +22,20 @@

   - update
   - patch
   - delete
 - apiGroups:
   - apps
   resources:
+  - controllerrevisions
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - apps
+  resources:
   - daemonsets
   verbs:
   - create
   - get
   - list
   - watch
@@ -35,17 +43,46 @@

   - patch
   - delete
 - apiGroups:
   - ''
   resources:
   - configmaps
+  - endpoints
+  - pods
+  - pods/eviction
   - secrets
+  - services
+  - services/finalizers
   - serviceaccounts
   verbs:
   - create
   - get
   - list
   - watch
   - update
   - patch
   - delete
+- apiGroups:
+  - coordination.k8s.io
+  resources:
+  - leases
+  verbs:
+  - get
+  - list
+  - watch
+  - create
+  - update
+  - patch
+  - delete
+- apiGroups:
+  - monitoring.coreos.com
+  resources:
+  - servicemonitors
+  - prometheusrules
+  verbs:
+  - get
+  - list
+  - create
+  - watch
+  - update
+  - delete
 
--- HelmRelease: kube-system/nvidia-gpu-operator Deployment: kube-system/gpu-operator

+++ HelmRelease: kube-system/nvidia-gpu-operator Deployment: kube-system/gpu-operator

@@ -28,13 +28,13 @@

         openshift.io/scc: restricted-readonly
     spec:
       serviceAccountName: gpu-operator
       priorityClassName: system-node-critical
       containers:
       - name: gpu-operator
-        image: nvcr.io/nvidia/gpu-operator:v24.6.1
+        image: nvcr.io/nvidia/gpu-operator:v24.9.1
         imagePullPolicy: IfNotPresent
         command:
         - gpu-operator
         args:
         - --leader-elect
         - --zap-time-encoding=epoch
@@ -44,13 +44,13 @@

           value: ''
         - name: OPERATOR_NAMESPACE
           valueFrom:
             fieldRef:
               fieldPath: metadata.namespace
         - name: DRIVER_MANAGER_IMAGE
-          value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.6.10
+          value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.7.0
         volumeMounts:
         - name: host-os-release
           mountPath: /host-etc/os-release
           readOnly: true
         livenessProbe:
           httpGet:
--- HelmRelease: kube-system/nvidia-gpu-operator ClusterPolicy: kube-system/cluster-policy

+++ HelmRelease: kube-system/nvidia-gpu-operator ClusterPolicy: kube-system/cluster-policy

@@ -17,30 +17,30 @@

   operator:
     defaultRuntime: docker
     runtimeClass: nvidia
     initContainer:
       repository: nvcr.io/nvidia
       image: cuda
-      version: 12.5.1-base-ubi8
+      version: 12.6.3-base-ubi9
       imagePullPolicy: IfNotPresent
   daemonsets:
     labels:
-      helm.sh/chart: gpu-operator-v24.6.1
+      helm.sh/chart: gpu-operator-v24.9.1
       app.kubernetes.io/managed-by: gpu-operator
     tolerations:
     - effect: NoSchedule
       key: nvidia.com/gpu
       operator: Exists
     priorityClassName: system-node-critical
     updateStrategy: RollingUpdate
     rollingUpdate:
       maxUnavailable: '1'
   validator:
     repository: nvcr.io/nvidia/cloud-native
     image: gpu-operator-validator
-    version: v24.6.1
+    version: v24.9.1
     imagePullPolicy: IfNotPresent
     plugin:
       env:
       - name: WITH_WORKLOAD
         value: 'false'
   mig:
@@ -54,26 +54,26 @@

     enabled: false
     useNvidiaDriverCRD: false
     useOpenKernelModules: false
     usePrecompiled: false
     repository: nvcr.io/nvidia
     image: driver
-    version: 550.90.07
+    version: 550.127.08
     imagePullPolicy: IfNotPresent
     startupProbe:
       failureThreshold: 120
       initialDelaySeconds: 60
       periodSeconds: 10
       timeoutSeconds: 60
     rdma:
       enabled: false
       useHostMofed: false
     manager:
       repository: nvcr.io/nvidia/cloud-native
       image: k8s-driver-manager
-      version: v0.6.10
+      version: v0.7.0
       imagePullPolicy: IfNotPresent
       env:
       - name: ENABLE_GPU_POD_EVICTION
         value: 'true'
       - name: ENABLE_AUTO_DRAIN
         value: 'false'
@@ -115,13 +115,13 @@

     enabled: false
     image: vgpu-manager
     imagePullPolicy: IfNotPresent
     driverManager:
       repository: nvcr.io/nvidia/cloud-native
       image: k8s-driver-manager
-      version: v0.6.10
+      version: v0.7.0
       imagePullPolicy: IfNotPresent
       env:
       - name: ENABLE_GPU_POD_EVICTION
         value: 'false'
       - name: ENABLE_AUTO_DRAIN
         value: 'false'
@@ -140,35 +140,35 @@

           url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-535.86.10-snp
         name: kata-nvidia-gpu-snp
         nodeSelector:
           nvidia.com/cc.capable: 'true'
     repository: nvcr.io/nvidia/cloud-native
     image: k8s-kata-manager
-    version: v0.2.1
+    version: v0.2.2
     imagePullPolicy: IfNotPresent
   vfioManager:
     enabled: true
     repository: nvcr.io/nvidia
     image: cuda
-    version: 12.5.1-base-ubi8
+    version: 12.6.3-base-ubi9
     imagePullPolicy: IfNotPresent
     driverManager:
       repository: nvcr.io/nvidia/cloud-native
       image: k8s-driver-manager
-      version: v0.6.10
+      version: v0.7.0
       imagePullPolicy: IfNotPresent
       env:
       - name: ENABLE_GPU_POD_EVICTION
         value: 'false'
       - name: ENABLE_AUTO_DRAIN
         value: 'false'
   vgpuDeviceManager:
     enabled: true
     repository: nvcr.io/nvidia/cloud-native
     image: vgpu-device-manager
-    version: v0.2.7
+    version: v0.2.8
     imagePullPolicy: IfNotPresent
     config:
       default: default
       name: ''
   ccManager:
     enabled: false
@@ -186,13 +186,13 @@

     imagePullPolicy: IfNotPresent
     installDir: /usr/local/nvidia
   devicePlugin:
     enabled: true
     repository: nvcr.io/nvidia
     image: k8s-device-plugin
-    version: v0.16.2-ubi8
+    version: v0.17.0
     imagePullPolicy: IfNotPresent
     env:
     - name: PASS_DEVICE_SPECS
       value: 'true'
     - name: FAIL_ON_INIT_ERROR
       value: 'true'
@@ -208,19 +208,19 @@

       name: time-slicing-config
       default: any
   dcgm:
     enabled: false
     repository: nvcr.io/nvidia/cloud-native
     image: dcgm
-    version: 3.3.7-1-ubuntu22.04
+    version: 3.3.9-1-ubuntu22.04
     imagePullPolicy: IfNotPresent
   dcgmExporter:
     enabled: false
     repository: nvcr.io/nvidia/k8s
     image: dcgm-exporter
-    version: 3.3.7-3.5.0-ubuntu22.04
+    version: 3.3.9-3.6.1-ubuntu22.04
     imagePullPolicy: IfNotPresent
     env:
     - name: DCGM_EXPORTER_LISTEN
       value: :9400
     - name: DCGM_EXPORTER_KUBERNETES
       value: 'true'
@@ -233,24 +233,24 @@

       interval: 15s
       relabelings: []
   gfd:
     enabled: true
     repository: nvcr.io/nvidia
     image: k8s-device-plugin
-    version: v0.16.2-ubi8
+    version: v0.17.0
     imagePullPolicy: IfNotPresent
     env:
     - name: GFD_SLEEP_INTERVAL
       value: 60s
     - name: GFD_FAIL_ON_INIT_ERROR
       value: 'true'
   migManager:
     enabled: true
     repository: nvcr.io/nvidia/cloud-native
     image: k8s-mig-manager
-    version: v0.8.0-ubuntu20.04
+    version: v0.10.0-ubuntu20.04
     imagePullPolicy: IfNotPresent
     env:
     - name: WITH_REBOOT
       value: 'false'
     config:
       name: null
@@ -258,24 +258,24 @@

     gpuClientsConfig:
       name: ''
   nodeStatusExporter:
     enabled: false
     repository: nvcr.io/nvidia/cloud-native
     image: gpu-operator-validator
-    version: v24.6.1
+    version: v24.9.1
     imagePullPolicy: IfNotPresent
   gdrcopy:
     enabled: false
     repository: nvcr.io/nvidia/cloud-native
     image: gdrdrv
-    version: v2.4.1-1
+    version: v2.4.1-2
     imagePullPolicy: IfNotPresent
   sandboxWorkloads:
     enabled: false
     defaultWorkload: container
   sandboxDevicePlugin:
     enabled: true
     repository: nvcr.io/nvidia
     image: kubevirt-gpu-device-plugin
-    version: v1.2.9
+    version: v1.2.10
     imagePullPolicy: IfNotPresent
 
--- HelmRelease: kube-system/nvidia-gpu-operator Job: kube-system/gpu-operator-cleanup-crd

+++ HelmRelease: kube-system/nvidia-gpu-operator Job: kube-system/gpu-operator-cleanup-crd

@@ -21,15 +21,24 @@

         app.kubernetes.io/name: gpu-operator
         app.kubernetes.io/instance: nvidia-gpu-operator
         app.kubernetes.io/managed-by: Helm
         app.kubernetes.io/component: gpu-operator
     spec:
       serviceAccountName: gpu-operator
+      tolerations:
+      - effect: NoSchedule
+        key: node-role.kubernetes.io/master
+        operator: Equal
+        value: ''
+      - effect: NoSchedule
+        key: node-role.kubernetes.io/control-plane
+        operator: Equal
+        value: ''
       containers:
       - name: cleanup-crd
-        image: nvcr.io/nvidia/gpu-operator:v24.6.1
+        image: nvcr.io/nvidia/gpu-operator:v24.9.1
         imagePullPolicy: IfNotPresent
         command:
         - /bin/sh
         - -c
         - |
           kubectl delete clusterpolicy cluster-policy; kubectl delete crd clusterpolicies.nvidia.com;
--- HelmRelease: kube-system/nvidia-gpu-operator Job: kube-system/gpu-operator-upgrade-crd

+++ HelmRelease: kube-system/nvidia-gpu-operator Job: kube-system/gpu-operator-upgrade-crd

@@ -21,17 +21,26 @@

         app.kubernetes.io/name: gpu-operator
         app.kubernetes.io/instance: nvidia-gpu-operator
         app.kubernetes.io/managed-by: Helm
         app.kubernetes.io/component: gpu-operator
     spec:
       serviceAccountName: gpu-operator-upgrade-crd-hook-sa
+      tolerations:
+      - effect: NoSchedule
+        key: node-role.kubernetes.io/master
+        operator: Equal
+        value: ''
+      - effect: NoSchedule
+        key: node-role.kubernetes.io/control-plane
+        operator: Equal
+        value: ''
       containers:
       - name: upgrade-crd
-        image: nvcr.io/nvidia/gpu-operator:v24.6.1
+        image: nvcr.io/nvidia/gpu-operator:v24.9.1
         imagePullPolicy: IfNotPresent
         command:
         - /bin/sh
         - -c
         - |
-          kubectl apply -f /opt/gpu-operator/nvidia.com_clusterpolicies_crd.yaml; kubectl apply -f /opt/gpu-operator/nvidia.com_nvidiadrivers.yaml;
+          kubectl apply -f /opt/gpu-operator/nvidia.com_clusterpolicies.yaml; kubectl apply -f /opt/gpu-operator/nvidia.com_nvidiadrivers.yaml;
       restartPolicy: OnFailure
 

@rosey-the-renovator-bot
Copy link
Contributor

--- kubernetes/main/apps/kube-system/nvidia-gpu-operator/app Kustomization: flux-system/nvidia-gpu-operator HelmRelease: kube-system/nvidia-gpu-operator

+++ kubernetes/main/apps/kube-system/nvidia-gpu-operator/app Kustomization: flux-system/nvidia-gpu-operator HelmRelease: kube-system/nvidia-gpu-operator

@@ -13,13 +13,13 @@

     spec:
       chart: gpu-operator
       sourceRef:
         kind: HelmRepository
         name: nvidia-operator
         namespace: flux-system
-      version: v24.6.1
+      version: v24.9.1
   install:
     crds: CreateReplace
     createNamespace: true
     remediation:
       retries: 3
   interval: 30m

@rosey-the-renovator-bot rosey-the-renovator-bot bot merged commit b40fc9c into main Dec 31, 2024
16 checks passed
@renovate renovate bot deleted the renovate/gpu-operator-24.x branch December 31, 2024 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants