GPU enabled pipeline behaviour when the Cluster doesn't have enough GPU #11497
Unanswered
rajendra-avesha
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have DAG pipeline with two components; one component requires 1 cpu and other requires 1 cpu and 1 gpu. When I tried to run the pipeline using kubeflow UI.
I had noticed that it is not creating gpu-dag-pipeline-5nfmx-container-impl-xxxxx pods for both components. Please find the snapshot of the pods as follows:
kubectl get pods -n kubeflow-user-example-com --watch NAME READY STATUS RESTARTS AGE ml-pipeline-ui-artifact-7779f6ddc8-xdsjb 2/2 Running 0 6d23h ml-pipeline-visualizationserver-777747b89b-pzl2r 2/2 Running 0 6d23h gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Pending 0 0s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Pending 0 0s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Init:0/1 0 1s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Init:0/1 0 1s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 PodInitializing 0 2s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 2/2 Running 0 3s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 1/2 NotReady 0 4s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 1/2 NotReady 0 4s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Completed 0 5s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Completed 0 6s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Completed 0 6s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Pending 0 1s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Pending 0 1s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Init:0/1 0 1s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Completed 0 11s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Init:0/1 0 1s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 PodInitializing 0 2s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 2/2 Running 0 3s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 1/2 NotReady 0 4s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 1/2 NotReady 0 4s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Completed 0 5s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Completed 0 6s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Completed 0 6s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Pending 0 0s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Pending 0 0s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Init:0/1 0 0s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Completed 0 11s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Init:0/1 0 0s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 PodInitializing 0 1s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 2/2 Running 0 2s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 1/2 NotReady 0 3s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 1/2 NotReady 0 3s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Completed 0 4s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Completed 0 5s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Completed 0 5s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Completed 0 10s
Please find the workflow CR details as follows
`kubectl describe workflow gpu-dag-pipeline-5nfmx -n kubeflow-user-example-com
Name: gpu-dag-pipeline-5nfmx
Namespace: kubeflow-user-example-com
Labels: pipeline/persistedFinalState=true
pipeline/runid=7f190382-7cfc-49da-b4e3-9d3b90580e71
workflows.argoproj.io/completed=true
workflows.argoproj.io/phase=Succeeded
Annotations: pipelines.kubeflow.org/components-comp-preprocess-gpu-stage:
{"executorLabel":"exec-preprocess-gpu-stage","outputDefinitions":{"parameters":{"Output":{"parameterType":"STRING"}}}}
pipelines.kubeflow.org/components-comp-process-gpu-stage:
{"executorLabel":"exec-process-gpu-stage","inputDefinitions":{"parameters":{"input_data":{"parameterType":"STRING"}}},"outputDefinitions":...
pipelines.kubeflow.org/components-root:
{"dag":{"tasks":{"preprocess-gpu-stage":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-preprocess-gpu-stage"},"taskIn...
pipelines.kubeflow.org/implementations-comp-preprocess-gpu-stage:
{"args":["--executor_input","{{$}}","--function_to_execute","preprocess_gpu_stage"],"command":["sh","-c","\nif ! [ -x "$(command -v pip)...
pipelines.kubeflow.org/implementations-comp-process-gpu-stage:
{"args":["--executor_input","{{$}}","--function_to_execute","process_gpu_stage"],"command":["sh","-c","\nif ! [ -x "$(command -v pip)" ]...
pipelines.kubeflow.org/run_name: Run of new-gpu-dag (fa3b9)
workflows.argoproj.io/pod-name-format: v2
API Version: argoproj.io/v1alpha1
Kind: Workflow
Metadata:
Creation Timestamp: 2025-01-02T11:53:52Z
Generate Name: gpu-dag-pipeline-
Generation: 5
Resource Version: 7028722
UID: c772cafa-97ec-4784-ae13-f1a5a9f649bd
Spec:
Arguments:
Entrypoint: entrypoint
Pod Metadata:
Annotations:
pipelines.kubeflow.org/v2_component: true
Labels:
pipeline/runid: 7f190382-7cfc-49da-b4e3-9d3b90580e71
pipelines.kubeflow.org/v2_component: true
Service Account Name: default-editor
Templates:
Container:
Args:
--type
CONTAINER
--pipeline_name
gpu-dag-pipeline
--run_id
7f190382-7cfc-49da-b4e3-9d3b90580e71
--dag_execution_id
{{inputs.parameters.parent-dag-id}}
--component
{{inputs.parameters.component}}
--task
{{inputs.parameters.task}}
--container
{{inputs.parameters.container}}
--iteration_index
{{inputs.parameters.iteration-index}}
--cached_decision_path
{{outputs.parameters.cached-decision.path}}
--pod_spec_patch_path
{{outputs.parameters.pod-spec-patch.path}}
--condition_path
{{outputs.parameters.condition.path}}
--kubernetes_config
{{inputs.parameters.kubernetes-config}}
Command:
driver
Image: gcr.io/ml-pipeline/kfp-driver@sha256:3c0665cd36aa87e4359a4c8b6271dcba5bdd817815cd0496ed12eb5dde5fd2ec
Name:
Resources:
Limits:
Cpu: 500m
Memory: 512Mi
Requests:
Cpu: 100m
Memory: 64Mi
Inputs:
Parameters:
Name: component
Name: task
Name: container
Name: parent-dag-id
Default: -1
Name: iteration-index
Default:
Name: kubernetes-config
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: system-container-driver
Outputs:
Parameters:
Name: pod-spec-patch
Value From:
Default:
Path: /tmp/outputs/pod-spec-patch
Default: false
Name: cached-decision
Value From:
Default: false
Path: /tmp/outputs/cached-decision
Name: condition
Value From:
Default: true
Path: /tmp/outputs/condition
Dag:
Tasks:
Arguments:
Parameters:
Name: pod-spec-patch
Value: {{inputs.parameters.pod-spec-patch}}
Name: executor
Template: system-container-impl
When: {{inputs.parameters.cached-decision}} != true
Inputs:
Parameters:
Name: pod-spec-patch
Default: false
Name: cached-decision
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: system-container-executor
Outputs:
Container:
Command:
should-be-overridden-during-runtime
Env:
Name: KFP_POD_NAME
Value From:
Field Ref:
Field Path: metadata.name
Name: KFP_POD_UID
Value From:
Field Ref:
Field Path: metadata.uid
Env From:
Config Map Ref:
Name: metadata-grpc-configmap
Optional: true
Image: gcr.io/ml-pipeline/should-be-overridden-during-runtime
Name:
Resources:
Volume Mounts:
Mount Path: /kfp-launcher
Name: kfp-launcher
Init Containers:
Command:
launcher-v2
--copy
/kfp-launcher/launch
Image: gcr.io/ml-pipeline/kfp-launcher@sha256:8fe5e6e4718f20b021736022ad3741ddf2abd82aa58c86ae13e89736fdc3f08f
Name: kfp-launcher
Resources:
Limits:
Cpu: 500m
Memory: 128Mi
Requests:
Cpu: 100m
Volume Mounts:
Mount Path: /kfp-launcher
Name: kfp-launcher
Inputs:
Parameters:
Name: pod-spec-patch
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: system-container-impl
Outputs:
Pod Spec Patch: {{inputs.parameters.pod-spec-patch}}
Volumes:
Empty Dir:
Name: kfp-launcher
Dag:
Tasks:
Arguments:
Parameters:
Name: component
Value: {{workflow.annotations.pipelines.kubeflow.org/components-comp-preprocess-gpu-stage}}
Name: task
Value: {"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-preprocess-gpu-stage"},"taskInfo":{"name":"preprocess-gpu-stage"}}
Name: container
Value: {{workflow.annotations.pipelines.kubeflow.org/implementations-comp-preprocess-gpu-stage}}
Name: parent-dag-id
Value: {{inputs.parameters.parent-dag-id}}
Name: preprocess-gpu-stage-driver
Template: system-container-driver
Arguments:
Parameters:
Name: pod-spec-patch
Value: {{tasks.preprocess-gpu-stage-driver.outputs.parameters.pod-spec-patch}}
Default: false
Name: cached-decision
Value: {{tasks.preprocess-gpu-stage-driver.outputs.parameters.cached-decision}}
Depends: preprocess-gpu-stage-driver.Succeeded
Name: preprocess-gpu-stage
Template: system-container-executor
Arguments:
Parameters:
Name: component
Value: {{workflow.annotations.pipelines.kubeflow.org/components-comp-process-gpu-stage}}
Name: task
Value: {"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-process-gpu-stage"},"dependentTasks":["preprocess-gpu-stage"],"inputs":{"parameters":{"input_data":{"taskOutputParameter":{"outputParameterKey":"Output","producerTask":"preprocess-gpu-stage"}}}},"taskInfo":{"name":"process-gpu-stage"}}
Name: container
Value: {{workflow.annotations.pipelines.kubeflow.org/implementations-comp-process-gpu-stage}}
Name: parent-dag-id
Value: {{inputs.parameters.parent-dag-id}}
Depends: preprocess-gpu-stage.Succeeded
Name: process-gpu-stage-driver
Template: system-container-driver
Arguments:
Parameters:
Name: pod-spec-patch
Value: {{tasks.process-gpu-stage-driver.outputs.parameters.pod-spec-patch}}
Default: false
Name: cached-decision
Value: {{tasks.process-gpu-stage-driver.outputs.parameters.cached-decision}}
Depends: process-gpu-stage-driver.Succeeded
Name: process-gpu-stage
Template: system-container-executor
Inputs:
Parameters:
Name: parent-dag-id
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: root
Outputs:
Container:
Args:
--type
{{inputs.parameters.driver-type}}
--pipeline_name
gpu-dag-pipeline
--run_id
7f190382-7cfc-49da-b4e3-9d3b90580e71
--dag_execution_id
{{inputs.parameters.parent-dag-id}}
--component
{{inputs.parameters.component}}
--task
{{inputs.parameters.task}}
--runtime_config
{{inputs.parameters.runtime-config}}
--iteration_index
{{inputs.parameters.iteration-index}}
--execution_id_path
{{outputs.parameters.execution-id.path}}
--iteration_count_path
{{outputs.parameters.iteration-count.path}}
--condition_path
{{outputs.parameters.condition.path}}
Command:
driver
Image: gcr.io/ml-pipeline/kfp-driver@sha256:3c0665cd36aa87e4359a4c8b6271dcba5bdd817815cd0496ed12eb5dde5fd2ec
Name:
Resources:
Limits:
Cpu: 500m
Memory: 512Mi
Requests:
Cpu: 100m
Memory: 64Mi
Inputs:
Parameters:
Name: component
Default:
Name: runtime-config
Default:
Name: task
Default: 0
Name: parent-dag-id
Default: -1
Name: iteration-index
Default: DAG
Name: driver-type
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: system-dag-driver
Outputs:
Parameters:
Name: execution-id
Value From:
Path: /tmp/outputs/execution-id
Name: iteration-count
Value From:
Default: 0
Path: /tmp/outputs/iteration-count
Name: condition
Value From:
Default: true
Path: /tmp/outputs/condition
Dag:
Tasks:
Arguments:
Parameters:
Name: component
Value: {{workflow.annotations.pipelines.kubeflow.org/components-root}}
Name: runtime-config
Value: {}
Name: driver-type
Value: ROOT_DAG
Name: root-driver
Template: system-dag-driver
Arguments:
Parameters:
Name: parent-dag-id
Value: {{tasks.root-driver.outputs.parameters.execution-id}}
Name: condition
Value:
Depends: root-driver.Succeeded
Name: root
Template: root
Inputs:
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: entrypoint
Outputs:
Status:
Artifact GC Status:
Not Specified: true
Artifact Repository Ref:
Artifact Repository:
Archive Logs: true
s3:
Access Key Secret:
Key: accesskey
Name: mlpipeline-minio-artifact
Bucket: mlpipeline
Endpoint: minio-service.kubeflow:9000
Insecure: true
Key Format: artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}
Secret Key Secret:
Key: secretkey
Name: mlpipeline-minio-artifact
Default: true
Conditions:
Status: False
Type: PodRunning
Status: True
Type: Completed
Finished At: 2025-01-02T11:54:23Z
Nodes:
gpu-dag-pipeline-5nfmx:
Children:
gpu-dag-pipeline-5nfmx-2521779877
Display Name: gpu-dag-pipeline-5nfmx
Finished At: 2025-01-02T11:54:23Z
Id: gpu-dag-pipeline-5nfmx
Name: gpu-dag-pipeline-5nfmx
Outbound Nodes:
gpu-dag-pipeline-5nfmx-411402406
Phase: Succeeded
Progress: 3/3
Resources Duration:
Cpu: 9
Memory: 6
Started At: 2025-01-02T11:53:52Z
Template Name: entrypoint
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: DAG
gpu-dag-pipeline-5nfmx-1052282141:
Boundary ID: gpu-dag-pipeline-5nfmx-3955520920
Children:
gpu-dag-pipeline-5nfmx-2860418848
Display Name: preprocess-gpu-stage-driver
Finished At: 2025-01-02T11:54:06Z
Host Node Name: lke293878-490567-3333acbc0000
Id: gpu-dag-pipeline-5nfmx-1052282141
Inputs:
Parameters:
Name: component
Value: {"executorLabel":"exec-preprocess-gpu-stage","outputDefinitions":{"parameters":{"Output":{"parameterType":"STRING"}}}}
Name: task
Value: {"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-preprocess-gpu-stage"},"taskInfo":{"name":"preprocess-gpu-stage"}}
Name: container
Value: {"args":["--executor_input","{{$}}","--function_to_execute","preprocess_gpu_stage"],"command":["sh","-c","\nif ! [ -x "$(command -v pip)" ]; then\n python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.11.0' '--no-deps' 'typing-extensions\u003e=3.7.4,\u003c5; python_version\u003c"3.9"' \u0026\u0026 "$0" "$@"\n","sh","-ec","program_path=$(mktemp -d)\n\nprintf "%s" "$0" \u003e "$program_path/ephemeral_component.py"\n_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"\n","\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import *\n\ndef preprocess_gpu_stage() -\u003e str:\n """\n Simulates data preprocessing. Returns a string representing processed data.\n """\n print("Preprocessing data...")\n return "processed_data"\n\n"],"image":"python:3.10","resources":{"cpuLimit":1}}
Name: parent-dag-id
Value: 68
Default: -1
Name: iteration-index
Value: -1
Default:
Name: kubernetes-config
Value:
Name: gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage-driver
Outputs:
Artifacts:
Name: main-logs
s3:
Key: artifacts/gpu-dag-pipeline-5nfmx/2025/01/02/gpu-dag-pipeline-5nfmx-system-container-driver-1052282141/main.log
Exit Code: 0
Parameters:
Name: pod-spec-patch
Value:
Value From:
Default:
Path: /tmp/outputs/pod-spec-patch
Default: false
Name: cached-decision
Value: true
Value From:
Default: false
Path: /tmp/outputs/cached-decision
Name: condition
Value: nil
Value From:
Default: true
Path: /tmp/outputs/condition
Phase: Succeeded
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:54:02Z
Template Name: system-container-driver
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Pod
gpu-dag-pipeline-5nfmx-1363621287:
Boundary ID: gpu-dag-pipeline-5nfmx-3955520920
Children:
gpu-dag-pipeline-5nfmx-411402406
Display Name: process-gpu-stage
Finished At: 2025-01-02T11:54:23Z
Id: gpu-dag-pipeline-5nfmx-1363621287
Inputs:
Parameters:
Name: pod-spec-patch
Value:
Default: false
Name: cached-decision
Value: true
Name: gpu-dag-pipeline-5nfmx.root.process-gpu-stage
Outbound Nodes:
gpu-dag-pipeline-5nfmx-411402406
Phase: Succeeded
Started At: 2025-01-02T11:54:23Z
Template Name: system-container-executor
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: DAG
gpu-dag-pipeline-5nfmx-2521779877:
Boundary ID: gpu-dag-pipeline-5nfmx
Children:
gpu-dag-pipeline-5nfmx-3955520920
Display Name: root-driver
Finished At: 2025-01-02T11:53:56Z
Host Node Name: lke293878-490567-3333acbc0000
Id: gpu-dag-pipeline-5nfmx-2521779877
Inputs:
Parameters:
Name: component
Value: {"dag":{"tasks":{"preprocess-gpu-stage":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-preprocess-gpu-stage"},"taskInfo":{"name":"preprocess-gpu-stage"}},"process-gpu-stage":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-process-gpu-stage"},"dependentTasks":["preprocess-gpu-stage"],"inputs":{"parameters":{"input_data":{"taskOutputParameter":{"outputParameterKey":"Output","producerTask":"preprocess-gpu-stage"}}}},"taskInfo":{"name":"process-gpu-stage"}}}}}
Default:
Name: runtime-config
Value: {}
Default:
Name: task
Value:
Default: 0
Name: parent-dag-id
Value: 0
Default: -1
Name: iteration-index
Value: -1
Default: DAG
Name: driver-type
Value: ROOT_DAG
Name: gpu-dag-pipeline-5nfmx.root-driver
Outputs:
Artifacts:
Name: main-logs
s3:
Key: artifacts/gpu-dag-pipeline-5nfmx/2025/01/02/gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877/main.log
Exit Code: 0
Parameters:
Name: execution-id
Value: 68
Value From:
Path: /tmp/outputs/execution-id
Name: iteration-count
Value: 0
Value From:
Default: 0
Path: /tmp/outputs/iteration-count
Name: condition
Value: nil
Value From:
Default: true
Path: /tmp/outputs/condition
Phase: Succeeded
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:53:52Z
Template Name: system-dag-driver
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Pod
gpu-dag-pipeline-5nfmx-2860418848:
Boundary ID: gpu-dag-pipeline-5nfmx-3955520920
Children:
gpu-dag-pipeline-5nfmx-4250179659
Display Name: preprocess-gpu-stage
Finished At: 2025-01-02T11:54:13Z
Id: gpu-dag-pipeline-5nfmx-2860418848
Inputs:
Parameters:
Name: pod-spec-patch
Value:
Default: false
Name: cached-decision
Value: true
Name: gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage
Outbound Nodes:
gpu-dag-pipeline-5nfmx-4250179659
Phase: Succeeded
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:54:13Z
Template Name: system-container-executor
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: DAG
gpu-dag-pipeline-5nfmx-3955520920:
Boundary ID: gpu-dag-pipeline-5nfmx
Children:
gpu-dag-pipeline-5nfmx-1052282141
Display Name: root
Finished At: 2025-01-02T11:54:23Z
Id: gpu-dag-pipeline-5nfmx-3955520920
Inputs:
Parameters:
Name: parent-dag-id
Value: 68
Name: gpu-dag-pipeline-5nfmx.root
Outbound Nodes:
gpu-dag-pipeline-5nfmx-411402406
Phase: Succeeded
Progress: 2/2
Resources Duration:
Cpu: 6
Memory: 4
Started At: 2025-01-02T11:54:02Z
Template Name: root
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: DAG
gpu-dag-pipeline-5nfmx-411402406:
Boundary ID: gpu-dag-pipeline-5nfmx-1363621287
Display Name: executor
Finished At: 2025-01-02T11:54:23Z
Id: gpu-dag-pipeline-5nfmx-411402406
Message: when 'true != true' evaluated false
Name: gpu-dag-pipeline-5nfmx.root.process-gpu-stage.executor
Phase: Skipped
Started At: 2025-01-02T11:54:23Z
Template Name: system-container-impl
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Skipped
gpu-dag-pipeline-5nfmx-4177297792:
Boundary ID: gpu-dag-pipeline-5nfmx-3955520920
Children:
gpu-dag-pipeline-5nfmx-1363621287
Display Name: process-gpu-stage-driver
Finished At: 2025-01-02T11:54:16Z
Host Node Name: lke293878-490567-3333acbc0000
Id: gpu-dag-pipeline-5nfmx-4177297792
Inputs:
Parameters:
Name: component
Value: {"executorLabel":"exec-process-gpu-stage","inputDefinitions":{"parameters":{"input_data":{"parameterType":"STRING"}}},"outputDefinitions":{"parameters":{"Output":{"parameterType":"STRING"}}}}
Name: task
Value: {"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-process-gpu-stage"},"dependentTasks":["preprocess-gpu-stage"],"inputs":{"parameters":{"input_data":{"taskOutputParameter":{"outputParameterKey":"Output","producerTask":"preprocess-gpu-stage"}}}},"taskInfo":{"name":"process-gpu-stage"}}
Name: container
Value: {"args":["--executor_input","{{$}}","--function_to_execute","process_gpu_stage"],"command":["sh","-c","\nif ! [ -x "$(command -v pip)" ]; then\n python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.11.0' '--no-deps' 'typing-extensions\u003e=3.7.4,\u003c5; python_version\u003c"3.9"' \u0026\u0026 "$0" "$@"\n","sh","-ec","program_path=$(mktemp -d)\n\nprintf "%s" "$0" \u003e "$program_path/ephemeral_component.py"\n_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"\n","\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import *\n\ndef process_gpu_stage(input_data: str) -\u003e str:\n """\n Simulates data processing. Consumes preprocessed data and returns the result.\n """\n print(f"Processing data: {input_data}")\n return f"result_from_{input_data}"\n\n"],"image":"python:3.10","resources":{"accelerator":{"count":"1","type":"nvidia.com/gpu"},"cpuLimit":1}}
Name: parent-dag-id
Value: 68
Default: -1
Name: iteration-index
Value: -1
Default:
Name: kubernetes-config
Value:
Name: gpu-dag-pipeline-5nfmx.root.process-gpu-stage-driver
Outputs:
Artifacts:
Name: main-logs
s3:
Key: artifacts/gpu-dag-pipeline-5nfmx/2025/01/02/gpu-dag-pipeline-5nfmx-system-container-driver-4177297792/main.log
Exit Code: 0
Parameters:
Name: pod-spec-patch
Value:
Value From:
Default:
Path: /tmp/outputs/pod-spec-patch
Default: false
Name: cached-decision
Value: true
Value From:
Default: false
Path: /tmp/outputs/cached-decision
Name: condition
Value: nil
Value From:
Default: true
Path: /tmp/outputs/condition
Phase: Succeeded
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:54:13Z
Template Name: system-container-driver
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Pod
gpu-dag-pipeline-5nfmx-4250179659:
Boundary ID: gpu-dag-pipeline-5nfmx-2860418848
Children:
gpu-dag-pipeline-5nfmx-4177297792
Display Name: executor
Finished At: 2025-01-02T11:54:13Z
Id: gpu-dag-pipeline-5nfmx-4250179659
Message: when 'true != true' evaluated false
Name: gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage.executor
Phase: Skipped
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:54:13Z
Template Name: system-container-impl
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Skipped
Phase: Succeeded
Progress: 3/3
Resources Duration:
Cpu: 9
Memory: 6
Started At: 2025-01-02T11:53:52Z
Events:
Type Reason Age From Message
Normal WorkflowRunning 110s workflow-controller Workflow Running
Normal WorkflowNodeRunning 109s workflow-controller Running node gpu-dag-pipeline-5nfmx
Normal WorkflowNodeRunning 99s workflow-controller Running node gpu-dag-pipeline-5nfmx.root-driver
Normal WorkflowNodeSucceeded 99s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root-driver
Normal WorkflowNodeRunning 99s workflow-controller Running node gpu-dag-pipeline-5nfmx.root
Normal WorkflowNodeRunning 89s workflow-controller Running node gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage-driver
Normal WorkflowNodeSucceeded 89s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage-driver
Normal WorkflowNodeRunning 89s workflow-controller Running node gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage
Normal WorkflowNodeSucceeded 89s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage
Normal WorkflowSucceeded 79s workflow-controller Workflow completed
Normal WorkflowNodeSucceeded 79s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx
Normal WorkflowNodeSucceeded 79s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root
Normal WorkflowNodeRunning 79s workflow-controller Running node gpu-dag-pipeline-5nfmx.root.process-gpu-stage-driver
Normal WorkflowNodeSucceeded 79s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root.process-gpu-stage-driver
Normal WorkflowNodeRunning 79s workflow-controller Running node gpu-dag-pipeline-5nfmx.root.process-gpu-stage
Normal WorkflowNodeSucceeded 79s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root.process-gpu-stage
`
If any of the component requirement cpu / gpu is not met pipeline is not running any of the components which is having enough resources. Is this expected behaviour?
Please find my sample source which I had used to simulate the scenario
`import kfp
from kfp.dsl import pipeline, component
Stage 1: Preprocessing Component
@component(base_image="python:3.10")
def preprocess_gpu_stage() -> str:
"""
Simulates data preprocessing. Returns a string representing processed data.
"""
print("Preprocessing data...")
return "processed_data"
Stage 2: Processing Component
@component(base_image="python:3.10")
def process_gpu_stage(input_data: str) -> str:
"""
Simulates data processing. Consumes preprocessed data and returns the result.
"""
print(f"Processing data: {input_data}")
return f"result_from_{input_data}"
Define the DAG pipeline
@pipeline(
name="GPU DAG Pipeline",
description="A sample pipeline with two stages using a DAG structure."
)
def gpu_dag_sample_pipeline():
# Stage 1: Preprocessing
preprocess_task = preprocess_gpu_stage().set_cpu_limit("1")
Compile the pipeline to a YAML file
if name == "main":
kfp.compiler.Compiler().compile(
pipeline_func=gpu_dag_sample_pipeline,
package_path="gpu_dag_sample_pipeline.yaml"
)
`
gpu_dag_sample_pipeline.yaml.zip
Please let me know if I was missing anything
Thanks
Beta Was this translation helpful? Give feedback.
All reactions