ROCm support for Dockerfile and build_image.sh #3381

jakki-amd · 2025-01-08T11:38:16Z

Description

Dockerfile and build_image.sh don't currently have support for ROCm. This PR adds support for ROCm in official Dockerfile and build_image.sh and reduces Docker image size for TorchServe with ROCm.

Fixes #3380

Type of change

New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Running CI image on AMD64 Ubuntu

test_huggingface_bert_model_parallel_inference has been failing on ROCm previously, no new failing tests introduced

====================================== short test summary info =======================================
FAILED test_handler.py::test_huggingface_bert_model_parallel_inference - assert 'Bloomberg has decided to publish a new report on the global economy' in '{\n  "code": 503...
================ 1 failed, 155 passed, 47 skipped, 11 warnings in 8275.01s (2:17:55) =================

Running production image on AMD64 Ubuntu

$ docker run -it --device=/dev/kfd --device=/dev/dri  torch-serve-rocm-test-production-stage
<--- logs --->
Torchserve version: 0.12.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 8
Number of CPUs: 128
Max heap size: 30688 M
Python executable: /home/venv/bin/python
Config file: /home/model-server/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 8
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /home/model-server/wf-store
CPP log config: N/A
Model config: N/A
System metrics command: default
Model API enabled: false
2025-01-07T15:49:46,355 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2025-01-07T15:49:46,367 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2025-01-07T15:49:46,422 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2025-01-07T15:49:46,423 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2025-01-07T15:49:46,424 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2025-01-07T15:49:46,424 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2025-01-07T15:49:46,424 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Have you made corresponding changes to the documentation?
Has code been commented, particularly in hard-to-understand areas?

Add Dockerfile and build_image.sh modification

7b2130d

pytorch-bot bot added the module: rocm label Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm support for Dockerfile and build_image.sh #3381

ROCm support for Dockerfile and build_image.sh #3381

jakki-amd commented Jan 8, 2025 •

edited

Loading

ROCm support for Dockerfile and build_image.sh #3381

Are you sure you want to change the base?

ROCm support for Dockerfile and build_image.sh #3381

Conversation

jakki-amd commented Jan 8, 2025 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

jakki-amd commented Jan 8, 2025 •

edited

Loading