Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm support for Dockerfile and build_image.sh #3381

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jakki-amd
Copy link
Contributor

@jakki-amd jakki-amd commented Jan 8, 2025

Description

Dockerfile and build_image.sh don't currently have support for ROCm. This PR adds support for ROCm in official Dockerfile and build_image.sh and reduces Docker image size for TorchServe with ROCm.

Fixes #3380

Type of change

  • New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

  • Running CI image on AMD64 Ubuntu

test_huggingface_bert_model_parallel_inference has been failing on ROCm previously, no new failing tests introduced

====================================== short test summary info =======================================
FAILED test_handler.py::test_huggingface_bert_model_parallel_inference - assert 'Bloomberg has decided to publish a new report on the global economy' in '{\n  "code": 503...
================ 1 failed, 155 passed, 47 skipped, 11 warnings in 8275.01s (2:17:55) =================
  • Running production image on AMD64 Ubuntu
$ docker run -it --device=/dev/kfd --device=/dev/dri  torch-serve-rocm-test-production-stage
<--- logs --->
Torchserve version: 0.12.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 8
Number of CPUs: 128
Max heap size: 30688 M
Python executable: /home/venv/bin/python
Config file: /home/model-server/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 8
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /home/model-server/wf-store
CPP log config: N/A
Model config: N/A
System metrics command: default
Model API enabled: false
2025-01-07T15:49:46,355 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2025-01-07T15:49:46,367 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2025-01-07T15:49:46,422 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2025-01-07T15:49:46,423 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2025-01-07T15:49:46,424 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2025-01-07T15:49:46,424 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2025-01-07T15:49:46,424 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Have you made corresponding changes to the documentation?
  • Has code been commented, particularly in hard-to-understand areas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ROCm support missing from Dockerfile
1 participant