Skip to content

Commit

Permalink
Multiple Model Configurations (#7185)
Browse files Browse the repository at this point in the history
Add `--mode-config-name` option when starting Triton server. Allow users to create multiple configurations and select a custom configuration other than the default `model/config.pbtxt`.
  • Loading branch information
yinggeh authored May 16, 2024
1 parent 58d3396 commit 6d9849d
Show file tree
Hide file tree
Showing 7 changed files with 414 additions and 6 deletions.
80 changes: 79 additions & 1 deletion docs/user_guide/model_configuration.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -321,6 +321,84 @@ configuration](#minimal-model-configuration). You must still provide
the optional portions of the model configuration by editing the
config.pbtxt file.

## Custom Model Configuration

Sometimes when multiple devices running Triton instances that share one
model repository, it is necessary to have models configured differently
on each platform in order to achieve the best performance. Triton allows
users to pick the custom model configuration name by setting `--model-config-name` option.

For example, when running `./tritonserver --model-repository=</path/to/model/repository> --model-config-name=h100`,
the server will search the custom configuration file `h100.pbtxt` under
`/path/to/model/repository/<model-name>/configs` directory for each model
that is loaded. If `h100.pbtxt` exists, it will be used as the configuration
for this model. Otherwise, the default configuration `/path/to/model/repository/<model-name>/config.pbtxt`
or [auto-generated model configuration](#auto-generated-model-configuration)
will be selected based on the settings.

Custom model configuration also works with `Explicit` and `Poll` model
control modes. Users may delete or add new custom configurations and the
server will pick the configuration file for each loaded model dynamically.

Note: custom model configuration name should not contain any space character.

Example 1: --model-config-name=h100
```
.
└── model_repository/
├── model_a/
│ ├── configs/
│ │ ├── v100.pbtxt
│ │ └── **h100.pbtxt**
│ └── config.pbtxt
├── model_b/
│ ├── configs/
│ │ └── v100.pbtxt
│ └── **config.pbtxt**
└── model_c/
├── configs/
│ └── config.pbtxt
└── **config.pbtxt**
```

Example 2: --model-config-name=config
```
.
└── model_repository/
├── model_a/
│ ├── configs/
│ │ ├── v100.pbtxt
│ │ └── h100.pbtxt
│ └── **config.pbtxt**
├── model_b/
│ ├── configs/
│ │ └── v100.pbtxt
│ └── **config.pbtxt**
└── model_c/
├── configs/
│ └── **config.pbtxt**
└── config.pbtxt
```

Example 3: --model-config-name not set
```
.
└── model_repository/
├── model_a/
│ ├── configs/
│ │ ├── v100.pbtxt
│ │ └── h100.pbtxt
│ └── **config.pbtxt**
├── model_b/
│ ├── configs/
│ │ └── v100.pbtxt
│ └── **config.pbtxt**
└── model_c/
├── configs/
│ └── config.pbtxt
└── **config.pbtxt**
```

### Default Max Batch Size and Dynamic Batcher

When a model is using the auto-complete feature, a default maximum
Expand Down
15 changes: 12 additions & 3 deletions docs/user_guide/model_repository.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -57,6 +57,8 @@ The corresponding repository layout must be:
<model-name>/
[config.pbtxt]
[<output-labels-file> ...]
[configs]/
[<custom-config-file> ...]
<version>/
<model-definition-file>
<version>/
Expand All @@ -65,6 +67,8 @@ The corresponding repository layout must be:
<model-name>/
[config.pbtxt]
[<output-labels-file> ...]
[configs]/
[<custom-config-file> ...]
<version>/
<model-definition-file>
<version>/
Expand All @@ -83,10 +87,15 @@ config.pbtxt is required while for others it is optional. See
Configuration](model_configuration.md#auto-generated-model-configuration)
for more information.

Each <model-name> directory may include an optional sub-directory configs.
Within the configs directory there must be zero or more <custom-config-file>
with .pbtxt file extension. For more information about how the custom model
configuration is handled by Triton see [Custom Model Configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#custom-model-configuration).

Each <model-name> directory must have at least one numeric
sub-directory representing a version of the model. For more
sub-directory representing a version of the model. For more
information about how the model versions are handled by Triton see
[Model Versions](#model-versions). Each model is executed by a
[Model Versions](#model-versions). Each model is executed by a
specific
[backend](https://github.com/triton-inference-server/backend/blob/main/README.md).
Within each version sub-directory there must be the files required by
Expand Down
145 changes: 145 additions & 0 deletions qa/L0_custom_model_config/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#!/bin/bash
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

REPO_VERSION=${NVIDIA_TRITON_SERVER_VERSION}
if [ "$#" -ge 1 ]; then
REPO_VERSION=$1
fi
if [ -z "$REPO_VERSION" ]; then
echo -e "Repository version must be specified"
echo -e "\n***\n*** Test Failed\n***"
exit 1
fi
if [ ! -z "$TEST_REPO_ARCH" ]; then
REPO_VERSION=${REPO_VERSION}_${TEST_REPO_ARCH}
fi

export CUDA_VISIBLE_DEVICES=0

DATADIR="/data/inferenceserver/${REPO_VERSION}"
CLIENT_LOG="./client.log"
SERVER_LOG="./inference_server.log"

SERVER=/opt/tritonserver/bin/tritonserver
source ../common/util.sh

RET=0
rm -fr *.log

rm -fr models && mkdir models
cp -r $DATADIR/qa_model_repository/savedmodel_nobatch_float32_float32_float32 models/.
mkdir models/savedmodel_nobatch_float32_float32_float32/configs

test_custom_config()
{
VERSION=$@

run_server
if [ "$SERVER_PID" == "0" ]; then
echo -e "\n***\n*** Failed to start $SERVER\n***"
cat $SERVER_LOG
exit 1
fi

set +e
code=`curl -s -w %{http_code} -o ./curl.out localhost:8000/v2/models/savedmodel_nobatch_float32_float32_float32/config`
set -e
if [ "$code" != "200" ]; then
cat $out.out
echo -e "\n***\n*** Test Failed to GET model configuration\n***"
RET=1
fi

matches=`grep -o "\"version_policy\":{\"specific\":{\"versions\":\[$VERSION\]}}" curl.out | wc -l`
if [ $matches -ne 1 ]; then
cat curl.out
echo -e "\n***\n*** Expected 1 version_policy:specific:versions, got $matches\n***"
RET=1
fi

kill $SERVER_PID
wait $SERVER_PID
}

# Prepare the file structure
VERSION_DEFAULT="1,3"
VERSION_H100="1"
VERSION_V100="2"
VERSION_CUSTOM="3"

# Distinguish configs with different model versions
(cd models/savedmodel_nobatch_float32_float32_float32 && \
sed -i "s/^version_policy:.*/version_policy: { specific: { versions: [$VERSION_DEFAULT] }}/" config.pbtxt)
(cd models/savedmodel_nobatch_float32_float32_float32 && \
cp config.pbtxt configs/h100.pbtxt && \
sed -i "s/^version_policy:.*/version_policy: { specific: { versions: [$VERSION_H100] }}/" configs/h100.pbtxt)
(cd models/savedmodel_nobatch_float32_float32_float32 && \
cp config.pbtxt configs/v100.pbtxt && \
sed -i "s/^version_policy:.*/version_policy: { specific: { versions: [$VERSION_V100] }}/" configs/v100.pbtxt)
(cd models/savedmodel_nobatch_float32_float32_float32 && \
cp config.pbtxt configs/config.pbtxt && \
sed -i "s/^version_policy:.*/version_policy: { specific: { versions: [$VERSION_CUSTOM] }}/" configs/config.pbtxt)

# Test default model config
SERVER_ARGS="--model-repository=`pwd`/models"
test_custom_config $VERSION_DEFAULT

# Test model-config-name=h100
SERVER_ARGS="--model-repository=`pwd`/models --model-config-name=h100"
test_custom_config $VERSION_H100

# Test model-config-name=v100
SERVER_ARGS="--model-repository=`pwd`/models --model-config-name=v100"
test_custom_config $VERSION_V100

# Test model-config-name=config
SERVER_ARGS="--model-repository=`pwd`/models --model-config-name=config"
test_custom_config $VERSION_CUSTOM

# Test model-config-name=h200. Expect fall back to default config since h200 config does not exist.
SERVER_ARGS="--model-repository=`pwd`/models --model-config-name=h200"
test_custom_config $VERSION_DEFAULT

# Test model-config-name=
SERVER_ARGS="--model-repository=`pwd`/models --model-config-name="
run_server
if [ "$SERVER_PID" != "0" ]; then
echo -e "\n***\n*** Failed: $SERVER started successfully when it was expected to fail\n***"
cat $SERVER_LOG
RET=1

kill $SERVER_PID
wait $SERVER_PID
fi

if [ $RET -eq 0 ]; then
echo -e "\n***\n*** Test Passed\n***"
else
echo -e "\n***\n*** Test Failed\n***"
fi

exit $RET
90 changes: 89 additions & 1 deletion qa/L0_lifecycle/lifecycle_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1259,7 +1259,7 @@ def test_dynamic_file_delete(self):
model_version=1,
)
self.assertTrue(
False, "expected error for unavailable model " + graphdef_name
False, "expected error for unavailable model " + model_name
)
except Exception as ex:
self.assertIn("Request for unknown model", ex.message())
Expand Down Expand Up @@ -3405,6 +3405,94 @@ def test_shutdown_with_live_connection(self):
"exit timeout countdown restart detected",
)

def test_add_custom_config(self):
models_base = ("savedmodel",)
models = list()
for m in models_base:
models.append(tu.get_model_name(m, np.float32, np.float32, np.float32))

# Make sure savedmodel and plan are in the status
for model_name in models:
try:
for triton_client in (
httpclient.InferenceServerClient("localhost:8000", verbose=True),
grpcclient.InferenceServerClient("localhost:8001", verbose=True),
):
self.assertTrue(triton_client.is_server_live())
self.assertTrue(triton_client.is_server_ready())
self.assertTrue(triton_client.is_model_ready(model_name, "1"))
self.assertFalse(triton_client.is_model_ready(model_name, "2"))
self.assertTrue(triton_client.is_model_ready(model_name, "3"))
except Exception as ex:
self.assertTrue(False, "unexpected error {}".format(ex))

# Add custom model configuration, which cause model to be
# re-loaded and use custom config inside configs folder, which
# means that version policy will change and only version 2 will
# be available.
for base_name, model_name in zip(models_base, models):
shutil.copyfile(
"config.pbtxt.custom." + base_name,
"models/" + model_name + "/configs/custom.pbtxt",
)

time.sleep(5) # wait for models to reload
for model_name in models:
try:
for triton_client in (
httpclient.InferenceServerClient("localhost:8000", verbose=True),
grpcclient.InferenceServerClient("localhost:8001", verbose=True),
):
self.assertTrue(triton_client.is_server_live())
self.assertTrue(triton_client.is_server_ready())
self.assertFalse(triton_client.is_model_ready(model_name, "1"))
self.assertTrue(triton_client.is_model_ready(model_name, "2"))
self.assertFalse(triton_client.is_model_ready(model_name, "3"))
except Exception as ex:
self.assertTrue(False, "unexpected error {}".format(ex))

def test_delete_custom_config(self):
models_base = ("savedmodel",)
models = list()
for m in models_base:
models.append(tu.get_model_name(m, np.float32, np.float32, np.float32))

# Make sure savedmodel and plan are in the status
for model_name in models:
try:
for triton_client in (
httpclient.InferenceServerClient("localhost:8000", verbose=True),
grpcclient.InferenceServerClient("localhost:8001", verbose=True),
):
self.assertTrue(triton_client.is_server_live())
self.assertTrue(triton_client.is_server_ready())
self.assertFalse(triton_client.is_model_ready(model_name, "1"))
self.assertTrue(triton_client.is_model_ready(model_name, "2"))
self.assertFalse(triton_client.is_model_ready(model_name, "3"))
except Exception as ex:
self.assertTrue(False, "unexpected error {}".format(ex))

# Delete custom model configuration, which cause model to be
# re-loaded and use default config, which means that version
# policy will be changed and so only version 1, 3 will be available
for model_name in models:
os.remove("models/" + model_name + "/configs/custom.pbtxt")

time.sleep(5) # wait for models to reload
for model_name in models:
try:
for triton_client in (
httpclient.InferenceServerClient("localhost:8000", verbose=True),
grpcclient.InferenceServerClient("localhost:8001", verbose=True),
):
self.assertTrue(triton_client.is_server_live())
self.assertTrue(triton_client.is_server_ready())
self.assertTrue(triton_client.is_model_ready(model_name, "1"))
self.assertFalse(triton_client.is_model_ready(model_name, "2"))
self.assertTrue(triton_client.is_model_ready(model_name, "3"))
except Exception as ex:
self.assertTrue(False, "unexpected error {}".format(ex))


if __name__ == "__main__":
unittest.main()
Loading

0 comments on commit 6d9849d

Please sign in to comment.