Multiple Model Configurations (#7185)

Add `--mode-config-name` option when starting Triton server. Allow users to create multiple configurations and select a custom configuration other than the default `model/config.pbtxt`.
triton-inference-server · May 16, 2024 · 6d9849d · 6d9849d
1 parent 58d3396
commit 6d9849d
Show file tree

Hide file tree

Showing 7 changed files with 414 additions and 6 deletions.
diff --git a/docs/user_guide/model_configuration.md b/docs/user_guide/model_configuration.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -321,6 +321,84 @@ configuration](#minimal-model-configuration). You must still provide
 the optional portions of the model configuration by editing the
 config.pbtxt file.
 
+## Custom Model Configuration
+
+Sometimes when multiple devices running Triton instances that share one
+model repository, it is necessary to have models configured differently
+on each platform in order to achieve the best performance. Triton allows
+users to pick the custom model configuration name by setting `--model-config-name` option.
+
+For example, when running `./tritonserver --model-repository=</path/to/model/repository> --model-config-name=h100`,
+the server will search the custom configuration file `h100.pbtxt` under
+`/path/to/model/repository/<model-name>/configs` directory for each model
+that is loaded. If `h100.pbtxt` exists, it will be used as the configuration
+for this model. Otherwise, the default configuration `/path/to/model/repository/<model-name>/config.pbtxt`
+or [auto-generated model configuration](#auto-generated-model-configuration)
+will be selected based on the settings.
+
+Custom model configuration also works with `Explicit` and `Poll` model
+control modes. Users may delete or add new custom configurations and the
+server will pick the configuration file for each loaded model dynamically.
+
+Note: custom model configuration name should not contain any space character.
+
+Example 1: --model-config-name=h100
+```
+.
+└── model_repository/
+    ├── model_a/
+    │   ├── configs/
+    │   │   ├── v100.pbtxt
+    │   │   └── **h100.pbtxt**
+    │   └── config.pbtxt
+    ├── model_b/
+    │   ├── configs/
+    │   │   └── v100.pbtxt
+    │   └── **config.pbtxt**
+    └── model_c/
+        ├── configs/
+        │   └── config.pbtxt
+        └── **config.pbtxt**
+```
+
+Example 2: --model-config-name=config
+```
+.
+└── model_repository/
+    ├── model_a/
+    │   ├── configs/
+    │   │   ├── v100.pbtxt
+    │   │   └── h100.pbtxt
+    │   └── **config.pbtxt**
+    ├── model_b/
+    │   ├── configs/
+    │   │   └── v100.pbtxt
+    │   └── **config.pbtxt**
+    └── model_c/
+        ├── configs/
+        │   └── **config.pbtxt**
+        └── config.pbtxt
+```
+
+Example 3: --model-config-name not set
+```
+.
+└── model_repository/
+    ├── model_a/
+    │   ├── configs/
+    │   │   ├── v100.pbtxt
+    │   │   └── h100.pbtxt
+    │   └── **config.pbtxt**
+    ├── model_b/
+    │   ├── configs/
+    │   │   └── v100.pbtxt
+    │   └── **config.pbtxt**
+    └── model_c/
+        ├── configs/
+        │   └── config.pbtxt
+        └── **config.pbtxt**
+```
+
 ### Default Max Batch Size and Dynamic Batcher
 
 When a model is using the auto-complete feature, a default maximum

diff --git a/docs/user_guide/model_repository.md b/docs/user_guide/model_repository.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -57,6 +57,8 @@ The corresponding repository layout must be:
     <model-name>/
       [config.pbtxt]
       [<output-labels-file> ...]
+      [configs]/
+        [<custom-config-file> ...]
       <version>/
         <model-definition-file>
       <version>/
@@ -65,6 +67,8 @@ The corresponding repository layout must be:
     <model-name>/
       [config.pbtxt]
       [<output-labels-file> ...]
+      [configs]/
+        [<custom-config-file> ...]
       <version>/
         <model-definition-file>
       <version>/
@@ -83,10 +87,15 @@ config.pbtxt is required while for others it is optional. See
 Configuration](model_configuration.md#auto-generated-model-configuration)
 for more information.
 
+Each <model-name> directory may include an optional sub-directory configs.
+Within the configs directory there must be zero or more <custom-config-file>
+with .pbtxt file extension. For more information about how the custom model
+configuration is handled by Triton see [Custom Model Configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#custom-model-configuration).
+
 Each <model-name> directory must have at least one numeric
-sub-directory representing a version of the model.  For more
+sub-directory representing a version of the model. For more
 information about how the model versions are handled by Triton see
-[Model Versions](#model-versions).  Each model is executed by a
+[Model Versions](#model-versions). Each model is executed by a
 specific
 [backend](https://github.com/triton-inference-server/backend/blob/main/README.md).
 Within each version sub-directory there must be the files required by

diff --git a/qa/L0_custom_model_config/test.sh b/qa/L0_custom_model_config/test.sh
@@ -0,0 +1,145 @@
+#!/bin/bash
+# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+REPO_VERSION=${NVIDIA_TRITON_SERVER_VERSION}
+if [ "$#" -ge 1 ]; then
+    REPO_VERSION=$1
+fi
+if [ -z "$REPO_VERSION" ]; then
+    echo -e "Repository version must be specified"
+    echo -e "\n***\n*** Test Failed\n***"
+    exit 1
+fi
+if [ ! -z "$TEST_REPO_ARCH" ]; then
+    REPO_VERSION=${REPO_VERSION}_${TEST_REPO_ARCH}
+fi
+
+export CUDA_VISIBLE_DEVICES=0
+
+DATADIR="/data/inferenceserver/${REPO_VERSION}"
+CLIENT_LOG="./client.log"
+SERVER_LOG="./inference_server.log"
+
+SERVER=/opt/tritonserver/bin/tritonserver
+source ../common/util.sh
+
+RET=0
+rm -fr *.log
+
+rm -fr models && mkdir models
+cp -r $DATADIR/qa_model_repository/savedmodel_nobatch_float32_float32_float32 models/.
+mkdir models/savedmodel_nobatch_float32_float32_float32/configs
+
+test_custom_config()
+{
+    VERSION=$@
+
+    run_server
+    if [ "$SERVER_PID" == "0" ]; then
+        echo -e "\n***\n*** Failed to start $SERVER\n***"
+        cat $SERVER_LOG
+        exit 1
+    fi
+
+    set +e
+    code=`curl -s -w %{http_code} -o ./curl.out localhost:8000/v2/models/savedmodel_nobatch_float32_float32_float32/config`
+    set -e
+    if [ "$code" != "200" ]; then
+        cat $out.out
+        echo -e "\n***\n*** Test Failed to GET model configuration\n***"
+        RET=1
+    fi
+
+    matches=`grep -o "\"version_policy\":{\"specific\":{\"versions\":\[$VERSION\]}}" curl.out | wc -l`
+    if [ $matches -ne 1 ]; then
+        cat curl.out
+        echo -e "\n***\n*** Expected 1 version_policy:specific:versions, got $matches\n***"
+        RET=1
+    fi
+
+    kill $SERVER_PID
+    wait $SERVER_PID
+}
+
+# Prepare the file structure
+VERSION_DEFAULT="1,3"
+VERSION_H100="1"
+VERSION_V100="2"
+VERSION_CUSTOM="3"
+
+# Distinguish configs with different model versions
+(cd models/savedmodel_nobatch_float32_float32_float32 && \
+     sed -i "s/^version_policy:.*/version_policy: { specific: { versions: [$VERSION_DEFAULT] }}/" config.pbtxt)
+(cd models/savedmodel_nobatch_float32_float32_float32 && \
+     cp config.pbtxt configs/h100.pbtxt && \
+     sed -i "s/^version_policy:.*/version_policy: { specific: { versions: [$VERSION_H100] }}/" configs/h100.pbtxt)
+(cd models/savedmodel_nobatch_float32_float32_float32 && \
+     cp config.pbtxt configs/v100.pbtxt && \
+     sed -i "s/^version_policy:.*/version_policy: { specific: { versions: [$VERSION_V100] }}/" configs/v100.pbtxt)
+(cd models/savedmodel_nobatch_float32_float32_float32 && \
+     cp config.pbtxt configs/config.pbtxt && \
+     sed -i "s/^version_policy:.*/version_policy: { specific: { versions: [$VERSION_CUSTOM] }}/" configs/config.pbtxt)
+
+# Test default model config
+SERVER_ARGS="--model-repository=`pwd`/models"
+test_custom_config $VERSION_DEFAULT
+
+# Test model-config-name=h100
+SERVER_ARGS="--model-repository=`pwd`/models --model-config-name=h100"
+test_custom_config $VERSION_H100
+
+# Test model-config-name=v100
+SERVER_ARGS="--model-repository=`pwd`/models --model-config-name=v100"
+test_custom_config $VERSION_V100
+
+# Test model-config-name=config
+SERVER_ARGS="--model-repository=`pwd`/models --model-config-name=config"
+test_custom_config $VERSION_CUSTOM
+
+# Test model-config-name=h200. Expect fall back to default config since h200 config does not exist.
+SERVER_ARGS="--model-repository=`pwd`/models --model-config-name=h200"
+test_custom_config $VERSION_DEFAULT
+
+# Test model-config-name=
+SERVER_ARGS="--model-repository=`pwd`/models --model-config-name="
+run_server
+if [ "$SERVER_PID" != "0" ]; then
+    echo -e "\n***\n*** Failed: $SERVER started successfully when it was expected to fail\n***"
+    cat $SERVER_LOG
+    RET=1
+
+    kill $SERVER_PID
+    wait $SERVER_PID
+fi
+
+if [ $RET -eq 0 ]; then
+    echo -e "\n***\n*** Test Passed\n***"
+else
+    echo -e "\n***\n*** Test Failed\n***"
+fi
+
+exit $RET
diff --git a/qa/L0_lifecycle/lifecycle_test.py b/qa/L0_lifecycle/lifecycle_test.py
@@ -1259,7 +1259,7 @@ def test_dynamic_file_delete(self):
                     model_version=1,
                 )
                 self.assertTrue(
-                    False, "expected error for unavailable model " + graphdef_name
+                    False, "expected error for unavailable model " + model_name
                 )
             except Exception as ex:
                 self.assertIn("Request for unknown model", ex.message())
@@ -3405,6 +3405,94 @@ def test_shutdown_with_live_connection(self):
             "exit timeout countdown restart detected",
         )
 
+    def test_add_custom_config(self):
+        models_base = ("savedmodel",)
+        models = list()
+        for m in models_base:
+            models.append(tu.get_model_name(m, np.float32, np.float32, np.float32))
+
+        # Make sure savedmodel and plan are in the status
+        for model_name in models:
+            try:
+                for triton_client in (
+                    httpclient.InferenceServerClient("localhost:8000", verbose=True),
+                    grpcclient.InferenceServerClient("localhost:8001", verbose=True),
+                ):
+                    self.assertTrue(triton_client.is_server_live())
+                    self.assertTrue(triton_client.is_server_ready())
+                    self.assertTrue(triton_client.is_model_ready(model_name, "1"))
+                    self.assertFalse(triton_client.is_model_ready(model_name, "2"))
+                    self.assertTrue(triton_client.is_model_ready(model_name, "3"))
+            except Exception as ex:
+                self.assertTrue(False, "unexpected error {}".format(ex))
+
+        # Add custom model configuration, which cause model to be
+        # re-loaded and use custom config inside configs folder, which
+        # means that version policy will change and only version 2 will
+        # be available.
+        for base_name, model_name in zip(models_base, models):
+            shutil.copyfile(
+                "config.pbtxt.custom." + base_name,
+                "models/" + model_name + "/configs/custom.pbtxt",
+            )
+
+        time.sleep(5)  # wait for models to reload
+        for model_name in models:
+            try:
+                for triton_client in (
+                    httpclient.InferenceServerClient("localhost:8000", verbose=True),
+                    grpcclient.InferenceServerClient("localhost:8001", verbose=True),
+                ):
+                    self.assertTrue(triton_client.is_server_live())
+                    self.assertTrue(triton_client.is_server_ready())
+                    self.assertFalse(triton_client.is_model_ready(model_name, "1"))
+                    self.assertTrue(triton_client.is_model_ready(model_name, "2"))
+                    self.assertFalse(triton_client.is_model_ready(model_name, "3"))
+            except Exception as ex:
+                self.assertTrue(False, "unexpected error {}".format(ex))
+
+    def test_delete_custom_config(self):
+        models_base = ("savedmodel",)
+        models = list()
+        for m in models_base:
+            models.append(tu.get_model_name(m, np.float32, np.float32, np.float32))
+
+        # Make sure savedmodel and plan are in the status
+        for model_name in models:
+            try:
+                for triton_client in (
+                    httpclient.InferenceServerClient("localhost:8000", verbose=True),
+                    grpcclient.InferenceServerClient("localhost:8001", verbose=True),
+                ):
+                    self.assertTrue(triton_client.is_server_live())
+                    self.assertTrue(triton_client.is_server_ready())
+                    self.assertFalse(triton_client.is_model_ready(model_name, "1"))
+                    self.assertTrue(triton_client.is_model_ready(model_name, "2"))
+                    self.assertFalse(triton_client.is_model_ready(model_name, "3"))
+            except Exception as ex:
+                self.assertTrue(False, "unexpected error {}".format(ex))
+
+        # Delete custom model configuration, which cause model to be
+        # re-loaded and use default config, which means that version
+        # policy will be changed and so only version 1, 3 will be available
+        for model_name in models:
+            os.remove("models/" + model_name + "/configs/custom.pbtxt")
+
+        time.sleep(5)  # wait for models to reload
+        for model_name in models:
+            try:
+                for triton_client in (
+                    httpclient.InferenceServerClient("localhost:8000", verbose=True),
+                    grpcclient.InferenceServerClient("localhost:8001", verbose=True),
+                ):
+                    self.assertTrue(triton_client.is_server_live())
+                    self.assertTrue(triton_client.is_server_ready())
+                    self.assertTrue(triton_client.is_model_ready(model_name, "1"))
+                    self.assertFalse(triton_client.is_model_ready(model_name, "2"))
+                    self.assertTrue(triton_client.is_model_ready(model_name, "3"))
+            except Exception as ex:
+                self.assertTrue(False, "unexpected error {}".format(ex))
+
 
 if __name__ == "__main__":
     unittest.main()