Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secure deployment considerations guide #6533

Merged
merged 36 commits into from
Nov 10, 2023
Merged
Changes from 29 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
bce3200
draft document
nnshah1 Nov 7, 2023
7b03d0a
updates
nnshah1 Nov 7, 2023
1825eff
updates
nnshah1 Nov 8, 2023
e2f0807
updated
nnshah1 Nov 8, 2023
a4f998f
updates
nnshah1 Nov 8, 2023
a68fc68
updates
nnshah1 Nov 8, 2023
553a138
updates
nnshah1 Nov 8, 2023
1d5bc30
updates
nnshah1 Nov 8, 2023
a8850ca
updates
nnshah1 Nov 8, 2023
355fbe4
updates
nnshah1 Nov 8, 2023
af8201d
updates
nnshah1 Nov 8, 2023
eef0980
updates
nnshah1 Nov 8, 2023
a7f1b5c
updates
nnshah1 Nov 8, 2023
d82e738
updates
nnshah1 Nov 8, 2023
3e12352
updates
nnshah1 Nov 8, 2023
64b3a83
updates
nnshah1 Nov 8, 2023
4e754c2
updates
nnshah1 Nov 8, 2023
1a605f2
updates
nnshah1 Nov 8, 2023
d6b283c
updates
nnshah1 Nov 8, 2023
0b33246
updates
nnshah1 Nov 8, 2023
e42d18a
updates
nnshah1 Nov 8, 2023
1cc6123
updates
nnshah1 Nov 8, 2023
d8a86f4
updates
nnshah1 Nov 8, 2023
39fdad8
updates
nnshah1 Nov 8, 2023
2247487
update
nnshah1 Nov 8, 2023
a9d2b8c
Merge branch 'main' into nnshah1-deployment-guide
nnshah1 Nov 8, 2023
449976f
updates
nnshah1 Nov 8, 2023
e0c20fd
Merge branch 'nnshah1-deployment-guide' of https://github.com/triton-…
nnshah1 Nov 8, 2023
03ff098
updates
nnshah1 Nov 8, 2023
a9f9b0f
Update docs/customization_guide/deploy.md
nnshah1 Nov 8, 2023
b054c39
Update docs/customization_guide/deploy.md
nnshah1 Nov 8, 2023
7828ceb
fixing typos
nnshah1 Nov 9, 2023
db6bb39
Merge branch 'nnshah1-deployment-guide' of https://github.com/triton-…
nnshah1 Nov 9, 2023
0d8869b
updated with clearer warnings
nnshah1 Nov 10, 2023
b36287b
updates to readme and toc
nnshah1 Nov 10, 2023
8a03269
Merge branch 'main' into nnshah1-deployment-guide
nnshah1 Nov 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
269 changes: 269 additions & 0 deletions docs/customization_guide/deploy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
<!--
# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

# Secure Deployment Considerations

The Triton Inference Server project is designed for flexibility and
allows developers to create and deploy inferencing solutions in a
variety of ways. Developers can deploy Triton as an http server, a
grpc server, a server supporting both, or embed a Triton server into
their own application. Developers can deploy Triton locally or in the
cloud, within a kubernetes cluster, behind an API gateway or as a
standalone process. This guide is intended to provide some key points
nnshah1 marked this conversation as resolved.
Show resolved Hide resolved
and best practices that users deploying Triton based solutions should
consider.

| [Deploying Behind a Secure Gateway or Proxy](#deploying-behind-a-secure-proxy-or-gateway) | [Running with Least Privilege](#running-with-least-privilege) |

> [!IMPORTANT]
> Ultimately the security of a solution based on Triton
> is the responsibility of the developer building and deploying that
> solution. When deploying in production settings please have security
> experts review any potential risks and threats.

> [!WARNING]
> Allowing dynamic updates to the model repository can lead
> to arbitrary code execution. Model repository access control is
> critical in production deployments. Unless required for operation, recommended
nnshah1 marked this conversation as resolved.
Show resolved Hide resolved
> to disable dynamic updates. If required, please ensure only trusted entities
> can add or remove models from a model repository.


## Deploying Behind a Secure Proxy or Gateway

The Triton Inference Server is designed primarily as a microservice to
be deployed as part of a larger solution within an application
framework or service mesh.

In such deployments it is typical to utilize dedicated gateway or
proxy servers to handle authorization, access control, resource
management, encryption, load balancing, redundancy and many other
security and availability features.

The full design of such systems is outside the scope of this
deployment guide but in such scenarios dedicated ingress controllers
handle access from outside the trusted network while Triton Inference
Server handles only trusted, validated requests.

In such scenarios Triton Inference Server is not exposed directly to
an untrusted network.

### References on Secure Deployments

In the following references, Triton Inference Server would be deployed
as an "Application" or "Service" within the trusted internal network.

* [https://www.nginx.com/blog/architecting-zero-trust-security-for-kubernetes-apps-with-nginx/]
* [https://istio.io/latest/docs/concepts/security/]
* [https://konghq.com/blog/enterprise/envoy-service-mesh]
* [https://www.solo.io/topics/envoy-proxy/]

## Running with Least Privilege

The security principle of least privilege advocates that a process be
granted the minimum permissions required to do its job.

For an inference solution based on Triton Inference Server there are a
number of ways to reduce security risks by limiting the permissions
and capabilities of the server to the minimum required for correct
operation.

### 1. Follow Best Practices for Launching Docker Containers

When Triton is deployed as a containerized service, standard docker
security practices apply. This includes limiting the resources that a
container has access to as well as limiting network access to the
container. https://docs.docker.com/engine/security/

### 2. Run as a Non-Root User

Triton's pre-built containers contain a non-root user that can be used
to launch the tritonserver application with limited permissions. This
user, `triton-server` is created with `user id 1000`. When launching
the container using docker the user can be set with the `--user`
command line option.

##### Example Launch Command

```
docker run --rm --user triton-server -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:YY.MM-py3 tritonserver --model-repository=/models
```

### 3. Restrict or Disable Access to Protocols and APIs

The pre-built Triton Inference Serrver application enables a full set
of features including health checks, server metadata, inference apis,
shared memory apis, model and model repository configuration,
statistics, tracing and logging. Care should be taken to only expose
those capabilities that are required for your solution.

#### Disabling Features at Compile Time

When building a custom inference server application features can be
selectively enabled or disabled using the `build.py` script. As an
example a developer can use the flags `--endpoint http` and
`--endpoint grpc` to compile support for `http`, `grpc` or
both. Support for individual backends can be enabled as well. For more
details please see [documentation](build.md) on building a custom
inference server application.

#### Disabling / Restricting Features at Run Time

The `tritonserver` application provides a number of command line
options to enable and disable features when launched. For a full list
of options please see `tritonserver --help`. The following subset are
described here with basic recommendations.

##### `--exit-on-error <boolean>, default True`

Exits the inference server if any error occurs during
initialization. Recommended to set to `True` to catch any
unanticipated errors.

##### `--disable-auto-complete-config, default enabled`

Disables backends from autocompleting model configuration. If not
required for your solution recommended to disable to ensure model
configurations are defined statically.

##### `--strict-readiness <boolean>, default True`

If set to true `/v2/health/ready` will only report ready when all
selected models are loaded. Recommended to set to `True` to provide a
signal to other services and orchestration frameworks when full
initialization is complete and server is healthy.

##### `--model-control-mode <string>, default "none"`

Specifies the mode for model management.

> [!WARNING]
> Allowing dynamic updates to the model repository can lead
> to arbitrary code execution. Model repository access control is
> critical in production deployments. Unless required for operation, recommended
nnshah1 marked this conversation as resolved.
Show resolved Hide resolved
> to disable dynamic updates. If required, please ensure only trusted entities
> can add or remove models from a model repository.

Options:

* `none`- Models are loaded at start up and can not be modified.
* `poll`- Server process will poll the model repository for changes.
* `explicit` - Models can be loaded and unloaded via the model control APIs.

Recommended to set to `none` unless dynamic updates are required. If
dynamic updates are required care must be taken to control access to
the model repository files and load and unload APIs.

##### `--allow-http <boolean>, default True`

Enable HTTP request handling. Recommended to set to `False` if not required.

##### `--allow-grpc <boolean>, default True`

Enable gRPC request handling. Recommended to set to `False` if not required.

##### `--grpc-use-ssl <boolean> default False`

Use SSL authentication for gRPC requests. Recommended to set to `True` if service is not protected by a gateway or proxy.

##### `--grpc-use-ssl-mutual <boolean> default False`

Use mutual SSL authentication for gRPC requests. Recommended to set to `True` if service is not protected by a gateway or proxy.

##### `--grpc-restricted-protocol <<string>:<string>=<string>>`

Restrict access to specific gRPC protocol categories to users with
specific key, value pair shared secret. See
[limit-endpoint-access](inference_protocols.md#limit-endpoint-access-beta)
for more information.

> [!Note]
> Restricting access can be used to limit exposure to model
> control APIs to trusted users.

##### `--http-restricted-api <<string>:<string>=<string>>`

Restrict access to specific HTTP API categories to users with
specific key, value pair shared secret. See
[limit-endpoint-access](inference_protocols.md#limit-endpoint-access-beta)
for more information.

> [!Note]
> Restricting access can be used to limit exposure to model
> control APIs to trusted users.

##### `--allow-sagemaker <boolean> default False`

Enable Sagemaker request handling. Recommended to set to `False` unless required.

##### `--allow-vertex-ai <boolean> default depends on environment variable`

Enable Vertex AI request handling. Default is `True` if
`AIP_MODE=PREDICTION`, `False` otherwise. Recommended to set to
`False` unless required.

##### `--allow-metrics <boolean> default True`

Allow server to publish prometheus style metrics. Recommended to set
to `False` if not required to avoid capturing or exposing any sensitive information.

#### `--trace-config level=<string> default "off"`

Tracing mode. Trace mode supports `triton` and `opentelemetry`. Unless required `--trace-config level=off` should be set to avoid capturing or exposing any sensitive information.


##### `backend-directory <string> default /opt/tritonserver/backends`

Directory where backend shared libraries are found.

> [!Warning]
> Access to add or remove files from the backends directory
> must be access controlled. Adding untrusted files
> can lead to arbitrarty code execution.

##### `repoagent-directory <string> default /opt/tritonserver/repoagents`
Directory where repository agent shared libraries are found.

> [!Warning]
> Access to add or remove files from the repoagents directory
> must be access controlled. Adding untrusted files
> can lead to arbitrarty code execution.

##### `cache-directory <string> default /opt/tritonserver/caches`

Directory where cache shared libraries are found.

> [!Warning]
> Access to add or remove files from the repoagents directory
> must be access controlled. Adding untrusted files
> can lead to arbitrarty code execution.





Loading