diff --git a/README.md b/README.md index d29be6b94a..b62e049b3d 100644 --- a/README.md +++ b/README.md @@ -141,6 +141,7 @@ images. - [Build Triton Inference Server for Windows 10](docs/customization_guide/build.md#building-for-windows-10) - Examples for deploying Triton Inference Server with Kubernetes and Helm on [GCP](deploy/gcp/README.md), [AWS](deploy/aws/README.md), and [NVIDIA FleetCommand](deploy/fleetcommand/README.md) +- [Secure Deployment Considerations](docs/customization_guide/deploy.md) ### Using Triton diff --git a/docs/contents.md b/docs/contents.md index 986cb0c9b2..d48007b621 100644 --- a/docs/contents.md +++ b/docs/contents.md @@ -55,6 +55,7 @@ user_guide/metrics user_guide/trace user_guide/jetson user_guide/v1_to_v2 +customization_guide/deploy ``` ```{toctree} @@ -97,4 +98,4 @@ customization_guide/test examples/jetson/README examples/jetson/concurrency_and_dynamic_batching/README -``` \ No newline at end of file +``` diff --git a/docs/customization_guide/deploy.md b/docs/customization_guide/deploy.md new file mode 100644 index 0000000000..112a2cebcf --- /dev/null +++ b/docs/customization_guide/deploy.md @@ -0,0 +1,279 @@ + + +# Secure Deployment Considerations + +The Triton Inference Server project is designed for flexibility and +allows developers to create and deploy inferencing solutions in a +variety of ways. Developers can deploy Triton as an http server, a +grpc server, a server supporting both, or embed a Triton server into +their own application. Developers can deploy Triton locally or in the +cloud, within a Kubernetes cluster behind an API gateway or as a +standalone process. This guide is intended to provide some key points +and best practices that users deploying Triton based solutions should +consider. + +| [Deploying Behind a Secure Gateway or Proxy](#deploying-behind-a-secure-proxy-or-gateway) | [Running with Least Privilege](#running-with-least-privilege) | + +> [!IMPORTANT] +> Ultimately the security of a solution based on Triton +> is the responsibility of the developer building and deploying that +> solution. When deploying in production settings please have security +> experts review any potential risks and threats. + +> [!WARNING] +> Dynamic updates to model repositories are disabled by +> default. Enabling dynamic updates to model repositories either +> through model loading APIs or through directory polling can lead to +> arbitrary code execution. Model repository access control is +> critical in production deployments. If dynamic updates are required, +> ensure only trusted entities have access to model loading APIs and +> model repository directories. + +## Deploying Behind a Secure Proxy or Gateway + +The Triton Inference Server is designed primarily as a microservice to +be deployed as part of a larger solution within an application +framework or service mesh. + +In such deployments it is typical to utilize dedicated gateway or +proxy servers to handle authorization, access control, resource +management, encryption, load balancing, redundancy and many other +security and availability features. + +The full design of such systems is outside the scope of this +deployment guide but in such scenarios dedicated ingress controllers +handle access from outside the trusted network while Triton Inference +Server handles only trusted, validated requests. + +In such scenarios Triton Inference Server is not exposed directly to +an untrusted network. + +### References on Secure Deployments + +In the following references, Triton Inference Server would be deployed +as an "Application" or "Service" within the trusted internal network. + +* [https://www.nginx.com/blog/architecting-zero-trust-security-for-kubernetes-apps-with-nginx/] +* [https://istio.io/latest/docs/concepts/security/] +* [https://konghq.com/blog/enterprise/envoy-service-mesh] +* [https://www.solo.io/topics/envoy-proxy/] + +## Running with Least Privilege + + The security principle of least privilege advocates that a process be + granted the minimum permissions required to do its job. + + For an inference solution based on Triton Inference Server there are a + number of ways to reduce security risks by limiting the permissions + and capabilities of the server to the minimum required for correct + operation. + +### 1. Follow Best Practices for Securing Kubernetes Deployments + + When deploying Triton within a Kubernetes pod ensure that it is + running with a service account with the fewest possible + permissions. Ensure that you have configured [role based access + control](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) + to limit access to resources and capabilities as required by your + application. + +### 2. Follow Best Practices for Launching Standalone Docker Containers + + When Triton is deployed as a containerized service, standard docker + security practices apply. This includes limiting the resources that a + container has access to as well as limiting network access to the + container. https://docs.docker.com/engine/security/ + +### 3. Run as a Non-Root User + + Triton's pre-built containers contain a non-root user that can be used + to launch the tritonserver application with limited permissions. This + user, `triton-server` is created with `user id 1000`. When launching + the container using docker the user can be set with the `--user` + command line option. + +##### Example Launch Command + + ``` + docker run --rm --user triton-server -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:YY.MM-py3 tritonserver --model-repository=/models + ``` + +### 4. Restrict or Disable Access to Protocols and APIs + +The pre-built Triton Inference Serrver application enables a full set +of features including health checks, server metadata, inference apis, +shared memory apis, model and model repository configuration, +statistics, tracing and logging. Care should be taken to only expose +those capabilities that are required for your solution. + +#### Disabling Features at Compile Time + +When building a custom inference server application features can be +selectively enabled or disabled using the `build.py` script. As an +example a developer can use the flags `--endpoint http` and +`--endpoint grpc` to compile support for `http`, `grpc` or +both. Support for individual backends can be enabled as well. For more +details please see [documentation](build.md) on building a custom +inference server application. + +#### Disabling / Restricting Features at Run Time + +The `tritonserver` application provides a number of command line +options to enable and disable features when launched. For a full list +of options please see `tritonserver --help`. The following subset are +described here with basic recommendations. + +##### `--exit-on-error , default True` + +Exits the inference server if any error occurs during +initialization. Recommended to set to `True` to catch any +unanticipated errors. + +##### `--disable-auto-complete-config, default enabled` + +Disables backends from autocompleting model configuration. If not +required for your solution recommended to disable to ensure model +configurations are defined statically. + +##### `--strict-readiness , default True` + +If set to true `/v2/health/ready` will only report ready when all +selected models are loaded. Recommended to set to `True` to provide a +signal to other services and orchestration frameworks when full +initialization is complete and server is healthy. + +##### `--model-control-mode , default "none"` + +Specifies the mode for model management. + +> [!WARNING] +> Allowing dynamic updates to the model repository can lead +> to arbitrary code execution. Model repository access control is +> critical in production deployments. Unless required for operation, it's recommended +> to disable dynamic updates. If required, please ensure only trusted entities +> can add or remove models from a model repository. + +Options: + + * `none`- Models are loaded at start up and can not be modified. + * `poll`- Server process will poll the model repository for changes. + * `explicit` - Models can be loaded and unloaded via the model control APIs. + +Recommended to set to `none` unless dynamic updates are required. If +dynamic updates are required care must be taken to control access to +the model repository files and load and unload APIs. + +##### `--allow-http , default True` + +Enable HTTP request handling. Recommended to set to `False` if not required. + +##### `--allow-grpc , default True` + +Enable gRPC request handling. Recommended to set to `False` if not required. + +##### `--grpc-use-ssl default False` + +Use SSL authentication for gRPC requests. Recommended to set to `True` if service is not protected by a gateway or proxy. + +##### `--grpc-use-ssl-mutual default False` + +Use mutual SSL authentication for gRPC requests. Recommended to set to `True` if service is not protected by a gateway or proxy. + +##### `--grpc-restricted-protocol <:=>` + +Restrict access to specific gRPC protocol categories to users with +specific key, value pair shared secret. See +[limit-endpoint-access](inference_protocols.md#limit-endpoint-access-beta) +for more information. + +> [!Note] +> Restricting access can be used to limit exposure to model +> control APIs to trusted users. + +##### `--http-restricted-api <:=>` + +Restrict access to specific HTTP API categories to users with +specific key, value pair shared secret. See +[limit-endpoint-access](inference_protocols.md#limit-endpoint-access-beta) +for more information. + +> [!Note] +> Restricting access can be used to limit exposure to model +> control APIs to trusted users. + +##### `--allow-sagemaker default False` + +Enable Sagemaker request handling. Recommended to set to `False` unless required. + +##### `--allow-vertex-ai default depends on environment variable` + +Enable Vertex AI request handling. Default is `True` if +`AIP_MODE=PREDICTION`, `False` otherwise. Recommended to set to +`False` unless required. + +##### `--allow-metrics default True` + +Allow server to publish prometheus style metrics. Recommended to set +to `False` if not required to avoid capturing or exposing any sensitive information. + +#### `--trace-config level= default "off"` + +Tracing mode. Trace mode supports `triton` and `opentelemetry`. Unless required `--trace-config level=off` should be set to avoid capturing or exposing any sensitive information. + + +##### `backend-directory default /opt/tritonserver/backends` + +Directory where backend shared libraries are found. + +> [!Warning] +> Access to add or remove files from the backend directory +> must be access controlled. Adding untrusted files +> can lead to arbitrarty code execution. + +##### `repoagent-directory default /opt/tritonserver/repoagents` +Directory where repository agent shared libraries are found. + +> [!Warning] +> Access to add or remove files from the repoagent directory +> must be access controlled. Adding untrusted files +> can lead to arbitrarty code execution. + +##### `cache-directory default /opt/tritonserver/caches` + +Directory where cache shared libraries are found. + +> [!Warning] +> Access to add or remove files from the cache directory +> must be access controlled. Adding untrusted files +> can lead to arbitrarty code execution. + + + + +