Can I increase request timeout? #6500

3xMike · 2023-10-31T12:38:20Z

3xMike
Oct 31, 2023

Hi,

I'm using Triton Server to start a ONNX model, that has pretty big batch. My goal is to run it, no matter how much time it will execute the batch only using CPU.
Now I'm stuck on the default Request timeout (30 secs) after which Triton aborts the request:

# ...
# Server launching, model loading
# ...
[2023-10-31T12:20:19Z DEBUG triton_cli] Request successfully start
I1031 12:20:19.938866 433 onnxruntime.cc:2672] model voicenet, instance voicenet_0, executing 1 requests
I1031 12:20:19.938898 433 onnxruntime.cc:1469] TRITONBACKEND_ModelExecute: Running voicenet_0 with 1 requests
I1031 12:20:19.938950 433 onnxruntime.cc:1641] Output Properties Unavailable. Using cpu as preferred location for output: output
2023-10-31 12:20:19.939019777 [I:onnxruntime:, sequential_executor.cc:176 Execute] Begin execution
2023-10-31 12:20:19.939039063 [I:onnxruntime:log, basic_backend.cc:457 Infer] [OpenVINO-EP] Running graph OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0
2023-10-31 12:20:19.939046322 [I:onnxruntime:log, basic_backend.cc:458 Infer] [OpenVINO-EP] In Infer
I1031 12:20:20.006100 433 server.cc:264] Waiting for in-flight requests to complete.
I1031 12:20:20.006135 433 model_lifecycle.cc:222] StopAllModels()
I1031 12:20:20.006143 433 model_lifecycle.cc:240] InflightStatus()
I1031 12:20:20.006147 433 server.cc:280] Timeout 30: Found 0 model versions that have in-flight inferences
I1031 12:20:20.006152 433 model_lifecycle.cc:381] AsyncUnload() 'voicenet'
I1031 12:20:20.006158 433 server.cc:295] All models are stopped, unloading models
I1031 12:20:20.006163 433 model_lifecycle.cc:189] LiveModelStates()
I1031 12:20:20.006169 433 server.cc:302] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I1031 12:20:20.006174 433 server.cc:309] voicenet v1000: UNLOADING
I1031 12:20:21.006269 433 model_lifecycle.cc:189] LiveModelStates()
I1031 12:20:21.006327 433 server.cc:302] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I1031 12:20:21.006343 433 server.cc:309] voicenet v1000: UNLOADING
I1031 12:20:22.006435 433 model_lifecycle.cc:189] LiveModelStates()
I1031 12:20:22.006493 433 server.cc:302] Timeout 28: Found 1 live models and 0 in-flight non-inference requests
I1031 12:20:22.006505 433 server.cc:309] voicenet v1000: UNLOADING
# ...
I1031 12:20:49.010215 433 server.cc:302] Timeout 1: Found 1 live models and 0 in-flight non-inference requests
I1031 12:20:49.010221 433 server.cc:309] voicenet v1000: UNLOADING
I1031 12:20:50.010318 433 model_lifecycle.cc:189] LiveModelStates()
I1031 12:20:50.010367 433 server.cc:302] Timeout 0: Found 1 live models and 0 in-flight non-inference requests
I1031 12:20:50.010373 433 server.cc:309] voicenet v1000: UNLOADING
double free or corruption (!prev)

I've only found that doc: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/extension_schedule_policy.html
But I don't need a dynamic batcher, is there any argument of model_configuration for a default batcher?
My model_configuration:

name: "voicenet"
platform: "onnxruntime_onnx"
input [
  {
    data_type: TYPE_FP32
    name: "input"
    dims: [512, 160000]
  }
]
output [
  {
    data_type: TYPE_FP32
    name: "output"
    dims: [512, 512]
  }
]
instance_group [
  {
    count: 2
    kind: KIND_CPU
  }
]
optimization { execution_accelerators {
  cpu_execution_accelerator : [ {
    name : "openvino"
  } ]
}}

Triton server container version: "22.10"

UPD: Adding lines below to model config doesn't change timeout (it's still 30 secs)

dynamic_batching {
  default_queue_policy {
    default_timeout_microseconds: 5000000
  }
}

3xMike · 2023-10-31T13:29:15Z

3xMike
Oct 31, 2023
Author

Solved. Actually, argument, responsible for this logic is exit_timeout server parameter (the last one from here:)

I1031 12:54:39.176874 1466 tritonserver.cc:2264] 
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                               |
| server_version                   | 2.27.0                                                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging |
| model_repository_path[0]         | /tmp/onnx                                                                                                                                                                                            |
| model_control_mode               | MODE_POLL                                                                                                                                                                                            |
| startup_models_0                 | voicenet                                                                                                                                                                                             |
| strict_model_config              | 0                                                                                                                                                                                                    |
| rate_limit                       | OFF                                                                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                            |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                             |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                                                                                                             |
| cuda_memory_pool_byte_size{2}    | 67108864                                                                                                                                                                                             |
| cuda_memory_pool_byte_size{3}    | 67108864                                                                                                                                                                                             |
| cuda_memory_pool_byte_size{4}    | 67108864                                                                                                                                                                                             |
| cuda_memory_pool_byte_size{5}    | 67108864                                                                                                                                                                                             |
| response_cache_byte_size         | 0                                                                                                                                                                                                    |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                                                   |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 reply

MatinHz Dec 6, 2023

You can adjust the exit-timeout-secs parameter when you are running the server:
tritonserver --model-repository=/models --exit-timeout-secs=6000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I increase request timeout? #6500

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Can I increase request timeout? #6500

3xMike Oct 31, 2023

Replies: 1 comment · 1 reply

3xMike Oct 31, 2023 Author

MatinHz Dec 6, 2023

3xMike
Oct 31, 2023

Replies: 1 comment 1 reply

3xMike
Oct 31, 2023
Author