You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Batching is the most common parameter in AI benchmarking, and it applies to virtually every model.
We currently support batching via the --script-args argument, which allows a batching parameter to be sent to the input script and therefore the model. We also support --script-args="--batch_size N" as the "official" semantics for passing batch size to a turnkey model.
However, we have a major flaw: batch size is never reflected in our results. We report throughput as "invocations per second", not "inferences per second". The latter would be far more useful.
To truly report "inferences per second" we need to somehow parse the batch size and then pass it into the benchmarking software, so that we can report inferences_per_second = invocations_per_second * batch_size
There may not be any perfect way to solve this, but we should still do something. Some issues with potential solutions:
Models/applications that were not created by the TurnkeyML maintainers may use other arg names for batching (e.g., --batch, --batching, etc.)
Batch size is usually the outer dimension of the input tensor, but not always (e.g., in LSTM it is the second dimension)
Batch size may be hardcoded in the application (not configurable as a script arg at all)
A bulletproof (if verbose) solution could be like this:
batch_size is a reserved --script-arg name that indicates the batch size, and will be used in IPS computations
A new CLI arg --batch-arg-name can override the reserved term batch_size to some other name such as batching in corner cases where the model/app developer has named their arg something else.
a new CLI arg --batch-size=N can set the batch size in both the inputs (by setting --script-args="<batch_arg_name>=N) as well as in the IPS calculation. This is needed in the case where batch size is hardcoded in the application.
Batching is the most common parameter in AI benchmarking, and it applies to virtually every model.
We currently support batching via the
--script-args
argument, which allows a batching parameter to be sent to the input script and therefore the model. We also support--script-args="--batch_size N"
as the "official" semantics for passing batch size to a turnkey model.However, we have a major flaw: batch size is never reflected in our results. We report throughput as "invocations per second", not "inferences per second". The latter would be far more useful.
To truly report "inferences per second" we need to somehow parse the batch size and then pass it into the benchmarking software, so that we can report
inferences_per_second = invocations_per_second * batch_size
There may not be any perfect way to solve this, but we should still do something. Some issues with potential solutions:
--batch
,--batching
, etc.)A bulletproof (if verbose) solution could be like this:
batch_size
is a reserved--script-arg
name that indicates the batch size, and will be used in IPS computations--batch-arg-name
can override the reserved termbatch_size
to some other name such asbatching
in corner cases where the model/app developer has named their arg something else.--batch-size=N
can set the batch size in both the inputs (by setting--script-args="<batch_arg_name>=N
) as well as in the IPS calculation. This is needed in the case where batch size is hardcoded in the application.cc @danielholanda @viradhak-amd @ramkrishna2910
The text was updated successfully, but these errors were encountered: