Customized flags for backendRuntimes #140
Labels
feature
Categorizes issue or PR as related to a new feature.
needs-priority
Indicates a PR lacks a label and requires one.
needs-triage
Indicates an issue or PR lacks a label and requires one.
What would you like to be added:
Right now, we have at most two inferenceModes in backendRuntime, one is Default, another is SpeculativeDecoding, what if people wants to customized there flags for easy usage and refer to the mode in the backendRuntimeConfig, considering flags are really really complex in the inference engine.
Some of our users have little knowledge with the inference engine, so they have no idea how to set the flags to make the inference engine perform better, where this can help.
Generally looks like:
Why is this needed:
Better to manage the flags and provide some best practices to the users.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: