Customized flags for backendRuntimes #140

kerthcet · 2024-09-11T02:46:26Z

What would you like to be added:

Right now, we have at most two inferenceModes in backendRuntime, one is Default, another is SpeculativeDecoding, what if people wants to customized there flags for easy usage and refer to the mode in the backendRuntimeConfig, considering flags are really really complex in the inference engine.

Some of our users have little knowledge with the inference engine, so they have no idea how to set the flags to make the inference engine perform better, where this can help.

Generally looks like:

  backendRuntimeConfig:
    mode: customziedOne
    resources:
      limits:
        cpu: 8
        memory: "16Gi"

apiVersion: inference.llmaz.io/v1alpha1
kind: BackendRuntime
metadata:
  labels:
    app.kubernetes.io/name: backendruntime
    app.kubernetes.io/part-of: llmaz
    app.kubernetes.io/created-by: llmaz
  name: vllm
spec:
  args:
    - mode: Default
      flags:
        - --model
        - "{{ .ModelPath }}"
        - --served-model-name
        - "{{ .ModelName }}"
        - --host
        - "0.0.0.0"
        - --port
        - "8080"
    - mode: CustomizedOne # new added.

Why is this needed:

Better to manage the flags and provide some best practices to the users.

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

The text was updated successfully, but these errors were encountered:

kerthcet · 2024-09-11T02:46:48Z

/kind feature

kerthcet · 2024-09-11T02:47:01Z

Waiting for feedbacks.

InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-kind Indicates a PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. labels Sep 11, 2024

InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a label and requires one. labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customized flags for backendRuntimes #140

Customized flags for backendRuntimes #140

kerthcet commented Sep 11, 2024

kerthcet commented Sep 11, 2024

kerthcet commented Sep 11, 2024

Customized flags for backendRuntimes #140

Customized flags for backendRuntimes #140

Comments

kerthcet commented Sep 11, 2024

kerthcet commented Sep 11, 2024

kerthcet commented Sep 11, 2024