Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customized flags for backendRuntimes #140

Open
3 tasks done
kerthcet opened this issue Sep 11, 2024 · 2 comments
Open
3 tasks done

Customized flags for backendRuntimes #140

kerthcet opened this issue Sep 11, 2024 · 2 comments
Labels
feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.

Comments

@kerthcet
Copy link
Member

What would you like to be added:

Right now, we have at most two inferenceModes in backendRuntime, one is Default, another is SpeculativeDecoding, what if people wants to customized there flags for easy usage and refer to the mode in the backendRuntimeConfig, considering flags are really really complex in the inference engine.

Some of our users have little knowledge with the inference engine, so they have no idea how to set the flags to make the inference engine perform better, where this can help.

Generally looks like:

  backendRuntimeConfig:
    mode: customziedOne
    resources:
      limits:
        cpu: 8
        memory: "16Gi"
apiVersion: inference.llmaz.io/v1alpha1
kind: BackendRuntime
metadata:
  labels:
    app.kubernetes.io/name: backendruntime
    app.kubernetes.io/part-of: llmaz
    app.kubernetes.io/created-by: llmaz
  name: vllm
spec:
  args:
    - mode: Default
      flags:
        - --model
        - "{{ .ModelPath }}"
        - --served-model-name
        - "{{ .ModelName }}"
        - --host
        - "0.0.0.0"
        - --port
        - "8080"
    - mode: CustomizedOne # new added.

Why is this needed:

Better to manage the flags and provide some best practices to the users.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-kind Indicates a PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. labels Sep 11, 2024
@kerthcet
Copy link
Member Author

/kind feature

@kerthcet
Copy link
Member Author

Waiting for feedbacks.

@InftyAI-Agent InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a label and requires one. labels Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants