[DO NOT MERGE] Upstream codebase diff #470
Draft
kzawora-intel wants to merge 1504 commits intomainfrom habana_main
+104,289-36,649
Commits
This pull request is big! We're only showing the most recent 250 commits
Commits on Jan 3, 2025
- authored
- authored
- authored
- authored
Commits on Jan 4, 2025
- authored
- authored
- authored
- authored
- authored
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (vllm-project#11233)
authored- authored
- authored
- authored
Commits on Jan 5, 2025
- authored
- authored
- authored
Commits on Jan 6, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (vllm-project#11685)
Commits on Jan 7, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 8, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (vllm-project#11698)
authored- authored
Commits on Jan 9, 2025
- authored
- authored
- authored
- authored
- authored
- committed
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 10, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- committed
- committed
- authored
- authored
- authored
- authored
- authored
[Bugfix] Check that number of images matches number of <|image|> tokens with mllama (vllm-project#11939)
authored
Commits on Jan 11, 2025
- authored
- authored
[Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (vllm-project#11672)
authored- authored
- authored
- authored
- authored
Commits on Jan 12, 2025
- authored
- authored
- authored
- authored
- authored
Commits on Jan 13, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 14, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- authored
- authored
- committed
- committed
- authored
- authored
Commits on Jan 15, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 16, 2025
[Core] Default to using per_token quantization for fp8 when cutlass is supported. (vllm-project#8651)
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 17, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[V1] Move more control of kv cache initialization from model_executor to EngineCore (vllm-project#11960)
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- committed
- committed
- committed
- committed
- authored
- authored