Upstream merge 25 1 6#350
Merged
gshtras merged 202 commits intomainfrom upstream_merge_25_1_6Jan 8, 2025
+31,168-16,735
Commits
Commits on Dec 16, 2024
Commits on Dec 17, 2024
- authored
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (vllm-project#10935)
authored- authored
- authored
- authored
- authored
Commits on Dec 18, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 19, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 20, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 21, 2024
- authored
- authored
- authored
Commits on Dec 22, 2024
- authored
- authored
- authored
- authored
Commits on Dec 23, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 24, 2024
- authored
- authored
- authored
[Bugfix][Hardware][CPU] Fix CPU
input_positions
creation for text-only inputs with mrope (vllm-project#11434)authored- authored
- authored
- authored
- authored
- authored
Commits on Dec 25, 2024
- authored
- authored
- authored
- authored
Commits on Dec 26, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 27, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (vllm-project#11566)
authored- authored
Commits on Dec 28, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 29, 2024
- authored
- authored
- authored
- authored
Commits on Dec 30, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 31, 2024
- authored
[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (vllm-project#11565)
authored- authored
Commits on Jan 1, 2025
- authored
- authored
- authored
- authored
- authored
Commits on Jan 2, 2025
- authored
- authored
- authored
- authored
[Bugfix] Free cross attention block table for preempted-for-recompute sequence group. (vllm-project#10013)
authored- authored
- authored
- authored
Commits on Jan 3, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 4, 2025
- authored
- authored
- authored
- authored
- authored
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (vllm-project#11233)
authored- authored
- authored
- authored
Commits on Jan 5, 2025
- authored
- authored
- authored
Commits on Jan 6, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- committed