-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Remove comments re: pytorch for outlines + compressed-tensors dependencies
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#12260
opened Jan 21, 2025 by
tdoublep
Loading…
[V1][Bugfix] Fix data item ordering in mixed-modality inference
ready
ONLY add when PR is ready to merge/full CI is needed
#12259
opened Jan 21, 2025 by
ywang96
Loading…
[core] separate builder init and builder prepare for each batch
#12253
opened Jan 21, 2025 by
youkaichao
Loading…
[Model] Enable Inference Support for the New Baichuan-M1 Model
documentation
Improvements or additions to documentation
#12251
opened Jan 21, 2025 by
rainkert
Loading…
[torch.compile] decouple compile sizes and cudagraph sizes
#12243
opened Jan 21, 2025 by
youkaichao
Loading…
[Frontend] Set server's maximum number of generated tokens using generation_config.json
frontend
#12242
opened Jan 21, 2025 by
mhendrey
Loading…
[Kernel] fix moe_align_block_size error condition
#12239
opened Jan 21, 2025 by
jinzhen-lin
Loading…
[Docs] Update FP8 KV Cache documentation
documentation
Improvements or additions to documentation
#12238
opened Jan 21, 2025 by
mgoin
Loading…
[Misc] Set default backend to SDPA for get_vit_attn_backend
ready
ONLY add when PR is ready to merge/full CI is needed
#12235
opened Jan 21, 2025 by
wangxiyuan
Loading…
[Misc] Move find_loaded_library to platform_aware_utils.py
#12231
opened Jan 20, 2025 by
houseroad
Loading…
[V1][Spec Decode] Ngram Spec Decode
#12193
opened Jan 19, 2025 by
LiuXiaoxuanPKU
•
Draft
1 of 7 tasks
[Bugfix] fix race condition that leads to wrong order of token returned
#12192
opened Jan 19, 2025 by
joennlae
Loading…
[Kernel] add triton fused moe kernel for gptq/awq
#12185
opened Jan 18, 2025 by
jinzhen-lin
Loading…
[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor
#12167
opened Jan 17, 2025 by
kzawora-intel
Loading…
[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution
#12158
opened Jan 17, 2025 by
cennn
Loading…
[Core] Optimize topp/topk calculation in sampler
#12156
opened Jan 17, 2025 by
afierka-intel
Loading…
[WIP][Hardware][CPU] testing branch for mlperf
ci/build
documentation
Improvements or additions to documentation
needs-rebase
#12141
opened Jan 17, 2025 by
bigPYJ1151
•
Draft
[Misc] Update to Transformers 4.48
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#12120
opened Jan 16, 2025 by
tlrmchlsmth
Loading…
[BUILD] Add VLLM_BUILD_EXT to control custom op build
ci/build
#12116
opened Jan 16, 2025 by
MengqingCao
Loading…
[Misc]add modules_to_not_convert attribute to gptq series
#12103
opened Jan 16, 2025 by
1096125073
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.