You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current implementation of MambaCacheManager._assign_seq_id_to_cache_index, if cur_id is not amongst the finished requests, it will try to pop a free_cache_index.
However, it seems there might be an edge case where the _assign_seq_id_to_cache_index tries to aggressively pop free indices before _release_finished_requests has a change to return them
We have some private experiments involving mamba that we reuse the above MambaCacheManager implementation, but we have observed errors like below
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/jamba.py", line 441, in forward
) = self.mamba_cache.current_run_tensors(input_ids, attn_metadata,
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 54, in current_run_tensors
state_indices = self._prepare_current_run_mamba_cache(
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 144, in _prepare_current_run_mamba_cache
return [
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 145, in <listcomp>
self._assign_seq_id_to_cache_index(req_id, seq_id,
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 119, in _assign_seq_id_to_cache_index
destination_index = self.free_cache_indices.pop()
IndexError: pop from empty list
which suggests the issue being diagnosed above.
We have made sure that we initialize MambaCacheManager will have max_batch_size equal to scheduler_config.max_num_seqs, which we have set it 10 times as large as our batch_size. We use around 8 scheduler steps.
Question: But how can we be sure that the cache occupancy will never exceed max_batch_size?
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Hi @fabianlim ,
Thank you for finding out this bug, we haven't had the chance to test mamba-like models with multistep scheduler setup,
Since the ssm cache of mamba models in vLLM are managed inside the modeling file, It can cause some problems with the release and allocation of cache slots of incoming/finished requests, especially upon introduction of new features like multistep scheduling.
Please take a look at #10705 I just opened that fixes this bug.
Thanks!
Your current environment
No response
Model Input Dumps
No response
🐛 Describe the bug
In the current implementation of
MambaCacheManager._assign_seq_id_to_cache_index
, ifcur_id
is not amongst the finished requests, it will try to pop afree_cache_index
._assign_seq_id_to_cache_index
tries to aggressively pop free indices before_release_finished_requests
has a change to return themWe have some private experiments involving mamba that we reuse the above
MambaCacheManager
implementation, but we have observed errors like belowwhich suggests the issue being diagnosed above.
We have made sure that we initialize
MambaCacheManager
will havemax_batch_size
equal toscheduler_config.max_num_seqs
, which we have set it 10 times as large as our batch_size. We use around 8 scheduler steps.Question: But how can we be sure that the cache occupancy will never exceed
max_batch_size
?CC: @nelsonspbr
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: