[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

fabianlim · 2024-11-27T03:43:28Z

Your current environment

No response

Model Input Dumps

No response

🐛 Describe the bug

In the current implementation of MambaCacheManager._assign_seq_id_to_cache_index, if cur_id is not amongst the finished requests, it will try to pop a free_cache_index.

However, it seems there might be an edge case where the _assign_seq_id_to_cache_index tries to aggressively pop free indices before _release_finished_requests has a change to return them

We have some private experiments involving mamba that we reuse the above MambaCacheManager implementation, but we have observed errors like below

  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/jamba.py", line 441, in forward
    ) = self.mamba_cache.current_run_tensors(input_ids, attn_metadata,
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 54, in current_run_tensors
    state_indices = self._prepare_current_run_mamba_cache(
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 144, in _prepare_current_run_mamba_cache
    return [
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 145, in <listcomp>
    self._assign_seq_id_to_cache_index(req_id, seq_id,
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 119, in _assign_seq_id_to_cache_index
    destination_index = self.free_cache_indices.pop()
IndexError: pop from empty list

which suggests the issue being diagnosed above.

We have made sure that we initialize MambaCacheManager will have max_batch_size equal to scheduler_config.max_num_seqs, which we have set it 10 times as large as our batch_size. We use around 8 scheduler steps.

Question: But how can we be sure that the cache occupancy will never exceed max_batch_size?

CC: @nelsonspbr

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

mzusman · 2024-11-27T10:36:17Z

Hi @fabianlim ,
Thank you for finding out this bug, we haven't had the chance to test mamba-like models with multistep scheduler setup,
Since the ssm cache of mamba models in vLLM are managed inside the modeling file, It can cause some problems with the release and allocation of cache slots of incoming/finished requests, especially upon introduction of new features like multistep scheduling.
Please take a look at #10705 I just opened that fixes this bug.
Thanks!

fabianlim · 2024-11-27T11:38:29Z

thank you very much! we will try it as soon as we get a chance! @mzusman

fabianlim added the bug Something isn't working label Nov 27, 2024

fabianlim changed the title ~~[Bug]:~~ [Bug]: MambaCacheManager Can Possibly Run Out of Free Slots Nov 27, 2024

fabianlim mentioned this issue Nov 27, 2024

[Kernel][Model] Improve continuous batching for Jamba and Mamba #9189

Merged

mzusman mentioned this issue Nov 27, 2024

[Bugfix][Mamba] Fix Multistep on Mamba-like models #10705

Merged

tlrmchlsmth closed this as completed in #10705 Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

fabianlim commented Nov 27, 2024 •

edited

Loading

mzusman commented Nov 27, 2024

fabianlim commented Nov 27, 2024

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

Comments

fabianlim commented Nov 27, 2024 • edited Loading

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

mzusman commented Nov 27, 2024

fabianlim commented Nov 27, 2024

fabianlim commented Nov 27, 2024 •

edited

Loading