Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

Closed
1 task done
fabianlim opened this issue Nov 27, 2024 · 2 comments · Fixed by #10705
Closed
1 task done

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

fabianlim opened this issue Nov 27, 2024 · 2 comments · Fixed by #10705
Labels
bug Something isn't working

Comments

@fabianlim
Copy link

fabianlim commented Nov 27, 2024

Your current environment

No response

Model Input Dumps

No response

🐛 Describe the bug

In the current implementation of MambaCacheManager._assign_seq_id_to_cache_index, if cur_id is not amongst the finished requests, it will try to pop a free_cache_index.

  • However, it seems there might be an edge case where the _assign_seq_id_to_cache_index tries to aggressively pop free indices before _release_finished_requests has a change to return them

We have some private experiments involving mamba that we reuse the above MambaCacheManager implementation, but we have observed errors like below

  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/jamba.py", line 441, in forward
    ) = self.mamba_cache.current_run_tensors(input_ids, attn_metadata,
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 54, in current_run_tensors
    state_indices = self._prepare_current_run_mamba_cache(
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 144, in _prepare_current_run_mamba_cache
    return [
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 145, in <listcomp>
    self._assign_seq_id_to_cache_index(req_id, seq_id,
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 119, in _assign_seq_id_to_cache_index
    destination_index = self.free_cache_indices.pop()
IndexError: pop from empty list

which suggests the issue being diagnosed above.

We have made sure that we initialize MambaCacheManager will have max_batch_size equal to scheduler_config.max_num_seqs, which we have set it 10 times as large as our batch_size. We use around 8 scheduler steps.

Question: But how can we be sure that the cache occupancy will never exceed max_batch_size?

CC: @nelsonspbr

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@fabianlim fabianlim added the bug Something isn't working label Nov 27, 2024
@fabianlim fabianlim changed the title [Bug]: [Bug]: MambaCacheManager Can Possibly Run Out of Free Slots Nov 27, 2024
@mzusman
Copy link
Contributor

mzusman commented Nov 27, 2024

Hi @fabianlim ,
Thank you for finding out this bug, we haven't had the chance to test mamba-like models with multistep scheduler setup,
Since the ssm cache of mamba models in vLLM are managed inside the modeling file, It can cause some problems with the release and allocation of cache slots of incoming/finished requests, especially upon introduction of new features like multistep scheduling.
Please take a look at #10705 I just opened that fixes this bug.
Thanks!

@fabianlim
Copy link
Author

thank you very much! we will try it as soon as we get a chance! @mzusman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants