Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Run Phi 3 and 3.5 vision on Mac [Chips M1 & M3] ~"Placeholder storage has not been allocated on MPS device!" #244

Open
Harsha0056 opened this issue Jan 10, 2025 · 0 comments

Comments

@Harsha0056
Copy link

Harsha0056 commented Jan 10, 2025

I initially used a codebase from Hugging Face and successfully tested it on Google Colab with GPU compute. However, when I tried running the same code on my local system, I set device_map="auto" and the device to "mps" for input IDs. This resulted in the following error:

"Placeholder storage has not been allocated on MPS device!"

Interestingly, I tested the same setup with the Qwen Vision model, which also used "mps", and it utilized the GPU without any issues.

Could this error indicate that the Phi 3 or Phi 3.5 Vision models are not supported on macOS GPUs? Any suggestions for fixing this issue?

Below is the code from hugging face itself.

################Code#################

from PIL import Image
import requests
from transformers import AutoModelForCausalLM, AutoProcessor

model_id = "microsoft/Phi-3.5-vision-instruct"

model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
trust_remote_code=True,
torch_dtype="auto",
_attn_implementation='eager'
)

processor = AutoProcessor.from_pretrained(
model_id,
trust_remote_code=True,
num_crops=4
)

images = []

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
images.append(Image.open(requests.get(url, stream=True).raw))

messages = [
{"role": "user", "content": "<|image_1|>\nSummarize what you see in this image."}
]

prompt = processor.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)

inputs = processor(prompt, images, return_tensors="pt").to("mps")

generation_args = {
"max_new_tokens": 1000,
"temperature": 0.0,
"do_sample": False,
}

generate_ids = model.generate(
**inputs,
eos_token_id=processor.tokenizer.eos_token_id,
**generation_args
)

generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = processor.batch_decode(
generate_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)[0]

print(response)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant