-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fake HPU mode to Habana components #180
Changes from all commits
e52c0ec
dcc878b
afffe33
ed414dc
ceca996
1976d75
db4c30f
08c9cf3
ebcb4ab
506e026
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name: cpu-test | ||
|
||
on: | ||
# Trigger the workflow on push or pull request, | ||
# but only for the habana_main branch | ||
push: | ||
branches: | ||
- habana_main | ||
pull_request: | ||
branches: | ||
- habana_main | ||
|
||
|
||
jobs: | ||
cputest: | ||
runs-on: ubuntu-latest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wouldn't it be safer to use a hardcoded ubuntu version? |
||
strategy: | ||
matrix: | ||
python-version: ["3.10"] | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
pip install torch --extra-index-url https://download.pytorch.org/whl/cpu | ||
pip install -r requirements-hpu.txt | ||
VLLM_TARGET_DEVICE=hpu python setup.py develop | ||
- name: cpu-test | ||
run: | | ||
VLLM_SKIP_WARMUP=true VLLM_PROMPT_SEQ_BUCKET_MAX=128 python examples/offline_inference_fakehpu.py | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Running with warmup would be an additional bonus validation don't you think? Probably it would be better to limit number of buckets, so that it does not take that much time, instead of disabling warmup |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
from vllm import LLM, SamplingParams | ||
|
||
# Sample prompts. | ||
prompts = [ | ||
"Berlin is the capital city of ", | ||
"Louvre is located in the city called ", | ||
"Barack Obama was the 44th president of ", | ||
"Warsaw is the capital city of ", | ||
"Gniezno is a city in ", | ||
"Hebrew is an official state language of ", | ||
"San Francisco is located in the state of ", | ||
"Llanfairpwllgwyngyll is located in country of ", | ||
] | ||
ref_answers = [ | ||
"Germany", "Paris", "United States", "Poland", "Poland", "Israel", | ||
"California", "Wales" | ||
] | ||
# Create a sampling params object. | ||
sampling_params = SamplingParams(temperature=0, n=1, use_beam_search=False) | ||
|
||
# Create an LLM. | ||
llm = LLM(model="facebook/opt-125m", max_model_len=32, max_num_seqs=4) | ||
# Generate texts from the prompts. The output is a list of RequestOutput objects | ||
# that contain the prompt, generated text, and other information. | ||
outputs = llm.generate(prompts, sampling_params) | ||
# Print the outputs. | ||
for output, answer in zip(outputs, ref_answers): | ||
prompt = output.prompt | ||
generated_text = output.outputs[0].text | ||
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") | ||
assert answer in generated_text, ( | ||
f"The generated text does not contain the correct answer: {answer}") | ||
print('PASSED') |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -100,6 +100,7 @@ def forward( | |
kv_cache: torch.Tensor, | ||
attn_metadata: AttentionMetadata, | ||
) -> torch.Tensor: | ||
# import pdb; pdb.set_trace() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this comment is not needed |
||
qkv, _ = self.qkv_proj(hidden_states) | ||
q, k, v = qkv.chunk(chunks=3, dim=-1) | ||
attn_output = self.attn(q, k, v, kv_cache, attn_metadata) | ||
|
@@ -254,7 +255,6 @@ def forward( | |
if self.project_in is not None: | ||
inputs_embeds, _ = self.project_in(inputs_embeds) | ||
hidden_states = inputs_embeds + pos_embeds | ||
|
||
for i in range(len(self.layers)): | ||
layer = self.layers[i] | ||
hidden_states = layer(hidden_states, kv_caches[i], attn_metadata) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about adding also habana_next? Just temporary until the time we maintain two branches