New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[DO NOT MERGE] Upstream codebase diff #470

Draft

kzawora-intel wants to merge 1,504 commits into main from habana_main

kzawora-intel commented Nov 6, 2024 •

edited

Loading

Scope of changes:

Contiguous PA
Multi-step scheduling
Automatic prefix caching
Padding-aware scheduling/max_num_prefill_seqs
Guided decoding fixes
FP8 support (INC/w8a8/weights_load_device)
ApplyToppTopkScalar sampler optimization
LoRA/MultiLoRA support
FusedMoE support
Model changes (adding mark_steps)
Tests
FakeHPU mode
CI stuff (.jenkins, .github)
Lots of minor stuff (RNG, FSDPA flag, reduced block fragmentation)

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/cpu-test.yml

		@@ -0,0 +1,35 @@
		name: cpu-test

Check failure

Code scanning / Scorecard

Token-Permissions High

score is 0: no topLevel permission defined
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help

kzawora-intel marked this pull request as draft

November 6, 2024 13:49

kzawora-intel added the habana label

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/codespell.yml

		@@ -0,0 +1,45 @@
		name: codespell

Check failure

Code scanning / Scorecard

Token-Permissions High

score is 0: no topLevel permission defined
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help

github-advanced-security bot found potential problems

View reviewed changes

tests/distributed/test_utils.py

+              def test_stateless_process_group(worker):
+                  port1 = get_open_port()
+                  with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+                      s.bind(("", port1))

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium test

'' binds a socket to all interfaces.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to bind the socket to a specific interface instead of all interfaces. In this case, we can bind it to the loopback interface 127.0.0.1, which is commonly used for local testing and development. This change will limit the socket to accept connections only from the local machine, reducing the security risks.

Suggested changeset 1

tests/distributed/test_utils.py

@@ -124,3 +124,3 @@
                 with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-                    s.bind(("", port1))
+                    s.bind(("127.0.0.1", port1))
                     port2 = get_open_port()

Copilot is powered by AI and may make mistakes. Always verify output.

github-advanced-security bot found potential problems

View reviewed changes

vllm/entrypoints/openai/api_server.py

    
                  sock = socket.socket(family=family, type=socket.SOCK_STREAM)

                  sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

                  sock.bind(addr)

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium

'' binds a socket to all interfaces.

Copilot Autofix AI 7 days ago

To fix the problem, we need to ensure that the socket is not bound to all network interfaces. Instead, we should bind it to a specific interface. This can be achieved by modifying the create_server_socket function to check if the provided address is empty or 0.0.0.0 and replace it with a specific interface address.

Modify the create_server_socket function to check if the address is empty or 0.0.0.0.
If the address is empty or 0.0.0.0, replace it with a specific interface address (e.g., 127.0.0.1 for localhost).
Update the sock.bind(addr) call to use the modified address.

Suggested changeset 1

vllm/entrypoints/openai/api_server.py

@@ -759,2 +759,6 @@
+                # Bind to a specific interface if the address is empty or 0.0.0.0
+                if addr[0] in ("", "0.0.0.0"):
+                    addr = ("127.0.0.1", addr[1])
                 sock = socket.socket(family=family, type=socket.SOCK_STREAM)

Copilot is powered by AI and may make mistakes. Always verify output.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[' and containing many repetitions of 'AA(),'.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=,'.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ',A='.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA= ),A('.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[' and containing many repetitions of 'AA()'.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=,'.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ',A='.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=)A('.

vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py

    
                  # Llama3.2 models more reliable.

                  TOOL_CALL_REGEX = re.compile(

                      r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ')A(A='.

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/sphinx-lint.yml Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems

View reviewed changes

benchmarks/disagg_benchmarks/round_robin_proxy.py

+                                  return resp
+                          except Exception as e:
+                              return web.Response(text=f"Error: {str(e)}", status=500)

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium test

Stack trace information

flows to this location and may be exposed to an external user.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to ensure that detailed exception messages are not exposed to the end user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling block to log the exception and return a generic error message.

Import the logging module to enable logging of exceptions.
Configure the logging settings if not already configured.
Modify the exception handling block to log the exception and return a generic error message.

Suggested changeset 1

benchmarks/disagg_benchmarks/round_robin_proxy.py

@@ -2,3 +2,3 @@
             import itertools
+            import logging
             import aiohttp
@@ -6,2 +6,3 @@
+            logging.basicConfig(level=logging.ERROR)
@@ -39,3 +40,4 @@
                         except Exception as e:
-                            return web.Response(text=f"Error: {str(e)}", status=500)
+                            logging.error("An error occurred while handling the request", exc_info=True)
+                            return web.Response(text="An internal error has occurred!", status=500)

Copilot is powered by AI and may make mistakes. Always verify output.

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/lint-and-deploy.yaml Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems

View reviewed changes

.github/workflows/sphinx-lint.yml Fixed Show fixed Hide fixed

aurickq and others added 20 commits

January 3, 2025 16:39


          [Model] Whisper model implementation (vllm-project#11280)

e1a5c2f

Co-authored-by: Aurick Qiao <[email protected]>


          [V1] Simplify Shutdown (vllm-project#11659)

80c751e


          [Bugfix] Fix ColumnParallelLinearWithLoRA slice (vllm-project#11708)

61fed92

Signed-off-by: ZincCat <[email protected]>


          [V1] Improve TP>1 Error Handling + Stack Trace (vllm-project#11721)

Co-authored-by: Tyler Michael Smith <[email protected]>


          [Misc]Add BNB quantization for Qwen2VL (vllm-project#11719)

a655eb3

Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Isotr0py <[email protected]>


          Update requirements-tpu.txt to support python 3.9 and 3.11 (vllm-proj…

bf0d97d

…ect#11695)

Signed-off-by: mgoin <[email protected]>


          [V1] Chore: cruft removal (vllm-project#11724)

ad0d567


          [V1] log GPU blocks num for MultiprocExecutor (vllm-project#11656)

e5d7ed0


          Update tool_calling.md (vllm-project#11701)

9c93636


          Update bnb.md with example for OpenAI (vllm-project#11718)

d1d4939


          [V1] Add RayExecutor support for AsyncLLM (api server) (vllm-proj…

fbf2564

…ect#11712)


          [V1] Add kv cache utils tests. (vllm-project#11513)

d91457d

Signed-off-by: xcnick <[email protected]>


          [Core][Bugfix] Use correct device to initialize GPU data during CUDA-…

300acb8

…graph-capture (vllm-project#11233)

Signed-off-by: Yan Burman <[email protected]>
Signed-off-by: Ido Asraff <[email protected]>


          [VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-On…

eed11eb

…eVision (vllm-project#11717)

Signed-off-by: DarkLight1337 <[email protected]>


          [Bugfix] Fix precision error in LLaVA-NeXT (vllm-project#11735)

ba214df

Signed-off-by: DarkLight1337 <[email protected]>


          [Model] Remove unnecessary weight initialization logic (vllm-project#…

65c0892

…11736)

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Isotr0py <[email protected]>


          [Bugfix][V1] Fix test_kv_cache_utils.py (vllm-project#11738)

Signed-off-by: Jee Jee Li <[email protected]>


          [MISC] Replace c10::optional with std::optional (vllm-project#11730)

4068f4b

Signed-off-by: Lu Fang <[email protected]>


          [distributed] remove pynccl's redundant stream (vllm-project#11744)

635b897


          fix: [doc] fix typo (vllm-project#11751)

eba1717

Co-authored-by: Lancer <[email protected]>

jikunshang and others added 30 commits

January 17, 2025 04:15


          [CI]add genai-perf benchmark in nightly benchmark (vllm-project#10704)

fead53b

Signed-off-by: Kunshang Ji <[email protected]>


          [Doc] Add instructions on using Podman when SELinux is active (vllm-p…

…roject#12136)

Signed-off-by: Yuan Tang <[email protected]>


          [Bugfix] Fix issues in CPU build Dockerfile (vllm-project#12135)

b8bfa46

Signed-off-by: Yuan Tang <[email protected]>


          [BugFix] add more is not None check in VllmConfig.__post_init__ (vl…

d1adb9b

…lm-project#12138)

Signed-off-by: Chen Zhang <[email protected]>


          [Misc] Add deepseek_vl2 chat template (vllm-project#12143)

d75ab55

Signed-off-by: Isotr0py <[email protected]>


          [ROCm][MoE] moe tuning support for rocm (vllm-project#12049)

8027a72

Signed-off-by: Divakar Verma <[email protected]>


          [V1] Move more control of kv cache initialization from model_executor…

69d765f

… to EngineCore (vllm-project#11960)

Signed-off-by: Chen Zhang <[email protected]>
Co-authored-by: Cody Yu <[email protected]>


          Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava

2d85682


          Check if kv_cache is tuple before calling split_kv_cache (#697)

a685225


          Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava

a293e2e


          [Misc][LoRA] Improve the readability of LoRA error messages (vllm-pro…

07934cc

…ject#12102)

Signed-off-by: Jee Jee Li <[email protected]>


          [CI/Build][CPU][Bugfix] Fix CPU CI (vllm-project#12150)

d4e6194

Signed-off-by: jiang1.li <[email protected]>


          [core] allow callable in collective_rpc (vllm-project#12151)

87a0c07

Signed-off-by: youkaichao <[email protected]>


          [CI] Cleanup run_tests.sh logs (#700)

7eea2df


          Merge remote-tracking branch 'upstream/main' into private/kzawora/reb…

ce50b1a

…ase-2025-01-17


          fix TP crashes

a128878


          make mypy happy

2e53e75


          ¿what the heck is incquark?

21f5fb2


          i forgot brackets again

f1e911d


          Multimodality fix for llava (#641)

ae67e4d

Multimodality fix for llava after rebase

Fix for:
```
ERROR 12-16 12:31:11 engine.py:136] NotImplementedError: Unknown multi-modal data type: attention_mask
```


          Rebase 2025-01-17 (#701)

018ce62


          Fix LoRA tests (#696)

b10992b

This PR updates `test/lora/utils.py` based on latest rebase.


          Updating README_GAUDI in habana_main (#690)

1. This PR updates habana_main README_GAUDI to the Technical Writer
reviewed version as seen in v1.19.0.
(habana_main README_GAUDI and v1.19.0 README_GAUDI had diverged. )
2. It also fixes broken urls due to recent restructuring in upstream
vllm examples folder.
3. Adds notes in examples folder for new users and redirects them to see
the Gaudi specific examples in README_GAUDI.md.


          Change vllm-hpu-extension revision to ae726d4

293bd87


          Change vllm-hpu-extension revision to ae726d4 (#707)

cc069cb

Change vllm-hpu-extension revision to ae726d4


          Capabilities overhaul (#692)

fedf706

Supporting PR for HabanaAI/vllm-hpu-extension#76


          [SW-216156] Fix mixtral Fused MoE issues after rebase (#708)

37eb4fc


          Disable enforcing eager mode for mllama and deepseek_v3 on hpu (#713)

1df1c2c


          Fix for random sampler recompilations for incomplete batches (#663)

e977f2a

Changes the sampler used by dummy sequences to greedy if any
sequence is using it. Prevents sampler recompilations.


          [SW-216413] - Fix new executors shutdown and shutdown_inc flow (#716)

a64571c

Co-authored-by: Michał Kuligowski <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

habana

99 participants