Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Upstream codebase diff #470

Draft
wants to merge 1,504 commits into
base: main
Choose a base branch
from
Draft

[DO NOT MERGE] Upstream codebase diff #470

wants to merge 1,504 commits into from

Conversation

kzawora-intel
Copy link

@kzawora-intel kzawora-intel commented Nov 6, 2024

Scope of changes:

  • Contiguous PA
  • Multi-step scheduling
  • Automatic prefix caching
  • Padding-aware scheduling/max_num_prefill_seqs
  • Guided decoding fixes
  • FP8 support (INC/w8a8/weights_load_device)
  • ApplyToppTopkScalar sampler optimization
  • LoRA/MultiLoRA support
  • FusedMoE support
  • Model changes (adding mark_steps)
  • Tests
  • FakeHPU mode
  • CI stuff (.jenkins, .github)
  • Lots of minor stuff (RNG, FSDPA flag, reduced block fragmentation)

@@ -0,0 +1,35 @@
name: cpu-test

Check failure

Code scanning / Scorecard

Token-Permissions High

score is 0: no topLevel permission defined
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help
@kzawora-intel kzawora-intel marked this pull request as draft November 6, 2024 13:49
@kzawora-intel kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Nov 8, 2024
@@ -0,0 +1,45 @@
name: codespell

Check failure

Code scanning / Scorecard

Token-Permissions High

score is 0: no topLevel permission defined
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help
def test_stateless_process_group(worker):
port1 = get_open_port()
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", port1))

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium test

'' binds a socket to all interfaces.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to bind the socket to a specific interface instead of all interfaces. In this case, we can bind it to the loopback interface 127.0.0.1, which is commonly used for local testing and development. This change will limit the socket to accept connections only from the local machine, reducing the security risks.

Suggested changeset 1
tests/distributed/test_utils.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/distributed/test_utils.py b/tests/distributed/test_utils.py
--- a/tests/distributed/test_utils.py
+++ b/tests/distributed/test_utils.py
@@ -124,3 +124,3 @@
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-        s.bind(("", port1))
+        s.bind(("127.0.0.1", port1))
         port2 = get_open_port()
EOF
@@ -124,3 +124,3 @@
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", port1))
s.bind(("127.0.0.1", port1))
port2 = get_open_port()
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options

sock = socket.socket(family=family, type=socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(addr)

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium

'' binds a socket to all interfaces.

Copilot Autofix AI 7 days ago

To fix the problem, we need to ensure that the socket is not bound to all network interfaces. Instead, we should bind it to a specific interface. This can be achieved by modifying the create_server_socket function to check if the provided address is empty or 0.0.0.0 and replace it with a specific interface address.

  1. Modify the create_server_socket function to check if the address is empty or 0.0.0.0.
  2. If the address is empty or 0.0.0.0, replace it with a specific interface address (e.g., 127.0.0.1 for localhost).
  3. Update the sock.bind(addr) call to use the modified address.
Suggested changeset 1
vllm/entrypoints/openai/api_server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/vllm/entrypoints/openai/api_server.py b/vllm/entrypoints/openai/api_server.py
--- a/vllm/entrypoints/openai/api_server.py
+++ b/vllm/entrypoints/openai/api_server.py
@@ -759,2 +759,6 @@
 
+    # Bind to a specific interface if the address is empty or 0.0.0.0
+    if addr[0] in ("", "0.0.0.0"):
+        addr = ("127.0.0.1", addr[1])
+
     sock = socket.socket(family=family, type=socket.SOCK_STREAM)
EOF
@@ -759,2 +759,6 @@

# Bind to a specific interface if the address is empty or 0.0.0.0
if addr[0] in ("", "0.0.0.0"):
addr = ("127.0.0.1", addr[1])

sock = socket.socket(family=family, type=socket.SOCK_STREAM)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[' and containing many repetitions of 'AA(),'.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=,'.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ',A='.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA= ),A('.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[' and containing many repetitions of 'AA()'.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=,'.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ',A='.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(' and containing many repetitions of 'AA=)A('.
# Llama3.2 models more reliable.

TOOL_CALL_REGEX = re.compile(
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '[A(A=' and containing many repetitions of ')A(A='.
return resp

except Exception as e:
return web.Response(text=f"Error: {str(e)}", status=500)

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium test

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to ensure that detailed exception messages are not exposed to the end user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling block to log the exception and return a generic error message.

  1. Import the logging module to enable logging of exceptions.
  2. Configure the logging settings if not already configured.
  3. Modify the exception handling block to log the exception and return a generic error message.
Suggested changeset 1
benchmarks/disagg_benchmarks/round_robin_proxy.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/benchmarks/disagg_benchmarks/round_robin_proxy.py b/benchmarks/disagg_benchmarks/round_robin_proxy.py
--- a/benchmarks/disagg_benchmarks/round_robin_proxy.py
+++ b/benchmarks/disagg_benchmarks/round_robin_proxy.py
@@ -2,3 +2,3 @@
 import itertools
-
+import logging
 import aiohttp
@@ -6,2 +6,3 @@
 
+logging.basicConfig(level=logging.ERROR)
 
@@ -39,3 +40,4 @@
             except Exception as e:
-                return web.Response(text=f"Error: {str(e)}", status=500)
+                logging.error("An error occurred while handling the request", exc_info=True)
+                return web.Response(text="An internal error has occurred!", status=500)
 
EOF
@@ -2,3 +2,3 @@
import itertools

import logging
import aiohttp
@@ -6,2 +6,3 @@

logging.basicConfig(level=logging.ERROR)

@@ -39,3 +40,4 @@
except Exception as e:
return web.Response(text=f"Error: {str(e)}", status=500)
logging.error("An error occurred while handling the request", exc_info=True)
return web.Response(text="An internal error has occurred!", status=500)

Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
aurickq and others added 20 commits January 3, 2025 16:39
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
jikunshang and others added 30 commits January 17, 2025 04:15
Multimodality fix for llava after rebase

Fix for:
```
ERROR 12-16 12:31:11 engine.py:136] NotImplementedError: Unknown multi-modal data type: attention_mask
```
This PR updates `test/lora/utils.py` based on latest rebase.
1. This PR updates habana_main README_GAUDI to the Technical Writer
reviewed version as seen in v1.19.0.
(habana_main README_GAUDI and v1.19.0 README_GAUDI had diverged. )
2. It also fixes broken urls due to recent restructuring in upstream
vllm examples folder.
3. Adds notes in examples folder for new users and redirects them to see
the Gaudi specific examples in README_GAUDI.md.
Change vllm-hpu-extension revision to ae726d4
Changes the sampler used by dummy sequences to greedy if any
sequence is using it. Prevents sampler recompilations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.