-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[V1][Frontend] Coalesce bunched RequestOutputs
Under high load it's possible for the frontend per-request asyncio queues to back up, with the next token(s) arriving before existing ones are streamed back to the user. In this case there's no reason for them to be emitted as separate outputs in subsequent iterations. By concatenating them into a single output it reduces the number of tasks / context switches / response messages and means those additional "ready" tokens should reach the user faster. Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
- Loading branch information
1 parent
9c485d9
commit 0431ddf
Showing
2 changed files
with
33 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters