Add fastapi support #2370

rainyfly · 2024-02-18T02:16:02Z

PR types(PR类型)

Other

Description

New Feature:
使用FastAPI做服务化

* Support chatglm-6b * Update README.md * support dynamic batching * support dynamic batching * fix dybatch * Disable dynamic batching for chatglm --------- Co-authored-by: root <[email protected]>

* add prefix cache for chatglm * support chatglm

* Support multicards * fix ptuning diff * Update engine.py

* support bloom prefix * support_bloom_prefix * support bloom prefix

* support bloom prefix * support_bloom_prefix * support bloom prefix * Update code for bloom prefix * update code

fix serving

* support bloom prefix * support_bloom_prefix * support bloom prefix * Update code for bloom prefix * update code * support bloom prefix

* [LLM] Support dynamic batching for chatglm * fix bug in triton model * fix bug * fix bug

* test * test FastDeploy * test --------- Co-authored-by: root <[email protected]>

* test * test FastDeploy * test * delete run.sh * delete run.sh * update run.sh * update run.sh ci.py * update ci.py * update ci.py --------- Co-authored-by: root <[email protected]>

[LLM] add ci test script

[LLM] add ci script

[LLM]add ci script

[LLM] add ci script

* add inference load balancer for fastdeploy llm * add inference load balance controller for llm * add ic for llm * add ic for llm * add fastdeploy ic for llm * add fastdeploy ic to llm * Fix asyncio.CancelError exception * Improve robust for llm service * Improve robust for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service

* add inference load balancer for fastdeploy llm * add inference load balance controller for llm * add ic for llm * add ic for llm * add fastdeploy ic for llm * add fastdeploy ic to llm * Fix asyncio.CancelError exception * Improve robust for llm service * Improve robust for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service * add detailed log * add detailed log

* add inference load balancer for fastdeploy llm * add inference load balance controller for llm * add ic for llm * add ic for llm * add fastdeploy ic for llm * add fastdeploy ic to llm * Fix asyncio.CancelError exception * Improve robust for llm service * Improve robust for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service * Add detailed log for llm service * add detailed log * add detailed log * add detailed log

* Add warning for server hangs * Add http nonstream server support * bump fastdeploy_llm to v1.0.0 * add time log for each request * baidu-fastdeploy-fastdeploy-3 fix time format

paddle-bot · 2024-02-18T02:16:07Z

Thanks for your contribution!

CLAassistant · 2024-02-18T02:16:11Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 5 committers have signed the CLA.

✅ jiangjiajun
✅ Zeref996
✅ karagg
✅ rainyfly
❌ root

root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

root and others added 30 commits September 25, 2023 10:37

add codee

9e7ef07

add copyright

4aa21bd

fix some bugs

6fd06f7

Update prefix_utils.py

9f569a7

Update triton_model.py

bdf2748

Update triton_model.py

52aaffb

fix tokenizer

0199cac

Add check for prefix len

fa151a7

Create README.md

800c6a9

Create test_client.py

91ea8bb

Update task.py

5e8221e

add debug log and fix ptuning

9897924

update version

8d1e691

Update triton_model.py

388eb9b

Update README.md

68c15b6

Support chatglm-6b (PaddlePaddle#2223)

30a3beb

* Support chatglm-6b * Update README.md * support dynamic batching * support dynamic batching * fix dybatch * Disable dynamic batching for chatglm --------- Co-authored-by: root <[email protected]>

Support bloom (PaddlePaddle#2232)

b96a92b

Support multicards (PaddlePaddle#2234)

80bb8ed

[LLM] Add prefix for chatglm (PaddlePaddle#2233)

986b233

* add prefix cache for chatglm * support chatglm

Update engine.py

9fa04c3

[LLM] Fix P-Tuning difference (PaddlePaddle#2240)

e6a7d4e

* Support multicards * fix ptuning diff * Update engine.py

[LLM] Support prefix for bloom (PaddlePaddle#2237)

51d8697

* support bloom prefix * support_bloom_prefix * support bloom prefix

Support bloom prefix (PaddlePaddle#2245)

73c1507

* support bloom prefix * support_bloom_prefix * support bloom prefix * Update code for bloom prefix * update code

[LLM] Fix serving (PaddlePaddle#2246)

528e976

fix serving

fix chatglm

1cbbaee

Update config.py

2f2c824

[LLM] Support bloom prefix (PaddlePaddle#2248)

66a4897

* support bloom prefix * support_bloom_prefix * support bloom prefix * Update code for bloom prefix * update code * support bloom prefix

[LLM] Add simple client

4d956d3

add requirements

a5a261b

[LLM] Support dynamic batching for chatglm (PaddlePaddle#2251)

4c21588

* [LLM] Support dynamic batching for chatglm * fix bug in triton model * fix bug * fix bug

karagg and others added 27 commits November 9, 2023 09:57

[LLM] Add ci test scripts (PaddlePaddle#2272)

2d2274c

* test * test FastDeploy * test --------- Co-authored-by: root <[email protected]>

delete run.sh

a55837e

Merge branch 'PaddlePaddle:llm' into llm

f9c8581

delete run.sh

1f76abf

update run.sh

9c6b2de

update run.sh ci.py

ceb49a4

update ci.py

9499199

update ci.py

8bf70a1

[LLM]update ci test script (PaddlePaddle#2285)

6e15209

* test * test FastDeploy * test * delete run.sh * delete run.sh * update run.sh * update run.sh ci.py * update ci.py * update ci.py --------- Co-authored-by: root <[email protected]>

debug

be12232

debug

f884c1a

Merge pull request PaddlePaddle#2286 from karagg/llm

57e7608

[LLM] add ci test script

debug

7b80d70

Merge pull request PaddlePaddle#2288 from karagg/llm

bb68a7e

[LLM] add ci script

debug

6cb1474

Merge pull request PaddlePaddle#2289 from karagg/llm

71652e3

[LLM]add ci script

update run.sh

261e519

add comment

836d21f

do not merge

87f53ea

Rename test_max_batch_size.sh to test_max_batch_size.py

66c4563

update

79e6a1e

Merge pull request PaddlePaddle#2291 from karagg/llm

3376284

[LLM] add ci script

Add warning for server hangs (PaddlePaddle#2333)

7bddc67

* Add warning for server hangs * Add http nonstream server support * bump fastdeploy_llm to v1.0.0 * add time log for each request * baidu-fastdeploy-fastdeploy-3 fix time format

Add fastapi support

23d877f

rainyfly closed this Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fastapi support #2370

Add fastapi support #2370

rainyfly commented Feb 18, 2024

paddle-bot bot commented Feb 18, 2024

CLAassistant commented Feb 18, 2024 •

edited

Loading

Add fastapi support #2370

Add fastapi support #2370

Conversation

rainyfly commented Feb 18, 2024

PR types(PR类型)

Description

paddle-bot bot commented Feb 18, 2024

CLAassistant commented Feb 18, 2024 • edited Loading

CLAassistant commented Feb 18, 2024 •

edited

Loading