Add Developer Applications Demo using Transformers Library #10

hupreti · 2024-05-07T13:08:31Z

Developer Applications on Cloud AI 100 using Transformers Library

Signed-off-by: Himanshu Upreti <[email protected]>

Removed app_config.json, instructions to create the .json is added in Readme.md Signed-off-by: Himanshu Upreti <[email protected]>

app/cert.pem

Remove cert.pem and key.pem and update Readme.md with instructions to generate them Signed-off-by: Himanshu Upreti <[email protected]>

Signed-off-by: Himanshu Upreti <[email protected]>

app/requirements.txt

app/Readme.md

QEfficient/generation/LLMGenerator.py

app/utils.py

Signed-off-by: Himanshu Upreti <[email protected]>

ochougul

The review is not yet complete. Will review the rest later.

QEfficient/generation/llm_generator.py

ochougul · 2024-05-13T05:23:59Z

QEfficient/generation/llm_generator.py

+from typing import Dict, List, Optional, Union
+from threading import Thread
+
+from QEfficient.generation.cloud_infer import QAICInferenceSession


this import should be below transformers import.
read more here: https://peps.python.org/pep-0008/#imports

ochougul · 2024-05-13T05:27:16Z

QEfficient/generation/llm_generator.py

+        self.session = None
+        self.tokenizer = None
+        self.is_first_prompt = False
+        self.model_name = ""


Can be replaced with line 66, is there a purpose of assigning empty string here?

ochougul · 2024-05-13T05:27:49Z

QEfficient/generation/llm_generator.py

+        self.tokenizer = None
+        self.is_first_prompt = False
+        self.model_name = ""
+        self.qpc_path = ""


replace with 61-63?

ochougul · 2024-05-13T05:44:49Z

app/app.py

+def infer_prompt(msg, chat_history, task, model):
+    global last_prompt, previous_current_ctx_len, last_state_generation_ids
+
+    qeff_generator_model.curr_cache_index = 0


is this required? default is 0 anyway

ochougul · 2024-05-13T05:45:02Z

app/app.py

+    qeff_generator_model.generated_ids = []
+
+    if qeff_generator_model.curr_cache_index >= qeff_generator_model.ctx_len - 1:
+        qeff_generator_model.curr_cache_index = 0


Why is this required?

ochougul · 2024-05-13T05:45:07Z

app/app.py

+    if qeff_generator_model.curr_cache_index >= qeff_generator_model.ctx_len - 1:
+        qeff_generator_model.curr_cache_index = 0
+
+    qeff_generator_model.curr_cache_index = 0


ochougul · 2024-05-13T05:47:23Z

QEfficient/generation/llm_generator.py

+        except Exception as err:
+            raise RuntimeError(f"Unable to load tokenizer, {err}")
+
+        if streamer:


handle else case, raise error

ochougul · 2024-05-13T05:51:03Z

QEfficient/generation/llm_generator.py

+            hf_token = None
+            if os.getenv("HF_TOKEN") is not None:
+                hf_token = os.getenv('HF_TOKEN')
+            tokenizer = AutoTokenizer.from_pretrained(


Have you tested this on models present in tests? I have seen it failing with config file not found on huggingface, and the work-around is to use hf_download, check code in QEfficient/cloud/infer.py::80-87 you can allow only tokenizer file pattern, as you don't need other model files for tokenizer

yes, hf_download are used un-necessarily, even at place where we just need tokenizer still we are downloading all files @ochougul please raise an issue

Here, we download all the files while only needing tokenizer, assuming that user must already have all the files in cache_dir, in which case, the files won't be downloaded again, as this was how this API is supposed to be used.

If you want to change it to only download tokenizer files when only tokenizer is needed, you are welcome to update it. This is definitely a better way to do this.

You can raise issue internally yourself, no need to ask anybody. Thanks.

Signed-off-by: Himanshu Upreti <[email protected]>

ochougul · 2024-05-13T08:39:11Z

QEfficient/cloud/infer.py

@@ -60,6 +60,7 @@ def main(
    device_group: List[int] = [
        0,
    ],
+    execute : bool = True


this funtionality exists in QEfficinet/cloud/execute.py::main, not required to add here.

ochougul · 2024-05-13T08:39:49Z

app/app.py

+            time.sleep(0.07)
+
+        # calling infer api directly to get qpc_path
+        app_config[task][model_name]['qpc_path'] = infer_api(


Please use QEfficient/cloud/execute.py::main function or directly use latency_kv_stats function
as I understand you want to compile and then execute, you can use from QEfficient/cloud/compile import main as compile and use that similar to it's use in infer.py, Let's not update infer.py file for this.

As there is no way customer will use execute option inserted in infer API via command line.
and infer.py is supposed to be a CLI API.

If you want you can create a utils function called compile_and_execute and put it inside, QEffcieint/cloud/utils.py
and then change infer API as well to use same.

It's your choice.

@anujgupt-github @vbaddi I believe we need discuss on requirements on high-level apis and low level apis, their use-cases. Here the use case is not compile and execute. However, we can take https://github.com/daquexian/onnx-simplifier as example to understand top-level cli api and top-level programmable api

Please come up with a design if you have better solution. Happy to discuss it.
Thanks for sharing the onnx-simplifier link.
I agree that we need an API that does end-to-end from HF model till execution, but that doesn't mean, we should destroy code scalability for this.

Signed-off-by: Himanshu Upreti <[email protected]>

QEfficient/cloud/infer.py

Signed-off-by: Himanshu Upreti <[email protected]>

ochougul · 2024-05-13T11:16:53Z

QEfficient/cloud/infer.py

-        ignore_patterns=["*.txt", "*.onnx", "*.ot", "*.md", "*.tflite", "*.pdf"],
-    )
-    tokenizer = AutoTokenizer.from_pretrained(model_hf_path, use_cache=True, padding_side="left")
+        if hf_token is not None:


This code (line 83-90) can also be moved under main function.
and pass model_hf_path as input to infer_api, and return qpc_path as output.

why should user downloads hf_model files from model_card and provide you model_hf_path ? please keep in mind that this is high_level api for user

anujgupt-github · 2024-10-14T15:02:39Z

Not planned to be revived at the moment, will revisit if really needed.
Qualcomm has a much more elaborate developer applications platform which uses efficient-transformers, and can be leveraged

hupreti added 9 commits May 7, 2024 07:19

Add .gitignore file

cab1629

Signed-off-by: Himanshu Upreti <[email protected]>

DevApp on Cloud AI using Library

e6ebefd

Signed-off-by: Himanshu Upreti <[email protected]>

Add LLMGenerator to library

2d2facc

Signed-off-by: Himanshu Upreti <[email protected]>

Update Readme.md for app

be46390

Signed-off-by: Himanshu Upreti <[email protected]>

Update License

c90694a

Signed-off-by: Himanshu Upreti <[email protected]>

Removed qpc_json, add utils

c191b22

Signed-off-by: Himanshu Upreti <[email protected]>

Add app_config, removed dead code, tested app

2d6f914

Signed-off-by: Himanshu Upreti <[email protected]>

Clean code

7da5f72

Signed-off-by: Himanshu Upreti <[email protected]>

Remove app_config.json

226e66c

Removed app_config.json, instructions to create the .json is added in Readme.md Signed-off-by: Himanshu Upreti <[email protected]>

vbaddi reviewed May 7, 2024

View reviewed changes

app/cert.pem Outdated Show resolved Hide resolved

hupreti added 2 commits May 7, 2024 16:32

Update certificates and readme

83f9d2b

Remove cert.pem and key.pem and update Readme.md with instructions to generate them Signed-off-by: Himanshu Upreti <[email protected]>

Update requirements

f9883f4

Signed-off-by: Himanshu Upreti <[email protected]>

vbaddi reviewed May 8, 2024

View reviewed changes

app/requirements.txt Show resolved Hide resolved

ochougul requested changes May 8, 2024

View reviewed changes

quic-jhugo reviewed May 8, 2024

View reviewed changes

app/utils.py Outdated Show resolved Hide resolved

anujgupt-github added the good first issue Good for newcomers label May 9, 2024

hupreti added 4 commits May 9, 2024 15:46

Updated readme, updated llm_generator apis

31aabf7

Signed-off-by: Himanshu Upreti <[email protected]>

Apply chat template with system prompt

66ced10

Signed-off-by: Himanshu Upreti <[email protected]>

Fixes for llama3, mistral and codellama

3d096f1

Signed-off-by: Himanshu Upreti <[email protected]>

Update certificates and clean code

19fe5e7

Signed-off-by: Himanshu Upreti <[email protected]>

ochougul reviewed May 13, 2024

View reviewed changes

Added qeff infer api

d9c6f33

Signed-off-by: Himanshu Upreti <[email protected]>

ochougul reviewed May 13, 2024

View reviewed changes

modified infer api to skip model download if qpc exits

a3f65c4

Signed-off-by: Himanshu Upreti <[email protected]>

ochougul reviewed May 13, 2024

View reviewed changes

QEfficient/cloud/infer.py Show resolved Hide resolved

Restore interface for infer cli api

aaa395b

Signed-off-by: Himanshu Upreti <[email protected]>

ochougul reviewed May 13, 2024

View reviewed changes

hupreti marked this pull request as draft May 14, 2024 03:52

anujgupt-github closed this Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Developer Applications Demo using Transformers Library #10

Add Developer Applications Demo using Transformers Library #10

hupreti commented May 7, 2024

ochougul left a comment

ochougul May 13, 2024

ochougul May 13, 2024

ochougul May 13, 2024

ochougul May 13, 2024

ochougul May 13, 2024

ochougul May 13, 2024

ochougul May 13, 2024

ochougul May 13, 2024

hupreti May 13, 2024

ochougul May 13, 2024

ochougul May 13, 2024

ochougul May 13, 2024 •

edited

Loading

ochougul May 13, 2024

hupreti May 13, 2024 •

edited

Loading

ochougul May 13, 2024 •

edited

Loading

ochougul May 13, 2024

hupreti May 13, 2024

anujgupt-github commented Oct 14, 2024

Add Developer Applications Demo using Transformers Library #10

Add Developer Applications Demo using Transformers Library #10

Conversation

hupreti commented May 7, 2024

ochougul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ochougul May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hupreti May 13, 2024 • edited Loading

Choose a reason for hiding this comment

ochougul May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anujgupt-github commented Oct 14, 2024

ochougul May 13, 2024 •

edited

Loading

hupreti May 13, 2024 •

edited

Loading

ochougul May 13, 2024 •

edited

Loading