Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Benchmarks v2 Merging dev to main (#184)
* AutoGPTQ Mistral, Memory profiling support and empirical quality checks (#163) * Added info about mistral support * AutoGPTQ now uses base class, with mistral support and memory profiling * minor changes on change of cli args in bench.sh * changed requirements with latest update of autogptq * support for mistral instruct and llama2 chat and latest autogptq installation from source * Added another common utility to build chat templates for model * fix bugs for multiple duplicated logging * PyTorchBenchmark supports mistral, memory profile and uses Base class * changes in instruction and using latest model for benchmarking, removed Logs * fixing dependencies with proper versions * Addition of mistral and llama and also table for precision wise quality comparision * Added new docs and template for mistral and starting out new benchmark performance logs in templates * improvements on better logging strategies to log the quality checks output in a readme * integrated the utility for logging improvements * using better logging stratgies in bench pytorch * questions.json has the ground truth answer set from fp32 response * AutoGPTQ readme improvements and added quality checks examples for llama and mistral * Using latest logging utilities * removed creation of Logs folder and unnencessary arguments * Added fsspec * Added llama2 and mistral performance logs * pinned version of huggingface_hub * Latest info under 'some points to note' section * Update bench_autogptq/bench.sh Co-authored-by: Nicola Sosio <[email protected]> --------- Co-authored-by: Anindyadeep Sannigrahi <[email protected]> Co-authored-by: Nicola Sosio <[email protected]> * Deepspeed Mistral, Memory profiling support and empirical quality checks (#168) * Added another common utility to build chat templates for model * fix bugs for multiple duplicated logging * PyTorchBenchmark supports mistral, memory profile and uses Base class * changes in instruction and using latest model for benchmarking, removed Logs * fixing dependencies with proper versions * Addition of mistral and llama and also table for precision wise quality comparision * Added new docs and template for mistral and starting out new benchmark performance logs in templates * improvements on better logging strategies to log the quality checks output in a readme * integrated the utility for logging improvements * using better logging stratgies in bench pytorch * questions.json has the ground truth answer set from fp32 response * DeepSpeed now using base class, mistral support and memory profiling * removed unused imports * removed Logs and latest improvements w.r.t base class * README now has quality comparision for deepspeed * using latest version of deepspeed * added latest performance logs for llama2 and mistral * added docs for llama and mistral with latest scores * updated readme with correct model info --------- Co-authored-by: Anindyadeep Sannigrahi <[email protected]> * Ctransformers Mistral and Memory Profiling support (#165) * Ctransformers support mistral and uses Base class along with memory profiling * uses latest bench.py arguments, remove making log folders and improvements * supporting mistral and llama chat models and installation improvements * added additional requirements which is not support by ctransformers by default * Added another common utility to build chat templates for model * fix bugs for multiple duplicated logging * PyTorchBenchmark supports mistral, memory profile and uses Base class * changes in instruction and using latest model for benchmarking, removed Logs * fixing dependencies with proper versions * Addition of mistral and llama and also table for precision wise quality comparision * Added new docs and template for mistral and starting out new benchmark performance logs in templates * improvements on better logging strategies to log the quality checks output in a readme * integrated the utility for logging improvements * using better logging stratgies in bench pytorch * questions.json has the ground truth answer set from fp32 response * CTransformers using latest logging utilities * removed unnecessary arguments and creation of Logs folder * Add precision wise quality comparision on AutoGPTQ readme * Added performance scores for llama2 and mistral * Latest info under 'some points to note' section * added ctransformers performance logs for mistral and llama * Update bench_ctransformers/bench.sh Co-authored-by: Nicola Sosio <[email protected]> --------- Co-authored-by: Anindyadeep Sannigrahi <[email protected]> Co-authored-by: Nicola Sosio <[email protected]> * CTranslate2 Benchmark with Mistral Support (#170) * Added support for BaseClass and mistral with memory profiling * removed docker support with latest ctranslate release * Added latest ctranslate2 version * Removed runs with docker and added mistral model support * removed docker support and added mistral support * Added performance logs for mistral and llama * engine specific readme with qualitative comparision * Llamacpp mistral (#171) * fix bug: handle temperature when None * Added llamacpp engine readme with quality comparision * Using Base class with mistral support and memory profiling * shell script cli improvements * Added newer requirements with version pineed * small improvements * removed MODEL_NAME while running setup * Added performance logs for llama and mistral * fixed performance metrics of llama for pytorch transformers * fixed performance metrics of mistral for pytorch transformers * Fix the name of the models and links of the same * ExLlamaV2 Mistral, Memory support, qualitative comparision and improvements (#175) * Added performance logs for mistral and llama for exllamav2 along with qualitative comparisions * ExLlamaV2 using base class along with support for mistral and memory profiling * removed old cli args and small improvements * deleted convert.py script * pinned latest version and added transformers * addition of mistral model along with usage of latest exllamav2 repo * Update bench_exllamav2/bench.sh Co-authored-by: Nicola Sosio <[email protected]> --------- Co-authored-by: Nicola Sosio <[email protected]> * vLLM Mistral, Memory support, qualitative comparision and improvements (#172) * Adding base class with mistral support and memory profling * small improvements on removing unnecessary cli args * download support for mistral * adding on_exit function on get_answers * Added precision wise qualitative checks for vLLM README * Added performance logs on docs for mistral and llama * Update bench_vllm/bench.sh Co-authored-by: Nicola Sosio <[email protected]> --------- Co-authored-by: Nicola Sosio <[email protected]> * Nvidia TensortRT LLM Mistral, Memory support, qualitative comparision and improvements (#178) * Added readme with mistral support and qualitative comparision * TRT LLM using base class with mistral and memory profiling support * removed old cli args, and some improvements * Added support for mistral with latest trt llm * Added support for root dir for handling runs inside and outside docker * Added performance logs for both mistral and llama * Added float32 on docs and performance logs * Added support for float32 precision * Added support for float32 * revised to int4 for mistral * Optimum Nvidia Mistral, Memory support, qualitative comparision and improvements (#177) * Added performance logs for mistral and llama for exllamav2 along with qualitative comparisions * ExLlamaV2 using base class along with support for mistral and memory profiling * removed old cli args and small improvements * deleted convert.py script * pinned latest version and added transformers * addition of mistral model along with usage of latest exllamav2 repo * Using base benchmark class with memory profiling support and mistral model support * Addition of new contructor argument root_dir to handle paths inside or outside docker * created a converter script to convert to tensorrt engine file * Addition of latest update usage of optimum nvidia and also added qualitative comparision * cli improvements and remove older cli args * added latest conversion script logic to conver hf weights to engine and mistral support * Added latest performance logs for both mistral and llama * removed the conflict with exllamav2 * removed changes from exllamav2 * ONNX Runtime with mistral support and memory profiling (#182) * Added comparative quality analysis for mistral and llama and also added nuances related to onnx * Using base class with memory profiling and mistral support * removed old cli arguments and some improvements * removed requirements, since onnx runs using custom docker container * Added new setup sh file with mistral and llama onnx conversion through docker * Added performance logs of onnx for llama and mistral * Lightning AI Mistral and memory integration (#174) * Added qualitative comparision of quantity for litgpt * Using base class with mistral support and memory support * small cli improvements, removed old arguments * removed convert logic with latest litgpt * Added latest inference logic code * pinned version for dependencies * Added latest method of installation and model conversions with litgpt * added performance benchmarks info in litgpt * updated the memory usage and token per seconds * chore: minor improvements and added latest info about int4 * Changes in Engine Readmes (#183) * Deleted the files related to llama2 in docs --------- Co-authored-by: Anindyadeep Sannigrahi <[email protected]> Co-authored-by: Nicola Sosio <[email protected]>
- Loading branch information