-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable Diffusion 3.x and Flux Optimization #22986
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ACinfr
reviewed
Dec 16, 2024
onnxruntime/python/tools/transformers/models/stable_diffusion/optimize_pipeline.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark.py
Outdated
Show resolved
Hide resolved
tianleiwu
changed the title
[WIP] Stable Diffusion 3.x and Flux Optimization
Stable Diffusion 3.x and Flux Optimization
Jan 12, 2025
jiafatom
reviewed
Jan 12, 2025
onnxruntime/python/tools/transformers/models/stable_diffusion/optimize_pipeline.py
Show resolved
Hide resolved
kunal-vaishnavi
approved these changes
Jan 14, 2025
tianleiwu
added a commit
that referenced
this pull request
Jan 16, 2025
Add a tool to generate node_block_list used in [float16 conversion tool](https://github.com/microsoft/onnxruntime/blob/04030f64be10e020d3ac9aa5ba7d0f2917cbd14e/onnxruntime/python/tools/transformers/float16.py#L175). Previously, we have a feature to dump statistics data (like min, max) of each node input/output. However, it is time consuming to generate a list of nodes that need to be kept in float32 when model is large. This could help speed up the process by outputting a list of nodes that have potential overflow in float-to-half conversion. Usage is to build onnxruntime from source with ` --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1`, then set some environment variables before running float32 optimized onnx model like: ``` export ORT_DEBUG_NODE_IO_DUMP_HALF_CONVERSION_OVERFLOW=1 export ORT_DEBUG_NODE_IO_HALF_OVERFLOW_THRESHOLD=50000 python benchmark.py -e optimum --height 1024 --width 1024 --steps 3 -b 1 -v Flux.1D -p flux1_dev_onnx/fp32_opt --skip_warmup ``` The threshold `ORT_DEBUG_NODE_IO_HALF_OVERFLOW_THRESHOLD` shall be <= 65504. The default value is 50000 if the environment variable is not set. It is better to leave some margin if number of samples are not large enough in the test. As a demo, we add an option --skip_warmup to benchmark.py for Flux, so that we can reduce the time on dumping warm-up runs. Example snippet of stdout (each inference session has such a summary when session ended): ``` Total counter in node dumping: 141 Found 2 nodes cannot be converted to half precision due to potential input/output overflow. Operator frequencies for these nodes: Softmax : 1 MatMul : 1 # ------- # Example python script for float16 conversion # For details, search `node_block_list` in https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py # ------- from onnxruntime.transformers.onnx_model import OnnxModel m = OnnxModel(onnx.load('flux1_dev_onnx/fp32_opt/vae_decoder/model.onnx')) node_block_list = [ '/decoder/mid_block/attentions.0/Softmax', '/decoder/mid_block/attentions.0/MatMul', ] m.convert_float_to_float16(keep_io_types=False, node_block_list=node_block_list) m.save_model_to_file('fp16/optimized.onnx', use_external_data_format=False) ``` Then you can use the python script to convert corresponding model to float16. ### Motivation and Context It is a tool used to generate node_block_list used in float16 conversion of stable diffusion 3.x and flux models in #22986. In stable diffusion or Flux pipeline, there are multiple models and there could be multiple session runs for each model. Without a proper tool, it is time consuming to get node_block_list for each model.
carzh
pushed a commit
that referenced
this pull request
Jan 16, 2025
Add a tool to generate node_block_list used in [float16 conversion tool](https://github.com/microsoft/onnxruntime/blob/04030f64be10e020d3ac9aa5ba7d0f2917cbd14e/onnxruntime/python/tools/transformers/float16.py#L175). Previously, we have a feature to dump statistics data (like min, max) of each node input/output. However, it is time consuming to generate a list of nodes that need to be kept in float32 when model is large. This could help speed up the process by outputting a list of nodes that have potential overflow in float-to-half conversion. Usage is to build onnxruntime from source with ` --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1`, then set some environment variables before running float32 optimized onnx model like: ``` export ORT_DEBUG_NODE_IO_DUMP_HALF_CONVERSION_OVERFLOW=1 export ORT_DEBUG_NODE_IO_HALF_OVERFLOW_THRESHOLD=50000 python benchmark.py -e optimum --height 1024 --width 1024 --steps 3 -b 1 -v Flux.1D -p flux1_dev_onnx/fp32_opt --skip_warmup ``` The threshold `ORT_DEBUG_NODE_IO_HALF_OVERFLOW_THRESHOLD` shall be <= 65504. The default value is 50000 if the environment variable is not set. It is better to leave some margin if number of samples are not large enough in the test. As a demo, we add an option --skip_warmup to benchmark.py for Flux, so that we can reduce the time on dumping warm-up runs. Example snippet of stdout (each inference session has such a summary when session ended): ``` Total counter in node dumping: 141 Found 2 nodes cannot be converted to half precision due to potential input/output overflow. Operator frequencies for these nodes: Softmax : 1 MatMul : 1 # ------- # Example python script for float16 conversion # For details, search `node_block_list` in https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py # ------- from onnxruntime.transformers.onnx_model import OnnxModel m = OnnxModel(onnx.load('flux1_dev_onnx/fp32_opt/vae_decoder/model.onnx')) node_block_list = [ '/decoder/mid_block/attentions.0/Softmax', '/decoder/mid_block/attentions.0/MatMul', ] m.convert_float_to_float16(keep_io_types=False, node_block_list=node_block_list) m.save_model_to_file('fp16/optimized.onnx', use_external_data_format=False) ``` Then you can use the python script to convert corresponding model to float16. ### Motivation and Context It is a tool used to generate node_block_list used in float16 conversion of stable diffusion 3.x and flux models in #22986. In stable diffusion or Flux pipeline, there are multiple models and there could be multiple session runs for each model. Without a proper tool, it is time consuming to get node_block_list for each model.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
It has dependency on the following PRs:
Optimize the ONNX pipeline for Stable Diffusion 3.x and Flux 1.0 models (fp32 or fp16).
Optimize the ONNX pipeline for Stable Diffusion 3.x and Flux 1.0 models:
H100 Benchmark Results
A100 Benchmark Results
Future Works
Motivation and Context
SD 3.5 Architecture:
https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/mmdit-x.png