diff --git a/README.md b/README.md
index ebaa5b75..c361ca3d 100644
--- a/README.md
+++ b/README.md
@@ -24,28 +24,68 @@
+
-# Showcase
+## ๐ธ Showcase
https://github.com/fudan-generative-vision/hallo/assets/17402682/294e78ef-c60d-4c32-8e3c-7f8d6934c6bd
+### ๐ฌ Honoring Classic Films
-# Framework
+
+
+ Devil Wears Prada |
+ Green Book |
+ Infernal Affairs |
+
+
+ |
+ |
+ |
+
+
+ Patch Adams |
+ Tough Love |
+ Shawshank Redemption |
+
+
+ |
+ |
+ |
+
+
-![abstract](assets/framework_1.jpg)
-![framework](assets/framework_2.jpg)
+Explore [more examples](https://fudan-generative-vision.github.io/hallo).
+
+## ๐ฐ News
+
+- **`2024/06/15`**: โจโจโจ Released some images and audios for inference testing on [๐คHuggingface](https://huggingface.co/datasets/fudan-generative-ai/hallo_inference_samples).
+- **`2024/06/15`**: ๐๐๐ Launched the first version on ๐ซก[GitHub](https://github.com/fudan-generative-vision/hallo).
+
+## ๐ค Community Resources
+
+Explore the resources developed by our community to enhance your experience with Hallo:
+
+- [Demo on Huggingface](https://huggingface.co/spaces/multimodalart/hallo) - Check out this easy-to-use Gradio demo by [@multimodalart](https://huggingface.co/multimodalart).
+- [hallo-webui](https://github.com/daswer123/hallo-webui) - Explore the WebUI created by [@daswer123](https://github.com/daswer123).
+- [hallo-for-windows](https://github.com/sdbds/hallo-for-windows) - Utilize Hallo on Windows with the guide by [@sdbds](https://github.com/sdbds).
+- [ComfyUI-Hallo](https://github.com/AIFSH/ComfyUI-Hallo) - Integrate Hallo with the ComfyUI tool by [@AIFSH](https://github.com/AIFSH).
+
+Thanks to all of them.
+
+Join our community and explore these amazing resources to make the most out of Hallo. Enjoy and elevate their creative projects!
-# News
+## ๐ง๏ธ Framework
-- **`2024/06/15`**: ๐๐๐ Release the first version on [GitHub](https://github.com/fudan-generative-vision/hallo).
-- **`2024/06/15`**: โจโจโจ Release some images and audios for inference testing on [Huggingface](https://huggingface.co/datasets/fudan-generative-ai/hallo_inference_samples).
+![abstract](assets/framework_1.jpg)
+![framework](assets/framework_2.jpg)
-# Installation
+## โ๏ธ Installation
- System requirement: Ubuntu 20.04/Ubuntu 22.04, Cuda 12.1
- Tested GPUs: A100
@@ -69,7 +109,7 @@ Besides, ffmpeg is also need:
apt-get install ffmpeg
```
-# Inference
+## ๐๏ธ๏ธ Usage
The inference entrypoint script is `scripts/inference.py`. Before testing your cases, there are two preparations need to be completed:
@@ -77,7 +117,7 @@ The inference entrypoint script is `scripts/inference.py`. Before testing your c
2. [Prepare source image and driving audio pairs](#prepare-inference-data).
3. [Run inference](#run-inference).
-## Download pretrained models
+### ๐ฅ Download Pretrained Models
You can easily get all pretrained models required by inference from our [HuggingFace repo](https://huggingface.co/fudan-generative-ai/hallo).
@@ -91,12 +131,12 @@ git clone https://huggingface.co/fudan-generative-ai/hallo pretrained_models
Or you can download them separately from their source repo:
- [hallo](https://huggingface.co/fudan-generative-ai/hallo/tree/main/hallo): Our checkpoints consist of denoising UNet, face locator, image & audio proj.
-- [audio_separator](https://huggingface.co/huangjackson/Kim_Vocal_2): Kim\_Vocal\_2 MDX-Net vocal removal model by [KimberleyJensen](https://github.com/KimberleyJensen). (_Thanks to runwayml_)
+- [audio_separator](https://huggingface.co/huangjackson/Kim_Vocal_2): Kim\_Vocal\_2 MDX-Net vocal removal model. (_Thanks to [KimberleyJensen](https://github.com/KimberleyJensen)_)
- [insightface](https://github.com/deepinsight/insightface/tree/master/python-package#model-zoo): 2D and 3D Face Analysis placed into `pretrained_models/face_analysis/models/`. (_Thanks to deepinsight_)
- [face landmarker](https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task): Face detection & mesh model from [mediapipe](https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker#models) placed into `pretrained_models/face_analysis/models`.
-- [motion module](https://github.com/guoyww/AnimateDiff/blob/main/README.md#202309-animatediff-v2): motion module from [AnimateDiff](https://github.com/guoyww/AnimateDiff). (_Thanks to guoyww_).
-- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to stablilityai_)
-- [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Initialized and fine-tuned from Stable-Diffusion-v1-2. (_Thanks to runwayml_)
+- [motion module](https://github.com/guoyww/AnimateDiff/blob/main/README.md#202309-animatediff-v2): motion module from [AnimateDiff](https://github.com/guoyww/AnimateDiff). (_Thanks to [guoyww](https://github.com/guoyww)_).
+- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to [stablilityai](https://huggingface.co/stabilityai)_)
+- [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Initialized and fine-tuned from Stable-Diffusion-v1-2. (_Thanks to [runwayml](https://huggingface.co/runwayml)_)
- [wav2vec](https://huggingface.co/facebook/wav2vec2-base-960h): wav audio to vector model from [Facebook](https://huggingface.co/facebook/wav2vec2-base-960h).
Finally, these pretrained models should be organized as follows:
@@ -137,7 +177,7 @@ Finally, these pretrained models should be organized as follows:
| `-- vocab.json
```
-## Prepare Inference Data
+### ๐ ๏ธ Prepare Inference Data
Hallo has a few simple requirements for input data:
@@ -153,9 +193,9 @@ For the driving audio:
2. It must be in English since our training datasets are only in this language.
3. Ensure the vocals are clear; background music is acceptable.
-We have provided some samples for your reference.
+We have provided [some samples](examples/) for your reference.
-## Run inference
+### ๐ฎ Run Inference
Simply to run the `scripts/inference.py` and pass `source_image` and `driving_audio` as input:
@@ -189,31 +229,45 @@ options:
face region
```
-# Roadmap
+## ๐
๏ธ Roadmap
| Status | Milestone | ETA |
| :----: | :---------------------------------------------------------------------------------------------------- | :--------: |
| โ
| **[Inference source code meet everyone on GitHub](https://github.com/fudan-generative-vision/hallo)** | 2024-06-15 |
| โ
| **[Pretrained models on Huggingface](https://huggingface.co/fudan-generative-ai/hallo)** | 2024-06-15 |
-| ๐๐๐ | **[Training: data preparation and training scripts]()** | 2024-06-25 |
-| ๐๐๐ | **[Optimize inference performance in Mandarin]()** | TBD |
+| ๐ง | **[Optimizing Inference Performance]()** | 2024-06-23 |
+| ๐ง | **[Optimizing Performance on images with a resolution of 256x256.]()** | 2024-06-23 |
+| ๐ | **[Improving the model's performance on Mandarin Chinese]()** | 2024-06-25 |
+| ๐ | **[Releasing data preparation and training scripts]()** | 2024-06-28 |
+
+
+Other Enhacements
+
+- [ ] Enhancement: Test and ensure compatibility with Windows operating system. [#39](https://github.com/fudan-generative-vision/hallo/issues/39)
+- [ ] Bug: Output video may lose several frames. [#41](https://github.com/fudan-generative-vision/hallo/issues/41)
+- [ ] Bug: Sound volume affecting inference results (audio normalization).
+- [ ] Enhancement: Inference code logic optimization.
+- [ ] Enhancement: Enhancing performance on low resolutions(256x256) to support more efficient usage.
+
+
-# Citation
+
+## ๐ Citation
If you find our work useful for your research, please consider citing the paper:
```
@misc{xu2024hallo,
title={Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation},
- author={Mingwang Xu and Hui Li and Qingkun Su and Hanlin Shang and Liwei Zhang and Ce Liu and Jingdong Wang and Yao Yao and Siyu zhu},
- year={2024},
- eprint={2406.08801},
- archivePrefix={arXiv},
- primaryClass={cs.CV}
+ author={Mingwang Xu and Hui Li and Qingkun Su and Hanlin Shang and Liwei Zhang and Ce Liu and Jingdong Wang and Yao Yao and Siyu zhu},
+ year={2024},
+ eprint={2406.08801},
+ archivePrefix={arXiv},
+ primaryClass={cs.CV}
}
```
-# Opportunities available
+## ๐ Opportunities Available
Multiple research positions are open at the **Generative Vision Lab, Fudan University**! Include:
@@ -224,6 +278,14 @@ Multiple research positions are open at the **Generative Vision Lab, Fudan Unive
Interested individuals are encouraged to contact us at [siyuzhu@fudan.edu.cn](mailto://siyuzhu@fudan.edu.cn) for further information.
-# Social Risks and Mitigations
+## โ ๏ธ Social Risks and Mitigations
The development of portrait image animation technologies driven by audio inputs poses social risks, such as the ethical implications of creating realistic portraits that could be misused for deepfakes. To mitigate these risks, it is crucial to establish ethical guidelines and responsible use practices. Privacy and consent concerns also arise from using individuals' images and voices. Addressing these involves transparent data usage policies, informed consent, and safeguarding privacy rights. By addressing these risks and implementing mitigations, the research aims to ensure the responsible and ethical development of this technology.
+
+## ๐ Community Contributors
+
+Thank you to all the contributors who have helped to make this project better!
+
+
+
+