-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
34 changed files
with
4,365 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
workspace/ | ||
asserts/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
### Python template | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
cover/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
.pybuilder/ | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
# For a library or package, you might want to ignore these files since the code is | ||
# intended to run in multiple environments; otherwise, check them in: | ||
# .python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# poetry | ||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. | ||
# This is especially recommended for binary packages to ensure reproducibility, and is more | ||
# commonly ignored for libraries. | ||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control | ||
#poetry.lock | ||
|
||
# pdm | ||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. | ||
#pdm.lock | ||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it | ||
# in version control. | ||
# https://pdm.fming.dev/#use-with-ide | ||
.pdm.toml | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# pytype static type analyzer | ||
.pytype/ | ||
|
||
# Cython debug symbols | ||
cython_debug/ | ||
|
||
# PyCharm | ||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can | ||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore | ||
# and can be added to the global gitignore or merged into this file. For a more nuclear | ||
# option (not recommended) you can uncomment the following to ignore the entire idea folder. | ||
.idea/ | ||
|
||
asserts/ | ||
workspace/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
FROM ubuntu:20.04 | ||
|
||
ARG DEBIAN_FRONTEND=noninteractive | ||
|
||
RUN apt-get update && apt-get install -y ffmpeg git python3 python3-pip unzip && \ | ||
pip3 install --upgrade pip | ||
|
||
COPY requirements.txt . | ||
|
||
RUN pip3 install -r requirements.txt && rm requirements.txt | ||
|
||
RUN mkdir /app | ||
COPY . /app | ||
|
||
COPY asserts.zip /tmp/ | ||
RUN unzip /tmp/asserts.zip -d /app/ && rm -r /app/asserts/examples /tmp/asserts.zip | ||
|
||
WORKDIR /app | ||
|
||
# training stage | ||
FROM ubuntu:20.04 As training | ||
|
||
ARG DEBIAN_FRONTEND=noninteractive | ||
|
||
RUN apt-get update && apt-get install -y ffmpeg git python3 python3-pip unzip && \ | ||
pip3 install --upgrade pip | ||
|
||
COPY requirements_training.txt . | ||
|
||
RUN pip3 install -r requirements_training.txt && rm requirements_training.txt | ||
|
||
RUN mkdir /app | ||
COPY . /app | ||
|
||
COPY asserts.zip /tmp/ | ||
RUN unzip /tmp/asserts.zip -d /app/ && rm -r /app/asserts/examples /tmp/asserts.zip | ||
|
||
WORKDIR /app |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,143 @@ | ||
# DINet_optimized | ||
An optimized pipeline for DINet. | ||
# DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video (AAAI2023) | ||
![在这里插入图片描述](https://img-blog.csdnimg.cn/178c6b3ec0074af7a2dcc9ef26450e75.png) | ||
[Paper](https://fuxivirtualhuman.github.io/pdf/AAAI2023_FaceDubbing.pdf) [demo video](https://www.youtube.com/watch?v=UU344T-9h7M&t=6s) Supplementary materials | ||
|
||
|
||
# 🤔 How to achive this boost in inference latency? | ||
|
||
To achieve this, several changes were implemented: | ||
- Removed DeepSpeech and utilized wav2vec for instant feature extraction, leveraging the speed and power of torch. | ||
- Trained a lightweight model to map the wav2vec features to DeepSpeech, maintaining the existing process. | ||
- Enhanced frames extraction for improved speed. | ||
- These adjustments contribute to a reduction of up to 60% in inference latency compared to the original implementation, all while maintaining quality. | ||
|
||
Additionally, Docker has been introduced to facilitate faster, simpler, and more automated facial landmarks extraction. | ||
|
||
Tested on: | ||
- Ubuntu (18 and 20) | ||
- Python version >= 3.9 | ||
|
||
# 📖 Prerequisites | ||
To get started, follow these steps: | ||
|
||
- Download the resources (asserts.zip) from Google Drive. Unzip the file and place the directory in the current directory (./). Alternatively, you can run the command `bash ./download_asserts.zip` in your terminal. | ||
|
||
- For running inference or training, you have two options: | ||
|
||
## Option 1: Docker (most preferable) | ||
|
||
Build the Docker image using the following command. | ||
|
||
``` | ||
docker build -t dinet . | ||
``` | ||
|
||
|
||
## Option 2: Conda Environment | ||
|
||
Set up a Conda environment by executing the following commands. | ||
|
||
``` | ||
#Create the virtual environment | ||
conda create -n dinet python=3.9 | ||
#Activate the environment | ||
conda activate dinet | ||
#Install teh requirements | ||
pip install -r requirements.txt | ||
``` | ||
|
||
|
||
# 🚀 Inference | ||
|
||
## Run inference with example videos: | ||
|
||
``` | ||
docker run --rm --gpus 'device=0' -v $PWD:/mnt python3 inference.py --mouth_region_size=256 --source_video_path=./asserts/examples/testxxx.mp4 --source_openface_landmark_path=./asserts/examples/testxxx.csv --driving_audio_path=./asserts/examples/driving_audio_xxx.wav --pretrained_clip_DINet_path=./asserts/clip_training_DINet_256mouth.pth | ||
``` | ||
|
||
The results are saved in ./asserts/inference_result. | ||
|
||
## Run inference with custom videos. | ||
First, use the following Docker command to extract facial landmarks for your video. Replace `input_video` with the correct name of your video. The output file will be saved in the root directory. | ||
|
||
``` | ||
docker run --rm -v "$PWD:/mnt" -it algebr/openface -c "cp /mnt/input_video.mp4 /tmp/video.mp4 && build/bin/FeatureExtraction -f /tmp/video.mp4 -2Dfp -out_dir /tmp && cp /tmp/video.csv /mnt/input_video.csv" | ||
``` | ||
|
||
Run inference using the following command. | ||
|
||
``` | ||
docker run --rm --gpus 'device=0' -v $PWD:/mnt python3 inference.py --mouth_region_size=256 --source_video_path=<path_to_your_video>.mp4 --source_openface_landmark_path=<path_to_openface_output>.csv --driving_audio_path=<path_to_your_audio_file>.wav --pretrained_clip_DINet_path=./asserts/clip_training_DINet_256mouth.pth | ||
``` | ||
|
||
# Training | ||
First, you need to build the Docker image with the training dependencies as follows. | ||
|
||
``` | ||
docker build --target training -t dinet-training . | ||
``` | ||
This command will create the necessary Docker image for training with the specified dependencies. | ||
|
||
## Data Processing | ||
We release the code of video processing on [HDTF dataset](https://github.com/MRzzm/HDTF). You can also use this code to process custom videos. | ||
|
||
1. Downloading videos from [HDTF dataset](https://github.com/MRzzm/HDTF). Splitting videos according to xx_annotion_time.txt and **do not** crop&resize videos. | ||
2. Resampling all split videos into **25fps** and put videos into "./asserts/split_video_25fps". You can see the two example videos in "./asserts/split_video_25fps". We use [software](http://www.pcfreetime.com/formatfactory/cn/index.html) to resample videos. We provide the name list of training videos in our experiment. (pls see "./asserts/training_video_name.txt") | ||
3. Using [openface](https://github.com/TadasBaltrusaitis/OpenFace) to detect smooth facial landmarks of all videos. Putting all ".csv" results into "./asserts/split_video_25fps_landmark_openface". You can see the two example csv files in "./asserts/split_video_25fps_landmark_openface". | ||
|
||
4. Extracting frames from all videos and saving frames in "./asserts/split_video_25fps_frame". Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 data_processing.py --extract_video_frame --source_video_dir <PATH_TO_DATASET> | ||
``` | ||
5. Extracting audios from all videos and saving audios in "./asserts/split_video_25fps_audio". Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 data_processing.py --extract_audio --source_video_dir <PATH_TO_DATASET> | ||
``` | ||
6. Extracting deepspeech features from all audios and saving features in "./asserts/split_video_25fps_deepspeech". Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 data_processing.py --extract_deep_speech | ||
``` | ||
7. Cropping faces from all videos and saving images in "./asserts/split_video_25fps_crop_face". Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 data_processing.py --crop_face | ||
``` | ||
8. Generating training json file "./asserts/training_json.json". Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 data_processing.py --generate_training_json | ||
``` | ||
|
||
### Training models | ||
We split the training process into **frame training stage** and **clip training stage**. In frame training stage, we use coarse-to-fine strategy, **so you can train the model in arbitrary resolution**. | ||
|
||
#### Frame training stage. | ||
In frame training stage, we only use perception loss and GAN loss. | ||
|
||
1. Firstly, train the DINet in 104x80 (mouth region is 64x64) resolution. Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 train_DINet_frame.py --augment_num=32 --mouth_region_size=64 --batch_size=24 --result_path=./asserts/training_model_weight/frame_training_64 | ||
``` | ||
You can stop the training when the loss converges (we stop in about 270 epoch). | ||
|
||
2. Loading the pretrained model (face:104x80 & mouth:64x64) and train the DINet in higher resolution (face:208x160 & mouth:128x128). Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 train_DINet_frame.py --augment_num=100 --mouth_region_size=128 --batch_size=80 --coarse2fine --coarse_model_path=./asserts/training_model_weight/frame_training_64/xxxxxx.pth --result_path=./asserts/training_model_weight/frame_training_128 | ||
``` | ||
You can stop the training when the loss converges (we stop in about 200 epoch). | ||
|
||
3. Loading the pretrained model (face:208x160 & mouth:128x128) and train the DINet in higher resolution (face:416x320 & mouth:256x256). Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 train_DINet_frame.py --augment_num=20 --mouth_region_size=256 --batch_size=12 --coarse2fine --coarse_model_path=./asserts/training_model_weight/frame_training_128/xxxxxx.pth --result_path=./asserts/training_model_weight/frame_training_256 | ||
``` | ||
You can stop the training when the loss converges (we stop in about 200 epoch). Keep in mind that you may need to adjust the batch size to start the training. | ||
|
||
#### Clip training stage. | ||
In clip training stage, we use perception loss, frame/clip GAN loss and sync loss. Loading the pretrained frame model (face:416x320 & mouth:256x256), pretrained syncnet model (mouth:256x256) and train the DINet in clip setting. Run | ||
```python | ||
docker run --rm --gpus 'device=0' -it -v $PWD:/app dinet-training python3 train_DINet_clip.py --augment_num=3 --mouth_region_size=256 --batch_size=3 --pretrained_syncnet_path=./asserts/syncnet_256mouth.pth --pretrained_frame_DINet_path=./asserts/training_model_weight/frame_training_256/xxxxxx.pth --result_path=./asserts/training_model_weight/clip_training_256 | ||
``` | ||
You can stop the training when the loss converges and select the best model (our best model is at 160 epoch). | ||
|
||
## Acknowledge | ||
The AdaAT is borrowed from [AdaAT](https://github.com/MRzzm/AdaAT). The deepspeech feature is borrowed from [AD-NeRF](https://github.com/YudongGuo/AD-NeRF). The basic module is borrowed from [first-order](https://github.com/AliaksandrSiarohin/first-order-model). Thanks for their released code. |
Oops, something went wrong.