GitHub - MarcoVitella/autovc: AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

This repository is a fork of AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss.

This repository provides a PyTorch implementation of AUTOVC, appropriately modified in order to make style transfer well also on languages other than English.

Dependencies

Python 3
jupyter
Numpy
PyTorch >= v0.4.1
wavenet_vocoder pip install wavenet_vocoder for more information, please refer to https://github.com/r9y9/wavenet_vocoder
librosa
soundfile
scipy
tqdm
matplotlib
wavio
spleeter for more information, please refer to https://github.com/deezer/spleeter
ffmpeg

Pre-trained models

AUTOVC	Speaker Encoder	WaveNet Vocoder
link	link	link

0.Convert Mel-Spectrograms

Download pre-trained AUTOVC model, and run the conversion.ipynb in the same directory.

The fast and high-quality hifi-gan v1 (https://github.com/jik876/hifi-gan) pre-trained model is now available here.

1.Mel-Spectrograms to waveform

Download pre-trained WaveNet Vocoder model, and run the vocoder.ipynb in the same the directory.

Please note the training metadata and testing metadata have different formats.

2.Train model

We have included a small set of training audio files in the wav folder. However, the data is very small and is for code verification purpose only. Please prepare your own dataset for training.

1.Generate spectrogram data from the wav files: python make_spect.py

2.Generate training metadata, including the GE2E speaker embedding (please use one-hot embeddings if you are not doing zero-shot conversion): python make_metadata.py

3.Run the main training script: python main.py

Converges when the reconstruction loss is around 0.0001.

Train with new vocoder

python3.8 make_spect.py # create folder spmel
python3.8 make_spect_other_vocoder.py # create the folder spmel_other
CUDA_VISIBLE_DEVICES="0" python3.8 make_metadata.py --root-dir="./spmel" # create the spmel/train.pkl # use speaker encoder on /spmel
cp spmel/train.pkl spmel_other # copy the spmel/train.pkl into spmel_other/train.pkl
CUDA_VISIBLE_DEVICES="0" python3.8 main.py --data_dir="spmel_other" \
    --outfile-path="/home/super/Models/autovc_simple/generator.pth" \
    --num_iters 10000 --batch_size=6 --dim_neck 32 --dim_emb 256 --dim_pre 512 --freq 32
CUDA_VISIBLE_DEVICES="0" python3.8 test_audio.py

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
italian_audio		italian_audio
wavs		wavs
AutoVCallinone-newVocoder.ipynb		AutoVCallinone-newVocoder.ipynb
AutoVCallinone-oldVocoder.ipynb		AutoVCallinone-oldVocoder.ipynb
LICENSE		LICENSE
MyTraining.ipynb		MyTraining.ipynb
README.md		README.md
SpleeterForRemoveNoise.ipynb		SpleeterForRemoveNoise.ipynb
conversion.ipynb		conversion.ipynb
data_loader.py		data_loader.py
hparams.py		hparams.py
main.py		main.py
make_metadata.py		make_metadata.py
make_spect.py		make_spect.py
make_spect_new_mel.py		make_spect_new_mel.py
make_spect_new_mel_for_shah.py		make_spect_new_mel_for_shah.py
metadata.pkl		metadata.pkl
model_bl.py		model_bl.py
model_vc.py		model_vc.py
results.pkl		results.pkl
solver_encoder.py		solver_encoder.py
synthesis.py		synthesis.py
vocoder.ipynb		vocoder.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Dependencies

Pre-trained models

0.Convert Mel-Spectrograms

1.Mel-Spectrograms to waveform

2.Train model

Train with new vocoder

About

Releases 1

Packages

Languages

License

MarcoVitella/autovc

Folders and files

Latest commit

History

Repository files navigation

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Dependencies

Pre-trained models

0.Convert Mel-Spectrograms

1.Mel-Spectrograms to waveform

2.Train model

Train with new vocoder

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages