Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement missing speed functions along with durable speech rate / speed changer function. #4115

Closed
Closed
Changes from 1 commit
Commits
Show all changes
318 commits
Select commit Hold shift + click to select a range
adbcba0
refactor(dataset): get audio length with torchaudio
eginhard Mar 14, 2024
571f065
Merge pull request #21 from eginhard/audio-length
eginhard Mar 14, 2024
7630abb
refactor(bin.find_unique_chars): use existing function
eginhard Nov 18, 2023
d76d0ef
ci(tests.yml): run apt-get update before installing espeak
eginhard Mar 30, 2024
018daa0
Merge pull request #22 from eginhard/unique-chars
eginhard Mar 30, 2024
d772724
fix: update repository links, package names, metadata
eginhard Apr 3, 2024
7fe6a01
ci(pypi-release): update actions, use trusted publishing
eginhard Apr 3, 2024
dd3768d
chore: update version to v0.22.1
eginhard Apr 3, 2024
00f8d47
ci: switch back from uv to pip
eginhard Apr 2, 2024
a4ca02b
Merge pull request #24 from idiap/coqui-refs
eginhard Apr 3, 2024
b6ab85a
fix: use logging instead of print statements
eginhard Nov 13, 2023
b711e19
refactor: remove verbose arguments
eginhard Nov 18, 2023
9b2d48f
feat(utils.generic_utils): improve setup_logger() arguments and output
eginhard Apr 2, 2024
ab64844
feat(utils.generic_utils): add custom formatter for logging to console
eginhard Apr 2, 2024
7dc5d1e
fix: logging in executables
eginhard Nov 20, 2023
e689fd1
fix(utils.manage): remove bare except, improve messages
eginhard Mar 31, 2024
aa40fd2
docs: update links
eginhard Apr 4, 2024
107e22c
ci(workflows): update actions
eginhard Apr 4, 2024
31f1c8b
ci(workflows.docker): update image namespace
eginhard Apr 4, 2024
e626a29
Merge pull request #1 from idiap/update-docs
eginhard Apr 5, 2024
d416865
feat(xtts): support hindi for sentence-splitting and fine-tuning
eginhard Apr 8, 2024
dfbe016
Merge pull request #3 from idiap/logging
eginhard Apr 11, 2024
2ad790d
Merge pull request #4 from idiap/hindi
eginhard Apr 11, 2024
b3c9685
fix(tokenizer): add debug logging
eginhard Apr 11, 2024
794eecb
docs(README): update badges to new pypi package
eginhard Apr 11, 2024
f7d69cc
chore: update version to 0.23.0
eginhard Apr 11, 2024
5527f70
Merge pull request #5 from idiap/tokenizer-logging
eginhard Apr 18, 2024
52a52b5
fix(LanguageManager): allow initialisation from config with language …
eginhard Apr 19, 2024
8b1ed02
build: add python 3.12 support
eginhard Apr 22, 2024
f636fab
build: switch to forked trainer package
eginhard Apr 22, 2024
697d4ef
Revert "ci: switch back from uv to pip"
eginhard Apr 22, 2024
2675e74
chore: update version to 0.23.1
eginhard Apr 23, 2024
d65ba4a
Merge pull request #9 from idiap/fix-language-manager
eginhard Apr 23, 2024
0630450
Merge pull request #11 from idiap/py312
eginhard Apr 23, 2024
7b2289a
fix(espeak_wrapper): capture stderr separately
eginhard May 1, 2024
962f9bb
refactor(espeak_wrapper): fix ruff lint suggestions
eginhard May 1, 2024
98e21d0
test(losses): change assertEqual to assertAlmostEqual
eginhard May 1, 2024
63bfb9f
Merge pull request #17 from idiap/espeak-stderr
eginhard May 7, 2024
f4cacd7
build: move metadata from setup.py to pyproject.toml
eginhard May 7, 2024
259d8fc
build: store version in pyproject.toml
eginhard May 7, 2024
fb92e13
build: remove unused/obsolete code
eginhard May 7, 2024
8d2a562
build: move dependencies into pyproject.toml
eginhard May 7, 2024
5cf1d41
chore: enable commented pre-commit rules
eginhard May 7, 2024
ec50006
style: run pre-commit
eginhard May 7, 2024
0504ae3
ci: add script to automatically generate requirements.dev.txt
eginhard May 7, 2024
129b488
build: update pip and setuptools in dockerfile
eginhard May 7, 2024
e3fed5c
build: create separate makefile target for development install
eginhard May 8, 2024
20e82bc
build: update development dockerfile and test it in ci
eginhard May 8, 2024
4f2eff4
chore: enable ruff rules that already pass
eginhard May 6, 2024
ea893c3
fix: make bangla g2p deps optional
eginhard May 8, 2024
55ed162
fix: make chinese g2p deps optional
eginhard May 8, 2024
865a481
fix: make korean g2p deps optional
eginhard May 8, 2024
6d563af
chore: remove obsolete code for torch<2
eginhard May 8, 2024
59a6c9f
fix(bark): add missing argument for load_voice()
eginhard May 15, 2024
018f1e6
docs(bark): update docstrings and type hints
eginhard May 15, 2024
d73c9cc
Merge pull request #22 from idiap/bark
eginhard May 16, 2024
924f42e
ci: update release workflow
eginhard May 15, 2024
6023250
chore: update version to 0.24.0
eginhard May 16, 2024
70bd848
fix(server): ensure logging output gets actually shown
eginhard May 20, 2024
8503500
chore(server): remove duplicate code
eginhard May 20, 2024
ab7d84b
refactor(server): address linter issues
eginhard May 20, 2024
7bf9033
chore: update repo info
eginhard May 25, 2024
642cbd4
Merge pull request #26 from idiap/server-output
eginhard May 26, 2024
df088e9
Merge pull request #19 from idiap/toml
eginhard May 27, 2024
7df4c2f
fix: restore TTS.__version__ attribute
eginhard May 28, 2024
dc629f8
build: set upper version limit for transformers
eginhard May 28, 2024
df4a1f5
docs: update readme
eginhard May 28, 2024
203f60f
refactor(espeak_wrapper): remove sync argument
eginhard May 28, 2024
49fcbd9
fix(espeak_wrapper): avoid stuck process on windows
eginhard May 28, 2024
03430de
chore: bump version to 0.24.1
eginhard May 29, 2024
07cbcf8
fix(espeak_wrapper): read phonemize() input from file
eginhard May 29, 2024
c5f3d63
Merge pull request #34 from idiap/espeak
eginhard May 29, 2024
a682fa8
Merge pull request #33 from idiap/versions
eginhard May 29, 2024
77722cb
fix(bin.synthesize): correctly handle boolean arguments
eginhard May 30, 2024
29e91f2
fix(utils.generic_utils): correctly call now()
eginhard May 30, 2024
bdd44cf
docs: update readme
eginhard May 30, 2024
03de4b8
docs: fix readthedocs links
eginhard Jun 13, 2024
063e9e9
Merge pull request #38 from idiap/cli
eginhard Jun 14, 2024
e5c208d
feat(cleaners): add multilingual phoneme cleaner
eginhard Jun 14, 2024
a1495d4
fix(recipes): use multilingual phoneme cleaner in non-english recipes
eginhard Jun 14, 2024
9cfcc0a
chore(cleaners): add type hints
eginhard Jun 14, 2024
3a20f47
fix(freevc): use the specified device for pretrained speaker encoder …
ChristianRomberg Jun 16, 2024
4bc0e75
build: add numpy2 support
eginhard Jun 16, 2024
bd9b21d
Merge pull request #44 from idiap/phoneme-cleaners
eginhard Jun 17, 2024
81ac7ab
Merge pull request #47 from idiap/numpy2
eginhard Jun 17, 2024
4b6da4e
refactor(stream_generator): update special tokens for transformers>=4…
eginhard Jun 15, 2024
2a28123
refactor(stream_generator): update code for transformers>=4.41.1
eginhard Jun 15, 2024
4d9e18e
chore(stream_generator): address lint issues
eginhard Jun 15, 2024
98c0f86
Merge pull request #46 from idiap/fix-xtts-streaming
eginhard Jun 18, 2024
c9f7197
test(helpers): add test_ prefix so tests actually run
eginhard Jun 20, 2024
857cd55
test(helpers): fix test_rand_segment, test_generate_path
eginhard Jun 20, 2024
9f80e04
refactor(freevc): use existing layernorm
eginhard Jun 24, 2024
d65bcf6
chore(freevc): remove duplicate DDSConv and ElementwiseAffine
eginhard Jun 24, 2024
cd7b6da
fix: clarify types, fix missing functions
eginhard Jun 25, 2024
f8df19a
refactor: remove duplicate convert_pad_shape
eginhard Jun 20, 2024
a755328
refactor(freevc): remove duplicate sequence_mask
eginhard Jun 20, 2024
c5241d7
chore: address pytorch deprecations
eginhard Jun 25, 2024
c30fb0f
chore: remove duplicate init_weights
eginhard Jun 26, 2024
4bd3df2
refactor: remove duplicate get_padding
eginhard Jun 26, 2024
ff2cd5c
Merge pull request #49 from idiap/vc-refactors
eginhard Jun 26, 2024
59ef28d
build: move umap-learn into optional notebook dependencies
eginhard Jun 26, 2024
c693b08
build: update trainer to 0.1.4
eginhard Jun 27, 2024
28296c6
refactor: use get_git_branch from trainer
eginhard Jun 27, 2024
0fb26f9
refactor: use get_user_data_dir from trainer
eginhard Jun 27, 2024
da82d55
refactor: use load_fsspec from trainer
eginhard Jun 27, 2024
e869b9b
refactor: use load_checkpoint from trainer
eginhard Jun 27, 2024
2d06aeb
chore: remove unused TTS.utils.io module
eginhard Jun 27, 2024
808a938
build: specify minimum versions for dependencies
eginhard Jun 29, 2024
8cab2e3
ci: test lowest and highest compatible versions of dependencies
eginhard Jun 29, 2024
c1a929b
Merge pull request #51 from idiap/update-trainer
eginhard Jul 2, 2024
6ea3b75
Update xtts.py (#53)
abrahammathews2000 Jul 2, 2024
9192ef1
fix(xtts): load tokenizer file based on config as last resort
eginhard Jul 5, 2024
de35920
Merge pull request #50 from idiap/umap
eginhard Jul 25, 2024
20583a4
Merge pull request #57 from idiap/xtts-vocab
eginhard Jul 25, 2024
20bbb41
fix(xtts): update streaming for transformers>=4.42.0 (#59)
gravityrail Jul 25, 2024
8c460d0
fix(dataset): skip files where audio length can't be computed
eginhard Jul 31, 2024
9c604c1
chore(dataset): address lint issues
eginhard Jul 31, 2024
19fce2c
Merge pull request #66 from idiap/skip-broken-audio
eginhard Jul 31, 2024
d304ab2
build: update gruut version for numpy2 support
eginhard Jul 3, 2024
b1558b0
build: require numpy<2 because spacy/thinc lack support
eginhard Jul 3, 2024
7014782
build: add upper bound for transformers
eginhard Aug 5, 2024
204588f
Merge pull request #56 from idiap/update-gruut
eginhard Aug 5, 2024
233dfb5
docs(tacotron): fix wrong paper links (#74)
hykilpikonna Aug 25, 2024
1920328
feat(xtts): support hindi in tokenizer (#64)
eginhard Sep 12, 2024
17ca24c
fix: load weights only in torch.load
shavit Aug 31, 2024
86b58fb
fix: define torch safe globals for torch.load
eginhard Sep 12, 2024
659b485
chore(bark): remove manual download of hubert model
eginhard Sep 12, 2024
f5e2148
ci: explicitly upload hidden files for coverage
eginhard Sep 12, 2024
e5dd06b
Merge pull request #77 from shavit/71-torch-load
eginhard Sep 12, 2024
0a18418
build: allow numpy2, which should be supported in spacy 3.8 now (#81)
eginhard Sep 13, 2024
3e8125c
ci: switch to cibuildwheel
eginhard Sep 17, 2024
36611a7
feat: normalize unicode characters in text cleaners (#85)
shavit Oct 2, 2024
f75d095
fix(build): restrict spacy version to unbreak installation (#92)
KoljaB Oct 4, 2024
6c2e0be
chore: bump version to 0.24.2
eginhard Oct 4, 2024
073f8de
Merge pull request #95 from idiap/cibuildwheel
eginhard Oct 4, 2024
018d4ba
fix(xtts): support transformers>=4.43.0 in streaming inference
JohnnyStreet Oct 5, 2024
a510ec3
build(uv): add constraint on numba to avoid resolution error
eginhard Oct 20, 2024
ad435b5
build: again restrict to numpy<2
eginhard Oct 20, 2024
b66c782
Merge pull request #109 from idiap/transformers
eginhard Oct 21, 2024
964b813
fix(gpt): set attention mask and address other warnings
eginhard Oct 25, 2024
88de5c4
Merge pull request #114 from idiap/gpt-warnings
eginhard Oct 26, 2024
47ad0bf
fix(text.characters): add nasal diacritic (#127)
eginhard Nov 4, 2024
8e66be2
fix: only enable load with weights_only in pytorch>=2.4
eginhard Oct 25, 2024
ce5c492
ci: simplify ci by using uv where possible
eginhard Oct 20, 2024
f6a4d5e
chore: bump version to 0.24.3
eginhard Nov 4, 2024
6314032
Merge pull request #113 from idiap/pytorch
eginhard Nov 4, 2024
45b8b5b
build: set upper version limit for trainer (#130)
eginhard Nov 5, 2024
ef8158d
build: use group not extra for docs dependencies
eginhard Nov 6, 2024
020a724
ci(readthedocs): build docs with uv
eginhard Nov 6, 2024
59996ff
Merge pull request #133 from idiap/docs
eginhard Nov 6, 2024
0971bc2
refactor: use external package for monotonic alignment
eginhard Nov 6, 2024
9dd7ae6
build: switch to hatch
eginhard Nov 7, 2024
d30eba5
chore: remove obsolete code owners file
eginhard Nov 7, 2024
683ee66
ci: simplify release, cibuildwheel not needed anymore
eginhard Nov 7, 2024
e18f7da
Merge pull request #135 from idiap/mas
eginhard Nov 8, 2024
540e8d6
fix(bin.synthesize): return speakers names only (#147)
shavit Nov 9, 2024
2df9bfa
refactor: handle deprecation of torch.cuda.amp.autocast (#144)
eginhard Nov 9, 2024
21172ec
ci: update uv and move into composite action
eginhard Nov 10, 2024
993da77
chore: use original instead of scarf urls
eginhard Nov 10, 2024
5de47e9
ci: run integration tests only on lowest and highest python
eginhard Nov 10, 2024
d3c3ba3
build: set upper limit on transformers
eginhard Nov 10, 2024
b5bd995
Merge pull request #149 from idiap/cache-models
eginhard Nov 11, 2024
75d0825
fix(docker): add Support for building Docker on Mac/arm64 (#159)
hongkongkiwi Nov 14, 2024
e81f8d0
fix: more helpful error message when formatter is not found
eginhard Nov 16, 2024
627bbe4
fix(xtts): more helpful error message when vocab.json not found
eginhard Nov 16, 2024
48f5be2
feat(audio): automatically convert audio to mono
eginhard Nov 17, 2024
5784f67
refactor(audio): improve type hints, address lint issues
eginhard Nov 17, 2024
8ba3233
refactor(audio): remove duplicate save_wav code
eginhard Nov 17, 2024
fbbae5a
refactor(audio): remove duplicate rms_volume_norm function
eginhard Nov 17, 2024
312593e
Merge pull request #166 from idiap/error-messages
eginhard Nov 20, 2024
9035e36
ci: allow testing out trainer/coqpit branches before release (#168)
eginhard Nov 20, 2024
1b6d3eb
refactor(xtts): remove duplicate hifigan generator
eginhard Nov 20, 2024
1f27f99
refactor(utils): remove duplicate set_partial_state_dict
eginhard Nov 20, 2024
66701e1
refactor(xtts): reuse functions/classes from tortoise
eginhard Nov 21, 2024
4ba83f4
chore(tortoise): remove unused AudioMiniEncoder
eginhard Nov 21, 2024
705551c
refactor(tortoise): remove unused do_checkpoint arguments
eginhard Nov 21, 2024
5ffc054
refactor(bark): remove custom layer norm
eginhard Nov 21, 2024
490c973
refactor(xtts): use position embedding from tortoise
eginhard Nov 21, 2024
33ac0d6
refactor(xtts): use build_hf_gpt_transformer from tortoise
eginhard Nov 21, 2024
7cdfde2
refactor: move amp_to_db/db_to_amp into torch_transforms
eginhard Nov 22, 2024
6f25c2b
refactor(delightful_tts): remove unused classes
eginhard Nov 21, 2024
e63962c
refactor(losses): move shared losses into losses.py
eginhard Nov 21, 2024
2e5f68d
refactor(wavernn): remove duplicate Stretch2d
eginhard Nov 22, 2024
69a599d
refactor(freevc): remove duplicate code
eginhard Nov 22, 2024
6ecf473
refactor(xtts): use tortoise conditioning encoder
eginhard Nov 22, 2024
0f69d31
refactor(vocoder): remove duplicate function
eginhard Nov 22, 2024
fa844e0
refactor(tacotron): remove duplicate function
eginhard Nov 22, 2024
b45a7a4
refactor: move exists() and default() into generic_utils
eginhard Nov 22, 2024
54f4228
refactor(xtts): use existing cleaners
eginhard Nov 22, 2024
b1ac884
refactor: move shared function into dataset.py
eginhard Nov 22, 2024
2c82477
ci: merge integration tests back into unit tests
eginhard Nov 22, 2024
76df642
refactor: move more audio processing into torch_transforms
eginhard Nov 23, 2024
8bf288e
test: move test_helpers.py to fast unit tests
eginhard Nov 24, 2024
7330ad8
refactor: move duplicate alignment functions into helpers
eginhard Nov 24, 2024
170d3da
refactor: remove duplicate to_camel
eginhard Nov 24, 2024
63625e7
refactor: import get_last_checkpoint from trainer.io
eginhard Nov 27, 2024
98a372b
Merge pull request #172 from idiap/deduplicate
eginhard Dec 2, 2024
ce20253
fix(xtts): clearer error message when file given to checkpoint_dir
eginhard Dec 2, 2024
6de98ff
feat(openvoice): initial integration
ajk1402 Jun 13, 2024
4124b9d
feat(vits): add tau parameter to posterior encoder
eginhard Jun 25, 2024
b97d537
refactor(openvoice): remove duplicate and unused code
eginhard Jun 20, 2024
9599837
feat(openvoice): add config classes
eginhard Jun 20, 2024
ca02d03
feat(openvoice): add to .models.json
eginhard Nov 13, 2024
1a21853
ci: validate .models.json file
eginhard Nov 13, 2024
fce3137
feat: add openvoice vc model
eginhard Jun 25, 2024
d488441
test(freevc): remove unused code
eginhard Nov 13, 2024
6927e0b
fix(api): clearer error message when model doesn't support VC
eginhard Nov 29, 2024
546f43c
refactor: only use keyword args in Synthesizer
eginhard Nov 29, 2024
9ef2c7e
test(freevc): fix output length check
eginhard Dec 1, 2024
5f8ad4c
test(openvoice): add sanity check
eginhard Nov 29, 2024
32c99e8
docs(readme): mention openvoice vc
eginhard Jun 13, 2024
7d0416f
refactor(vc): rename TTS.vc.modules to TTS.vc.layers for consistency
eginhard Dec 1, 2024
3539e65
refactor(synthesizer): set sample rate in loading methods
eginhard Dec 2, 2024
834d41b
build: switch to forked coqpit
eginhard Oct 20, 2024
d4ffff4
chore: bump version to 0.25.0
eginhard Dec 3, 2024
9ae0b27
Merge pull request #183 from idiap/openvoice
eginhard Dec 3, 2024
48ad1da
Merge pull request #110 from idiap/uv
eginhard Dec 3, 2024
8241d55
fix(pypi-release): fix publishing workflow (#191)
eginhard Dec 4, 2024
fe14ca6
refactor(xtts): remove duplicate xtts audio config
eginhard Dec 5, 2024
8c381e3
docs: use .to("cuda") instead of deprecated gpu=True
eginhard Dec 3, 2024
5cfb4ec
refactor(api): require keyword arguments except for model_name
eginhard Dec 3, 2024
42ad9b0
feat(api): support specifying vocoders by name
eginhard Dec 4, 2024
5daed87
chore(bin.synthesize): remove unused argument
eginhard Dec 4, 2024
1a4e58d
feat(api): support passing a custom speaker encoder by path
eginhard Dec 4, 2024
e8d99aa
Merge pull request #184 from idiap/xtts-error
eginhard Dec 6, 2024
85dbb3b
feat(api): allow mixing TTS and vocoder model name and path
eginhard Dec 4, 2024
a05177c
chore(api): add type hints
eginhard Dec 4, 2024
89abd98
feat(api): support passing speaker/language id file paths
eginhard Dec 5, 2024
806af96
refactor(api): use save_wav() from Synthesizer instance
eginhard Dec 6, 2024
e0f6211
refactor(bin.synthesize): use Python API for CLI
eginhard Dec 2, 2024
b545ab8
Merge pull request #197 from idiap/api
eginhard Dec 6, 2024
c0d9ed3
fix: handle difference in xtts/tortoise attention (#199)
eginhard Dec 9, 2024
f329072
chore: bump version to 0.25.1 (#202)
eginhard Dec 9, 2024
236e490
build(docs): update dependencies, fix makefile
eginhard Dec 11, 2024
849e75e
docs: improve documentation
eginhard Dec 11, 2024
e23766d
docs: move project structure from readme into documentation
eginhard Dec 12, 2024
ae2f8d2
docs: use nested contents for easier overview
eginhard Dec 12, 2024
e38dcbe
docs: streamline readme and reuse content in other docs pages
eginhard Dec 12, 2024
cd52907
Merge pull request #207 from idiap/docs
eginhard Dec 12, 2024
a425ba5
feat: allow both Path and strings where possible and add type hints
eginhard Dec 13, 2024
0df04cc
docs: add notes about xtts fine-tuning
eginhard Dec 14, 2024
5165e71
Merge pull request #210 from idiap/manager
eginhard Dec 16, 2024
9d5fc60
feat(manager): print download location when listing models (#213)
eginhard Dec 16, 2024
1f9dda6
docs(xtts): show manual inference with default speakers
eginhard Dec 17, 2024
6a52c8a
fix(bin): log to stdout in cli tools, unless pipe_out is set
eginhard Dec 17, 2024
370fb1d
Merge pull request #217 from idiap/stdout
eginhard Dec 17, 2024
f89ce41
fix(xtts): voice_dir should remain None if not specified (#224)
eginhard Dec 19, 2024
98080e2
fix(xtts): use correct language code for Czech num2words call (#237)
SkaceKamen Dec 28, 2024
26128be
feat: add adjust_speech_rate function to modify speech speed with mor…
isikhi Dec 28, 2024
ed1563b
Merge branch 'dev' into fix-improvements/adjust-speech-rate-or-speed
isikhi Dec 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: use logging instead of print statements
Fixes #1691
  • Loading branch information
eginhard committed Apr 3, 2024
commit b6ab85a05028a54c268e102e2d3ce3701efaa16e
3 changes: 3 additions & 0 deletions TTS/api.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import logging
import tempfile
import warnings
from pathlib import Path
@@ -9,6 +10,8 @@
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer

logger = logging.getLogger(__name__)


class TTS(nn.Module):
"""TODO: Add voice conversion and Capacitron support."""
15 changes: 9 additions & 6 deletions TTS/encoder/dataset.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
import logging
import random

import torch
from torch.utils.data import Dataset

from TTS.encoder.utils.generic_utils import AugmentWAV

logger = logging.getLogger(__name__)


class EncoderDataset(Dataset):
def __init__(
@@ -51,12 +54,12 @@ def __init__(
self.gaussian_augmentation_config = augmentation_config["gaussian"]

if self.verbose:
print("\n > DataLoader initialization")
print(f" | > Classes per Batch: {num_classes_in_batch}")
print(f" | > Number of instances : {len(self.items)}")
print(f" | > Sequence length: {self.seq_len}")
print(f" | > Num Classes: {len(self.classes)}")
print(f" | > Classes: {self.classes}")
logger.info("DataLoader initialization")
logger.info(" | Classes per batch: %d", num_classes_in_batch)
logger.info(" | Number of instances: %d", len(self.items))
logger.info(" | Sequence length: %d", self.seq_len)
logger.info(" | Number of classes: %d", len(self.classes))
logger.info(" | Classes: %d", self.classes)

def load_wav(self, filename):
audio = self.ap.load_wav(filename, sr=self.ap.sample_rate)
12 changes: 8 additions & 4 deletions TTS/encoder/losses.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
import logging

import torch
import torch.nn.functional as F
from torch import nn

logger = logging.getLogger(__name__)


# adapted from https://github.com/cvqluu/GE2E-Loss
class GE2ELoss(nn.Module):
@@ -23,7 +27,7 @@ def __init__(self, init_w=10.0, init_b=-5.0, loss_method="softmax"):
self.b = nn.Parameter(torch.tensor(init_b))
self.loss_method = loss_method

print(" > Initialized Generalized End-to-End loss")
logger.info("Initialized Generalized End-to-End loss")

assert self.loss_method in ["softmax", "contrast"]

@@ -139,7 +143,7 @@ def __init__(self, init_w=10.0, init_b=-5.0):
self.b = nn.Parameter(torch.tensor(init_b))
self.criterion = torch.nn.CrossEntropyLoss()

print(" > Initialized Angular Prototypical loss")
logger.info("Initialized Angular Prototypical loss")

def forward(self, x, _label=None):
"""
@@ -177,7 +181,7 @@ def __init__(self, embedding_dim, n_speakers):
self.criterion = torch.nn.CrossEntropyLoss()
self.fc = nn.Linear(embedding_dim, n_speakers)

print("Initialised Softmax Loss")
logger.info("Initialised Softmax Loss")

def forward(self, x, label=None):
# reshape for compatibility
@@ -212,7 +216,7 @@ def __init__(self, embedding_dim, n_speakers, init_w=10.0, init_b=-5.0):
self.softmax = SoftmaxLoss(embedding_dim, n_speakers)
self.angleproto = AngleProtoLoss(init_w, init_b)

print("Initialised SoftmaxAnglePrototypical Loss")
logger.info("Initialised SoftmaxAnglePrototypical Loss")

def forward(self, x, label=None):
"""
10 changes: 7 additions & 3 deletions TTS/encoder/models/base_encoder.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import logging

import numpy as np
import torch
import torchaudio
@@ -8,6 +10,8 @@
from TTS.utils.generic_utils import set_init_dict
from TTS.utils.io import load_fsspec

logger = logging.getLogger(__name__)


class PreEmphasis(nn.Module):
def __init__(self, coefficient=0.97):
@@ -118,13 +122,13 @@ def load_checkpoint(
state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
try:
self.load_state_dict(state["model"])
print(" > Model fully restored. ")
logger.info("Model fully restored. ")
except (KeyError, RuntimeError) as error:
# If eval raise the error
if eval:
raise error

print(" > Partial model initialization.")
logger.info("Partial model initialization.")
model_dict = self.state_dict()
model_dict = set_init_dict(model_dict, state["model"], c)
self.load_state_dict(model_dict)
@@ -135,7 +139,7 @@ def load_checkpoint(
try:
criterion.load_state_dict(state["criterion"])
except (KeyError, RuntimeError) as error:
print(" > Criterion load ignored because of:", error)
logger.exception("Criterion load ignored because of: %s", error)

# instance and load the criterion for the encoder classifier in inference time
if (
11 changes: 8 additions & 3 deletions TTS/encoder/utils/generic_utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import glob
import logging
import os
import random

@@ -8,6 +9,8 @@
from TTS.encoder.models.lstm import LSTMSpeakerEncoder
from TTS.encoder.models.resnet import ResNetSpeakerEncoder

logger = logging.getLogger(__name__)


class AugmentWAV(object):
def __init__(self, ap, augmentation_config):
@@ -38,8 +41,10 @@ def __init__(self, ap, augmentation_config):
self.noise_list[noise_dir] = []
self.noise_list[noise_dir].append(wav_file)

print(
f" | > Using Additive Noise Augmentation: with {len(additive_files)} audios instances from {self.additive_noise_types}"
logger.info(
"Using Additive Noise Augmentation: with %d audios instances from %s",
len(additive_files),
self.additive_noise_types,
)

self.use_rir = False
@@ -50,7 +55,7 @@ def __init__(self, ap, augmentation_config):
self.rir_files = glob.glob(os.path.join(self.rir_config["rir_path"], "**/*.wav"), recursive=True)
self.use_rir = True

print(f" | > Using RIR Noise Augmentation: with {len(self.rir_files)} audios instances")
logger.info("Using RIR Noise Augmentation: with %d audios instances", len(self.rir_files))

self.create_augmentation_global_list()

30 changes: 16 additions & 14 deletions TTS/encoder/utils/prepare_voxceleb.py
Original file line number Diff line number Diff line change
@@ -21,13 +21,15 @@

import csv
import hashlib
import logging
import os
import subprocess
import sys
import zipfile

import soundfile as sf
from absl import logging

logger = logging.getLogger(__name__)

SUBSETS = {
"vox1_dev_wav": [
@@ -77,14 +79,14 @@ def download_and_extract(directory, subset, urls):
zip_filepath = os.path.join(directory, url.split("/")[-1])
if os.path.exists(zip_filepath):
continue
logging.info("Downloading %s to %s" % (url, zip_filepath))
logger.info("Downloading %s to %s" % (url, zip_filepath))
subprocess.call(
"wget %s --user %s --password %s -O %s" % (url, USER["user"], USER["password"], zip_filepath),
shell=True,
)

statinfo = os.stat(zip_filepath)
logging.info("Successfully downloaded %s, size(bytes): %d" % (url, statinfo.st_size))
logger.info("Successfully downloaded %s, size(bytes): %d" % (url, statinfo.st_size))

# concatenate all parts into zip files
if ".zip" not in zip_filepath:
@@ -118,9 +120,9 @@ def exec_cmd(cmd):
try:
retcode = subprocess.call(cmd, shell=True)
if retcode < 0:
logging.info(f"Child was terminated by signal {retcode}")
logger.info(f"Child was terminated by signal {retcode}")
except OSError as e:
logging.info(f"Execution failed: {e}")
logger.info(f"Execution failed: {e}")
retcode = -999
return retcode

@@ -134,11 +136,11 @@ def decode_aac_with_ffmpeg(aac_file, wav_file):
bool, True if success.
"""
cmd = f"ffmpeg -i {aac_file} {wav_file}"
logging.info(f"Decoding aac file using command line: {cmd}")
logger.info(f"Decoding aac file using command line: {cmd}")
ret = exec_cmd(cmd)
if ret != 0:
logging.error(f"Failed to decode aac file with retcode {ret}")
logging.error("Please check your ffmpeg installation.")
logger.error(f"Failed to decode aac file with retcode {ret}")
logger.error("Please check your ffmpeg installation.")
return False
return True

@@ -152,7 +154,7 @@ def convert_audio_and_make_label(input_dir, subset, output_dir, output_file):
output_file: the name of the newly generated csv file. e.g. vox1_dev_wav.csv
"""

logging.info("Preprocessing audio and label for subset %s" % subset)
logger.info("Preprocessing audio and label for subset %s" % subset)
source_dir = os.path.join(input_dir, subset)

files = []
@@ -190,7 +192,7 @@ def convert_audio_and_make_label(input_dir, subset, output_dir, output_file):
writer.writerow(["wav_filename", "wav_length_ms", "speaker_id", "speaker_name"])
for wav_file in files:
writer.writerow(wav_file)
logging.info("Successfully generated csv file {}".format(csv_file_path))
logger.info("Successfully generated csv file {}".format(csv_file_path))


def processor(directory, subset, force_process):
@@ -203,16 +205,16 @@ def processor(directory, subset, force_process):
if not force_process and os.path.exists(subset_csv):
return subset_csv

logging.info("Downloading and process the voxceleb in %s", directory)
logging.info("Preparing subset %s", subset)
logger.info("Downloading and process the voxceleb in %s", directory)
logger.info("Preparing subset %s", subset)
download_and_extract(directory, subset, urls[subset])
convert_audio_and_make_label(directory, subset, directory, subset + ".csv")
logging.info("Finished downloading and processing")
logger.info("Finished downloading and processing")
return subset_csv


if __name__ == "__main__":
logging.set_verbosity(logging.INFO)
logging.getLogger("TTS").setLevel(logging.INFO)
if len(sys.argv) != 4:
print("Usage: python prepare_data.py save_directory user password")
sys.exit()
11 changes: 7 additions & 4 deletions TTS/server/server.py
Original file line number Diff line number Diff line change
@@ -2,6 +2,7 @@
import argparse
import io
import json
import logging
import os
import sys
from pathlib import Path
@@ -18,6 +19,8 @@
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer

logger = logging.getLogger(__name__)


def create_argparser():
def convert_boolean(x):
@@ -200,9 +203,9 @@ def tts():
style_wav = request.headers.get("style-wav") or request.values.get("style_wav", "")
style_wav = style_wav_uri_to_dict(style_wav)

print(f" > Model input: {text}")
print(f" > Speaker Idx: {speaker_idx}")
print(f" > Language Idx: {language_idx}")
logger.info("Model input: %s", text)
logger.info("Speaker idx: %s", speaker_idx)
logger.info("Language idx: %s", language_idx)
wavs = synthesizer.tts(text, speaker_name=speaker_idx, language_name=language_idx, style_wav=style_wav)
out = io.BytesIO()
synthesizer.save_wav(wavs, out)
@@ -246,7 +249,7 @@ def mary_tts_api_process():
text = data.get("INPUT_TEXT", [""])[0]
else:
text = request.args.get("INPUT_TEXT", "")
print(f" > Model input: {text}")
logger.info("Model input: %s", text)
wavs = synthesizer.tts(text)
out = io.BytesIO()
synthesizer.save_wav(wavs, out)
16 changes: 9 additions & 7 deletions TTS/tts/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import logging
import os
import sys
from collections import Counter
@@ -9,6 +10,8 @@
from TTS.tts.datasets.dataset import *
from TTS.tts.datasets.formatters import *

logger = logging.getLogger(__name__)


def split_dataset(items, eval_split_max_size=None, eval_split_size=0.01):
"""Split a dataset into train and eval. Consider speaker distribution in multi-speaker training.
@@ -122,7 +125,7 @@ def load_tts_samples(

meta_data_train = add_extra_keys(meta_data_train, language, dataset_name)

print(f" | > Found {len(meta_data_train)} files in {Path(root_path).resolve()}")
logger.info("Found %d files in %s", len(meta_data_train), Path(root_path).resolve())
# load evaluation split if set
if eval_split:
if meta_file_val:
@@ -166,16 +169,15 @@ def _get_formatter_by_name(name):
return getattr(thismodule, name.lower())


def find_unique_chars(data_samples, verbose=True):
def find_unique_chars(data_samples):
texts = "".join(item["text"] for item in data_samples)
chars = set(texts)
lower_chars = filter(lambda c: c.islower(), chars)
chars_force_lower = [c.lower() for c in chars]
chars_force_lower = set(chars_force_lower)

if verbose:
print(f" > Number of unique characters: {len(chars)}")
print(f" > Unique characters: {''.join(sorted(chars))}")
print(f" > Unique lower characters: {''.join(sorted(lower_chars))}")
print(f" > Unique all forced to lower characters: {''.join(sorted(chars_force_lower))}")
logger.info("Number of unique characters: %d", len(chars))
logger.info("Unique characters: %s", "".join(sorted(chars)))
logger.info("Unique lower characters: %s", "".join(sorted(lower_chars)))
logger.info("Unique all forced to lower characters: %s", "".join(sorted(chars_force_lower)))
return chars_force_lower
Loading