Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement missing speed functions along with durable speech rate / speed changer function. #4115

Closed
Closed
Changes from 7 commits
Commits
Show all changes
318 commits
Select commit Hold shift + click to select a range
adbcba0
refactor(dataset): get audio length with torchaudio
eginhard Mar 14, 2024
571f065
Merge pull request #21 from eginhard/audio-length
eginhard Mar 14, 2024
7630abb
refactor(bin.find_unique_chars): use existing function
eginhard Nov 18, 2023
d76d0ef
ci(tests.yml): run apt-get update before installing espeak
eginhard Mar 30, 2024
018daa0
Merge pull request #22 from eginhard/unique-chars
eginhard Mar 30, 2024
d772724
fix: update repository links, package names, metadata
eginhard Apr 3, 2024
7fe6a01
ci(pypi-release): update actions, use trusted publishing
eginhard Apr 3, 2024
dd3768d
chore: update version to v0.22.1
eginhard Apr 3, 2024
00f8d47
ci: switch back from uv to pip
eginhard Apr 2, 2024
a4ca02b
Merge pull request #24 from idiap/coqui-refs
eginhard Apr 3, 2024
b6ab85a
fix: use logging instead of print statements
eginhard Nov 13, 2023
b711e19
refactor: remove verbose arguments
eginhard Nov 18, 2023
9b2d48f
feat(utils.generic_utils): improve setup_logger() arguments and output
eginhard Apr 2, 2024
ab64844
feat(utils.generic_utils): add custom formatter for logging to console
eginhard Apr 2, 2024
7dc5d1e
fix: logging in executables
eginhard Nov 20, 2023
e689fd1
fix(utils.manage): remove bare except, improve messages
eginhard Mar 31, 2024
aa40fd2
docs: update links
eginhard Apr 4, 2024
107e22c
ci(workflows): update actions
eginhard Apr 4, 2024
31f1c8b
ci(workflows.docker): update image namespace
eginhard Apr 4, 2024
e626a29
Merge pull request #1 from idiap/update-docs
eginhard Apr 5, 2024
d416865
feat(xtts): support hindi for sentence-splitting and fine-tuning
eginhard Apr 8, 2024
dfbe016
Merge pull request #3 from idiap/logging
eginhard Apr 11, 2024
2ad790d
Merge pull request #4 from idiap/hindi
eginhard Apr 11, 2024
b3c9685
fix(tokenizer): add debug logging
eginhard Apr 11, 2024
794eecb
docs(README): update badges to new pypi package
eginhard Apr 11, 2024
f7d69cc
chore: update version to 0.23.0
eginhard Apr 11, 2024
5527f70
Merge pull request #5 from idiap/tokenizer-logging
eginhard Apr 18, 2024
52a52b5
fix(LanguageManager): allow initialisation from config with language …
eginhard Apr 19, 2024
8b1ed02
build: add python 3.12 support
eginhard Apr 22, 2024
f636fab
build: switch to forked trainer package
eginhard Apr 22, 2024
697d4ef
Revert "ci: switch back from uv to pip"
eginhard Apr 22, 2024
2675e74
chore: update version to 0.23.1
eginhard Apr 23, 2024
d65ba4a
Merge pull request #9 from idiap/fix-language-manager
eginhard Apr 23, 2024
0630450
Merge pull request #11 from idiap/py312
eginhard Apr 23, 2024
7b2289a
fix(espeak_wrapper): capture stderr separately
eginhard May 1, 2024
962f9bb
refactor(espeak_wrapper): fix ruff lint suggestions
eginhard May 1, 2024
98e21d0
test(losses): change assertEqual to assertAlmostEqual
eginhard May 1, 2024
63bfb9f
Merge pull request #17 from idiap/espeak-stderr
eginhard May 7, 2024
f4cacd7
build: move metadata from setup.py to pyproject.toml
eginhard May 7, 2024
259d8fc
build: store version in pyproject.toml
eginhard May 7, 2024
fb92e13
build: remove unused/obsolete code
eginhard May 7, 2024
8d2a562
build: move dependencies into pyproject.toml
eginhard May 7, 2024
5cf1d41
chore: enable commented pre-commit rules
eginhard May 7, 2024
ec50006
style: run pre-commit
eginhard May 7, 2024
0504ae3
ci: add script to automatically generate requirements.dev.txt
eginhard May 7, 2024
129b488
build: update pip and setuptools in dockerfile
eginhard May 7, 2024
e3fed5c
build: create separate makefile target for development install
eginhard May 8, 2024
20e82bc
build: update development dockerfile and test it in ci
eginhard May 8, 2024
4f2eff4
chore: enable ruff rules that already pass
eginhard May 6, 2024
ea893c3
fix: make bangla g2p deps optional
eginhard May 8, 2024
55ed162
fix: make chinese g2p deps optional
eginhard May 8, 2024
865a481
fix: make korean g2p deps optional
eginhard May 8, 2024
6d563af
chore: remove obsolete code for torch<2
eginhard May 8, 2024
59a6c9f
fix(bark): add missing argument for load_voice()
eginhard May 15, 2024
018f1e6
docs(bark): update docstrings and type hints
eginhard May 15, 2024
d73c9cc
Merge pull request #22 from idiap/bark
eginhard May 16, 2024
924f42e
ci: update release workflow
eginhard May 15, 2024
6023250
chore: update version to 0.24.0
eginhard May 16, 2024
70bd848
fix(server): ensure logging output gets actually shown
eginhard May 20, 2024
8503500
chore(server): remove duplicate code
eginhard May 20, 2024
ab7d84b
refactor(server): address linter issues
eginhard May 20, 2024
7bf9033
chore: update repo info
eginhard May 25, 2024
642cbd4
Merge pull request #26 from idiap/server-output
eginhard May 26, 2024
df088e9
Merge pull request #19 from idiap/toml
eginhard May 27, 2024
7df4c2f
fix: restore TTS.__version__ attribute
eginhard May 28, 2024
dc629f8
build: set upper version limit for transformers
eginhard May 28, 2024
df4a1f5
docs: update readme
eginhard May 28, 2024
203f60f
refactor(espeak_wrapper): remove sync argument
eginhard May 28, 2024
49fcbd9
fix(espeak_wrapper): avoid stuck process on windows
eginhard May 28, 2024
03430de
chore: bump version to 0.24.1
eginhard May 29, 2024
07cbcf8
fix(espeak_wrapper): read phonemize() input from file
eginhard May 29, 2024
c5f3d63
Merge pull request #34 from idiap/espeak
eginhard May 29, 2024
a682fa8
Merge pull request #33 from idiap/versions
eginhard May 29, 2024
77722cb
fix(bin.synthesize): correctly handle boolean arguments
eginhard May 30, 2024
29e91f2
fix(utils.generic_utils): correctly call now()
eginhard May 30, 2024
bdd44cf
docs: update readme
eginhard May 30, 2024
03de4b8
docs: fix readthedocs links
eginhard Jun 13, 2024
063e9e9
Merge pull request #38 from idiap/cli
eginhard Jun 14, 2024
e5c208d
feat(cleaners): add multilingual phoneme cleaner
eginhard Jun 14, 2024
a1495d4
fix(recipes): use multilingual phoneme cleaner in non-english recipes
eginhard Jun 14, 2024
9cfcc0a
chore(cleaners): add type hints
eginhard Jun 14, 2024
3a20f47
fix(freevc): use the specified device for pretrained speaker encoder …
ChristianRomberg Jun 16, 2024
4bc0e75
build: add numpy2 support
eginhard Jun 16, 2024
bd9b21d
Merge pull request #44 from idiap/phoneme-cleaners
eginhard Jun 17, 2024
81ac7ab
Merge pull request #47 from idiap/numpy2
eginhard Jun 17, 2024
4b6da4e
refactor(stream_generator): update special tokens for transformers>=4…
eginhard Jun 15, 2024
2a28123
refactor(stream_generator): update code for transformers>=4.41.1
eginhard Jun 15, 2024
4d9e18e
chore(stream_generator): address lint issues
eginhard Jun 15, 2024
98c0f86
Merge pull request #46 from idiap/fix-xtts-streaming
eginhard Jun 18, 2024
c9f7197
test(helpers): add test_ prefix so tests actually run
eginhard Jun 20, 2024
857cd55
test(helpers): fix test_rand_segment, test_generate_path
eginhard Jun 20, 2024
9f80e04
refactor(freevc): use existing layernorm
eginhard Jun 24, 2024
d65bcf6
chore(freevc): remove duplicate DDSConv and ElementwiseAffine
eginhard Jun 24, 2024
cd7b6da
fix: clarify types, fix missing functions
eginhard Jun 25, 2024
f8df19a
refactor: remove duplicate convert_pad_shape
eginhard Jun 20, 2024
a755328
refactor(freevc): remove duplicate sequence_mask
eginhard Jun 20, 2024
c5241d7
chore: address pytorch deprecations
eginhard Jun 25, 2024
c30fb0f
chore: remove duplicate init_weights
eginhard Jun 26, 2024
4bd3df2
refactor: remove duplicate get_padding
eginhard Jun 26, 2024
ff2cd5c
Merge pull request #49 from idiap/vc-refactors
eginhard Jun 26, 2024
59ef28d
build: move umap-learn into optional notebook dependencies
eginhard Jun 26, 2024
c693b08
build: update trainer to 0.1.4
eginhard Jun 27, 2024
28296c6
refactor: use get_git_branch from trainer
eginhard Jun 27, 2024
0fb26f9
refactor: use get_user_data_dir from trainer
eginhard Jun 27, 2024
da82d55
refactor: use load_fsspec from trainer
eginhard Jun 27, 2024
e869b9b
refactor: use load_checkpoint from trainer
eginhard Jun 27, 2024
2d06aeb
chore: remove unused TTS.utils.io module
eginhard Jun 27, 2024
808a938
build: specify minimum versions for dependencies
eginhard Jun 29, 2024
8cab2e3
ci: test lowest and highest compatible versions of dependencies
eginhard Jun 29, 2024
c1a929b
Merge pull request #51 from idiap/update-trainer
eginhard Jul 2, 2024
6ea3b75
Update xtts.py (#53)
abrahammathews2000 Jul 2, 2024
9192ef1
fix(xtts): load tokenizer file based on config as last resort
eginhard Jul 5, 2024
de35920
Merge pull request #50 from idiap/umap
eginhard Jul 25, 2024
20583a4
Merge pull request #57 from idiap/xtts-vocab
eginhard Jul 25, 2024
20bbb41
fix(xtts): update streaming for transformers>=4.42.0 (#59)
gravityrail Jul 25, 2024
8c460d0
fix(dataset): skip files where audio length can't be computed
eginhard Jul 31, 2024
9c604c1
chore(dataset): address lint issues
eginhard Jul 31, 2024
19fce2c
Merge pull request #66 from idiap/skip-broken-audio
eginhard Jul 31, 2024
d304ab2
build: update gruut version for numpy2 support
eginhard Jul 3, 2024
b1558b0
build: require numpy<2 because spacy/thinc lack support
eginhard Jul 3, 2024
7014782
build: add upper bound for transformers
eginhard Aug 5, 2024
204588f
Merge pull request #56 from idiap/update-gruut
eginhard Aug 5, 2024
233dfb5
docs(tacotron): fix wrong paper links (#74)
hykilpikonna Aug 25, 2024
1920328
feat(xtts): support hindi in tokenizer (#64)
eginhard Sep 12, 2024
17ca24c
fix: load weights only in torch.load
shavit Aug 31, 2024
86b58fb
fix: define torch safe globals for torch.load
eginhard Sep 12, 2024
659b485
chore(bark): remove manual download of hubert model
eginhard Sep 12, 2024
f5e2148
ci: explicitly upload hidden files for coverage
eginhard Sep 12, 2024
e5dd06b
Merge pull request #77 from shavit/71-torch-load
eginhard Sep 12, 2024
0a18418
build: allow numpy2, which should be supported in spacy 3.8 now (#81)
eginhard Sep 13, 2024
3e8125c
ci: switch to cibuildwheel
eginhard Sep 17, 2024
36611a7
feat: normalize unicode characters in text cleaners (#85)
shavit Oct 2, 2024
f75d095
fix(build): restrict spacy version to unbreak installation (#92)
KoljaB Oct 4, 2024
6c2e0be
chore: bump version to 0.24.2
eginhard Oct 4, 2024
073f8de
Merge pull request #95 from idiap/cibuildwheel
eginhard Oct 4, 2024
018d4ba
fix(xtts): support transformers>=4.43.0 in streaming inference
JohnnyStreet Oct 5, 2024
a510ec3
build(uv): add constraint on numba to avoid resolution error
eginhard Oct 20, 2024
ad435b5
build: again restrict to numpy<2
eginhard Oct 20, 2024
b66c782
Merge pull request #109 from idiap/transformers
eginhard Oct 21, 2024
964b813
fix(gpt): set attention mask and address other warnings
eginhard Oct 25, 2024
88de5c4
Merge pull request #114 from idiap/gpt-warnings
eginhard Oct 26, 2024
47ad0bf
fix(text.characters): add nasal diacritic (#127)
eginhard Nov 4, 2024
8e66be2
fix: only enable load with weights_only in pytorch>=2.4
eginhard Oct 25, 2024
ce5c492
ci: simplify ci by using uv where possible
eginhard Oct 20, 2024
f6a4d5e
chore: bump version to 0.24.3
eginhard Nov 4, 2024
6314032
Merge pull request #113 from idiap/pytorch
eginhard Nov 4, 2024
45b8b5b
build: set upper version limit for trainer (#130)
eginhard Nov 5, 2024
ef8158d
build: use group not extra for docs dependencies
eginhard Nov 6, 2024
020a724
ci(readthedocs): build docs with uv
eginhard Nov 6, 2024
59996ff
Merge pull request #133 from idiap/docs
eginhard Nov 6, 2024
0971bc2
refactor: use external package for monotonic alignment
eginhard Nov 6, 2024
9dd7ae6
build: switch to hatch
eginhard Nov 7, 2024
d30eba5
chore: remove obsolete code owners file
eginhard Nov 7, 2024
683ee66
ci: simplify release, cibuildwheel not needed anymore
eginhard Nov 7, 2024
e18f7da
Merge pull request #135 from idiap/mas
eginhard Nov 8, 2024
540e8d6
fix(bin.synthesize): return speakers names only (#147)
shavit Nov 9, 2024
2df9bfa
refactor: handle deprecation of torch.cuda.amp.autocast (#144)
eginhard Nov 9, 2024
21172ec
ci: update uv and move into composite action
eginhard Nov 10, 2024
993da77
chore: use original instead of scarf urls
eginhard Nov 10, 2024
5de47e9
ci: run integration tests only on lowest and highest python
eginhard Nov 10, 2024
d3c3ba3
build: set upper limit on transformers
eginhard Nov 10, 2024
b5bd995
Merge pull request #149 from idiap/cache-models
eginhard Nov 11, 2024
75d0825
fix(docker): add Support for building Docker on Mac/arm64 (#159)
hongkongkiwi Nov 14, 2024
e81f8d0
fix: more helpful error message when formatter is not found
eginhard Nov 16, 2024
627bbe4
fix(xtts): more helpful error message when vocab.json not found
eginhard Nov 16, 2024
48f5be2
feat(audio): automatically convert audio to mono
eginhard Nov 17, 2024
5784f67
refactor(audio): improve type hints, address lint issues
eginhard Nov 17, 2024
8ba3233
refactor(audio): remove duplicate save_wav code
eginhard Nov 17, 2024
fbbae5a
refactor(audio): remove duplicate rms_volume_norm function
eginhard Nov 17, 2024
312593e
Merge pull request #166 from idiap/error-messages
eginhard Nov 20, 2024
9035e36
ci: allow testing out trainer/coqpit branches before release (#168)
eginhard Nov 20, 2024
1b6d3eb
refactor(xtts): remove duplicate hifigan generator
eginhard Nov 20, 2024
1f27f99
refactor(utils): remove duplicate set_partial_state_dict
eginhard Nov 20, 2024
66701e1
refactor(xtts): reuse functions/classes from tortoise
eginhard Nov 21, 2024
4ba83f4
chore(tortoise): remove unused AudioMiniEncoder
eginhard Nov 21, 2024
705551c
refactor(tortoise): remove unused do_checkpoint arguments
eginhard Nov 21, 2024
5ffc054
refactor(bark): remove custom layer norm
eginhard Nov 21, 2024
490c973
refactor(xtts): use position embedding from tortoise
eginhard Nov 21, 2024
33ac0d6
refactor(xtts): use build_hf_gpt_transformer from tortoise
eginhard Nov 21, 2024
7cdfde2
refactor: move amp_to_db/db_to_amp into torch_transforms
eginhard Nov 22, 2024
6f25c2b
refactor(delightful_tts): remove unused classes
eginhard Nov 21, 2024
e63962c
refactor(losses): move shared losses into losses.py
eginhard Nov 21, 2024
2e5f68d
refactor(wavernn): remove duplicate Stretch2d
eginhard Nov 22, 2024
69a599d
refactor(freevc): remove duplicate code
eginhard Nov 22, 2024
6ecf473
refactor(xtts): use tortoise conditioning encoder
eginhard Nov 22, 2024
0f69d31
refactor(vocoder): remove duplicate function
eginhard Nov 22, 2024
fa844e0
refactor(tacotron): remove duplicate function
eginhard Nov 22, 2024
b45a7a4
refactor: move exists() and default() into generic_utils
eginhard Nov 22, 2024
54f4228
refactor(xtts): use existing cleaners
eginhard Nov 22, 2024
b1ac884
refactor: move shared function into dataset.py
eginhard Nov 22, 2024
2c82477
ci: merge integration tests back into unit tests
eginhard Nov 22, 2024
76df642
refactor: move more audio processing into torch_transforms
eginhard Nov 23, 2024
8bf288e
test: move test_helpers.py to fast unit tests
eginhard Nov 24, 2024
7330ad8
refactor: move duplicate alignment functions into helpers
eginhard Nov 24, 2024
170d3da
refactor: remove duplicate to_camel
eginhard Nov 24, 2024
63625e7
refactor: import get_last_checkpoint from trainer.io
eginhard Nov 27, 2024
98a372b
Merge pull request #172 from idiap/deduplicate
eginhard Dec 2, 2024
ce20253
fix(xtts): clearer error message when file given to checkpoint_dir
eginhard Dec 2, 2024
6de98ff
feat(openvoice): initial integration
ajk1402 Jun 13, 2024
4124b9d
feat(vits): add tau parameter to posterior encoder
eginhard Jun 25, 2024
b97d537
refactor(openvoice): remove duplicate and unused code
eginhard Jun 20, 2024
9599837
feat(openvoice): add config classes
eginhard Jun 20, 2024
ca02d03
feat(openvoice): add to .models.json
eginhard Nov 13, 2024
1a21853
ci: validate .models.json file
eginhard Nov 13, 2024
fce3137
feat: add openvoice vc model
eginhard Jun 25, 2024
d488441
test(freevc): remove unused code
eginhard Nov 13, 2024
6927e0b
fix(api): clearer error message when model doesn't support VC
eginhard Nov 29, 2024
546f43c
refactor: only use keyword args in Synthesizer
eginhard Nov 29, 2024
9ef2c7e
test(freevc): fix output length check
eginhard Dec 1, 2024
5f8ad4c
test(openvoice): add sanity check
eginhard Nov 29, 2024
32c99e8
docs(readme): mention openvoice vc
eginhard Jun 13, 2024
7d0416f
refactor(vc): rename TTS.vc.modules to TTS.vc.layers for consistency
eginhard Dec 1, 2024
3539e65
refactor(synthesizer): set sample rate in loading methods
eginhard Dec 2, 2024
834d41b
build: switch to forked coqpit
eginhard Oct 20, 2024
d4ffff4
chore: bump version to 0.25.0
eginhard Dec 3, 2024
9ae0b27
Merge pull request #183 from idiap/openvoice
eginhard Dec 3, 2024
48ad1da
Merge pull request #110 from idiap/uv
eginhard Dec 3, 2024
8241d55
fix(pypi-release): fix publishing workflow (#191)
eginhard Dec 4, 2024
fe14ca6
refactor(xtts): remove duplicate xtts audio config
eginhard Dec 5, 2024
8c381e3
docs: use .to("cuda") instead of deprecated gpu=True
eginhard Dec 3, 2024
5cfb4ec
refactor(api): require keyword arguments except for model_name
eginhard Dec 3, 2024
42ad9b0
feat(api): support specifying vocoders by name
eginhard Dec 4, 2024
5daed87
chore(bin.synthesize): remove unused argument
eginhard Dec 4, 2024
1a4e58d
feat(api): support passing a custom speaker encoder by path
eginhard Dec 4, 2024
e8d99aa
Merge pull request #184 from idiap/xtts-error
eginhard Dec 6, 2024
85dbb3b
feat(api): allow mixing TTS and vocoder model name and path
eginhard Dec 4, 2024
a05177c
chore(api): add type hints
eginhard Dec 4, 2024
89abd98
feat(api): support passing speaker/language id file paths
eginhard Dec 5, 2024
806af96
refactor(api): use save_wav() from Synthesizer instance
eginhard Dec 6, 2024
e0f6211
refactor(bin.synthesize): use Python API for CLI
eginhard Dec 2, 2024
b545ab8
Merge pull request #197 from idiap/api
eginhard Dec 6, 2024
c0d9ed3
fix: handle difference in xtts/tortoise attention (#199)
eginhard Dec 9, 2024
f329072
chore: bump version to 0.25.1 (#202)
eginhard Dec 9, 2024
236e490
build(docs): update dependencies, fix makefile
eginhard Dec 11, 2024
849e75e
docs: improve documentation
eginhard Dec 11, 2024
e23766d
docs: move project structure from readme into documentation
eginhard Dec 12, 2024
ae2f8d2
docs: use nested contents for easier overview
eginhard Dec 12, 2024
e38dcbe
docs: streamline readme and reuse content in other docs pages
eginhard Dec 12, 2024
cd52907
Merge pull request #207 from idiap/docs
eginhard Dec 12, 2024
a425ba5
feat: allow both Path and strings where possible and add type hints
eginhard Dec 13, 2024
0df04cc
docs: add notes about xtts fine-tuning
eginhard Dec 14, 2024
5165e71
Merge pull request #210 from idiap/manager
eginhard Dec 16, 2024
9d5fc60
feat(manager): print download location when listing models (#213)
eginhard Dec 16, 2024
1f9dda6
docs(xtts): show manual inference with default speakers
eginhard Dec 17, 2024
6a52c8a
fix(bin): log to stdout in cli tools, unless pipe_out is set
eginhard Dec 17, 2024
370fb1d
Merge pull request #217 from idiap/stdout
eginhard Dec 17, 2024
f89ce41
fix(xtts): voice_dir should remain None if not specified (#224)
eginhard Dec 19, 2024
98080e2
fix(xtts): use correct language code for Czech num2words call (#237)
SkaceKamen Dec 28, 2024
26128be
feat: add adjust_speech_rate function to modify speech speed with mor…
isikhi Dec 28, 2024
ed1563b
Merge branch 'dev' into fix-improvements/adjust-speech-rate-or-speed
isikhi Dec 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions TTS/api.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import logging
import tempfile
import warnings
from pathlib import Path
@@ -9,6 +10,8 @@
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer

logger = logging.getLogger(__name__)


class TTS(nn.Module):
"""TODO: Add voice conversion and Capacitron support."""
@@ -59,7 +62,7 @@ def __init__(
gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False.
"""
super().__init__()
self.manager = ModelManager(models_file=self.get_models_file_path(), progress_bar=progress_bar, verbose=False)
self.manager = ModelManager(models_file=self.get_models_file_path(), progress_bar=progress_bar)
self.config = load_config(config_path) if config_path else None
self.synthesizer = None
self.voice_converter = None
@@ -122,7 +125,7 @@ def get_models_file_path():

@staticmethod
def list_models():
return ModelManager(models_file=TTS.get_models_file_path(), progress_bar=False, verbose=False).list_models()
return ModelManager(models_file=TTS.get_models_file_path(), progress_bar=False).list_models()

def download_model_by_name(self, model_name: str):
model_path, config_path, model_item = self.manager.download_model(model_name)
4 changes: 4 additions & 0 deletions TTS/bin/compute_attention_masks.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import argparse
import importlib
import logging
import os
from argparse import RawTextHelpFormatter

@@ -13,9 +14,12 @@
from TTS.tts.models import setup_model
from TTS.tts.utils.text.characters import make_symbols, phonemes, symbols
from TTS.utils.audio import AudioProcessor
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger
from TTS.utils.io import load_checkpoint

if __name__ == "__main__":
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

# pylint: disable=bad-option-value
parser = argparse.ArgumentParser(
description="""Extract attention masks from trained Tacotron/Tacotron2 models.
4 changes: 4 additions & 0 deletions TTS/bin/compute_embeddings.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import argparse
import logging
import os
from argparse import RawTextHelpFormatter

@@ -10,6 +11,7 @@
from TTS.tts.datasets import load_tts_samples
from TTS.tts.utils.managers import save_file
from TTS.tts.utils.speakers import SpeakerManager
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger


def compute_embeddings(
@@ -100,6 +102,8 @@ def compute_embeddings(


if __name__ == "__main__":
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

parser = argparse.ArgumentParser(
description="""Compute embedding vectors for each audio file in a dataset and store them keyed by `{dataset_name}#{file_path}` in a .pth file\n\n"""
"""
4 changes: 4 additions & 0 deletions TTS/bin/compute_statistics.py
Original file line number Diff line number Diff line change
@@ -3,6 +3,7 @@

import argparse
import glob
import logging
import os

import numpy as np
@@ -12,10 +13,13 @@
from TTS.config import load_config
from TTS.tts.datasets import load_tts_samples
from TTS.utils.audio import AudioProcessor
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger


def main():
"""Run preprocessing process."""
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

parser = argparse.ArgumentParser(description="Compute mean and variance of spectrogtram features.")
parser.add_argument("config_path", type=str, help="TTS config file path to define audio processin parameters.")
parser.add_argument("out_path", type=str, help="save path (directory and filename).")
4 changes: 4 additions & 0 deletions TTS/bin/eval_encoder.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import argparse
import logging
from argparse import RawTextHelpFormatter

import torch
@@ -7,6 +8,7 @@
from TTS.config import load_config
from TTS.tts.datasets import load_tts_samples
from TTS.tts.utils.speakers import SpeakerManager
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger


def compute_encoder_accuracy(dataset_items, encoder_manager):
@@ -51,6 +53,8 @@ def compute_encoder_accuracy(dataset_items, encoder_manager):


if __name__ == "__main__":
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

parser = argparse.ArgumentParser(
description="""Compute the accuracy of the encoder.\n\n"""
"""
9 changes: 6 additions & 3 deletions TTS/bin/extract_tts_spectrograms.py
Original file line number Diff line number Diff line change
@@ -2,6 +2,7 @@
"""Extract Mel spectrograms with teacher forcing."""

import argparse
import logging
import os

import numpy as np
@@ -17,11 +18,12 @@
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor
from TTS.utils.audio.numpy_transforms import quantize
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger

use_cuda = torch.cuda.is_available()


def setup_loader(ap, r, verbose=False):
def setup_loader(ap, r):
tokenizer, _ = TTSTokenizer.init_from_config(c)
dataset = TTSDataset(
outputs_per_step=r,
@@ -37,7 +39,6 @@ def setup_loader(ap, r, verbose=False):
phoneme_cache_path=c.phoneme_cache_path,
precompute_num_workers=0,
use_noise_augment=False,
verbose=verbose,
speaker_id_mapping=speaker_manager.name_to_id if c.use_speaker_embedding else None,
d_vector_mapping=speaker_manager.embeddings if c.use_d_vector_file else None,
)
@@ -257,7 +258,7 @@ def main(args): # pylint: disable=redefined-outer-name
print("\n > Model has {} parameters".format(num_params), flush=True)
# set r
r = 1 if c.model.lower() == "glow_tts" else model.decoder.r
own_loader = setup_loader(ap, r, verbose=True)
own_loader = setup_loader(ap, r)

extract_spectrograms(
own_loader,
@@ -272,6 +273,8 @@ def main(args): # pylint: disable=redefined-outer-name


if __name__ == "__main__":
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

parser = argparse.ArgumentParser()
parser.add_argument("--config_path", type=str, help="Path to config file for training.", required=True)
parser.add_argument("--checkpoint_path", type=str, help="Model file to be restored.", required=True)
4 changes: 4 additions & 0 deletions TTS/bin/find_unique_chars.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
"""Find all the unique characters in a dataset"""

import argparse
import logging
from argparse import RawTextHelpFormatter

from TTS.config import load_config
from TTS.tts.datasets import find_unique_chars, load_tts_samples
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger


def main():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

# pylint: disable=bad-option-value
parser = argparse.ArgumentParser(
description="""Find all the unique characters or phonemes in a dataset.\n\n"""
4 changes: 4 additions & 0 deletions TTS/bin/find_unique_phonemes.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Find all the unique characters in a dataset"""

import argparse
import logging
import multiprocessing
from argparse import RawTextHelpFormatter

@@ -9,6 +10,7 @@
from TTS.config import load_config
from TTS.tts.datasets import load_tts_samples
from TTS.tts.utils.text.phonemizers import Gruut
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger


def compute_phonemes(item):
@@ -18,6 +20,8 @@ def compute_phonemes(item):


def main():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

# pylint: disable=W0601
global c, phonemizer
# pylint: disable=bad-option-value
4 changes: 4 additions & 0 deletions TTS/bin/remove_silence_using_vad.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
import argparse
import glob
import logging
import multiprocessing
import os
import pathlib

import torch
from tqdm import tqdm

from TTS.utils.generic_utils import ConsoleFormatter, setup_logger
from TTS.utils.vad import get_vad_model_and_utils, remove_silence

torch.set_num_threads(1)
@@ -75,6 +77,8 @@ def preprocess_audios():


if __name__ == "__main__":
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

parser = argparse.ArgumentParser(
description="python TTS/bin/remove_silence_using_vad.py -i=VCTK-Corpus/ -o=VCTK-Corpus-removed-silence/ -g=wav48_silence_trimmed/*/*_mic1.flac --trim_just_beginning_and_end True"
)
33 changes: 23 additions & 10 deletions TTS/bin/synthesize.py
Original file line number Diff line number Diff line change
@@ -3,12 +3,17 @@

import argparse
import contextlib
import logging
import sys
from argparse import RawTextHelpFormatter

# pylint: disable=redefined-outer-name, unused-argument
from pathlib import Path

from TTS.utils.generic_utils import ConsoleFormatter, setup_logger

logger = logging.getLogger(__name__)

description = """
Synthesize speech on command line.

@@ -142,6 +147,8 @@ def str2bool(v):


def main():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

parser = argparse.ArgumentParser(
description=description.replace(" ```\n", ""),
formatter_class=RawTextHelpFormatter,
@@ -435,31 +442,37 @@ def main():

# query speaker ids of a multi-speaker model.
if args.list_speaker_idxs:
print(
" > Available speaker ids: (Set --speaker_idx flag to one of these values to use the multi-speaker model."
if synthesizer.tts_model.speaker_manager is None:
logger.info("Model only has a single speaker.")
return
logger.info(
"Available speaker ids: (Set --speaker_idx flag to one of these values to use the multi-speaker model."
)
print(synthesizer.tts_model.speaker_manager.name_to_id)
logger.info(synthesizer.tts_model.speaker_manager.name_to_id)
return

# query langauge ids of a multi-lingual model.
if args.list_language_idxs:
print(
" > Available language ids: (Set --language_idx flag to one of these values to use the multi-lingual model."
if synthesizer.tts_model.language_manager is None:
logger.info("Monolingual model.")
return
logger.info(
"Available language ids: (Set --language_idx flag to one of these values to use the multi-lingual model."
)
print(synthesizer.tts_model.language_manager.name_to_id)
logger.info(synthesizer.tts_model.language_manager.name_to_id)
return

# check the arguments against a multi-speaker model.
if synthesizer.tts_speakers_file and (not args.speaker_idx and not args.speaker_wav):
print(
" [!] Looks like you use a multi-speaker model. Define `--speaker_idx` to "
logger.error(
"Looks like you use a multi-speaker model. Define `--speaker_idx` to "
"select the target speaker. You can list the available speakers for this model by `--list_speaker_idxs`."
)
return

# RUN THE SYNTHESIS
if args.text:
print(" > Text: {}".format(args.text))
logger.info("Text: %s", args.text)

# kick it
if tts_path is not None:
@@ -484,8 +497,8 @@ def main():
)

# save the results
print(" > Saving output to {}".format(args.out_path))
synthesizer.save_wav(wav, args.out_path, pipe_out=pipe_out)
logger.info("Saved output to %s", args.out_path)


if __name__ == "__main__":
11 changes: 7 additions & 4 deletions TTS/bin/train_encoder.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import logging
import os
import sys
import time
@@ -19,6 +20,7 @@
from TTS.encoder.utils.visual import plot_embeddings
from TTS.tts.datasets import load_tts_samples
from TTS.utils.audio import AudioProcessor
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger
from TTS.utils.samplers import PerfectBatchSampler
from TTS.utils.training import check_update

@@ -31,7 +33,7 @@
print(" > Number of GPUs: ", num_gpus)


def setup_loader(ap: AudioProcessor, is_val: bool = False, verbose: bool = False):
def setup_loader(ap: AudioProcessor, is_val: bool = False):
num_utter_per_class = c.num_utter_per_class if not is_val else c.eval_num_utter_per_class
num_classes_in_batch = c.num_classes_in_batch if not is_val else c.eval_num_classes_in_batch

@@ -42,7 +44,6 @@ def setup_loader(ap: AudioProcessor, is_val: bool = False, verbose: bool = False
voice_len=c.voice_len,
num_utter_per_class=num_utter_per_class,
num_classes_in_batch=num_classes_in_batch,
verbose=verbose,
augmentation_config=c.audio_augmentation if not is_val else None,
use_torch_spec=c.model_params.get("use_torch_spec", False),
)
@@ -278,9 +279,9 @@ def main(args): # pylint: disable=redefined-outer-name
# pylint: disable=redefined-outer-name
meta_data_train, meta_data_eval = load_tts_samples(c.datasets, eval_split=True)

train_data_loader, train_classes, map_classid_to_classname = setup_loader(ap, is_val=False, verbose=True)
train_data_loader, train_classes, map_classid_to_classname = setup_loader(ap, is_val=False)
if c.run_eval:
eval_data_loader, _, _ = setup_loader(ap, is_val=True, verbose=True)
eval_data_loader, _, _ = setup_loader(ap, is_val=True)
else:
eval_data_loader = None

@@ -316,6 +317,8 @@ def main(args): # pylint: disable=redefined-outer-name


if __name__ == "__main__":
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

args, c, OUT_PATH, AUDIO_PATH, c_logger, dashboard_logger = init_training()

try:
4 changes: 4 additions & 0 deletions TTS/bin/train_tts.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import logging
import os
from dataclasses import dataclass, field

@@ -6,6 +7,7 @@
from TTS.config import load_config, register_config
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models import setup_model
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger


@dataclass
@@ -15,6 +17,8 @@ class TrainTTSArgs(TrainerArgs):

def main():
"""Run `tts` model training directly by a `config.json` file."""
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

# init trainer args
train_args = TrainTTSArgs()
parser = train_args.init_argparse(arg_prefix="")
Loading