GitHub - JiaqiLi404/OpenModal: OpenModal is a foundational library for training multi-modal models based on PyTorch

Introduction

OpenModal is a foundational library for training multi-modal models based on PyTorch. Our target is to provide a centered platform for multi-modal tasks and models, to better standardize and fuse the models among different modal. The highlights are:

Modular Design: The library is designed in a modular way, which allows you to easily add and customize the components, just by registering them in the module registry.
Configuration: The library supports configuration files in yaml and json formats, which allows you to easily assemble different models and manage the hyperparameters during your experiments.
Flexibility: Due to the variety of multi-modal tasks, the library provides a flexible way to define the components and the flow of the project.
Standardized: We recommend a standardized way to define the formal object-oriented model inputs and outputs, which could help others to easily understand and utilize the model.

Supported Models:

MeloTTS TTS Model (config/tts.yaml).
OpenVoice Instant voice cloning by MIT and MyShell ( config/tts_voice_converter.yaml).
GPT-SoVITS A Powerful Few-shot Text-to-Speech Tool with WebUI( config/voice_clone_GPT_SoVITS.yaml).

Supporting Customizing Modules:

Flow > Model > Component > Block

flow: Here you could define the flow of the project, i.e. including data loading, preprocessing, augmentation, inferencing, etc.
model: Here you could define the model architecture, e.g. including how many encoders or decoders, etc.
component: Here you could define the model components, e.g. the detail of the encoders or decoders.
block: Here you could define the building blocks of the components, e.g. the customized layers, residual blocks, etc.
metric: Here you could define the evaluation metrics.
dataset: Here you could define the dataset loaders.
process: Here you could define the additional processes, e.g. screening, augmenting, etc.
view_object: Here you could define the formal object-oriented model for inputs and outputs, e.g. enums.
visualization: Here you could define the visualization functions.

Run

For model users, you only need look once at the configs and our registries (which stores all the supported modules on our platform), and customize your ideal configs. Then you could your config by running:

cd openmodal (The package will be installable in the future)
python openmodal/run.py --config config/xxx.yaml

Configs

The configuration files are stored in the configs directory, which support yaml and json formats.

Type

In a config, we support you to define the variables and the modules. When defining the models, you could use the type key to specify the model type, which should be registered in the module registry. For example, you could define a ResNet model by the following code:

num_classes: 10
model:
  type: "ResNet"
  num_classes: "{{num_classes}}"
  depth: 18

Variables

When you are using the variables, you must use the {{variable_name}} format, and confirm that the variable is defined in the config before using it. By defining the loading order of the config items, you could use the order key. Every item has a default order of 0. e.g.:

flow:
  type: "BaseFlow"
  model: "{{model}}"
model:
  order: -1
  type: "ResNet"
  num_classes: "{{num_classes}}"
  depth: 18

Basic Config

The usage of configuration is similar to and partially copied from the mmengine. Thus, you could structure your configurations by providing _base_, to inherit the base configuration; and providing _delete_: True" to overwrite the base configuration. e.g.:

_base_: "base.yaml"
model:
  _delete_: True
  order: -1
  type: "ResNet"
  num_classes: "{{num_classes}}"
  depth: 18

OpenModal will first find the config file in the same folder as the current config, then find the config file based on the project root, and finally find the config file regarding base as a whole path.

External Objects

When you need to import outer libraries or non-registered libraries, you should use the {library} format, e.g.:

model:
  type: "{torchvision.models.resnet}"
  num_classes: "{openmodal.view_object.ResNetEnum.num_classes}"
  depth: 18

Flow

A flow is necessary for the project, which defines the flow of the project, including data loading, preprocessing, augmentation, inferencing, etc. A correct configuration should define the necessary modules first, and include the flow item to control the data flow between the models. e.g.:

depth: 18
num_classes: 10
model:
  type: "ResNet"
  num_classes: "{{num_classes}}"
  depth: "{{depth}}"
image_loader:
  type: "ImageLoader"
  root: "data\image"
flow:
  type: "BaseFlow"
  model: "{{model}}"
  audio_loader:
    type: "AudioLoader"
    root: "data\audio"
  image_loader: "{{image_loader}}"

Guide for Specific Models

GPT-SoVITS: A Powerful Few-shot Text-to-Speech Tool with WebUI (config/voice_clone_GPT_SoVITS.yaml)

Symbols in GPT-SoVITS is different from ours and other common voice projects, so you need to export their symbols with their checkpoint, by adding the codes below. Or you can download our provided checkpoint at Onedrive.

dict_s2 = torch.load(sovits_path, map_location="cpu")
hps = dict_s2["config"]
hps['symbols']=symbols
# sovits_path_temp=sovits_path.replace(".pth","_symbols.pth")
# torch.save(dict_s2, f"{sovits_path_temp}")

The original pretrained weights of GPT-SoVITS are provided by the original authors at hugging face.

By running the following command, you could clone the voice by GPT-SoVITS:

python openmodal/run.py --config config/voice_clone_GPT_SoVITS.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
config		config
openmodal		openmodal
reference		reference
res		res
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Supported Models:

Supporting Customizing Modules:

Run

Configs

Type

Variables

Basic Config

External Objects

Flow

Guide for Specific Models

GPT-SoVITS: A Powerful Few-shot Text-to-Speech Tool with WebUI (config/voice_clone_GPT_SoVITS.yaml)

About

Releases

Packages

Languages

License

JiaqiLi404/OpenModal

Folders and files

Latest commit

History

Repository files navigation

Introduction

Supported Models:

Supporting Customizing Modules:

Run

Configs

Type

Variables

Basic Config

External Objects

Flow

Guide for Specific Models

GPT-SoVITS: A Powerful Few-shot Text-to-Speech Tool with WebUI (config/voice_clone_GPT_SoVITS.yaml)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages