Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[recipe] fix errors in voxceleb/v1/Whisper-PMFA #357

Merged
merged 2 commits into from
Aug 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ pre-commit install # for clean and tidy code
```

## 🔥 News
* 2024.08.30: We support whisper_encoder based frontend and propose the [Whisper-PMFA](https://arxiv.org/pdf/2408.15585) framework, check [#356](https://github.com/wenet-e2e/wespeaker/pull/356).
* 2024.08.20: Update diarization recipe for VoxConverse dataset by leveraging umap dimensionality reduction and hdbscan clustering, see [#347](https://github.com/wenet-e2e/wespeaker/pull/347) and [#352](https://github.com/wenet-e2e/wespeaker/pull/352).
* 2024.08.18: Support using ssl pre-trained models as the frontend. The [WavLM recipe](https://github.com/wenet-e2e/wespeaker/blob/master/examples/voxceleb/v2/run_wavlm.sh) is also provided, see [#344](https://github.com/wenet-e2e/wespeaker/pull/344).
* 2024.05.15: Add support for [quality-aware score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320).
Expand Down
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,5 @@
| | √ | 6.63M | 1.88 |
| Whisper-PMFA | × | 478.7M | 1.62 |
| | √ | 478.7M | **1.42** |
| Whisper-PMFA with LoRa (Coming soon) | √ | 10.9M | 1.62 |
| Whisper-PMFA with LoRA (Coming soon) | √ | 10.9M | 1.62 |

Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### train configuraton

exp_dir: exp/test
exp_dir: exp/Whisper_PMFA_large_v2_voxceleb1_mel_5s
gpus: "[0,1]"
num_avg: 10
num_avg: 1
enable_amp: False # whether enable automatic mixed precision training

seed: 42
Expand Down Expand Up @@ -57,7 +57,7 @@ margin_update:
initial_margin: 0.2
final_margin: 0.2
increase_start_epoch: 0
fix_start_epoch: 30
fix_start_epoch: 4
update_margin: True
increase_type: "exp" # exp, linear

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### train configuraton

exp_dir: exp/test
exp_dir: exp/Whisper_PMFA_large_v2_voxceleb1_mel_5s
gpus: "[0,1]"
num_avg: 10
num_avg: 1
enable_amp: False # whether enable automatic mixed precision training

seed: 42
Expand Down Expand Up @@ -56,7 +56,7 @@ margin_update:
initial_margin: 0.2
final_margin: 0.2
increase_start_epoch: 0
fix_start_epoch: 30
fix_start_epoch: 8
update_margin: True
increase_type: "exp" # exp, linear

Expand Down
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/local/score.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
scores_dir=${exp_dir}/scores
for x in $trials; do
python wespeaker/bin/compute_metrics.py \
--p_target 0.01 \
--p_target 0.05 \
--c_fa 1 \
--c_miss 1 \
${scores_dir}/${x}.score \
Expand Down
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/local/score_norm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ if [ $stage -le 3 ] && [ $stop_stage -ge 3 ]; then
for x in ${trials}; do
scores_dir=${exp_dir}/scores
python wespeaker/bin/compute_metrics.py \
--p_target 0.01 \
--p_target 0.05 \
--c_fa 1 \
--c_miss 1 \
${scores_dir}/${output_name}_${x}.score \
Expand Down
17 changes: 9 additions & 8 deletions examples/voxceleb/v1/Whisper-PMFA/run.sh
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
#!/bin/bash

# Copyright 2022 Hongji Wang ([email protected])
# 2022 Chengdong Liang ([email protected])
# 2022 Zhengyang Chen ([email protected])
# Copyright 2024 Yiyang Zhao ([email protected])
# 2024 Hongji Wang ([email protected])

. ./path.sh || exit 1

stage=3
stop_stage=3
stage=-1
stop_stage=-1

data=data
data_type="raw" # shard/raw
model=whisper_PMFA_large_v2

exp_dir=exp/Whisper_PMFA_large_v2_voxceleb1_mel_5s

gpus="[0]"
num_avg=10
gpus="[0,1]"
num_avg=1
checkpoint=

trials="vox1_O_cleaned.kaldi"
Expand All @@ -25,6 +23,9 @@ score_norm_method="asnorm" # asnorm/snorm
top_n=300

. tools/parse_options.sh || exit 1
if ! pip show openai-whisper > /dev/null 2>&1; then
pip install openai-whisper==20231117
fi

if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
echo "Preparing datasets ..."
Expand Down
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/tools
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/wespeaker
2 changes: 0 additions & 2 deletions wespeaker/bin/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,6 @@ def train(config='conf/config.yaml', **kwargs):

# model: frontend (optional) => speaker model => projection layer
logger.info("<== Model ==>")
# frontend: fbank or s3prl
frontend_type = configs['dataset_args'].get('frontend', 'fbank')
if frontend_type != "fbank":
frontend_args = frontend_type + "_args"
Expand All @@ -119,7 +118,6 @@ def train(config='conf/config.yaml', **kwargs):
model.add_module("frontend", frontend)
else:
model = get_speaker_model(configs['model'])(**configs['model_args'])

if rank == 0:
num_params = sum(param.numel() for param in model.parameters())
logger.info('speaker_model size: {}'.format(num_params))
Expand Down
Loading