GitHub

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

Official implementation of TalkNCE. [Paper]

TalkNCE loss

This repository contains LoCoNet model trained with TalkNCE loss.

TalkNCE loss is implemented in Loconet() class of loconet.py

Pretrained checkpoint can be downloaded here. This checkpoint is a LoCoNet model trained with TalkNCE loss on the training set of AVA dataset. It yields mAP 95.5% on the validation set of AVA dataset.

Dependencies

Start from building the environment

conda create -n {env_name} python=3.7.9
conda activate {env_name}
pip install -r requirements.txt

Data preparation

We follow TalkNet's data preparation script to download and prepare the AVA dataset.

python train.py --dataPathAVA {AVADataPath} --download

AVADataPath is the folder you want to save the AVA dataset and its preprocessing outputs, the details can be found in here . Please read them carefully.

After AVA dataset is downloaded, please change the DATA.dataPathAVA entry in the config file (configs/multi.yaml).

If you have already downloaded AVA-ActiveSpeaker dataset, download csv files only by running this command:

gdown 1h8DISV9sYHGi2CsDI7PXK4kXmS2_kjEg
unzip talknce_csv.zip
rm talknce_csv.zip

Training script

python -W ignore::UserWarning train.py --cfg configs/multi.yaml OUTPUT_DIR {output directory}

Test script

python -W ignore::UserWarning test_multicard.py --cfg configs/test.yaml  RESUME_PATH {pretrained ckpt path}

Citation

If you find this code useful, please consider citing the paper below:

@inproceedings{jung2024talknce,
  title={TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning},
  author={Jung, Chaeyoung and Lee, Suyeon and Nam, Kihyun and Rho, Kyeongha and Kim, You Jin and Jang, Youngjoon and Chung, Joon Son},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2024}
}

References

Our baseline code is based on the following repositories:

TalkNet
LoCoNet.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
dlhammer		dlhammer
metrics		metrics
model		model
scripts		scripts
torchvggish		torchvggish
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
builder.py		builder.py
dataLoader_multiperson.py		dataLoader_multiperson.py
environment.yml		environment.yml
loconet.py		loconet.py
loss_multi.py		loss_multi.py
requirements.txt		requirements.txt
test_multicard.py		test_multicard.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

TalkNCE loss

Dependencies

Data preparation

Training script

Test script

Citation

References

About

Releases

Packages

Languages

License

kaistmm/TalkNCE

Folders and files

Latest commit

History

Repository files navigation

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

TalkNCE loss

Dependencies

Data preparation

Training script

Test script

Citation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages