This is a Python re-implementation of the spectral clustering algorithm in the paper Speaker Diarization with LSTM.
This is not the original implementation used by the paper.
Specifically, in this implementation, we use the K-Means from scikit-learn, which does NOT support customized distance measure like cosine distance.
- numpy
- scipy
- scikit-learn
Install the package by:
pip3 install spectralcluster
or
python3 -m pip install spectralcluster
Simply use the predict()
method of class SpectralClusterer
to perform
spectral clustering:
from spectralcluster import SpectralClusterer
clusterer = SpectralClusterer(
min_clusters=2,
max_clusters=100,
p_percentile=0.95,
gaussian_blur_sigma=1)
labels = clusterer.predict(X)
The input X
is a numpy array of shape (n_samples, n_features)
,
and the returned labels
is a numpy array of shape (n_samples,)
.
For the complete list of parameters of the clusterer, see
spectralcluster/spectral_clusterer.py
.
Our paper is cited as:
@inproceedings{wang2018speaker,
title={Speaker diarization with lstm},
author={Wang, Quan and Downey, Carlton and Wan, Li and Mansfield, Philip Andrew and Moreno, Ignacio Lopz},
booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={5239--5243},
year={2018},
organization={IEEE}
}
Our new speaker diarization systems are now fully supervised, powered by uis-rnn. Check this Google AI Blog.
To learn more about speaker diarization, here is a curated list of resources: awesome-diarization.