-
Notifications
You must be signed in to change notification settings - Fork 52
Slow performance on a relatively small dataset (numpy arrays) #12
Comments
I'd love it if we could make this happen! Could you share what reverb settings you're using? sox is pretty fast for me so I have a hard time finding out why it's so terribly slow in your pipeline. import multiprocessing
from librosa.util import example_audio_file
from pysndfx.dsp import AudioEffectsChain
apply_audio_effects = AudioEffectsChain()\
.highshelf()\
.reverb()\
.phaser()\
.delay()\
.lowshelf()
%%timeit
with multiprocessing.Pool() as pool:
pool.map(apply_audio_effects, [example_audio_file()] * 300)
21.3 s +- 266 ms per loop (mean +- std. dev. of 7 runs, 1 loop each) If I got this right, this is roughly 5 hours of 44.1 kHz / 16-bit stereo audio processed in roughly 20 seconds. I don't think Finally, |
I am also having problems with speed. I am training a speech recognition (using pairs of [sound , text]). As your code works well in python (and particularly on numpy), I started using it as data augmentation. So, every time a sound is going to be fed to my network, I pass it through a function like this one: def perform_aug(tsound,sr):
aug_fx = AudioEffectsChain()
if (random.random()<0.5):
aug_fx.speed(random.uniform(0.9,1.1))
if (random.random()<0.9):
aug_fx.tempo(random.uniform(0.8,1.2))
if (random.random()<0.9):
aug_fx.pitch(random.uniform(-200,200))
if (random.random()<0.2):
aug_fx.highshelf()
if (random.random()<0.2):
aug_fx.lowshelf()
if (random.random()<0.2):
aug_fx.highpass(random.uniform(200,400))
if (random.random()<0.2):
aug_fx.lowpass(random.uniform(200,400))
if (random.random()<0.5):
aug_fx.reverb(random.uniform(10,50))
out = aug_fx(tsound)
return out,aug_fx This works pretty well, except that the training runs like 10x slower. How do you think I could make it faster? |
The key for getting good performance is to data augment the next batch while training on the current batch. You could do this with either from pysndfx import AudioEffectsChain
import tensorflow as tf
def load(x):
f = lambda x, i: AudioEffectsChain().pitch(i[0])(x.decode())
i = tf.random_uniform([1], -50.0, 50.0)
x = tf.py_func(f, [x, i], tf.float32)
return x
dataset = (tf.data.Dataset
.list_files('*.mp3')
.map(load, num_parallel_calls=batch_size)
.padded_batch(batch_size, [None, None])
.prefetch(1)) Without knowing more about your particular program it's hard to give good advice though! |
I am actually using DeepSpeech from Mozilla (https://github.com/mozilla/DeepSpeech), which is implemented over TensorFlow. Right now, I am just performing data augmentation right before DeepSpeech extract the features from the audio, to be specific, on this line (https://github.com/mozilla/DeepSpeech/blob/master/util/audio.py#L67). Considering how it is implemented (loading the audio when constructing the batch), I don't know if I am going to be able to apply your suggestion. Ps: I do believe DeepSpeech makes use of parallelized units to account for loading while training. |
Yes, it looks like they're doing asynchronous prefetching up to the next minibatch. Have you tried raising this? |
I’ve tried with I do not believe that the augs are so computationally expensive, thinking that they would be less expensive than accessing the disk for reading the audio file. Maybe the number generator, or the way SoX works is not optimal, don't know. But thanks for the help, even with a little bit slower, the augmentations improved our model's accuracy 😄 |
When trying to apply 10 reverb presets to 30 2-second waveform samples (represented as numpy arrays), the 300 sox calls take an unreasonable amount of time (close to an hour).
Is there any way to make batch effect application faster, other than by calling
AudioEffectsChain()
on each numpy array separately? Ideally, it would be great if this could scale to significantly more data than this.The text was updated successfully, but these errors were encountered: