Slow performance on a relatively small dataset (numpy arrays) #12

straygar · 2018-03-20T11:52:02Z

When trying to apply 10 reverb presets to 30 2-second waveform samples (represented as numpy arrays), the 300 sox calls take an unreasonable amount of time (close to an hour).

Is there any way to make batch effect application faster, other than by calling AudioEffectsChain() on each numpy array separately? Ideally, it would be great if this could scale to significantly more data than this.

The text was updated successfully, but these errors were encountered:

carlthome · 2018-03-21T16:51:43Z

I'd love it if we could make this happen! Could you share what reverb settings you're using? sox is pretty fast for me so I have a hard time finding out why it's so terribly slow in your pipeline.

import multiprocessing

from librosa.util import example_audio_file               
from pysndfx.dsp import AudioEffectsChain

apply_audio_effects = AudioEffectsChain()\
    .highshelf()\
    .reverb()\
    .phaser()\
    .delay()\
    .lowshelf()

%%timeit
with multiprocessing.Pool() as pool:
    pool.map(apply_audio_effects, [example_audio_file()] * 300)

21.3 s +- 266 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)

If I got this right, this is roughly 5 hours of 44.1 kHz / 16-bit stereo audio processed in roughly 20 seconds. I don't think sox is smart enough to cache and reuse the processed audio when it's started in the multiprocessing pool but we should double check that.

Finally, sox is single threaded by default. I tried setting --multi-threaded and running the tests but that made it a lot slower (weirdly?).

bernardohenz · 2018-04-03T19:48:07Z

I am also having problems with speed. I am training a speech recognition (using pairs of [sound , text]). As your code works well in python (and particularly on numpy), I started using it as data augmentation. So, every time a sound is going to be fed to my network, I pass it through a function like this one:

def perform_aug(tsound,sr):
    aug_fx = AudioEffectsChain()
        
    if (random.random()<0.5):
        aug_fx.speed(random.uniform(0.9,1.1))
        
    if (random.random()<0.9):
        aug_fx.tempo(random.uniform(0.8,1.2))
        
    if (random.random()<0.9):
        aug_fx.pitch(random.uniform(-200,200))
        
    if (random.random()<0.2):
        aug_fx.highshelf()

    if (random.random()<0.2):
        aug_fx.lowshelf()
        
    if (random.random()<0.2):
        aug_fx.highpass(random.uniform(200,400))
        
    if (random.random()<0.2):
        aug_fx.lowpass(random.uniform(200,400))
        
    if (random.random()<0.5):
        aug_fx.reverb(random.uniform(10,50))
        
    out = aug_fx(tsound)
    return out,aug_fx

This works pretty well, except that the training runs like 10x slower. How do you think I could make it faster?

carlthome · 2018-04-05T13:09:05Z

The key for getting good performance is to data augment the next batch while training on the current batch.

You could do this with either multiprocessing.Queue or async/await. If you're working in TensorFlow, I'd look into tf.data:

from pysndfx import AudioEffectsChain
import tensorflow as tf


def load(x):
    f = lambda x, i: AudioEffectsChain().pitch(i[0])(x.decode())
    i = tf.random_uniform([1], -50.0, 50.0)
    x = tf.py_func(f, [x, i], tf.float32)
    return x


dataset = (tf.data.Dataset
    .list_files('*.mp3')
    .map(load, num_parallel_calls=batch_size)
    .padded_batch(batch_size, [None, None])
    .prefetch(1))

Without knowing more about your particular program it's hard to give good advice though!

bernardohenz · 2018-04-05T14:04:47Z

I am actually using DeepSpeech from Mozilla (https://github.com/mozilla/DeepSpeech), which is implemented over TensorFlow.

Right now, I am just performing data augmentation right before DeepSpeech extract the features from the audio, to be specific, on this line (https://github.com/mozilla/DeepSpeech/blob/master/util/audio.py#L67).

Considering how it is implemented (loading the audio when constructing the batch), I don't know if I am going to be able to apply your suggestion.

Ps: I do believe DeepSpeech makes use of parallelized units to account for loading while training.

carlthome · 2018-04-05T14:30:37Z

Yes, it looks like they're doing asynchronous prefetching up to the next minibatch. Have you tried raising this?

bernardohenz · 2018-04-06T13:54:20Z

I’ve tried with threads_per_queue=8, and it improved the speed performance for like 20~25%. Nonetheless, it is still much slower than running without augmentation.

I do not believe that the augs are so computationally expensive, thinking that they would be less expensive than accessing the disk for reading the audio file. Maybe the number generator, or the way SoX works is not optimal, don't know.

But thanks for the help, even with a little bit slower, the augmentations improved our model's accuracy 😄

straygar changed the title ~~Slow performance on a relatively small dataset~~ Slow performance on a relatively small dataset (numpy arrays) Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performance on a relatively small dataset (numpy arrays) #12

Slow performance on a relatively small dataset (numpy arrays) #12

straygar commented Mar 20, 2018

carlthome commented Mar 21, 2018 •

edited

Loading

bernardohenz commented Apr 3, 2018 •

edited

Loading

carlthome commented Apr 5, 2018

bernardohenz commented Apr 5, 2018 •

edited

Loading

carlthome commented Apr 5, 2018

bernardohenz commented Apr 6, 2018

Slow performance on a relatively small dataset (numpy arrays) #12

Slow performance on a relatively small dataset (numpy arrays) #12

Comments

straygar commented Mar 20, 2018

carlthome commented Mar 21, 2018 • edited Loading

bernardohenz commented Apr 3, 2018 • edited Loading

carlthome commented Apr 5, 2018

bernardohenz commented Apr 5, 2018 • edited Loading

carlthome commented Apr 5, 2018

bernardohenz commented Apr 6, 2018

carlthome commented Mar 21, 2018 •

edited

Loading

bernardohenz commented Apr 3, 2018 •

edited

Loading

bernardohenz commented Apr 5, 2018 •

edited

Loading