Skip to content
This repository has been archived by the owner on Feb 17, 2022. It is now read-only.

Slow performance on a relatively small dataset (numpy arrays) #12

Open
straygar opened this issue Mar 20, 2018 · 6 comments
Open

Slow performance on a relatively small dataset (numpy arrays) #12

straygar opened this issue Mar 20, 2018 · 6 comments

Comments

@straygar
Copy link

When trying to apply 10 reverb presets to 30 2-second waveform samples (represented as numpy arrays), the 300 sox calls take an unreasonable amount of time (close to an hour).

Is there any way to make batch effect application faster, other than by calling AudioEffectsChain() on each numpy array separately? Ideally, it would be great if this could scale to significantly more data than this.

@straygar straygar changed the title Slow performance on a relatively small dataset Slow performance on a relatively small dataset (numpy arrays) Mar 20, 2018
@carlthome
Copy link
Owner

carlthome commented Mar 21, 2018

I'd love it if we could make this happen! Could you share what reverb settings you're using? sox is pretty fast for me so I have a hard time finding out why it's so terribly slow in your pipeline.

import multiprocessing

from librosa.util import example_audio_file               
from pysndfx.dsp import AudioEffectsChain

apply_audio_effects = AudioEffectsChain()\
    .highshelf()\
    .reverb()\
    .phaser()\
    .delay()\
    .lowshelf()

%%timeit
with multiprocessing.Pool() as pool:
    pool.map(apply_audio_effects, [example_audio_file()] * 300)

21.3 s +- 266 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)

If I got this right, this is roughly 5 hours of 44.1 kHz / 16-bit stereo audio processed in roughly 20 seconds. I don't think sox is smart enough to cache and reuse the processed audio when it's started in the multiprocessing pool but we should double check that.

Finally, sox is single threaded by default. I tried setting --multi-threaded and running the tests but that made it a lot slower (weirdly?).

@bernardohenz
Copy link

bernardohenz commented Apr 3, 2018

I am also having problems with speed. I am training a speech recognition (using pairs of [sound , text]). As your code works well in python (and particularly on numpy), I started using it as data augmentation. So, every time a sound is going to be fed to my network, I pass it through a function like this one:

def perform_aug(tsound,sr):
    aug_fx = AudioEffectsChain()
        
    if (random.random()<0.5):
        aug_fx.speed(random.uniform(0.9,1.1))
        
    if (random.random()<0.9):
        aug_fx.tempo(random.uniform(0.8,1.2))
        
    if (random.random()<0.9):
        aug_fx.pitch(random.uniform(-200,200))
        
    if (random.random()<0.2):
        aug_fx.highshelf()

    if (random.random()<0.2):
        aug_fx.lowshelf()
        
    if (random.random()<0.2):
        aug_fx.highpass(random.uniform(200,400))
        
    if (random.random()<0.2):
        aug_fx.lowpass(random.uniform(200,400))
        
    if (random.random()<0.5):
        aug_fx.reverb(random.uniform(10,50))
        
    out = aug_fx(tsound)
    return out,aug_fx

This works pretty well, except that the training runs like 10x slower. How do you think I could make it faster?

@carlthome
Copy link
Owner

The key for getting good performance is to data augment the next batch while training on the current batch.

You could do this with either multiprocessing.Queue or async/await. If you're working in TensorFlow, I'd look into tf.data:

from pysndfx import AudioEffectsChain
import tensorflow as tf


def load(x):
    f = lambda x, i: AudioEffectsChain().pitch(i[0])(x.decode())
    i = tf.random_uniform([1], -50.0, 50.0)
    x = tf.py_func(f, [x, i], tf.float32)
    return x


dataset = (tf.data.Dataset
    .list_files('*.mp3')
    .map(load, num_parallel_calls=batch_size)
    .padded_batch(batch_size, [None, None])
    .prefetch(1))

Without knowing more about your particular program it's hard to give good advice though!

@bernardohenz
Copy link

bernardohenz commented Apr 5, 2018

I am actually using DeepSpeech from Mozilla (https://github.com/mozilla/DeepSpeech), which is implemented over TensorFlow.

Right now, I am just performing data augmentation right before DeepSpeech extract the features from the audio, to be specific, on this line (https://github.com/mozilla/DeepSpeech/blob/master/util/audio.py#L67).

Considering how it is implemented (loading the audio when constructing the batch), I don't know if I am going to be able to apply your suggestion.

Ps: I do believe DeepSpeech makes use of parallelized units to account for loading while training.

@carlthome
Copy link
Owner

Yes, it looks like they're doing asynchronous prefetching up to the next minibatch. Have you tried raising this?

@bernardohenz
Copy link

I’ve tried with threads_per_queue=8, and it improved the speed performance for like 20~25%. Nonetheless, it is still much slower than running without augmentation.

I do not believe that the augs are so computationally expensive, thinking that they would be less expensive than accessing the disk for reading the audio file. Maybe the number generator, or the way SoX works is not optimal, don't know.

But thanks for the help, even with a little bit slower, the augmentations improved our model's accuracy 😄

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants