Determining time indexes for embeddings? #64

chrisspen · 2019-10-16T21:16:18Z

In dvector_create.py, an audio file is converted to a sequence of dvectors. However, the time index of each dvector is lost, so if a classification is performed using that dvector, you can't really do much with it since you don't know where in the original file that classification applies to. How would determine the time index for each dvector returned by align_embeddings()?

The text was updated successfully, but these errors were encountered:

lagidigu · 2019-10-28T17:00:40Z

I had a similar problem and I fixed it by changing the method concat_segs(times, segs) to the following:

def concat_segs(times, segs):
    #Concatenate continuous voiced segments
    concat_seg = []
    seg_concat = segs[0]

    continuous_times = []
    start = times[0][0]
    end = times[0][1]

    for i in range(0, len(times)-1):
        if times[i][1] == times[i+1][0]:
            seg_concat = np.concatenate((seg_concat, segs[i+1]))
            end = times[i+1][1]
        else:
            concat_seg.append(seg_concat)
            seg_concat = segs[i+1]

            continuous_times.append((start, end))
            start = times[i+1][0]
            end = times[i+1][1]

    else:
        concat_seg.append(seg_concat)
        continuous_times.append((start, end))

    return concat_seg, continuous_times

continuous_times is a list of tuples containing the start and end time of each segment, and every embedding of aligned_embeddings corresponds to every 400ms of each segment, without overlap.

abhilashnayak · 2019-12-16T10:04:47Z

Hi @lagidigu ,

Thanks for this suggestion.
Just had a small issue. I have input an audio file to dvector_create.py. Got 11 d-vector embeddings.
and continuous_times = [(0.12, 1.68), (2.7, 4.28), (5.26, 6.7)]
converting into milliseconds,
[(120, 1680), (2700, 4280), (5260, 6700)]

total duration of each segment in ms,
[1560, 1580, 1440]

counting number of embeddings that can fit in each segment with 400 ms partition,
[3.9, 3.95, 3.6]

If I consider 3 embeddings in each segment I should have got 9 d-vector embeddings.
If I round off to next integer, and consider 4 embedding in each segment then I should have got 12 embeddings.

But Actually I got 11 d-vector embeddings which does not match with both of the above number.

I have experimented like this for higher length audio files as well and finding mismatch of around 2 or 4 embeddings.
Could you please help me understand how to know which d vector embedding is related to which timeframe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determining time indexes for embeddings? #64

Determining time indexes for embeddings? #64

chrisspen commented Oct 16, 2019

lagidigu commented Oct 28, 2019 •

edited

Loading

abhilashnayak commented Dec 16, 2019

Determining time indexes for embeddings? #64

Determining time indexes for embeddings? #64

Comments

chrisspen commented Oct 16, 2019

lagidigu commented Oct 28, 2019 • edited Loading

abhilashnayak commented Dec 16, 2019

lagidigu commented Oct 28, 2019 •

edited

Loading