Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determining time indexes for embeddings? #64

Open
chrisspen opened this issue Oct 16, 2019 · 2 comments
Open

Determining time indexes for embeddings? #64

chrisspen opened this issue Oct 16, 2019 · 2 comments

Comments

@chrisspen
Copy link

In dvector_create.py, an audio file is converted to a sequence of dvectors. However, the time index of each dvector is lost, so if a classification is performed using that dvector, you can't really do much with it since you don't know where in the original file that classification applies to. How would determine the time index for each dvector returned by align_embeddings()?

@lagidigu
Copy link

lagidigu commented Oct 28, 2019

I had a similar problem and I fixed it by changing the method concat_segs(times, segs) to the following:

def concat_segs(times, segs):
    #Concatenate continuous voiced segments
    concat_seg = []
    seg_concat = segs[0]

    continuous_times = []
    start = times[0][0]
    end = times[0][1]

    for i in range(0, len(times)-1):
        if times[i][1] == times[i+1][0]:
            seg_concat = np.concatenate((seg_concat, segs[i+1]))
            end = times[i+1][1]
        else:
            concat_seg.append(seg_concat)
            seg_concat = segs[i+1]

            continuous_times.append((start, end))
            start = times[i+1][0]
            end = times[i+1][1]

    else:
        concat_seg.append(seg_concat)
        continuous_times.append((start, end))

    return concat_seg, continuous_times

continuous_times is a list of tuples containing the start and end time of each segment, and every embedding of aligned_embeddings corresponds to every 400ms of each segment, without overlap.

@abhilashnayak
Copy link

Hi @lagidigu ,

Thanks for this suggestion.
Just had a small issue. I have input an audio file to dvector_create.py. Got 11 d-vector embeddings.
and continuous_times = [(0.12, 1.68), (2.7, 4.28), (5.26, 6.7)]
converting into milliseconds,
[(120, 1680), (2700, 4280), (5260, 6700)]

total duration of each segment in ms,
[1560, 1580, 1440]

counting number of embeddings that can fit in each segment with 400 ms partition,
[3.9, 3.95, 3.6]

If I consider 3 embeddings in each segment I should have got 9 d-vector embeddings.
If I round off to next integer, and consider 4 embedding in each segment then I should have got 12 embeddings.

But Actually I got 11 d-vector embeddings which does not match with both of the above number.

I have experimented like this for higher length audio files as well and finding mismatch of around 2 or 4 embeddings.
Could you please help me understand how to know which d vector embedding is related to which timeframe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants