-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
real-time processing for FRCRN_SE_16K ? #23
Comments
Hi, thank you for your feedback, this releasing currently focuses on the performance refined on large training data, we have made changes on FRCRN structure for better performance on 16K audio. For 48K audio, please try MossFormer2_SE_48K model. |
and for real-time processing? |
Related to modelscope#23 Add real-time processing support for the FRCRN_SE_16K model. * **clearvoice/models/frcrn_se/frcrn.py** - Add a new method `real_time_process` to the `FRCRN_SE_16K` class for real-time processing. - Modify the `forward` method to support both offline and real-time processing. - Update the `DCCRN` class to handle real-time processing. * **clearvoice/config/inference/FRCRN_SE_16K.yaml** - Change `win_len` to 320 to use 20 ms input windows. - Change `win_inc` to 160 to use 20 ms input windows. * **clearvoice/demo.py** - Add a new demo case for real-time processing using the `FRCRN_SE_16K` model. * **clearvoice/demo_with_more_comments.py** - Add a new demo case for real-time processing using the `FRCRN_SE_16K` model.
Thank vishwamartur for the new addings. Meantime, we have released a 48K real-time model on ModelScope. Please have a check of it: https://modelscope.cn/models/iic/speech_dfsmn_ans_psm_48k_causal |
@alibabasglab thank you for sharing the causal 48k model! This works! Too bad it uses double window size (hence, double latency) than the model described in the paper, still very interesting to see a frame-in-frame-out implementation. @vishwamartur your implementation does not work. |
Hi, thanks for sharing this great piece of work! Regarding ClearVoice, the documentation seems to suggest that the trained model FRCRN_SE_16K is the same used for the IEEE ICASSP 2022 DNS Challenge. Is this so? How is it possible? First, the code running FRCRN_SE_16K is an offline process (big chunks of audio are processed at once), while the referenced DNS Challenge would involve real-time processing. Also, the code here uses input windows of 40 ms, while 20 ms is used in the paper that you reference in the documentation... Do you plan to add also the model actually used for the DNS challenge, and possibly an example running it in a real-time framework (i.e., frame-by-frame processing instead of big chunks of audio)? Thanks!
The text was updated successfully, but these errors were encountered: