-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
less data preparation time #1
Conversation
Sorry I find there is some bug for SS, let me look into this further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ftshijt and @HuangZiliAndy ,
I made some updates and now two scripts for generating mixtures both work.
Please help me proof-read the change if you have time. Thanks!
--freqs 16k \ | ||
--modes min \ | ||
--types mix_clean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HuangZiliAndy Please help me proof-read this! Thanks!
print("[Warning] - train-clean-360 is ignored in create_librimix_from_metadata.py for less data preparation time."\ | ||
" Please note that in S3PRL we only use the train-clean-100 for downstream tasks.") | ||
md_filename_list = [file for file in os.listdir(metadata_dir) | ||
if 'info' not in file] | ||
if 'info' not in file and '360' not in file] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for skipping the mixture creation for trian-clean-360 which will take much more time.
@@ -79,5 +78,5 @@ for n_src in 2; do | |||
--n_src $n_src \ | |||
--freqs 16k \ | |||
--modes max \ | |||
--types mix_clean mix_both | |||
--types mix_both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can @ftshijt please help me proof-read this?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks
Hey @ftshijt and @HuangZiliAndy !
As we have discussed this very long time ago, that it will be nice we have a centralized LibriMix repo instead of one repo for a task, and we just use different data preparation script for each task.
Here I create on repo basing on Jiatong's version since his version has much more modification from the official release including the RTTM label files or SD. So there are two data preparation scripts:
I also make some changes to decrease the data preparation time, including ignoring train-clean-360 for both tasks and ignoring WHAM noise augmentation for SS. Furthermore, since now SD and SS all have a specific setting in terms of
min/max
condition ormix_clean/mix_both
condition. Hence I think the data preparation script can now just prepare the specific setting we use for benchmarking, I believe these changes can save user some time from waiting the data to be ready.Could you please take a look if my change fit your need?
Thanks!!