-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add adapter for HiSanta data #47
base: main
Are you sure you want to change the base?
Conversation
@farzadab It wasn't clear to me whether VoiceDatasetArgs are optional customizations to be used by some datasets or whether there are some that Datasets are required to respect. (I imagine at least max_audio_duration_secs is required?) Should be pretty easy to add support for the required ones once I know which those are. Note to self: Need to set up new service account. |
I believe |
"""List of references to conversation metadata JSON files in the bucket. | ||
These all look like {conversation_id}/metadata.json.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Why use strings for comments?
f"{conversation_id}/{message['speech']}" | ||
).download_as_bytes() | ||
yield VoiceSample( | ||
messages=[*history, {"role": "user", "content": "<|audio|>"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current assumption is that the last message should be the assistant message.
for i in range(start, len(self._conversations), increment): | ||
yield from self._from_conversation(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably have to experiment with how to form our samples here.
There are multiple issues to consider:
- How to do shuffle
- The length of each sample should be regulated:
max_audio_duration_secs
was an attempt at this, but generally the bottleneck is GPU memory
Putting this on ice for now. |
No description provided.