-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create back-translation.md #463
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for your hard work, @liashahnazaryan!
I hope my comments make sense. Let me know what you think. :)
title: Back-translation | ||
description: | ||
description: Back-translating or back-copying target language sentences to augment parallel data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this definition is not clear, should we say something like "Translation of the machine translation output back into its input language"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option. would be something like "Creating parallel data by translating monolingual data from the target language to the source language"
--- | ||
|
||
**Back-translation** is the process of translating the monolingual data in the target language into the source language and then back into the target language. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd simplify "the monolingual data in the target language" to make it less wordy.
What do you think of something like this?
Back-translation is the process of using machine translation to translate again a machine translation output to generate synthetic data.
Steps
1- First, the machine translation system translates a text from one language to another language.
2- Then, the system uses the machine translation output as input, and translates it back into the original language.
3- Finally, the system translates the resulting synthetic input text again into the output language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think creating an image would make it easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this description, because it makes it sound like then it is translated from source to target. But the training data is only translated from target to source.
Translation from source to target only happens at inference time, but that's a given.
--- | ||
|
||
**Back-translation** is the process of translating the monolingual data in the target language into the source language and then back into the target language. | ||
The goal is to generate synthetic [parallel data](/customisation/parallel-data.md) that can be used to train machine translation systems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great place to introduce the "augment" term!
Perhaps we can say that synthetic data is necessary to augment parallel data so that there is more data to train machine translation systems and improve quality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"necessary" is not always true, we can be less opinionated.
The goal is to generate synthetic [parallel data](/customisation/parallel-data.md) that can be used to train machine translation systems. | ||
|
||
**Back-copying** is a similar technique to back-translation. | ||
The process involves using an existing translation to create a new parallel sentence pair in the opposite direction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's clear how it differs with back-translation
. Is the "existing" translation human-generated?
I don't want to make articles longer, but I see a few things missing:
|
I'll make this a draft to work on the changes and add the suggested parts. |
Description
Fixes # 81
Type of PR
Checklist: