GitHub - FAhtisham/neural-machine-translation-using-gan: Application of rnn-gan to machine translation

Neural Machine Translation using Adversarial Training

Here I am trying to investigate how a GAN-NMT can be used to predict mutations in the case of SARS-COV-2

Dataset for Mutation Prediciton GAN-NMT

Sequences downloaded from GISAID
Pair formed by the application of ML algorithms i.e. K-means clustering, Nearest Neighbors, Euclidean Distance.

Dataset for original GAN-NMT

Here, i make use of freely available IWSLT'14 dataset. The dataset is downloaded and preprocessed through facebook fairseq toolkit

Follow below steps to download & preprocess the dataset:

git clone https://github.com/pytorch/fairseq
cd examples/translation/; bash prepare-iwslt14.sh; cd ../.. (make sure to make relevant dataset name changes in bash script)
TEXT=examples/translation/iwslt14.tokenized.de-en
python preprocess.py --source-lang de --target-lang en --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test --destdir data-bin/iwslt14.tokenized.de-en

Usage

The main program for the NMT-GAN is joint_train.py .

The file train_generator.py is a traditional NMT, which is similar to NMT-GAN generator network.

To train NMT-GAN model, please use following command:

python joint_train.py --data data-bin/iwslt14.tokenized.de-en/  --src_lang de --trg_lang en --learning_rate 1e-3 --joint-batch-size 64 --gpuid 0 --clip-norm 1.0 --epochs 10

This will save the model in checkpoints folder (make sure GPU is enabled)

Generate predictions as per below:

python generate.py --data data-bin/iwslt14.tokenized.de-en/ --src_lang de --trg_lang en --batch-size 64 --gpuid 0

This generates predictions.txt and real.txt

As common in most NMT models, we use BLEU score as an evaluation metric. I make use of freely available mosesedecoder toolkit.

Follow below steps for evaluation:

Postprocess both real and predictions text files as below

bash postprocess.sh < real.txt > real_processed.txt
bash postprocess.sh < predictions.txt > predictions_processed.txt

Run BLEU evaluation

perl scripts/multi-bleu.perl real_processed.txt < predictions_processed.txt

References

Code adapted from: https://github.com/wangyirui/Adversarial-NMT
https://github.com/pytorch/fairseq
https://github.com/moses-smt/mosesdecoder/tree/master/scripts
Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yun Liu. Adversarial neural machine translation. arXiv, 2017

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
__pycache__		__pycache__
data-bin/iwslt14.tokenized.de-en		data-bin/iwslt14.tokenized.de-en
mutation_gan		mutation_gan
papers		papers
scripts		scripts
test		test
.gitignore		.gitignore
PGLoss.py		PGLoss.py
README.md		README.md
Report.pdf		Report.pdf
batch_generator.py		batch_generator.py
data.py		data.py
dictionary.py		dictionary.py
disc_dataloader.py		disc_dataloader.py
discriminator.py		discriminator.py
generate.py		generate.py
generator.py		generator.py
german-english-predicted.txt		german-english-predicted.txt
german-english-predicted_processed.txt		german-english-predicted_processed.txt
german-english-real.txt		german-english-real.txt
german-english-real_processed.txt		german-english-real_processed.txt
indexed_dataset.py		indexed_dataset.py
joint_train.py		joint_train.py
meters.py		meters.py
options.py		options.py
postprocess.sh		postprocess.sh
sequence_generator.py		sequence_generator.py
tokenizer.py		tokenizer.py
train_discriminator.py		train_discriminator.py
train_generator.py		train_generator.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Machine Translation using Adversarial Training

Dataset for Mutation Prediciton GAN-NMT

Dataset for original GAN-NMT

Usage

References

About

Releases

Packages

Contributors 2

Languages

FAhtisham/neural-machine-translation-using-gan

Folders and files

Latest commit

History

Repository files navigation

Neural Machine Translation using Adversarial Training

Dataset for Mutation Prediciton GAN-NMT

Dataset for original GAN-NMT

Usage

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages