All the training scripts are located in the /verisci/training
folder
- Common Arguments
- Training on SciFact
- Training on FEVER
- Training on FEVER + SciFact
- Training on Snopes
--model
: Model argument allows you to select what pretrained model to use. Options includeroberta-base
,roberta-large
,allenai/scibert_scivocab_uncased
,allenai/biomed_roberta_base
, andPATH TO CHECKPOINT
. Note that if you select non-roberta model. You may need to edit the script where it is commented.--dest
: The folder to save weights. The script saves on every epoch.--batch-size-gpu
: The batch size to pass through GPU. You may need to lower this if your GPU has small memory.--batch-size-accumulated
: The accumulated batch size for each optimizer update--lr-base
: The learning rate for the transformer base--lr-linear
: The learning rate for the linear classifier layer.
Train rationale selection module
python verisci/training/rationale_selection/transformer_scifact.py \
--corpus "PATH TO corpus.jsonl" \
--claim-train "PATH TO claims_train.jsonl" \
--claim-dev "PATH TO claims_dev.jsonl" \
--model "roberta-large" \
--dest "PATH TO WEIGHT SAVING FOLDER"
For example, after downloading the data using script/download-data.sh
, you could do
python verisci/training/rationale_selection/transformer_scifact.py \
--corpus "data/corpus.jsonl" \
--claim-train "data/claims_train.jsonl" \
--claim-dev "data/claims_dev.jsonl" \
--model "roberta-large" \
--dest "model/rationale_roberta_large_scifact"
Train label prediction module
python verisci/training/label_prediction/transformer_scifact.py \
--corpus "PATH TO corpus.jsonl" \
--claim-train "PATH TO claims_train.jsonl" \
--claim-dev "PATH TO claims_dev.jsonl" \
--model "roberta-large" \
--dest "PATH TO WEIGHT SAVING FOLDER"
You will need to download Fever dataset and Wiki Dump manually
# Download Fever dataset:
wget https://fever.ai/download/fever/train.jsonl
wget https://fever.ai/download/fever/paper_dev.jsonl
wget https://fever.ai/download/fever/paper_test.jsonl
# Download Wikipedia Dump and unzip it manually.
wget https://fever.ai/download/fever/wiki-pages.zip
unzip wiki-pages.zip
Then use the preprocessing script to convert them to our format.
Use it to process both train.jsonl
and paper_dev.json
to get output fever_train.jsonl
and fever_dev.jsonl
python verisci/training/preprocess_fever.py \
--wiki-folder "PATH TO wiki-pages folder" \
--input "PATH TO train.jsonl or paper_dev.jsonl" \
--output "PATH TO CONVERTED FILE fever_train.jsonl or fever_dev.jsonl"
Train rationale selection module
python verisci/training/rationale_selection/transformer_fever.py \
--train "PATH TO fever_train.jsonl" \
--dev "PATH TO fever_dev.jsonl" \
--model "roberta-large" \
--dest "PATH TO WEIGHT SAVING FOLDER"
Train label prediction module
python verisci/training/label_prediction/transformer_fever.py \
--train "PATH TO fever_train.jsonl" \
--dev "PATH TO fever_dev.jsonl" \
--model "roberta-large" \
--dest "PATH TO WEIGHT SAVING FOLDER"
Simply train the model using the Fever script and then using the SciFact script.
When using the SciFact script, change the --model
argument to the saved weights produced by the
fever script.
You will need to acquire Snopes dataset yourself.
Train rationale selection module
python verisci/training/rationale_selection/transformer_snopes.py \
--corpus "PATH TO snopes.pages.json" \
--evidence-train "PATH TO snopes.evidence.train.jsonl" \
--evidence-dev "PATH TO snopes.evidence.dev.jsonl" \
--model "roberta-large" \
--dest "PATH TO WEIGHT SAVING FOLDER"
Train label prediction module
python verisci/training/label_prediction/transformer_snopes.py \
--corpus "PATH TO snopes.pages.json" \
--evidence-train "PATH TO snopes.stance.train.jsonl" \
--evidence-dev "PATH TO snopes.stance.dev.jsonl" \
--model "roberta-large" \
--dest "PATH TO WEIGHT SAVING FOLDER"