SKIE Project Guide

Welcome to the SKIE project's pretraining guide. This document will walk you through the necessary steps to preprocess your datasets and perform pretraining. Please follow the requirements.txt to install and use it.

1. Preprocessing

Step 1: Dataset Preprocessing

To preprocess your dataset for the model, ensure each sentence is converted into a JSON object containing sentence and tokens. Run the following script:

python preprocess_pretrained_data.py

Step 2: Extract AMR Subgraphs

We use a transformer-based AMR parser 'Transition-based Parsing with Stack-Transformers' in our model.

Run the script to process AMR subgraphs:

nohup python get_sub_counts.py --dataset {dataset} > {dataset}subcounts.log 2>&1 &

2. Pretraining

Initial Pretraining

For the initial pretraining of the model, use the following command. This will save the model in a specified directory named '{dataset}_global_complete_graph_finetuned_bert_model'.

nohup python pretrain.py --dataset {dataset} > {dataset}pretrain.log 2>&1 &

Continual Training on a New Dataset

To continue training on a new dataset using a pre-trained model:

nohup python pretrain.py --dataset {dataset} --bert_model_name {dataset}_global_complete_graph_finetuned_bert_model > {dataset}pretrain.log 2>&1 &

Due to privacy concerns, our pre-trained models will be made available after the paper is accepted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SKIE Project Guide

1. Preprocessing

Step 1: Dataset Preprocessing

Step 2: Extract AMR Subgraphs

2. Pretraining

Initial Pretraining

Continual Training on a New Dataset

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
pretrain		pretrain
README.md		README.md
get_sub_counts.py		get_sub_counts.py
preprocess_pretrained_data.py		preprocess_pretrained_data.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt

baokemi/Pretrain-for-IE

Folders and files

Latest commit

History

Repository files navigation

SKIE Project Guide

1. Preprocessing

Step 1: Dataset Preprocessing

Step 2: Extract AMR Subgraphs

2. Pretraining

Initial Pretraining

Continual Training on a New Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages