Skip to content

Codebase for the paper XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages (WWW 2023)

License

Notifications You must be signed in to change notification settings

anupampatil44/XWikiGen

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XWikiGen

PWC

This repository contains code related to various experiments, which we performed on our dataset (XWikiRef).

Updated dataset link: XWikiRef

Overall it contains 3 directories:

1. extractive:
	- Salience: Experiments containing salience extractive stage
	- HipoRank: Experiments containing hiporank extractive stage
2. abstractive:
	- combined_model: Experiments containing combined model in abstractive stage
	- multidomain: Experiments containing multidomain model in abstractive stage
	- multilingual: Experiments containing multilingual model in abstractive stage
3. evaluation:
    - evaluate_multidomain: Evaluation script for multidomain experiment
    - evaluate_multilingual: Evaluation script for multilingual experiment
    - evaluate_multilingual_multidomain: Evaluation script for multilingual multidomain experiment

The command to run the above experiments are mentioned in the bash file present in each of the directories mentioned above.

    conda create -n xwikigen python=3.8
    conda activate xikigen
    cd XWikiGen/
    pip install -r requirements.txt

To run the salience extractive stage:

    cd extractive/salience/
    bash run_extractive.sh

To run the hiporank extractive stage:

    cd extractive/hiporank/
    bash run_extractive.sh

To run the salience abstractive stage:

    cd abstractive/
    # Go to the desired expriment directory
    bash salience_run.sh

To run the hiporank abstractive stage:

    cd abstractive/
    # Go to the desired expriment directory
    bash hiporank_run.sh

Note: Make sure you make changes to the files path as per your machine.

Below is the directory structure of this repo.

├── extractive
│   ├── HipoRank
│   │   ├── modified_codes
│   │   ├── exp8_run.py
│   │   ├── exp5_run.py
│   │   ├── exp10_run.py
│   │   ├── human_eval_data.jsonl
│   │   ├── ROUGE-1.5.5
│   │   ├── exp2_run.py
│   │   ├── run_extractive.sh
│   │   ├── readme.txt
│   │   ├── exp3_run.py
│   │   ├── .gitignore
│   │   ├── exp6_run.py
│   │   ├── exp4_run.py
│   │   ├── exp9_run.py
│   │   ├── human_eval_sample.ipynb
│   │   ├── plot_ablation.ipynb
│   │   ├── exp11_run.py
│   │   ├── convert_to_pubmed_like.py
│   │   ├── LICENSE
│   │   ├── human_eval_samples.jsonl
│   │   ├── plot_sentence_positions.ipynb
│   │   ├── hipo_rank
│   │   ├── human_eval_results.ipynb
│   │   ├── test.txt
│   │   ├── exp7_run.py
│   │   ├── op_indiv2.txt
│   │   ├── exp_ours.py
│   │   └── dataset_format_sentence_tokenization_individual_sectionwise.py
│   └── salience
│       ├── run_extractive.sh
│       └── extractive.py
├── evaluation
│   ├── evaludate_multidomain.py
│   ├── evaluate_multilingual.py
│   └── evaluate_multilingual_multidomain.py
├── requirements.txt
├── readme.md
└── abstractive
    ├── multilingual
    │   ├── hiporank_run.sh
    │   ├── readme.txt
    │   ├── model
    │   │   ├── dataloader.py
    │   │   └── model.py
    │   ├── saliency_run.sh
    │   ├── testing
    │   │   ├── testing.py
    │   │   └── test.sh
    │   └── train.py
    ├── multidomain
    │   ├── hiporank_run.sh
    │   ├── readme.txt
    │   ├── model
    │   │   ├── dataloader.py
    │   │   └── model.py
    │   ├── saliency_run.sh
    │   ├── testing
    │   │   ├── testing.py
    │   │   └── test.sh
    │   └── train.py
    └── combined_model
        ├── hiporank_run.sh
        ├── model
        │   ├── dataloader.py
        │   └── model.py
        ├── saliency_run.sh
        ├── testing
        │   ├── testing.py
        │   └── test.sh
        └── train.py

About

Codebase for the paper XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages (WWW 2023)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 42.9%
  • Python 37.9%
  • Jupyter Notebook 16.8%
  • Shell 2.4%