Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support VLBart with ReFT #119

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions examples/vlbart/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# ReFT + VLBart Experiment

Try out ReFT with vision-language! Go to `ReftDora/image_video_text_understanding/VL-T5/src/Reft_Injection.ipynb` for instructions.
1 change: 1 addition & 0 deletions examples/vlbart/ReftDora/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.json filter=lfs diff=lfs merge=lfs -text
6 changes: 6 additions & 0 deletions examples/vlbart/ReftDora/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
instruction_tuning/instruct/*
instruction_tuning/answers/*
instruction_tuning/peft/*
instruction_tuning/COPYRIGHT.txt
instruction_tuning/get_avg_score.py
instruction_tuning/Software Evaluation License.pdf
83 changes: 83 additions & 0 deletions examples/vlbart/ReftDora/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
Copyright (c) 2023-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

NVIDIA Source Code License for DoRA

=======================================================================

1. Definitions

“Licensor” means any person or entity that distributes its Work.

“Work” means (a) the original work of authorship made available under
this license, which may include software, documentation, or other files,
and (b) any additions to or derivative works thereof that are made
available under this license.

The terms “reproduce,” “reproduction,” “derivative works,” and “distribution”
have the meaning as provided under U.S. copyright law; provided, however,
that for the purposes of this license, derivative works shall not include works
that remain separable from, or merely link (or bind by name) to the
interfaces of, the Work.

Works are “made available” under this license by including in or with the Work
either (a) a copyright notice referencing the applicability of
this license to the Work, or (b) a copy of this license.

2. License Grant

2.1 Copyright Grant. Subject to the terms and conditions of this license, each
Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free,
copyright license to use, reproduce, prepare derivative works of, publicly display,
publicly perform, sublicense and distribute its Work and any resulting derivative
works in any form.

3. Limitations

3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under
this license, (b) you include a complete copy of this license with your distribution,
and (c) you retain without modification any copyright, patent, trademark, or
attribution notices that are present in the Work.

3.2 Derivative Works. You may specify that additional or different terms apply to the use,
reproduction, and distribution of your derivative works of the Work (“Your Terms”) only
if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative
works, and (b) you identify the specific derivative works that are subject to Your Terms.
Notwithstanding Your Terms, this license (including the redistribution requirements in
Section 3.1) will continue to apply to the Work itself.

3.3 Use Limitation. The Work and any derivative works thereof only may be used or
intended for use non-commercially. Notwithstanding the foregoing, NVIDIA Corporation
and its affiliates may use the Work and any derivative works commercially.
As used herein, “non-commercially” means for research or evaluation purposes only.

3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor
(including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that
you allege are infringed by any Work, then your rights under this license from
such Licensor (including the grant in Section 2.1) will terminate immediately.

3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its
affiliates’ names, logos, or trademarks, except as necessary to reproduce
the notices described in this license.

3.6 Termination. If you violate any term of this license, then your rights under
this license (including the grant in Section 2.1) will terminate immediately.

4. Disclaimer of Warranty.

THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT.
YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.

5. Limitation of Liability.

EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY,
WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR
BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL,
OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR
INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS
INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY
OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

=======================================================================
168 changes: 168 additions & 0 deletions examples/vlbart/ReftDora/image_video_text_understanding/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Initially taken from Github's Python gitignore file

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# tests and logs
tests/fixtures/*
!tests/fixtures/sample_text_no_unicode.txt
logs/
lightning_logs/
lang_code_data/
**/slurm*
**/wandb
**/snap
datasets

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# vscode
.vs
.vscode

# Pycharm
.idea

# TF code
tensorflow_code

# Models
proc_data

# examples
runs
/runs_old
/wandb
/examples/runs
/examples/**/*.args
/examples/rag/sweep

# data
/data
serialization_dir

# emacs
*.*~
debug.env

# vim
.*.swp

#ctags
tags

# pre-commit
.pre-commit*

# .lock
*.lock
21 changes: 21 additions & 0 deletions examples/vlbart/ReftDora/image_video_text_understanding/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 YI-LIN SUNG

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
99 changes: 99 additions & 0 deletions examples/vlbart/ReftDora/image_video_text_understanding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Finetuning VL-BART on image/video-text understaing tasks using DoRA

This directory includes the DoRA implementation and guidelines for reproducing the results in our paper.
We evaluate DoRA in a unified multi-task
setup on both image-text and video-text benchmarks following the settings of VL-Adapter. For the image-text tasks, we use four diverse V&L datasets: VQAv2, GQA, NLVR2, and MSCOCO image captioning. For video-text tasks, we use TVQA, How2QA, TVC, and YC2C.

## Setup
```
# Create python environment
conda create -n vlt5 python=3.8
source activate vlt5

# Install python dependencies
pip install -r requirements.txt

# Download T5/BART backbone checkpoint
python download_backbones.py

# For MSCOCO captioning evaluation (optional; for captioning only)
python -c "import language_evaluation; language_evaluation.download('coco')"
```

## Data
```bash
# Store images, features, and annotations
./datasets
COCO/
images/
clip_featuers/
VG/
images/
clip_features/
GQA/
images/
clip_features/
nlvr/
images/
clip_features/
vqa/
lxmert/

video/
ann/
vis_features

# Train VL-T5 with adapters
./VL-T5/
src/
multitask.py <= multitask learning on 7 downstream tasks
trainer_base.py <= DoRA implementation
```

### Image-text dataset
Please go to [link](https://drive.google.com/file/d/1O_RU1iFh_sbItZCTkOHUrbVIQQ_89Djj/view?usp=sharing) to download the processed CLIP features. We suggest to use [gdrive](https://github.com/prasmussen/gdrive) to download it. Unzip the downloaded file and arrange the folders according to the format demonstrated above.

If you would like to use dgrive to download the data, please try the following command

```
gdrive download 1O_RU1iFh_sbItZCTkOHUrbVIQQ_89Djj
```

### Extract your own CLIP features (Not necessary)
Please refer to `feature_extraction` for more details.

### Video-text dataset
Please go to [VALUE](https://github.com/VALUE-Leaderboard/DataRelease) to download the ViT processed data.

## Finetuning and Evaluation
### Finetuning VL-BART on Image-text datasets with DoRA (Evaluation included)
```
bash ./VL-T5/scripts/image/dora.sh 1
```
### Finetuning VL-BART on Video-text datasets with DoRA
```
bash ./VL-T5/scripts/video/dora.sh 1
```
### Evaluation of video-text tasks
Submit the generated test submission file strictly following the submission format (including directory layout and file names) specified [here](https://github.com/VALUE-Leaderboard/EvaluationTools) to the [Value benchmark website](https://value-benchmark.github.io/#:~:text=What%20is%20VALUE%3F,understanding%20both%20video%20and%20subtitles.) for evaluation.

## DoRA Result

### The multi-task evaluation results on VQA, GQA, NVLR2 and COCO Caption with the VL-BART backbone
| Method | # Params (%) | VQAv2 | GQA | NVLR2 | COCO Cap | Avg |
|-----------------------|---------|--------|--------|-------------|--------------|---------|
| FT | 100 | 66.9 |56.7 | 73.7 |112.0| 77.3|
| LoRA | 5.93 |65.2 |53.6| 71.9| 115.3| 76.5|
| DoRA | 5.96 | 65.8 |54.7 |73.1 |115.9 | **77.4** |


### The multi-task evaluation results on TVQA, How2QA, TVC, and YC2C with the VL-BART backbone.
| Method | # Params (%) | TVQA | How2QA| TVC| YC2C | Avg |
|-----------------------|---------|--------|--------|-------------|--------------|---------|
| FT | 100 | 76.3 | 73.9| 45.7| 154.0 | 87.5|
| LoRA | 5.17 | 75.5 | 72.9 | 44.6 | 140.9 | 83.5|
| DoRA | 5.19 | 76.3 | 74.1 | 45.8 | 145.4 | **85.4** |


## Acknowledgement
We greatly appreciate the contributions of [VL-Adapter](https://github.com/ylsung/VL_adapter) which has significantly benefited our work.
Loading