stanfordnlp · PinetreePantry · May 7, 2024 · May 11, 2024 · May 13, 2024 · May 13, 2024
diff --git a/examples/vlbart/README.md b/examples/vlbart/README.md
@@ -0,0 +1,3 @@
+# ReFT + VLBart Experiment
+
+Try out ReFT with vision-language! Go to `ReftDora/image_video_text_understanding/VL-T5/src/Reft_Injection.ipynb` for instructions.
diff --git a/examples/vlbart/ReftDora/.gitattributes b/examples/vlbart/ReftDora/.gitattributes
@@ -0,0 +1 @@
+*.json filter=lfs diff=lfs merge=lfs -text
diff --git a/examples/vlbart/ReftDora/.gitignore b/examples/vlbart/ReftDora/.gitignore
@@ -0,0 +1,6 @@
+instruction_tuning/instruct/*
+instruction_tuning/answers/*
+instruction_tuning/peft/*
+instruction_tuning/COPYRIGHT.txt
+instruction_tuning/get_avg_score.py
+instruction_tuning/Software Evaluation License.pdf
diff --git a/examples/vlbart/ReftDora/LICENSE b/examples/vlbart/ReftDora/LICENSE
@@ -0,0 +1,83 @@
+Copyright (c) 2023-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+NVIDIA Source Code License for DoRA
+
+=======================================================================
+
+1. Definitions
+
+“Licensor” means any person or entity that distributes its Work.
+
+“Work” means (a) the original work of authorship made available under
+this license, which may include software, documentation, or other files,
+and (b) any additions to or derivative works thereof that are made
+available under this license.
+
+The terms “reproduce,” “reproduction,” “derivative works,” and “distribution”
+have the meaning as provided under U.S. copyright law; provided, however,
+that for the purposes of this license, derivative works shall not include works
+that remain separable from, or merely link (or bind by name) to the
+interfaces of, the Work.
+
+Works are “made available” under this license by including in or with the Work
+either (a) a copyright notice referencing the applicability of
+this license to the Work, or (b) a copy of this license.
+
+2. License Grant
+
+2.1 Copyright Grant. Subject to the terms and conditions of this license, each
+Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free,
+copyright license to use, reproduce, prepare derivative works of, publicly display,
+publicly perform, sublicense and distribute its Work and any resulting derivative
+works in any form.
+
+3. Limitations
+
+3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under
+this license, (b) you include a complete copy of this license with your distribution,
+and (c) you retain without modification any copyright, patent, trademark, or
+attribution notices that are present in the Work.
+
+3.2 Derivative Works. You may specify that additional or different terms apply to the use,
+reproduction, and distribution of your derivative works of the Work (“Your Terms”) only
+if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative
+works, and (b) you identify the specific derivative works that are subject to Your Terms.
+Notwithstanding Your Terms, this license (including the redistribution requirements in
+Section 3.1) will continue to apply to the Work itself.
+
+3.3 Use Limitation. The Work and any derivative works thereof only may be used or
+intended for use non-commercially. Notwithstanding the foregoing, NVIDIA Corporation
+and its affiliates may use the Work and any derivative works commercially.
+As used herein, “non-commercially” means for research or evaluation purposes only.
+
+3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor
+(including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that
+you allege are infringed by any Work, then your rights under this license from
+such Licensor (including the grant in Section 2.1) will terminate immediately.
+
+3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its
+affiliates’ names, logos, or trademarks, except as necessary to reproduce
+the notices described in this license.
+
+3.6 Termination. If you violate any term of this license, then your rights under
+this license (including the grant in Section 2.1) will terminate immediately.
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT.
+YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY,
+WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR
+BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL,
+OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR
+INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS
+INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY
+OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+=======================================================================
diff --git a/examples/vlbart/ReftDora/image_video_text_understanding/.gitignore b/examples/vlbart/ReftDora/image_video_text_understanding/.gitignore
@@ -0,0 +1,168 @@
+# Initially taken from Github's Python gitignore file
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# tests and logs
+tests/fixtures/*
+!tests/fixtures/sample_text_no_unicode.txt
+logs/
+lightning_logs/
+lang_code_data/
+**/slurm*
+**/wandb
+**/snap
+datasets
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# vscode
+.vs
+.vscode
+
+# Pycharm
+.idea
+
+# TF code
+tensorflow_code
+
+# Models
+proc_data
+
+# examples
+runs
+/runs_old
+/wandb
+/examples/runs
+/examples/**/*.args
+/examples/rag/sweep
+
+# data
+/data
+serialization_dir
+
+# emacs
+*.*~
+debug.env
+
+# vim
+.*.swp
+
+#ctags
+tags
+
+# pre-commit
+.pre-commit*
+
+# .lock
+*.lock
diff --git a/examples/vlbart/ReftDora/image_video_text_understanding/LICENSE b/examples/vlbart/ReftDora/image_video_text_understanding/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2022 YI-LIN SUNG
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/examples/vlbart/ReftDora/image_video_text_understanding/README.md b/examples/vlbart/ReftDora/image_video_text_understanding/README.md
@@ -0,0 +1,99 @@
+# Finetuning VL-BART on image/video-text understaing tasks using DoRA
+
+This directory includes the DoRA implementation and guidelines for reproducing the results in our paper.
+We evaluate DoRA in a unified multi-task
+setup on both image-text and video-text benchmarks following the settings of VL-Adapter. For the image-text tasks, we use four diverse V&L datasets: VQAv2, GQA, NLVR2, and MSCOCO image captioning. For video-text tasks, we use TVQA, How2QA, TVC, and YC2C. 
+
+## Setup
+```
+# Create python environment
+conda create -n vlt5 python=3.8
+source activate vlt5
+
+# Install python dependencies
+pip install -r requirements.txt
+
+# Download T5/BART backbone checkpoint
+python download_backbones.py
+
+# For MSCOCO captioning evaluation (optional; for captioning only)
+python -c "import language_evaluation; language_evaluation.download('coco')"
+```
+
+## Data
+```bash
+# Store images, features, and annotations
+./datasets
+    COCO/
+        images/
+        clip_featuers/
+    VG/
+        images/
+        clip_features/
+    GQA/
+        images/
+        clip_features/
+    nlvr/
+        images/
+        clip_features/
+    vqa/
+    lxmert/
+
+    video/
+        ann/
+        vis_features
+
+# Train VL-T5 with adapters
+./VL-T5/
+    src/
+        multitask.py    <= multitask learning on 7 downstream tasks
+        trainer_base.py <= DoRA implementation
+```
+
+### Image-text dataset
+Please go to [link](https://drive.google.com/file/d/1O_RU1iFh_sbItZCTkOHUrbVIQQ_89Djj/view?usp=sharing) to download the processed CLIP features. We suggest to use [gdrive](https://github.com/prasmussen/gdrive) to download it. Unzip the downloaded file and arrange the folders according to the format demonstrated above.
+
+If you would like to use dgrive to download the data, please try the following command
+
+```
+gdrive download 1O_RU1iFh_sbItZCTkOHUrbVIQQ_89Djj
+```
+
+### Extract your own CLIP features (Not necessary)
+Please refer to `feature_extraction` for more details.
+
+### Video-text dataset
+Please go to [VALUE](https://github.com/VALUE-Leaderboard/DataRelease) to download the ViT processed data.
+
+## Finetuning and Evaluation
+### Finetuning VL-BART on Image-text datasets with DoRA (Evaluation included)
+```
+bash ./VL-T5/scripts/image/dora.sh 1
+```
+### Finetuning VL-BART on Video-text datasets with DoRA
+```
+bash ./VL-T5/scripts/video/dora.sh 1
+```
+### Evaluation of video-text tasks
+Submit the generated test submission file strictly following the submission format (including directory layout and file names) specified [here](https://github.com/VALUE-Leaderboard/EvaluationTools) to the [Value benchmark website](https://value-benchmark.github.io/#:~:text=What%20is%20VALUE%3F,understanding%20both%20video%20and%20subtitles.) for evaluation.
+
+## DoRA Result
+
+### The multi-task evaluation results on VQA, GQA, NVLR2 and COCO Caption with the VL-BART backbone
+| Method               |  # Params (%) | VQAv2 | GQA | NVLR2 | COCO Cap | Avg  |
+|-----------------------|---------|--------|--------|-------------|--------------|---------|
+| FT | 100 | 66.9 |56.7 | 73.7 |112.0| 77.3|
+| LoRA | 5.93 |65.2 |53.6| 71.9| 115.3| 76.5|
+| DoRA | 5.96 | 65.8 |54.7 |73.1 |115.9 | **77.4** |
+
+
+### The multi-task evaluation results on TVQA, How2QA, TVC, and YC2C with the VL-BART backbone.
+| Method                 |  # Params (%) |  TVQA | How2QA| TVC| YC2C | Avg  |
+|-----------------------|---------|--------|--------|-------------|--------------|---------|
+| FT | 100 | 76.3 | 73.9| 45.7| 154.0 | 87.5|
+| LoRA | 5.17 | 75.5 | 72.9 | 44.6 | 140.9 | 83.5|
+| DoRA | 5.19 |  76.3 | 74.1 | 45.8 | 145.4 | **85.4** |
+
+
+## Acknowledgement
+We greatly appreciate the contributions of [VL-Adapter](https://github.com/ylsung/VL_adapter) which has significantly benefited our work.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# ReFT + VLBart Experiment

		Try out ReFT with vision-language! Go to `ReftDora/image_video_text_understanding/VL-T5/src/Reft_Injection.ipynb` for instructions.