From 0c86c8fcb84f32db23a4c6916a57eeff201a45cf Mon Sep 17 00:00:00 2001 From: Ferdinand Schlatt Date: Fri, 15 Nov 2024 16:11:31 +0100 Subject: [PATCH] minor doc updates --- docs/howto/dataset.rst | 14 ++++++++------ docs/model-zoo.rst | 1 + 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/docs/howto/dataset.rst b/docs/howto/dataset.rst index 27f2d1a..f1a328d 100644 --- a/docs/howto/dataset.rst +++ b/docs/howto/dataset.rst @@ -1,10 +1,12 @@ .. _howto-dataset: +.. _ir_datasets: https://ir-datasets.com/ + ==================== Use a Custom Dataset ==================== -Lightning IR currently supports all datasets registered with the `ir_datasets `_ library. However, it is also possible to use custom datasets with Lightning IR. ``ir_datasets`` supports five different data types: +Lightning IR currently supports all datasets registered with the `ir_datasets`_ library. However, it is also possible to use custom datasets with Lightning IR. `ir_datasets`_ supports five different data types: - Documents (a collection of documents) - Queries (a collection of queries) @@ -12,15 +14,15 @@ Lightning IR currently supports all datasets registered with the `ir_datasets `_. Lightning IR provides a :py:class:`~lightning_ir.lightning_utils.callbacks.RegisterLocalDatasetCallback` class to make registering datasets easy. This function takes a dataset id, and optional paths to local files or already valid ``ir_datasets`` dataset ids. +To integrate a custom dataset it needs to be locally registered with the `ir_datasets`_. Lightning IR provides a :py:class:`~lightning_ir.lightning_utils.callbacks.RegisterLocalDatasetCallback` class to make registering datasets easy. This function takes a dataset id, and optional paths to local files or already valid `ir_datasets`_ dataset ids. Let's look at an example. Say we wanted to register a new set of training triples for the MS MARCO passage dataset. Our triples file is named ``msmarco-passage-train-triples.tsv`` and has the following format: diff --git a/docs/model-zoo.rst b/docs/model-zoo.rst index 59c0c6f..68c782a 100644 --- a/docs/model-zoo.rst +++ b/docs/model-zoo.rst @@ -18,6 +18,7 @@ The following command and configuration can be used to reproduce the results: trainer: logger: false + enable_checkpointing: false model: class_path: CrossEncoderModule # for cross-encoders # class_path: BiEncoderModule # for bi-encoders