Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] task distribution #5246

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 10 additions & 16 deletions argilla/docs/how_to_guides/annotate.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,13 @@ If you are starting an annotation effort, all the records are initially kept in
- **Pending**: The records without a response.
- **Draft**: The records with partial responses. They can be submitted or discarded later. You can’t move them back to the pending queue.
- **Discarded**: The records may or may not have responses. They can be edited but you can’t move them back to the pending queue.
- **Submitted**: The records have been fully annotated and have already been submitted.
- **Submitted**: The records have been fully annotated and have already been submitted. You can remove them from this queue and send them to the draft or discarded queues, but never back to the pending queue.

!!! note
If you are working as part of a team, the number of records in your Pending queue may change as other members of the team submit responses and those records get completed.

!!! tip
If you are working as part of a team, the records in the draft queue that have been completed by other team members will show a check mark to indicate that there is no need to provide a response.

### Suggestions

Expand Down Expand Up @@ -115,9 +121,9 @@ The bulk view displays the records in a vertical list. Once this view is active,

### Annotation progress

The global progress of the annotation task from all users is displayed in the dataset list. This is indicated in the `Global progress` column, which shows the number of records still to be annotated, along with a progress bar. The progress bar displays the percentage and number of records submitted, conflicting (i.e., those with both submitted and discarded responses), discarded and pending by hovering your mouse over it.
You can track the progress of an annotation task in the progress bar shown in the dataset list and in the progress panel inside the dataset. This bar shows the number of records that have been completed (i.e., those that have the minimum number of submitted responses) and those left to be completed.

You can track your annotation progress in real time from the righ-bottom panel inside the dataset page. This means that, while you are annotating, the progress bar updates as you submit or discard a record. Expanding the panel, the distribution of `Pending`, `Draft`, `Submitted` and `Discarded` responses is displayed in a donut chart.
You can also track your own progress in real time expanding the right-bottom panel inside the dataset page. There you can see the number of records for which you have `Pending`, `Draft`, `Submitted` and `Discarded` responses.

## Use search, filters, and sort

Expand Down Expand Up @@ -173,16 +179,4 @@ You can sort your records according to one or several attributes.

The insertion time and last update are general to all records.

The suggestion scores, response, and suggestion values for rating questions and metadata properties are available only when they were provided.

## Annotate in teams

!!! note
Argilla 2.1 will come with automatic task distribution, which will allow you to distribute the work across several users more efficiently.

### Edit guidelines in the settings

As an `owner` or `admin`, you can edit the guidelines as much as you need from the icon settings on the header. Markdown format is enabled.

!!! tip
If you want further guidance on good practices for guidelines during the project development, check this [blog post](https://argilla.io/blog/annotation-guidelines-practices/).
The suggestion scores, response, and suggestion values for rating questions and metadata properties are available only when they were provided.
15 changes: 15 additions & 0 deletions argilla/docs/how_to_guides/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ A **dataset** is a collection of records that you can configure for labelers to
vectors=[rg.VectorField(name="vector", dimensions=10)],
guidelines="guidelines",
allow_extra_metadata=True,
distribution=2
)
```

Expand Down Expand Up @@ -96,6 +97,7 @@ settings = rg.Settings(
guidelines="Select the sentiment of the prompt.",
fields=[rg.TextField(name="prompt", use_markdown=True)],
questions=[rg.LabelQuestion(name="sentiment", labels=["positive", "negative"])],
distribution=rg.TaskDistribution(min_submitted=3)
)

dataset1 = rg.Dataset(name="sentiment_analysis_1", settings=settings)
Expand Down Expand Up @@ -395,6 +397,19 @@ It is good practice to use at least the dataset guidelines if not both methods.
!!! tip
If you want further guidance on good practices for guidelines during the project development, check our [blog post](https://argilla.io/blog/annotation-guidelines-practices/).

### Distribution

When working as a team, you may want to distribute the annotation task to ensure efficiency and quality. You can use the `TaskDistribution` settings to configure the number of minimum submitted responses expected for each record. Argilla will use this setting to automatically handle records in your team members' pending queues.

```python
rg.TaskDistribution(
min_submitted = 2
)
```

> To learn more about how to distribute the task among team members in the [Distribute the annotation guide](../how_to_guides/distribution.md).


## List datasets

You can list all the datasets available in a workspace using the `datasets` attribute of the `Workspace` class. You can also use `len(workspace.datasets)` to get the number of datasets in a workspace.
Expand Down
76 changes: 76 additions & 0 deletions argilla/docs/how_to_guides/distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
description: In this section, we will provide a step-by-step guide to show how to distribute the annotation task among team members.
---

# Distribute the annotation task among the team

This guide explains how you can use Argilla’s **automatic task distribution** to efficiently divide the task of annotating a dataset among multiple team members.

Owners and admins can define the minimum number of submitted responses expected for each record depending on whether the dataset should have annotation overlap and how much. Argilla will use this setting to handle automatically the records that will be shown in the pending queues of all users with access to the dataset.

When a record has met the minimum number of submissions, the status of the record will change to `completed` and the record will be removed from the `Pending` queue of all team members, so they can focus on providing responses where they are most needed. The dataset’s annotation task will be fully completed once all records have the `completed` status.

![Task Distribution diagram](../assets/images/how_to_guides/distribution/taskdistribution.svg)

!!! note
The status of a record can be either `completed`, when it has the required number of responses with `submitted` status, or `pending`, when it doesn’t meet this requirement.

Each record can have multiple responses and each of those can have the status `submitted`, `discarded` or `draft`.

!!! info "Main Class"

```python
rg.TaskDistribution(
min_submitted = 2
)
```
> Check the [Task Distribution - Python Reference](../reference/argilla/settings/task_distribution.md) to see the attributes, arguments, and methods of the `TaskDistribution` class in detail.

## Configure task distribution settings

By default, Argilla will set the required minimum submitted responses to 1. This means that whenever a record has at least 1 response with the status `submitted` the status of the record will be `completed` and removed from the `Pending` queue of other team members.

!!! tip
Leave the default value of minimum submissions (1) if you are working on your own or when you don't require more than one submitted response per record.

If you wish to set a different number, you can do so through the `distribution` setting in your dataset settings:

```python
settings = rg.Settings(
guidelines="These are some guidelines.",
fields=[
rg.TextField(
name="text",
),
],
questions=[
rg.LabelQuestion(
name="label",
labels=["label_1", "label_2", "label_3"]
),
],
distribution=rg.TaskDistribution(min_submitted=3)
)
```

> Learn more about configuring dataset settings in the [Dataset management guide](../how_to_guides/dataset.md).

!!! tip
Increase the number of minimum subsmissions if you’d like to ensure you get more than one submitted response per record. Make sure that this number is never higher than the number of members in your team. Note that the lower this number is, the faster the task will be completed.

!!! note
Note that some records may have more responses than expected if multiple team members submit responses on the same record simultaneously.
nataliaElv marked this conversation as resolved.
Show resolved Hide resolved

## Change task distribution settings

If you wish to change the minimum submitted responses required in a dataset you can do so as long as the annotation hasn’t started, i.e. the dataset has no responses for any records.

Admins and owners can change this value from the dataset settings page in the UI or from the SDK:

```python
dataset = client.datasets(...)

dataset.settings.distribution.min_submitted = 4

dataset.update()
```
16 changes: 16 additions & 0 deletions argilla/docs/how_to_guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,22 @@ These guides provide step-by-step instructions for common scenarios, including d

[:octicons-arrow-right-24: How-to guide](import_export.md)

- __Annotate a dataset__

---

Learn how to use the Argilla UI to navigate datasets and submit responses.

[:octicons-arrow-right-24: How-to guide](annotate.md)

- __Distribute the annotation__

---

Learn how to use Argilla's automatic task distribution to annotate as a team efficiently.

[:octicons-arrow-right-24: How-to guide](distribution.md)

</div>

## Advanced
Expand Down
9 changes: 7 additions & 2 deletions argilla/docs/how_to_guides/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ You can use the `Filter` class to define the conditions and pass them to the `Da

## Filter by status

You can filter records based on their status. The status can be `pending`, `draft`, `submitted`, or `discarded`.
You can filter records based on record or response status. Record status can be `pending` or `completed` and response status can be `pending`, `draft`, `submitted`, or `discarded`.
nataliaElv marked this conversation as resolved.
Show resolved Hide resolved

```python
import argilla as rg
Expand All @@ -134,7 +134,12 @@ workspace = client.workspaces("my_workspace")
dataset = client.datasets(name="my_dataset", workspace=workspace)

status_filter = rg.Query(
filter=rg.Filter(("response.status", "==", "submitted"))
filter=rg.Filter(
[
("status", "==", "completed"),
("response.status", "==", "discarded")
]
)
)

filtered_records = list(dataset.records(status_filter))
Expand Down
1 change: 1 addition & 0 deletions argilla/docs/reference/argilla/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
* [Questions](settings/questions.md)
* [Metadata](settings/metadata_property.md)
* [Vectors](settings/vectors.md)
* [Distribution](settings/task_distribution.md)
* [rg.Record](records/records.md)
* [rg.Response](records/responses.md)
* [rg.Suggestion](records/suggestions.md)
Expand Down
2 changes: 1 addition & 1 deletion argilla/docs/reference/argilla/settings/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ dataset.create()

```

> To define the settings for fields, questions, metadata, or vectors, refer to the [`rg.TextField`](fields.md), [`rg.LabelQuestion`](questions.md), [`rg.TermsMetadataProperty`](metadata_property.md), and [`rg.VectorField`](vectors.md) class documentation.
> To define the settings for fields, questions, metadata, vectors, or distribution, refer to the [`rg.TextField`](fields.md), [`rg.LabelQuestion`](questions.md), [`rg.TermsMetadataProperty`](metadata_property.md), and [`rg.VectorField`](vectors.md), [`rg.TaskDistribution`](task_distribution.md) class documentation.

---

Expand Down
42 changes: 42 additions & 0 deletions argilla/docs/reference/argilla/settings/task_distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
hide: footer
---
# Distribution

Distribution settings are used to define the criteria used by the tool to automatically manage records in the dataset depending on the expected number of submitted responses per record.

## Usage Examples

The default minimum submitted responses per record is 1. If you wish to increase this value, you can define it through the `TaskDistribution` class and pass it to the `Settings` class.

```python
settings = rg.Settings(
guidelines="These are some guidelines.",
fields=[
rg.TextField(
name="text",
),
],
questions=[
rg.LabelQuestion(
name="label",
labels=["label_1", "label_2", "label_3"]
),
],
distribution=rg.TaskDistribution(min_submitted=3)
)

dataset = rg.Dataset(
name="my_dataset",
settings=settings
)
```

---

## `rg.TaskDistribution`

::: src.argilla.settings._task_distribution.OverlapTaskDistribution
options:
heading_level: 3
show_root_toc_entry: false
2 changes: 1 addition & 1 deletion argilla/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ nav:
- Query and filter records: how_to_guides/query.md
- Importing and exporting datasets: how_to_guides/import_export.md
- Annotate a dataset: how_to_guides/annotate.md
- Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
- Distribute the annotation task: how_to_guides/distribution.md
- Advanced:
- Use Markdown to format rich content: how_to_guides/use_markdown_to_format_rich_content.md
- Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
Expand Down
3 changes: 1 addition & 2 deletions argilla/src/argilla/settings/_task_distribution.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@
class OverlapTaskDistribution:
"""The task distribution settings class.

This task distribution defines a number of submitted record required to complete a record.
We could support multiple task distribution strategies in the future
This task distribution defines a number of submitted responses required to complete a record.

Args:
min_submitted (int): The number of min. submitted responses to complete the record
Expand Down
Loading