Skip to content

Commit

Permalink
docs: 5411 docs update migrating to 20 flow 2 (#5430)
Browse files Browse the repository at this point in the history
# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->
Extended user and workspace merge v2.

Closes #5411 

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Documentation update

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: Paco Aranda <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
3 people authored Aug 28, 2024
1 parent c03edfb commit 78575e6
Show file tree
Hide file tree
Showing 3 changed files with 90 additions and 21 deletions.
2 changes: 1 addition & 1 deletion argilla/docs/how_to_guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ These guides provide step-by-step instructions for common scenarios, including d

---

Learn how to migrate your legacy datasets from Argilla 1.x to 2.x.
Learn how to migrate users, workspaces and datasets from Argilla V1 to V2.

[:octicons-arrow-right-24: How-to guide](migrate_from_legacy_datasets.md)

Expand Down
107 changes: 88 additions & 19 deletions argilla/docs/how_to_guides/migrate_from_legacy_datasets.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,101 @@
# Migrate your legacy datasets to Argilla V2
# Migrate users, workspaces and datasets to Argilla 2.x

This guide will help you migrate task specific datasets to Argilla V2. These do not include the `FeedbackDataset` which is just an interim naming convention for the latest extensible dataset. Task specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the [SDK migration blog post](https://argilla.io/blog/introducing-argilla-new-sdk/).
This guide will help you migrate task to Argilla V2. These do not include the `FeedbackDataset` which is just an interim naming convention for the latest extensible dataset. Task-specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the [SDK migration blog post](https://argilla.io/blog/introducing-argilla-new-sdk/). Additionally, we will provide guidance on how to maintain your `User`'s and `Workspace`'s within the new Argilla V2 format.

!!! note
Legacy Datasets include: `DatasetForTextClassification`, `DatasetForTokenClassification`, and `DatasetForText2Text`.
Legacy datasets include: `DatasetForTextClassification`, `DatasetForTokenClassification`, and `DatasetForText2Text`.

`FeedbackDataset`'s do not need to be migrated as they are already in the Argilla V2 format. Anyway, since the 2.x version includes changes to the search index structure, you should reindex the datasets by enabling the docker environment variable REINDEX_DATASET (This step is automatically executed if you're running Argilla in an HF Space). See the [server configuration docs](../reference/argilla-server/configuration.md#docker-images-only) section for more details.

`FeedbackDataset`'s do not need to be migrated as they are already in the Argilla V2 format.

To follow this guide, you will need to have the following prerequisites:

- An argilla 1.* server instance running with legacy datasets.
- An argilla >=1.29 server instance running. If you don't have one, you can create one by following this [Argilla guide](../getting_started/quickstart.md).
- The `argilla` sdk package installed in your environment.

!!! warning
This guide will recreate all `User`'s' and `Workspace`'s' on a new server. Hence, they will be created with new passwords and IDs. If you want to keep the same passwords and IDs, you can can copy the datasets to a temporary v2 instance, then upgrade your current instance to v2.0 and copy the datasets back to your original instance after.

If your current legacy datasets are on a server with Argilla release after 1.29, you could chose to recreate your legacy datasets as new datasets on the same server. You could then upgrade the server to Argilla 2.0 and carry on working their. Your legacy datasets will not be visible on the new server, but they will remain in storage layers if you need to access them.

## Steps
For migrating the guides you will need to install the new `argilla` package. This includes a new `v1` module that allows you to connect to the Argilla V1 server.

```bash
pip install "argilla>=2.0.0"
```

## Migrate Users and Workspaces

The guide will take you through two steps:

1. **Retrieve the old users and workspaces** from the Argilla V1 server using the new `argilla` package.
2. **Recreate the users and workspaces** on the Argilla V2 server based op `name` as unique identifier.

### Step 1: Retrieve the old users and workspaces

You can use the `v1` module to connect to the Argilla V1 server.

```python
import argilla.v1 as rg_v1

# Initialize the API with an Argilla server less than 2.0
api_url = "<your-url>"
api_key = "<your-api-key>"
rg_v1.init(api_url, api_key)
```

Next, load the dataset `User` and `Workspaces` and from the Argilla V1 server:

```python
users_v1 = rg_v1.User.list()
workspaces_v1 = rg_v1.Workspace.list()
```

### Step 2: Recreate the users and workspaces

To recreate the users and workspaces on the Argilla V2 server, you can use the `argilla` package.

First, instantiate the `Argilla` class to connect to the Argilla V2 server:

```python
import argilla as rg

client = rg.Argilla()
```

Next, recreate the users and workspaces on the Argilla V2 server:

```python
for workspace in workspaces_v1:
rg.Workspace(
name=workspace.name
).create()
```

```python
for user in users_v1:
user = rg.User(
username=user.username,
first_name=user.first_name,
last_name=user.last_name,
role=user.role,
password="<your_chosen_password>" # (1)
).create()
if user.role == "owner":
continue

for workspace_name in user.workspaces:
if workspace_name != user.name:
workspace = client.workspaces(name=workspace_name)
user.add_to_workspace(workspace)
```

1. You need to chose a new password for the user, to do this programmatically you can use the `uuid` package to generate a random password. Take care to keep track of the passwords you chose, since you will not be able to retrieve them later.

Now you have successfully migrated your users and workspaces to Argilla V2 and can continue with the next steps.

## Migrate datasets

The guide will take you through three steps:

Expand All @@ -25,12 +105,7 @@ The guide will take you through three steps:

### Step 1: Retrieve the legacy dataset

Connect to the Argilla V1 server via the new `argilla` package. First, you should install an extra dependency:
```bash
pip install "argilla[legacy]"
```

Now, you can use the `v1` module to connect to the Argilla V1 server.
You can use the `v1` module to connect to the Argilla V1 server.

```python
import argilla.v1 as rg_v1
Expand Down Expand Up @@ -88,9 +163,7 @@ Next, define the new dataset settings:
```

1. The default field in `DatasetForTextClassification` is `text`, but make sure you provide all fields included in `record.inputs`.

2. Make sure you provide all relevant metadata fields available in the dataset.

3. Make sure you provide all relevant vectors available in the dataset.

=== "For multi-label classification"
Expand All @@ -113,9 +186,7 @@ Next, define the new dataset settings:
```

1. The default field in `DatasetForTextClassification` is `text`, but we should provide all fields included in `record.inputs`.

2. Make sure you provide all relevant metadata fields available in the dataset.

3. Make sure you provide all relevant vectors available in the dataset.

=== "For token classification"
Expand All @@ -138,7 +209,6 @@ Next, define the new dataset settings:
```

1. Make sure you provide all relevant metadata fields available in the dataset.

2. Make sure you provide all relevant vectors available in the dataset.

=== "For text generation"
Expand All @@ -161,21 +231,20 @@ Next, define the new dataset settings:
```

1. We should provide all relevant metadata fields available in the dataset.

2. We should provide all relevant vectors available in the dataset.

Finally, create the new dataset on the Argilla V2 server:

```python
dataset = rg.Dataset(name=dataset_name, settings=settings)
dataset = rg.Dataset(name=dataset_name, workspace=workspace, settings=settings)
dataset.create()
```

!!! note
If a dataset with the same name already exists, the `create` method will raise an exception. You can check if the dataset exists and delete it before creating a new one.

```python
dataset = client.datasets(name=dataset_name)
dataset = client.datasets(name=dataset_name, workspace=workspace)

if dataset is not None:
dataset.delete()
Expand Down
2 changes: 1 addition & 1 deletion argilla/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ nav:
- Import and export datasets: how_to_guides/import_export.md
- Advanced:
- Use Markdown to format rich content: how_to_guides/use_markdown_to_format_rich_content.md
- Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
- Migrate users, workspaces and datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
- Tutorials:
- tutorials/index.md
- Text classification: tutorials/text_classification.ipynb
Expand Down

0 comments on commit 78575e6

Please sign in to comment.