Skip to content

Commit

Permalink
Update phrasing for documentationlayout import export
Browse files Browse the repository at this point in the history
  • Loading branch information
davidberenstein1957 committed Jul 22, 2024
1 parent af1e488 commit a02accc
Show file tree
Hide file tree
Showing 2 changed files with 74 additions and 15 deletions.
87 changes: 73 additions & 14 deletions argilla/docs/how_to_guides/import_export.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,73 @@ description: In this section, we will provide a step-by-step guide to show how t
This guide provides an overview of how to import and export your dataset or its records to Python, your local disk, or the Hugging Face Hub.

In Argilla, you can import/export two main components of a dataset:
- The dataset's complete configuration defined in `rg.Settings`. This is useful if your want to share your feedback task or restore it later in Argilla.

- The dataset's complete configuration is defined in `rg.Settings`. This is useful if you want to share your feedback task or restore it later in Argilla.
- The records stored in the dataset, including `Metadata`, `Vectors`, `Suggestions`, and `Responses`. This is useful if you want to use your dataset's records outside of Argilla.

Check the [Dataset - Python Reference](../reference/argilla/datasets/dataset.md) to see the attributes, arguments, and methods of the export `Dataset` class in detail.
!!! info "Main Classes"
=== "`rg.Dataset.to_hub`"

```python
rg.Dataset.to_hub(
repo_id="<my_org>/<my_dataset>",
generate_card=True
)
```

=== "`rg.Dataset.from_hub`"

```python
rg.Dataset.from_hub(
repo_id="<my_org>/<my_dataset>",
workspace="<my_argilla_workspace>"
client=rg.Client(),
with_records=True
)
```
=== "`rg.Dataset.to_disk`"

```python
rg.Dataset.to_disk(
path="<directory>/<file>"
)
```
=== "`rg.Dataset.from_disk`"

```python
rg.Dataset.from_disk(
path="<directory>/<file>",
target_workspace=None,
target_name=None,
)
```
=== "`rg.Dataset.records.to_datasets()`"

```python
rg.Dataset.records.to_datasets()
```
=== "`rg.Dataset.records.to_dict()`"

```python
rg.Dataset.records.to_dict()
```
=== "`rg.Dataset.records.to_list()`"

To import records to a dataset, used the `rg.Datasets.records.log` method. Their is a guide on how to do this in the [Record - Python Reference](record.md).
```python
rg.Dataset.records.to_list()
```

## Import and Export an `rg.Dataset` from Argilla
> Check the [Dataset - Python Reference](../reference/argilla/datasets/dataset.md) to see the attributes, arguments, and methods of the export `Dataset` class in detail.
> Check the [Record - Python Reference](../reference/argilla/records/records.md) to see the attributes, arguments, and methods of the `Record` class in detail.

First, we will go through exporting a complete dataset from Argilla. This includes the dataset's setting and records. All of these methods use the `rg.Dataset.from_*` and `rg.Dataset.to_*` methods.

### Push an Argilla dataset to the Hugging Face Hub
## Importing and exporting datasets

First, we will go through exporting a complete dataset from Argilla. This includes the dataset's settings and records. All of these methods use the `rg.Dataset.from_*` and `rg.Dataset.to_*` methods.

### Hugging Face Hub

#### Export to Hub

You can push a dataset from Argilla to the Hugging Face Hub. This is useful if you want to share your dataset with the community or version control it. You can push the dataset to the Hugging Face Hub using the `rg.Dataset.to_hub` method.

Expand All @@ -37,8 +92,7 @@ dataset.to_hub(repo_id="<repo_id>")
dataset.to_hub(repo_id="<repo_id>", with_records=False)
```


### Pull an Argilla dataset from the Hugging Face Hub
#### Import from Hub

You can pull a dataset from the Hugging Face Hub to Argilla. This is useful if you want to restore a dataset and its configuration. You can pull the dataset from the Hugging Face Hub using the `rg.Dataset.from_hub` method.

Expand All @@ -60,7 +114,7 @@ The `rg.Dataset.from_hub` method loads the configuration and records from the da
dataset = rg.Dataset.from_hub(repo_id="<repo_id>", with_records=False)
```

With the dataset's configuration you could then make changes to the dataset. For example, you could adapt the dataset's settings for a different task:
With the dataset's configuration, you could then make changes to the dataset. For example, you could adapt the dataset's settings for a different task:

```python
dataset.settings.questions = [rg.TextQuestion(name="answer")]
Expand All @@ -73,9 +127,9 @@ The `rg.Dataset.from_hub` method loads the configuration and records from the da
dataset.log(hf_dataset)
```

### Local Disk


### Saving an Argilla dataset to local disk
#### Export to Disk

You can save a dataset from Argilla to your local disk. This is useful if you want to back up your dataset. You can use the `rg.Dataset.to_disk` method.

Expand All @@ -94,7 +148,7 @@ This will save the dataset's configuration and records to the specified path. If
dataset.to_disk(path="path/to/dataset", with_records=False)
```

### Loading an Argilla dataset from local disk
#### Import from Disk

You can load a dataset from your local disk to Argilla. This is useful if you want to restore a dataset's configuration. You can use the `rg.Dataset.from_disk` method.

Expand All @@ -111,12 +165,13 @@ dataset = rg.Dataset.from_disk(path="path/to/dataset")
dataset = rg.Dataset.from_disk(path="path/to/dataset", target_workspace=workspace, target_name="my_dataset")
```

## Export only records from Argilla Datasets
## Exporting and importing records

The records alone can be exported from a dataset in Argilla. This is useful if you want to process the records in Python, export them to a different platform, or use them in model training. All of these methods use the `rg.Dataset.records` attribute.

The records can be exported as a dictionary, a list of dictionaries, or to a `Dataset` of the `datasets` package.
### Export records

The records can be exported as a dictionary, a list of dictionaries, or a `Dataset` of the `datasets` package.

=== "To the `datasets` package"

Expand Down Expand Up @@ -177,3 +232,7 @@ The records can be exported as a dictionary, a list of dictionaries, or to a `Da
exported_records = dataset.records.to_list(flatten=True)
# [{"text": "Hello", "label": "greeting"}, {"text": "World", "label": "greeting"}]
```

### Import records

To import records to a dataset, used the `rg.Datasets.records.log` method. There is a guide on how to do this in [How-to guides - Record](./record.md) or you can check the [Record - Python Reference](../reference/argilla/records/records.md).
2 changes: 1 addition & 1 deletion argilla/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ nav:
- Manage and create datasets: how_to_guides/dataset.md
- Add, update, and delete records: how_to_guides/record.md
- Query and filter records: how_to_guides/query.md
- Importing and exporting datasets: how_to_guides/import_export.md
- Importing and exporting datasets and records: how_to_guides/import_export.md
- Annotate a dataset: how_to_guides/annotate.md
- Distribute the annotation task: how_to_guides/distribution.md
- Advanced:
Expand Down

0 comments on commit a02accc

Please sign in to comment.