Skip to content

Commit

Permalink
Import from hub docs (#5631)
Browse files Browse the repository at this point in the history
# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

Closes #<issue_number>

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Refactor (change restructuring the codebase without changing
functionality)
- Improvement (change adding some improvement to an existing
functionality)
- Documentation update

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: Leire Aguirre <[email protected]>
Co-authored-by: José Francisco Calvo <[email protected]>
Co-authored-by: José Francisco Calvo <[email protected]>
Co-authored-by: Paco Aranda <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Damián Pumar <[email protected]>
Co-authored-by: Francisco Aranda <[email protected]>
Co-authored-by: burtenshaw <[email protected]>
  • Loading branch information
9 people authored Oct 30, 2024
1 parent 2247283 commit e8d1d22
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 53 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
105 changes: 52 additions & 53 deletions argilla/docs/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Argilla is a free, open-source, self-hosted tool. This means you need to deploy

If you want to **run Argilla locally on your machine or a server**, or tune the server configuration, choose this option. To use this option, [check this guide](how-to-deploy-argilla-with-docker.md).

## Sign in into the Argilla UI
## Sign in to the Argilla UI

If everything went well, you should see the Argilla sign in page that looks like this:

Expand All @@ -82,21 +82,47 @@ In the sign in page:
!!! info "Unauthorized error"
Sometimes, after authorizing you'll see an unauthorized error, and get redirected to the sign in page. Typically, clicking the Sign in button again will solve this issue.

Congrats! Your Argilla server is ready to start your first project using the Python SDK. You now have full rights to create datasets. Follow the instructions in the home page, or keep reading this guide if you want a more detailed explanation.
Congrats! Your Argilla server is ready to start your first project.

## Install the Python SDK
## Create your first dataset

To manage workspaces and datasets in Argilla, you need to use the Argilla Python SDK. You can install it with pip as follows:
The quickest way to start exploring the tool and create your first dataset is by importing an exiting one from the Hugging Face Hub.

```console
pip install argilla
```
To do this, log in to the Argilla UI and in the Home page click on "Import from Hub". You can choose one of the sample datasets or paste a repo id in the input. This will look something like `stanfordnlp/imdb`.

## Create your first dataset
Argilla will automatically interpret the columns in the dataset to map them to Fields and Questions.

**Fields** include the data that you want feedback on, like text, chats, or images. If you want to exclude any of the Fields that Argilla identified for you, simply select the "No mapping" option.

**Questions** are the feedback you want to collect, like labels, ratings, rankings, or text. If Argilla identified questions in your dataset that you don't want, you can eliminate them. You can also add questions of your own.

![Screenshot of the dataset configuration page](../assets/images/getting_started/dataset_configurator.png)

Note that you will be able to modify some elements of the configuration of the dataset after it has been created from the Dataset Settings page e.g., the titles of fields and questions. Check all the settings you can modify in the [Update a dataset](../how_to_guides/dataset.md#update-a-dataset) section.

When you're happy with the result, you'll need to give a name to your dataset, select a workspace and choose a split, if applicable. Then, Argilla will start importing the dataset in the background. Now you're all set up to start annotating!

!!! info "Importing long datasets"
Argilla will only import the first 10k rows of a dataset. If your dataset is larger, you can import the rest of the records at any point using the Python SDK.

To do that, open your dataset and copy the code snippet provided under "Import data". Now, open a Jupyter or Google Colab notebook and install argilla:

```python
!pip install argilla
```
Then, paste and run your code snippet. This will import the remaining records to your dataset.

## Install and connect the Python SDK

For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab.
For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab. You will need this to manage users, workspaces and datasets in Argilla.

To start interacting with your Argilla server, you need to create a instantiate a client with an API key and API URL:
In your notebook, you can install the Argilla SDK with pip as follows:

```python
!pip install argilla
```

To start interacting with your Argilla server, you need to instantiate a client with an API key and API URL:

- The `<api_key>` is in the `My Settings` page of your Argilla Space but make sure you are logged in with the `owner` account you used to create the Space.

Expand All @@ -112,65 +138,38 @@ client = rg.Argilla(
```

!!! info "You can't find your API URL"
If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, there are two options:
1. Click on the three points menu at the top of the Space, select "Embed this Space", and open the direct URL.
2. Use this pattern: `https://[your-owner-name]-[your_space_name].hf.space`.
If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, you have several options:

1. In the Home page of Argilla, click on "Import from the SDK". You will find your API URL and key in the code snippet provided.
2. Click on the three points menu at the top of the Space, select "Embed this Space", and open the direct URL.
3. Use this pattern: `https://[your-owner-name]-[your_space_name].hf.space`.

To create a dataset with a simple text classification task, first, you need to **define the dataset settings**.
To check that everything is running correctly, you can call `me`. This should return your user information:

```python
settings = rg.Settings(
guidelines="Classify the reviews as positive or negative.",
fields=[
rg.TextField(
name="review",
title="Text from the review",
use_markdown=False,
),
],
questions=[
rg.LabelQuestion(
name="my_label",
title="In which category does this article fit?",
labels=["positive", "negative"],
)
],
)
client.me
```

Now you can **create the dataset with these settings**. Publish the dataset to make it available in the UI and add the records.
From here, you can manage all of your assets in Argilla, including updating the dataset we created earlier and adding advanced information, such as vectors, metadata or suggestions. To learn how to do this, check our [how to guides](../how_to_guides/index.md).

!!! info "About workspaces"
Workspaces in Argilla group datasets and user access rights. The `workspace` parameter is optional in this case. If you don't specify it, the dataset will be created in the default workspace `argilla`.
## Export your dataset to the Hub

By default, **this workspace will be visible to users joining with the Sign in with Hugging Face button**. You can create other workspaces and decide to grant access to users either with the SDK or the [changing the OAuth configuration](how-to-configure-argilla-on-huggingface.md).
Once you've spent some time annotating your dataset in Argilla, you can upload it back to the Hugging Face Hub to share with others or version control it.

```python
dataset = rg.Dataset(
name=f"my_first_dataset",
settings=settings,
client=client,
#workspace="argilla"
)
dataset.create()
```

Now you can **add records to your dataset**. We will use the IMDB dataset from the Hugging Face Datasets library as an example. The `mapping` parameter indicates which keys/columns in the source dataset correspond to the Argilla dataset fields.
To do that, first follow the steps in the previous section to connect to your Argilla server using the SDK. Then, you can load your dataset and export it to the hub like this:

```python
from datasets import load_dataset

data = load_dataset("imdb", split="train[:100]").to_list()
dataset = client.datasets(name="my_dataset")

dataset.records.log(records=data, mapping={"text": "review"})
dataset.to_hub(repo_id="<my_org>/<my_dataset>")
```

🎉 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records.
For more info on exporting datasets to the Hub, read our guide on [exporting datasets](../how_to_guides/import_export.md#export-to-hub).

## Next steps

- To learn how to create your datasets, workspace, and manage users, check the [how-to guides](../how_to_guides/index.md).
- To learn how to create your own datasets, workspaces, and manage users, check the [how-to guides](../how_to_guides/index.md).

- To learn Argilla with hands-on examples, check the [Tutorials section](../tutorials/index.md).

- To further configure your Argilla Space, check the [Hugging Face Spaces settings guide](how-to-configure-argilla-on-huggingface.md).
- To further configure your Argilla Space, check the [Hugging Face Spaces settings guide](how-to-configure-argilla-on-huggingface.md).

0 comments on commit e8d1d22

Please sign in to comment.