From e3213efd971af131c1e5938c74d2885bcf928825 Mon Sep 17 00:00:00 2001 From: davidberenstein1957 Date: Mon, 7 Oct 2024 13:51:18 +0000 Subject: [PATCH] Deployed 446a18d to docs_update-advanced-custom-field-example with MkDocs 1.6.0 and mike 2.1.3 --- .../how_to_guides/custom_fields/index.html | 59 +++++++++++-------- .../search/search_index.json | 2 +- 2 files changed, 35 insertions(+), 26 deletions(-) diff --git a/docs_update-advanced-custom-field-example/how_to_guides/custom_fields/index.html b/docs_update-advanced-custom-field-example/how_to_guides/custom_fields/index.html index 08250106c9..2ffd49d045 100644 --- a/docs_update-advanced-custom-field-example/how_to_guides/custom_fields/index.html +++ b/docs_update-advanced-custom-field-example/how_to_guides/custom_fields/index.html @@ -2586,34 +2586,43 @@

Advanced ModeUsage example

Let's reproduce example from the Without advanced mode section but this time we will insert the handlebars syntax engine into the template ourselves.

template = """
-<div id="custom-field-container"></div>
+<div id="content"></div>
 <script id="template" type="text/x-handlebars-template">
-    <div id="container">
-        <div class="column">
-            <h3>Original</h3>
-            <img src="{{record.fields.image.original}}" />
-        </div>
-        <div class="column">
-            <h3>Revision</h3>
-            <img src="{{record.fields.image.revision}}" />
-        </div>
-    </div>
-</script>
-""" # (1)
-
-script = """
-<script src="https://cdn.jsdelivr.net/npm/handlebars@latest/dist/handlebars.js"></script>
-<script>
-    const template = document.getElementById("template").innerHTML;
-    const compiledTemplate = Handlebars.compile(template);
-    const html = compiledTemplate({ record });
-    document.getElementById("custom-field-container").innerHTML = html;
-</script>
-""" # (2)
+    <style>
+    #container {
+        display: flex;
+        gap: 10px;
+    }
+    .column {
+        flex: 1;
+    }
+    </style>
+    <div id="container">
+        <div class="column">
+            <h3>Original</h3>
+            <img src="{{record.fields.image.original}}" />
+        </div>
+        <div class="column">
+            <h3>Revision</h3>
+            <img src="{{record.fields.image.revision}}" />
+        </div>
+    </div>
+</script>
+""" # (1)
+
+script = """
+<script src="https://cdn.jsdelivr.net/npm/handlebars@latest/dist/handlebars.js"></script>
+<script>
+    const template = document.getElementById("template").innerHTML;
+    const compiledTemplate = Handlebars.compile(template);
+    const html = compiledTemplate({ record });
+    document.getElementById("content").innerHTML = html;
+</script>
+""" # (2)
 
    -
  1. This is a JavaScript template script. We set id to template to use it later in our JavaScript code and type to text/x-handlebars-template to indicate that this is a Handlebars template. Note that we also added a div with id to custom-field-container to render the template into.
  2. -
  3. This is a JavaScript template script. We load the Handlebars library and then use it to compile the template and render the record. Eventually, we render the result into the div with id to custom-field-container.
  4. +
  5. This is a JavaScript template script. We set id to template to use it later in our JavaScript code and type to text/x-handlebars-template to indicate that this is a Handlebars template. Note that we also added a div with id to content to render the template into.
  6. +
  7. This is a JavaScript template script. We load the Handlebars library and then use it to compile the template and render the record. Eventually, we render the result into the div with id to content.

We can now pass these templates to the CustomField class, ensuring that the advanced_mode is set to True.

import argilla as rg
diff --git a/docs_update-advanced-custom-field-example/search/search_index.json b/docs_update-advanced-custom-field-example/search/search_index.json
index e30224a0a2..c16e75092e 100644
--- a/docs_update-advanced-custom-field-example/search/search_index.json
+++ b/docs_update-advanced-custom-field-example/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to Argilla","text":"

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets.

To get started:

  • Get started in 5 minutes!

    Deploy Argilla for free on the Hugging Face Hub or with Docker. Install the Python SDK with pip and create your first project.

    Quickstart

  • How-to guides

    Get familiar with the basic workflows of Argilla. Learn how to manage Users, Workspaces, Datasets, and Records to set up your data annotation projects.

    Learn more

Or, play with the Argilla UI by signing in with your Hugging Face account:

Looking for Argilla 1.x?

Looking for documentation for Argilla 1.x? Visit the latest release.

Migrate to Argilla 2.x

Want to learn how to migrate from Argilla 1.x to 2.x? Take a look at our dedicated Migration Guide.

"},{"location":"#why-use-argilla","title":"Why use Argilla?","text":"

Argilla can be used for collecting human feedback for a wide variety of AI projects like traditional NLP (text classification, NER, etc.), LLMs (RAG, preference tuning, etc.), or multimodal models (text to image, etc.).

Argilla's programmatic approach lets you build workflows for continuous evaluation and model improvement. The goal of Argilla is to ensure your data work pays off by quickly iterating on the right data and models.

Improve your AI output quality through data quality

Compute is expensive and output quality is important. We help you focus on data, which tackles the root cause of both of these problems at once. Argilla helps you to achieve and keep high-quality standards for your data. This means you can improve the quality of your AI outputs.

Take control of your data and models

Most AI tools are black boxes. Argilla is different. We believe that you should be the owner of both your data and your models. That's why we provide you with all the tools your team needs to manage your data and models in a way that suits you best.

Improve efficiency by quickly iterating on the right data and models

Gathering data is a time-consuming process. Argilla helps by providing a tool that allows you to interact with your data in a more engaging way. This means you can quickly and easily label your data with filters, AI feedback suggestions and semantic search. So you can focus on training your models and monitoring their performance.

"},{"location":"#what-do-people-build-with-argilla","title":"What do people build with Argilla?","text":"

Datasets and models

Argilla is a tool that can be used to achieve and keep high-quality data standards with a focus on NLP and LLMs. The community uses Argilla to create amazing open-source datasets and models, and we love contributions to open-source too.

  • cleaned UltraFeedback dataset and the Notus and Notux models, where we improved benchmark and empirical human judgment for the Mistral and Mixtral models with cleaner data using human feedback.
  • distilabeled Intel Orca DPO dataset and the improved OpenHermes model, show how we improve model performance by filtering out 50% of the original dataset through human and AI feedback.

Projects and pipelines

AI teams from companies like the Red Cross, Loris.ai and Prolific use Argilla to improve the quality and efficiency of AI projects. They shared their experiences in the AI community meetup.

  • AI for good: the Red Cross presentation showcases how their experts and AI team collaborate by classifying and redirecting requests from refugees of the Ukrainian crisis to streamline the support processes of the Red Cross.
  • Customer support: during the Loris meetup they showed how their AI team uses unsupervised and few-shot contrastive learning to help them quickly validate and gain labelled samples for a huge amount of multi-label classifiers.
  • Research studies: the showcase from Prolific announced their integration with Argilla. They use it to actively distribute data collection projects among their annotating workforce. This allows them to quickly and efficiently collect high-quality data for their research studies.
"},{"location":"community/","title":"Community","text":"

We are an open-source community-driven project not only focused on building a great product but also on building a great community, where you can get support, share your experiences, and contribute to the project! We would love to hear from you and help you get started with Argilla.

  • Discord

    In our Discord channels (#argilla-distilabel-general and #argilla-distilabel-help), you can get direct support from the community.

    Discord \u2197

  • Community Meetup

    We host bi-weekly community meetups where you can listen in or present your work.

    Community Meetup \u2197

  • Changelog

    The changelog is where you can find the latest updates and changes to the Argilla project.

    Changelog \u2197

  • Roadmap

    We love to discuss our plans with the community. Feel encouraged to participate in our roadmap discussions.

    Roadmap \u2197

"},{"location":"community/changelog/","title":"Changelog","text":"

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

"},{"location":"community/changelog/#unreleased","title":"Unreleased","text":""},{"location":"community/changelog/#230","title":"2.3.0","text":""},{"location":"community/changelog/#added","title":"Added","text":"
  • Added support for CustomField. (#5422)
  • Added inserted_at and updated_at to Resource model as properties. (#5540)
  • Added limit argument when fetching records. (#5525
  • Added similarity search support. (#5546)
  • Added filter support for id, _server_id, inserted_at and updated_at record attributes. (#5545)
  • Added support to read argilla credentials from colab secrets. (#5541))
"},{"location":"community/changelog/#changed","title":"Changed","text":"
  • Changed the repr method for SettingsProperties to display the details of all the properties in Setting object. (#5380)
  • Changed error messages when creating datasets with insufficient permissions. (#5540)
"},{"location":"community/changelog/#fixed","title":"Fixed","text":"
  • Fixed serialization of ChatField when collecting records from the hub and exporting to datasets. (#5554)
"},{"location":"community/changelog/#222","title":"2.2.2","text":""},{"location":"community/changelog/#fixed_1","title":"Fixed","text":"
  • Fixed from_hub with unsupported column names. (#5524)
  • Fixed from_hub with missing dataset subset configuration value. (#5524)
"},{"location":"community/changelog/#changed_1","title":"Changed","text":"
  • Changed from_hub to only generate fields not questions for strings in dataset. (#5524)
"},{"location":"community/changelog/#221","title":"2.2.1","text":""},{"location":"community/changelog/#fixed_2","title":"Fixed","text":"
  • Fixed from_hub errors when columns names contain uppercase letters. (#5523)
  • Fixed from_hub errors when class feature values contains unlabelled values. (#5523)
  • Fixed from_hub errors when loading cached datasets. (#5523)
"},{"location":"community/changelog/#220","title":"2.2.0","text":"
  • Added new ChatField supporting chat messages. (#5376)
  • Added template settings to rg.Settings for classification, rating, and ranking questions. (#5426)
  • Added rg.Settings definition based on datasets.Features within rg.Dataset.from_hub. (#5426)
  • Added persistent record mapping to rg.Settings to be used in rg.Dataset.records.log. (#5466)
  • Added multiple error handling methods to the rg.Dataset.records.log method to warn, ignore, or raise errors. (#5466)
  • Changed dataset import and export of rg.LabelQuestion to use datasets.ClassLabel not datasets.Value. (#5474)
"},{"location":"community/changelog/#210","title":"2.1.0","text":""},{"location":"community/changelog/#added_1","title":"Added","text":"
  • Added new ImageField supporting URLs and Data URLs. (#5279)
  • Added dark mode (#5412)
  • Added settings parameter to rg.Dataset.from_hub to define the dataset settings before ingesting a dataset from the hub. (#5418)
"},{"location":"community/changelog/#201","title":"2.0.1","text":""},{"location":"community/changelog/#fixed_3","title":"Fixed","text":"
  • Fixed error when creating optional fields. (#5362)
  • Fixed error creating integer and float metadata with visible_for_annotators. (#5364)
  • Fixed error when logging records with suggestions or responses for non-existent questions. (#5396 by @maxserras)
  • Fixed error from conflicts in testing suite when running tests in parallel. (#5349)
  • Fixed error in response model when creating a response with a None value. (#5343)
"},{"location":"community/changelog/#changed_2","title":"Changed","text":"
  • Changed from_hub method to raise an error when a dataset with the same name exists. (#5258)
  • Changed log method when ingesting records with no known keys to raise a descriptive error. (#5356)
  • Changed code snippets to add new datasets (#5395)
"},{"location":"community/changelog/#added_2","title":"Added","text":"
  • Added Google Analytics to the documentation site. (#5366)
  • Added frontend skeletons to progress metrics to optimise load time and improve user experience. (#5391)
  • Added documentation in methods in API references for the Python SDK. (#5400)
"},{"location":"community/changelog/#fixed_4","title":"Fixed","text":"
  • Fix bug when submit the latest record, sometimes you navigate to non existing page #5419
"},{"location":"community/changelog/#200","title":"2.0.0","text":""},{"location":"community/changelog/#added_3","title":"Added","text":"
  • Added core class refactors. For an overview, see this blog post
  • Added TaskDistribution to define distribution of records to users .
  • Added new documentation site and structure and migrated legacy documentation.
"},{"location":"community/changelog/#changed_3","title":"Changed","text":"
  • Changed FeedbackDataset to Dataset.
  • Changed rg.init into rg.Argilla class to interact with Argilla server.
"},{"location":"community/changelog/#deprecated","title":"Deprecated","text":"
  • Deprecated task specific dataset classes like TextClassification and TokenClassification. To migrate legacy datasets to rg.Dataset class, see the how-to-guide.
  • Deprecated use case extensions like listeners and ArgillaTrainer.
"},{"location":"community/changelog/#200rc1","title":"2.0.0rc1","text":"

[!NOTE] This release for 2.0.0rc1 does not contain any changelog entries because it is the first release candidate for the 2.0.0 version. The following versions will contain the changelog entries again. For a general overview of the changes in the 2.0.0 version, please refer to our blog or our new documentation.

"},{"location":"community/changelog/#1290","title":"1.29.0","text":""},{"location":"community/changelog/#added_4","title":"Added","text":"
  • Added support for rating questions to include 0 as a valid value. (#4860)
  • Added support for Python 3.12. (#4837)
  • Added search by field in the FeedbackDataset UI search. (#4746)
  • Added record metadata info in the FeedbackDataset UI. (#4851)
  • Added highlight on search results in the FeedbackDataset UI. (#4747)
"},{"location":"community/changelog/#fixed_5","title":"Fixed","text":"
  • Fix wildcard import for the whole argilla module. (#4874)
  • Fix issue when record does not have vectors related. (#4856)
  • Fix issue on character level. (#4836)
"},{"location":"community/changelog/#1280","title":"1.28.0","text":""},{"location":"community/changelog/#added_5","title":"Added","text":"
  • Added suggestion multi score attribute. (#4730)
  • Added order by suggestion first. (#4731)
  • Added multi selection entity dropdown for span annotation overlap. (#4735)
  • Added pre selection highlight for span annotation. (#4726)
  • Added banner when persistent storage is not enabled. (#4744)
  • Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)
"},{"location":"community/changelog/#changed_4","title":"Changed","text":"
  • Changed the way how Hugging Face space and user is showed in sign in. (#4748)
"},{"location":"community/changelog/#fixed_6","title":"Fixed","text":"
  • Fixed Korean character reversed. (#4753)
"},{"location":"community/changelog/#fixed_7","title":"Fixed","text":"
  • Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)
"},{"location":"community/changelog/#1270","title":"1.27.0","text":""},{"location":"community/changelog/#added_6","title":"Added","text":"
  • Added Allow overlap spans in the FeedbackDataset. (#4668)
  • Added allow_overlapping parameter for span questions. (#4697)
  • Added overall progress bar on Datasets table. (#4696)
  • Added German language translation. (#4688)
"},{"location":"community/changelog/#changed_5","title":"Changed","text":"
  • New UI design for suggestions. (#4682)
"},{"location":"community/changelog/#fixed_8","title":"Fixed","text":"
  • Improve performance for more than 250 labels. (#4702)
"},{"location":"community/changelog/#1261","title":"1.26.1","text":""},{"location":"community/changelog/#added_7","title":"Added","text":"
  • Added support for automatic detection of RTL languages. (#4686)
"},{"location":"community/changelog/#1260","title":"1.26.0","text":""},{"location":"community/changelog/#added_8","title":"Added","text":"
  • If you expand the labels of a single or multi label Question, the state is maintained during the entire annotation process. (#4630)
  • Added support for span questions in the Python SDK. (#4617)
  • Added support for span values in suggestions and responses. (#4623)
  • Added span questions for FeedbackDataset. (#4622)
  • Added ARGILLA_CACHE_DIR environment variable to configure the client cache directory. (#4509)
"},{"location":"community/changelog/#fixed_9","title":"Fixed","text":"
  • Fixed contextualized workspaces. (#4665)
  • Fixed prepare for training when passing RankingValueSchema instances to suggestions. (#4628)
  • Fixed parsing ranking values in suggestions from HF datasets. (#4629)
  • Fixed reading description from API response payload. (#4632)
  • Fixed pulling (n*chunk_size)+1 records when using ds.pull or iterating over the dataset. (#4662)
  • Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)
"},{"location":"community/changelog/#1250","title":"1.25.0","text":"

[!NOTE] For changes in the argilla-server module, visit the argilla-server release notes

"},{"location":"community/changelog/#added_9","title":"Added","text":"
  • Reorder labels in dataset settings page for single/multi label questions (#4598)
  • Added pandas v2 support using the python SDK. (#4600)
"},{"location":"community/changelog/#removed","title":"Removed","text":"
  • Removed missing response for status filter. Use pending instead. (#4533)
"},{"location":"community/changelog/#fixed_10","title":"Fixed","text":"
  • Fixed FloatMetadataProperty: value is not a valid float (#4570)
  • Fixed redirect to user-settings instead of 404 user_settings (#4609)
"},{"location":"community/changelog/#1240","title":"1.24.0","text":"

[!NOTE] This release does not contain any new features, but it includes a major change in the argilla-server dependency. The package is using the argilla-server dependency defined here. (#4537)

"},{"location":"community/changelog/#changed_6","title":"Changed","text":"
  • The package is using the argilla-server dependency defined here. (#4537)
"},{"location":"community/changelog/#1231","title":"1.23.1","text":""},{"location":"community/changelog/#fixed_11","title":"Fixed","text":"
  • Fixed Responsive view for Feedback Datasets. (#4579)
"},{"location":"community/changelog/#1230","title":"1.23.0","text":""},{"location":"community/changelog/#added_10","title":"Added","text":"
  • Added bulk annotation by filter criteria. (#4516)
  • Automatically fetch new datasets on focus tab. (#4514)
  • API v1 responses returning Record schema now always include dataset_id as attribute. (#4482)
  • API v1 responses returning Response schema now always include record_id as attribute. (#4482)
  • API v1 responses returning Question schema now always include dataset_id attribute. (#4487)
  • API v1 responses returning Field schema now always include dataset_id attribute. (#4488)
  • API v1 responses returning MetadataProperty schema now always include dataset_id attribute. (#4489)
  • API v1 responses returning VectorSettings schema now always include dataset_id attribute. (#4490)
  • Added pdf_to_html function to .html_utils module that convert PDFs to dataURL to be able to render them in tha Argilla UI. (#4481)
  • Added ARGILLA_AUTH_SECRET_KEY environment variable. (#4539)
  • Added ARGILLA_AUTH_ALGORITHM environment variable. (#4539)
  • Added ARGILLA_AUTH_TOKEN_EXPIRATION environment variable. (#4539)
  • Added ARGILLA_AUTH_OAUTH_CFG environment variable. (#4546)
  • Added OAuth2 support for HuggingFace Hub. (#4546)
"},{"location":"community/changelog/#deprecated_1","title":"Deprecated","text":"
  • Deprecated ARGILLA_LOCAL_AUTH_* environment variables. Will be removed in the release v1.25.0. (#4539)
"},{"location":"community/changelog/#changed_7","title":"Changed","text":"
  • Changed regex pattern for username attribute in UserCreate. Now uppercase letters are allowed. (#4544)
"},{"location":"community/changelog/#removed_1","title":"Removed","text":"
  • Remove sending Authorization header from python SDK requests. (#4535)
"},{"location":"community/changelog/#fixed_12","title":"Fixed","text":"
  • Fixed keyboard shortcut for label questions. (#4530)
"},{"location":"community/changelog/#1220","title":"1.22.0","text":""},{"location":"community/changelog/#added_11","title":"Added","text":"
  • Added Bulk annotation support. (#4333)
  • Restore filters from feedback dataset settings. ([#4461])(https://github.com/argilla-io/argilla/pull/4461)
  • Warning on feedback dataset settings when leaving page with unsaved changes. (#4461)
  • Added pydantic v2 support using the python SDK. (#4459)
  • Added vector_settings to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset. (#4454)
  • Added integration for sentence-transformers using SentenceTransformersExtractor to configure vector_settings in FeedbackDataset and FeedbackRecord. (#4454)
"},{"location":"community/changelog/#changed_8","title":"Changed","text":"
  • Module argilla.cli.server definitions have been moved to argilla.server.cli module. (#4472)
  • [breaking] Changed vector_settings_by_name for generic property_by_name usage, which will return None instead of raising an error. (#4454)
  • The constant definition ES_INDEX_REGEX_PATTERN in module argilla._constants is now private. (#4472)
  • nan values in metadata properties will raise a 422 error when creating/updating records. (#4300)
  • None values are now allowed in metadata properties. (#4300)
  • Refactor and add width, height, autoplay and loop attributes as optional args in to_html functions. (#4481)
"},{"location":"community/changelog/#fixed_13","title":"Fixed","text":"
  • Paginating to a new record, automatically scrolls down to selected form area. (#4333)
"},{"location":"community/changelog/#deprecated_2","title":"Deprecated","text":"
  • The missing response status for filtering records is deprecated and will be removed in the release v1.24.0. Use pending instead. (#4433)
"},{"location":"community/changelog/#removed_2","title":"Removed","text":"
  • The deprecated python -m argilla database command has been removed. (#4472)
"},{"location":"community/changelog/#1210","title":"1.21.0","text":""},{"location":"community/changelog/#added_12","title":"Added","text":"
  • Added new draft queue for annotation view (#4334)
  • Added annotation metrics module for the FeedbackDataset (argilla.client.feedback.metrics). (#4175).
  • Added strategy to handle and translate errors from the server for 401 HTTP status code` (#4362)
  • Added integration for textdescriptives using TextDescriptivesExtractor to configure metadata_properties in FeedbackDataset and FeedbackRecord. (#4400). Contributed by @m-newhauser
  • Added POST /api/v1/me/responses/bulk endpoint to create responses in bulk for current user. (#4380)
  • Added list support for term metadata properties. (Closes #4359)
  • Added new CLI task to reindex datasets and records into the search engine. (#4404)
  • Added httpx_extra_kwargs argument to rg.init and Argilla to allow passing extra arguments to httpx.Client used by Argilla. (#4440)
  • Added ResponseStatusFilter enum in __init__ imports of Argilla (#4118). Contributed by @Piyush-Kumar-Ghosh.
"},{"location":"community/changelog/#changed_9","title":"Changed","text":"
  • More productive and simpler shortcut system (#4215)
  • Move ArgillaSingleton, init and active_client to a new module singleton. (#4347)
  • Updated argilla.load functions to also work with FeedbackDatasets. (#4347)
  • [breaking] Updated argilla.delete functions to also work with FeedbackDatasets. It now raises an error if the dataset does not exist. (#4347)
  • Updated argilla.list_datasets functions to also work with FeedbackDatasets. (#4347)
"},{"location":"community/changelog/#fixed_14","title":"Fixed","text":"
  • Fixed error in TextClassificationSettings.from_dict method in which the label_schema created was a list of dict instead of a list of str. (#4347)
  • Fixed total records on pagination component (#4424)
"},{"location":"community/changelog/#removed_3","title":"Removed","text":"
  • Removed draft auto save for annotation view (#4334)
"},{"location":"community/changelog/#1200","title":"1.20.0","text":""},{"location":"community/changelog/#added_13","title":"Added","text":"
  • Added GET /api/v1/datasets/:dataset_id/records/search/suggestions/options endpoint to return suggestion available options for searching. (#4260)
  • Added metadata_properties to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset.(#4192).
  • Added get_model_kwargs, get_trainer_kwargs, get_trainer_model, get_trainer_tokenizer and get_trainer -methods to the ArgillaTrainer to improve interoperability across frameworks. (#4214).
  • Added additional formatting checks to the ArgillaTrainer to allow for better interoperability of defaults and formatting_func usage. (#4214).
  • Added a warning to the update_config-method of ArgillaTrainer to emphasize if the kwargs were updated correctly. (#4214).
  • Added argilla.client.feedback.utils module with html_utils (this mainly includes video/audio/image_to_html that convert media to dataURL to be able to render them in tha Argilla UI and create_token_highlights to highlight tokens in a custom way. Both work on TextQuestion and TextField with use_markdown=True) and assignments (this mainly includes assign_records to assign records according to a number of annotators and records, an overlap and the shuffle option; and assign_workspace to assign and create if needed a workspace according to the record assignment). (#4121)
"},{"location":"community/changelog/#fixed_15","title":"Fixed","text":"
  • Fixed error in ArgillaTrainer, with numerical labels, using RatingQuestion instead of RankingQuestion (#4171)
  • Fixed error in ArgillaTrainer, now we can train for extractive_question_answering using a validation sample (#4204)
  • Fixed error in ArgillaTrainer, when training for sentence-similarity it didn't work with a list of values per record (#4211)
  • Fixed error in the unification strategy for RankingQuestion (#4295)
  • Fixed TextClassificationSettings.labels_schema order was not being preserved. Closes #3828 (#4332)
  • Fixed error when requesting non-existing API endpoints. Closes #4073 (#4325)
  • Fixed error when passing draft responses to create records endpoint. (#4354)
"},{"location":"community/changelog/#changed_10","title":"Changed","text":"
  • [breaking] Suggestions agent field only accepts now some specific characters and a limited length. (#4265)
  • [breaking] Suggestions score field only accepts now float values in the range 0 to 1. (#4266)
  • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support optional query attribute. (#4327)
  • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support filter and sort attributes. (#4327)
  • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support optional query attribute. (#4270)
  • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support filter and sort attributes. (#4270)
  • Changed the logging style while pulling and pushing FeedbackDataset to Argilla from tqdm style to rich. (#4267). Contributed by @zucchini-nlp.
  • Updated push_to_argilla to print repr of the pushed RemoteFeedbackDataset after push and changed show_progress to True by default. (#4223)
  • Changed models and tokenizer for the ArgillaTrainer to explicitly allow for changing them when needed. (#4214).
"},{"location":"community/changelog/#1190","title":"1.19.0","text":""},{"location":"community/changelog/#added_14","title":"Added","text":"
  • Added POST /api/v1/datasets/:dataset_id/records/search endpoint to search for records without user context, including responses by all users. (#4143)
  • Added POST /api/v1/datasets/:dataset_id/vectors-settings endpoint for creating vector settings for a dataset. (#3776)
  • Added GET /api/v1/datasets/:dataset_id/vectors-settings endpoint for listing the vectors settings for a dataset. (#3776)
  • Added DELETE /api/v1/vectors-settings/:vector_settings_id endpoint for deleting a vector settings. (#3776)
  • Added PATCH /api/v1/vectors-settings/:vector_settings_id endpoint for updating a vector settings. (#4092)
  • Added GET /api/v1/records/:record_id endpoint to get a specific record. (#4039)
  • Added support to include vectors for GET /api/v1/datasets/:dataset_id/records endpoint response using include query param. (#4063)
  • Added support to include vectors for GET /api/v1/me/datasets/:dataset_id/records endpoint response using include query param. (#4063)
  • Added support to include vectors for POST /api/v1/me/datasets/:dataset_id/records/search endpoint response using include query param. (#4063)
  • Added show_progress argument to from_huggingface() method to make the progress bar for parsing records process optional.(#4132).
  • Added a progress bar for parsing records process to from_huggingface() method with trange in tqdm.(#4132).
  • Added to sort by inserted_at or updated_at for datasets with no metadata. (4147)
  • Added max_records argument to pull() method for RemoteFeedbackDataset.(#4074)
  • Added functionality to push your models to the Hugging Face hub with ArgillaTrainer.push_to_huggingface (#3976). Contributed by @Racso-3141.
  • Added filter_by argument to ArgillaTrainer to filter by response_status (#4120).
  • Added sort_by argument to ArgillaTrainer to sort by metadata (#4120).
  • Added max_records argument to ArgillaTrainer to limit record used for training (#4120).
  • Added add_vector_settings method to local and remote FeedbackDataset. (#4055)
  • Added update_vectors_settings method to local and remote FeedbackDataset. (#4122)
  • Added delete_vectors_settings method to local and remote FeedbackDataset. (#4130)
  • Added vector_settings_by_name method to local and remote FeedbackDataset. (#4055)
  • Added find_similar_records method to local and remote FeedbackDataset. (#4023)
  • Added ARGILLA_SEARCH_ENGINE environment variable to configure the search engine to use. (#4019)
"},{"location":"community/changelog/#changed_11","title":"Changed","text":"
  • [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. (#4173)
  • [breaking] Users working with OpenSearch engines must use version >=2.4 and set ARGILLA_SEARCH_ENGINE=opensearch. (#4019 and #4111)
  • [breaking] Changed FeedbackDataset.*_by_name() methods to return None when no match is found (#4101).
  • [breaking] limit query parameter for GET /api/v1/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
  • [breaking] limit query parameter for GET /api/v1/me/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
  • Update GET /api/v1/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
  • Update GET /api/v1/me/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
  • Update POST /api/v1/datasets/:dataset_id/records endpoint to allow to create records with vectors (#4022)
  • Update PATCH /api/v1/datasets/:dataset_id endpoint to allow updating allow_extra_metadata attribute. (#4112)
  • Update PATCH /api/v1/datasets/:dataset_id/records endpoint to allow to update records with vectors. (#4062)
  • Update PATCH /api/v1/records/:record_id endpoint to allow to update record with vectors. (#4062)
  • Update POST /api/v1/me/datasets/:dataset_id/records/search endpoint to allow to search records with vectors. (#4019)
  • Update BaseElasticAndOpenSearchEngine.index_records method to also index record vectors. (#4062)
  • Update FeedbackDataset.__init__ to allow passing a list of vector settings. (#4055)
  • Update FeedbackDataset.push_to_argilla to also push vector settings. (#4055)
  • Update FeedbackDatasetRecord to support the creation of records with vectors. (#4043)
  • Using cosine similarity to compute similarity between vectors. (#4124)
"},{"location":"community/changelog/#fixed_16","title":"Fixed","text":"
  • Fixed svg images out of screen with too large images (#4047)
  • Fixed creating records with responses from multiple users. Closes #3746 and #3808 (#4142)
  • Fixed deleting or updating responses as an owner for annotators. (Commit 403a66d)
  • Fixed passing user_id when getting records by id. (Commit 98c7927)
  • Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes #4089 (#4200)
"},{"location":"community/changelog/#1180","title":"1.18.0","text":""},{"location":"community/changelog/#added_15","title":"Added","text":"
  • New GET /api/v1/datasets/:dataset_id/metadata-properties endpoint for listing dataset metadata properties. (#3813)
  • New POST /api/v1/datasets/:dataset_id/metadata-properties endpoint for creating dataset metadata properties. (#3813)
  • New PATCH /api/v1/metadata-properties/:metadata_property_id endpoint allowing the update of a specific metadata property. (#3952)
  • New DELETE /api/v1/metadata-properties/:metadata_property_id endpoint for deletion of a specific metadata property. (#3911)
  • New GET /api/v1/metadata-properties/:metadata_property_id/metrics endpoint to compute metrics for a specific metadata property. (#3856)
  • New PATCH /api/v1/records/:record_id endpoint to update a record. (#3920)
  • New PATCH /api/v1/dataset/:dataset_id/records endpoint to bulk update the records of a dataset. (#3934)
  • Missing validations to PATCH /api/v1/questions/:question_id. Now title and description are using the same validations used to create questions. (#3967)
  • Added TermsMetadataProperty, IntegerMetadataProperty and FloatMetadataProperty classes allowing to define metadata properties for a FeedbackDataset. (#3818)
  • Added metadata_filters to filter_by method in RemoteFeedbackDataset to filter based on metadata i.e. TermsMetadataFilter, IntegerMetadataFilter, and FloatMetadataFilter. (#3834)
  • Added a validation layer for both metadata_properties and metadata_filters in their schemas and as part of the add_records and filter_by methods, respectively. (#3860)
  • Added sort_by query parameter to listing records endpoints that allows to sort the records by inserted_at, updated_at or metadata property. (#3843)
  • Added add_metadata_property method to both FeedbackDataset and RemoteFeedbackDataset (i.e. FeedbackDataset in Argilla). (#3900)
  • Added fields inserted_at and updated_at in RemoteResponseSchema. (#3822)
  • Added support for sort_by for RemoteFeedbackDataset i.e. a FeedbackDataset uploaded to Argilla. (#3925)
  • Added metadata_properties support for both push_to_huggingface and from_huggingface. (#3947)
  • Add support for update records (metadata) from Python SDK. (#3946)
  • Added delete_metadata_properties method to delete metadata properties. (#3932)
  • Added update_metadata_properties method to update metadata_properties. (#3961)
  • Added automatic model card generation through ArgillaTrainer.save (#3857)
  • Added FeedbackDataset TaskTemplateMixin for pre-defined task templates. (#3969)
  • A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
  • New last_activity_at field to FeedbackDataset exposing when the last activity for the associated dataset occurs. (#3992)
"},{"location":"community/changelog/#changed_12","title":"Changed","text":"
  • GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records and POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to return the total number of records. (#3848, #3903)
  • Implemented __len__ method for filtered datasets to return the number of records matching the provided filters. (#3916)
  • Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
  • Force elastic index refresh after records creation. (#3929)
  • Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
  • Using metadata property name instead of id for indexing data in search engine index. (#3994)
"},{"location":"community/changelog/#fixed_17","title":"Fixed","text":"
  • Fixed response schemas to allow values to be None i.e. when a record is discarded the response.values are set to None. (#3926)
"},{"location":"community/changelog/#1170","title":"1.17.0","text":""},{"location":"community/changelog/#added_16","title":"Added","text":"
  • Added fields inserted_at and updated_at in RemoteResponseSchema (#3822).
  • Added automatic model card generation through ArgillaTrainer.save (#3857).
  • Added task templates to the FeedbackDataset (#3973).
"},{"location":"community/changelog/#changed_13","title":"Changed","text":"
  • Updated Dockerfile to use multi stage build (#3221 and #3793).
  • Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
  • Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
  • FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
  • The unify_responses support for remote datasets (#3937).
"},{"location":"community/changelog/#fixed_18","title":"Fixed","text":"
  • Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
  • Updated active learning for text classification notebooks to pass ids of type int to TextClassificationRecord (#3831).
  • Fixed record fields validation that was preventing from logging records with optional fields (i.e. required=True) when the field value was None (#3846).
  • Always set pretrained_model_name_or_path attribute as string in ArgillaTrainer (#3914).
  • The inserted_at and updated_at attributes are create using the utcnow factory to avoid unexpected race conditions on timestamp creation (#3945)
  • Fixed configure_dataset_settings when providing the workspace via the arg workspace (#3887).
  • Fixed saving of models trained with ArgillaTrainer with a peft_config parameter (#3795).
  • Fixed backwards compatibility on from_huggingface when loading a FeedbackDataset from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829).
  • Fixed wrong __repr__ problem for TrainingTask. (#3969)
  • Fixed wrong key return error prepare_for_training_with_* for TrainingTask. (#3969)
"},{"location":"community/changelog/#deprecated_3","title":"Deprecated","text":"
  • Function rg.configure_dataset is deprecated in favour of rg.configure_dataset_settings. The former will be removed in version 1.19.0
"},{"location":"community/changelog/#1160","title":"1.16.0","text":""},{"location":"community/changelog/#added_17","title":"Added","text":"
  • Added ArgillaTrainer integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739)
  • Added ArgillaTrainer integration with TrainingTask.for_question_answering (#3740)
  • Added Auto save record to save automatically the current record that you are working on (#3541)
  • Added ArgillaTrainer integration with OpenAI, allowing fine tuning for chat completion (#3615)
  • Added workspaces list command to list Argilla workspaces (#3594).
  • Added datasets list command to list Argilla datasets (#3658).
  • Added users create command to create users (#3667).
  • Added whoami command to get current user (#3673).
  • Added users delete command to delete users (#3671).
  • Added users list command to list users (#3688).
  • Added workspaces delete-user command to remove a user from a workspace (#3699).
  • Added datasets list command to list Argilla datasets (#3658).
  • Added users create command to create users (#3667).
  • Added users delete command to delete users (#3671).
  • Added workspaces create command to create an Argilla workspace (#3676).
  • Added datasets push-to-hub command to push a FeedbackDataset from Argilla into the HuggingFace Hub (#3685).
  • Added info command to get info about the used Argilla client and server (#3707).
  • Added datasets delete command to delete a FeedbackDataset from Argilla (#3703).
  • Added created_at and updated_at properties to RemoteFeedbackDataset and FilteredRemoteFeedbackDataset (#3709).
  • Added handling PermissionError when executing a command with a logged in user with not enough permissions (#3717).
  • Added workspaces add-user command to add a user to workspace (#3712).
  • Added workspace_id param to GET /api/v1/me/datasets endpoint (#3727).
  • Added workspace_id arg to list_datasets in the Python SDK (#3727).
  • Added argilla script that allows to execute Argilla CLI using the argilla command (#3730).
  • Added support for passing already initialized model and tokenizer instances to the ArgillaTrainer (#3751)
  • Added server_info function to check the Argilla server information (also accessible via rg.server_info) (#3772).
"},{"location":"community/changelog/#changed_14","title":"Changed","text":"
  • Move database commands under server group of commands (#3710)
  • server commands only included in the CLI app when server extra requirements are installed (#3710).
  • Updated PUT /api/v1/responses/{response_id} to replace values stored with received values in request (#3711).
  • Display a UserWarning when the user_id in Workspace.add_user and Workspace.delete_user is the ID of an user with the owner role as they don't require explicit permissions (#3716).
  • Rename tasks sub-package to cli (#3723).
  • Changed argilla database command in the CLI to now be accessed via argilla server database, to be deprecated in the upcoming release (#3754).
  • Changed visible_options (of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).
"},{"location":"community/changelog/#fixed_19","title":"Fixed","text":"
  • Fixed remove user modification in text component on clear answers (#3775)
  • Fixed Highlight raw text field in dataset feedback task (#3731)
  • Fixed Field title too long (#3734)
  • Fixed error messages when deleting a DatasetForTextClassification (#3652)
  • Fixed Pending queue pagination problems when during data annotation (#3677)
  • Fixed visible_labels default value to be 20 just when visible_labels not provided and len(labels) > 20, otherwise it will either be the provided visible_labels value or None, for LabelQuestion and MultiLabelQuestion (#3702).
  • Fixed DatasetCard generation when RemoteFeedbackDataset contains suggestions (#3718).
  • Add missing draft status in ResponseSchema as now there can be responses with draft status when annotating via the UI (#3749).
  • Searches when queried words are distributed along the record fields (#3759).
  • Fixed Python 3.11 compatibility issue with /api/datasets endpoints due to the TaskType enum replacement in the endpoint URL (#3769).
  • Fixed RankingValueSchema and FeedbackRankingValueModel schemas to allow rank=None when status=draft (#3781).
"},{"location":"community/changelog/#1151","title":"1.15.1","text":""},{"location":"community/changelog/#fixed_20","title":"Fixed","text":"
  • Fixed Text component text content sanitization behavior just for markdown to prevent disappear the text(#3738)
  • Fixed Text component now you need to press Escape to exit the text area (#3733)
  • Fixed SearchEngine was creating the same number of primary shards and replica shards for each FeedbackDataset (#3736).
"},{"location":"community/changelog/#1150","title":"1.15.0","text":""},{"location":"community/changelog/#added_18","title":"Added","text":"
  • Added Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI (#3489)
  • Added ArgillaTrainer integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467)
  • Added formatting_func to ArgillaTrainer for FeedbackDataset datasets add a custom formatting for the data (#3599).
  • Added login function in argilla.client.login to login into an Argilla server and store the credentials locally (#3582).
  • Added login command to login into an Argilla server (#3600).
  • Added logout command to logout from an Argilla server (#3605).
  • Added DELETE /api/v1/suggestions/{suggestion_id} endpoint to delete a suggestion given its ID (#3617).
  • Added DELETE /api/v1/records/{record_id}/suggestions endpoint to delete several suggestions linked to the same record given their IDs (#3617).
  • Added response_status param to GET /api/v1/datasets/{dataset_id}/records to be able to filter by response_status as previously included for GET /api/v1/me/datasets/{dataset_id}/records (#3613).
  • Added list classmethod to ArgillaMixin to be used as FeedbackDataset.list(), also including the workspace to list from as arg (#3619).
  • Added filter_by method in RemoteFeedbackDataset to filter based on response_status (#3610).
  • Added list_workspaces function (to be used as rg.list_workspaces, but Workspace.list is preferred) to list all the workspaces from an user in Argilla (#3641).
  • Added list_datasets function (to be used as rg.list_datasets) to list the TextClassification, TokenClassification, and Text2Text datasets in Argilla (#3638).
  • Added RemoteSuggestionSchema to manage suggestions in Argilla, including the delete method to delete suggestios from Argilla via DELETE /api/v1/suggestions/{suggestion_id} (#3651).
  • Added delete_suggestions to RemoteFeedbackRecord to remove suggestions from Argilla via DELETE /api/v1/records/{record_id}/suggestions (#3651).
"},{"location":"community/changelog/#changed_15","title":"Changed","text":"
  • Changed Optional label for * mark for required question (#3608)
  • Updated RemoteFeedbackDataset.delete_records to use batch delete records endpoint (#3580).
  • Included allowed_for_roles for some RemoteFeedbackDataset, RemoteFeedbackRecords, and RemoteFeedbackRecord methods that are only allowed for users with roles owner and admin (#3601).
  • Renamed ArgillaToFromMixin to ArgillaMixin (#3619).
  • Move users CLI app under database CLI app (#3593).
  • Move server Enum classes to argilla.server.enums module (#3620).
"},{"location":"community/changelog/#fixed_21","title":"Fixed","text":"
  • Fixed Filter by workspace in breadcrumbs (#3577)
  • Fixed Filter by workspace in datasets table (#3604)
  • Fixed Query search highlight for Text2Text and TextClassification (#3621)
  • Fixed RatingQuestion.values validation to raise a ValidationError when values are out of range i.e. [1, 10] (#3626).
"},{"location":"community/changelog/#removed_4","title":"Removed","text":"
  • Removed multi_task_text_token_classification from TaskType as not used (#3640).
  • Removed argilla_id in favor of id from RemoteFeedbackDataset (#3663).
  • Removed fetch_records from RemoteFeedbackDataset as now the records are lazily fetched from Argilla (#3663).
  • Removed push_to_argilla from RemoteFeedbackDataset, as it just works when calling it through a FeedbackDataset locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663).
  • Removed set_suggestions in favor of update(suggestions=...) for both FeedbackRecord and RemoteFeedbackRecord, as all the updates of any \"updateable\" attribute of a record will go through update instead (#3663).
  • Remove unused owner attribute for client Dataset data model (#3665)
"},{"location":"community/changelog/#1141","title":"1.14.1","text":""},{"location":"community/changelog/#fixed_22","title":"Fixed","text":"
  • Fixed PostgreSQL database not being updated after begin_nested because of missing commit (#3567).
"},{"location":"community/changelog/#fixed_23","title":"Fixed","text":"
  • Fixed settings could not be provided when updating a rating or ranking question (#3552).
"},{"location":"community/changelog/#1140","title":"1.14.0","text":""},{"location":"community/changelog/#added_19","title":"Added","text":"
  • Added PATCH /api/v1/fields/{field_id} endpoint to update the field title and markdown settings (#3421).
  • Added PATCH /api/v1/datasets/{dataset_id} endpoint to update dataset name and guidelines (#3402).
  • Added PATCH /api/v1/questions/{question_id} endpoint to update question title, description and some settings (depending on the type of question) (#3477).
  • Added DELETE /api/v1/records/{record_id} endpoint to remove a record given its ID (#3337).
  • Added pull method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) to pull all the records from it and return it as a local copy as a FeedbackDataset (#3465).
  • Added delete method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) (#3512).
  • Added delete_records method in RemoteFeedbackDataset, and delete method in RemoteFeedbackRecord to delete records from Argilla (#3526).
"},{"location":"community/changelog/#changed_16","title":"Changed","text":"
  • Improved efficiency of weak labeling when dataset contains vectors (#3444).
  • Added ArgillaDatasetMixin to detach the Argilla-related functionality from the FeedbackDataset (#3427)
  • Moved FeedbackDataset-related pydantic.BaseModel schemas to argilla.client.feedback.schemas instead, to be better structured and more scalable and maintainable (#3427)
  • Update CLI to use database async connection (#3450).
  • Limit rating questions values to the positive range [1, 10] (#3451).
  • Updated POST /api/users endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462).
  • Updated Python client User.create method to be able to provide a list of workspace names to which the user should be linked to (#3462).
  • Updated GET /api/v1/me/datasets/{dataset_id}/records endpoint to allow getting records matching one of the response statuses provided via query param (#3359).
  • Updated POST /api/v1/me/datasets/{dataset_id}/records endpoint to allow searching records matching one of the response statuses provided via query param (#3359).
  • Updated SearchEngine.search method to allow searching records matching one of the response statuses provided (#3359).
  • After calling FeedbackDataset.push_to_argilla, the methods FeedbackDataset.add_records and FeedbackRecord.set_suggestions will automatically call Argilla with no need of calling push_to_argilla explicitly (#3465).
  • Now calling FeedbackDataset.push_to_huggingface dumps the responses as a List[Dict[str, Any]] instead of Sequence to make it more readable via \ud83e\udd17datasets (#3539).
"},{"location":"community/changelog/#fixed_24","title":"Fixed","text":"
  • Fixed issue with bool values and default from Jinja2 while generating the HuggingFace DatasetCard from argilla_template.md (#3499).
  • Fixed DatasetConfig.from_yaml which was failing when calling FeedbackDataset.from_huggingface as the UUIDs cannot be deserialized automatically by PyYAML, so UUIDs are neither dumped nor loaded anymore (#3502).
  • Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
  • TextClassificationSettings and TokenClassificationSettings labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).
  • Fixed PUT /api/v1/datasets/{dataset_id}/publish to check whether at least one field and question has required=True (#3511).
  • Fixed FeedbackDataset.from_huggingface as suggestions were being lost when there were no responses (#3539).
  • Fixed QuestionSchema and FieldSchema not validating name attribute (#3550).
"},{"location":"community/changelog/#deprecated_4","title":"Deprecated","text":"
  • After calling FeedbackDataset.push_to_argilla, calling push_to_argilla again won't do anything since the dataset is already pushed to Argilla (#3465).
  • After calling FeedbackDataset.push_to_argilla, calling fetch_records won't do anything since the records are lazily fetched from Argilla (#3465).
  • After calling FeedbackDataset.push_to_argilla, the Argilla ID is no longer stored in the attribute/property argilla_id but in id instead (#3465).
"},{"location":"community/changelog/#1133","title":"1.13.3","text":""},{"location":"community/changelog/#fixed_25","title":"Fixed","text":"
  • Fixed ModuleNotFoundError caused because the argilla.utils.telemetry module used in the ArgillaTrainer was importing an optional dependency not installed by default (#3471).
  • Fixed ImportError caused because the argilla.client.feedback.config module was importing pyyaml optional dependency not installed by default (#3471).
"},{"location":"community/changelog/#1132","title":"1.13.2","text":""},{"location":"community/changelog/#fixed_26","title":"Fixed","text":"
  • The suggestion_type_enum ENUM data type created in PostgreSQL didn't have any value (#3445).
"},{"location":"community/changelog/#1131","title":"1.13.1","text":""},{"location":"community/changelog/#fixed_27","title":"Fixed","text":"
  • Fix database migration for PostgreSQL (See #3438)
"},{"location":"community/changelog/#1130","title":"1.13.0","text":""},{"location":"community/changelog/#added_20","title":"Added","text":"
  • Added GET /api/v1/users/{user_id}/workspaces endpoint to list the workspaces to which a user belongs (#3308 and #3343).
  • Added HuggingFaceDatasetMixin for internal usage, to detach the FeedbackDataset integrations from the class itself, and use Mixins instead (#3326).
  • Added GET /api/v1/records/{record_id}/suggestions API endpoint to get the list of suggestions for the responses associated to a record (#3304).
  • Added POST /api/v1/records/{record_id}/suggestions API endpoint to create a suggestion for a response associated to a record (#3304).
  • Added support for RankingQuestionStrategy, RankingQuestionUnification and the .for_text_classification method for the TrainingTaskMapping (#3364)
  • Added PUT /api/v1/records/{record_id}/suggestions API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391).
  • Added suggestions attribute to FeedbackRecord, and allow adding and retrieving suggestions from the Python client (#3370)
  • Added allowed_for_roles Python decorator to check whether the current user has the required role to access the decorated function/method for User and Workspace (#3383)
  • Added API and Python Client support for workspace deletion (Closes #3260)
  • Added GET /api/v1/me/workspaces endpoint to list the workspaces of the current active user (#3390)
"},{"location":"community/changelog/#changed_17","title":"Changed","text":"
  • Updated output payload for GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records, POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to include the suggestions of the records based on the value of the include query parameter (#3304).
  • Updated POST /api/v1/datasets/{dataset_id}/records input payload to add suggestions (#3304).
  • The POST /api/datasets/:dataset-id/:task/bulk endpoints don't create the dataset if does not exists (Closes #3244)
  • Added Telemetry support for ArgillaTrainer (closes #3325)
  • User.workspaces is no longer an attribute but a property, and is calling list_user_workspaces to list all the workspace names for a given user ID (#3334)
  • Renamed FeedbackDatasetConfig to DatasetConfig and export/import from YAML as default instead of JSON (just used internally on push_to_huggingface and from_huggingface methods of FeedbackDataset) (#3326).
  • The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
  • Updated Dockerfile parent image from python:3.9.16-slim to python:3.10.12-slim (#3425).
  • Updated quickstart.Dockerfile parent image from elasticsearch:8.5.3 to argilla/argilla-server:${ARGILLA_VERSION} (#3425).
"},{"location":"community/changelog/#removed_5","title":"Removed","text":"
  • Removed support to non-prefixed environment variables. All valid env vars start with ARGILLA_ (See #3392).
"},{"location":"community/changelog/#fixed_28","title":"Fixed","text":"
  • Fixed GET /api/v1/me/datasets/{dataset_id}/records endpoint returning always the responses for the records even if responses was not provided via the include query parameter (#3304).
  • Values for protected metadata fields are not truncated (Closes #3331).
  • Big number ids are properly rendered in UI (Closes #3265)
  • Fixed ArgillaDatasetCard to include the values/labels for all the existing questions (#3366)
"},{"location":"community/changelog/#deprecated_5","title":"Deprecated","text":"
  • Integer support for record id in text classification, token classification and text2text datasets.
"},{"location":"community/changelog/#1121","title":"1.12.1","text":""},{"location":"community/changelog/#fixed_29","title":"Fixed","text":"
  • Using rg.init with default argilla user skips setting the default workspace if not available. (Closes #3340)
  • Resolved wrong import structure for ArgillaTrainer and TrainingTaskMapping (Closes #3345)
  • Pin pydantic dependency to version < 2 (Closes 3348)
"},{"location":"community/changelog/#1120","title":"1.12.0","text":""},{"location":"community/changelog/#added_21","title":"Added","text":"
  • Added RankingQuestionSettings class allowing to create ranking questions in the API using POST /api/v1/datasets/{dataset_id}/questions endpoint (#3232)
  • Added RankingQuestion in the Python client to create ranking questions (#3275).
  • Added Ranking component in feedback task question form (#3177 & #3246).
  • Added FeedbackDataset.prepare_for_training method for generaring a framework-specific dataset with the responses provided for RatingQuestion, LabelQuestion and MultiLabelQuestion (#3151).
  • Added ArgillaSpaCyTransformersTrainer class for supporting the training with spacy-transformers (#3256).
"},{"location":"community/changelog/#docs","title":"Docs","text":"
  • Added instructions for how to run the Argilla frontend in the developer docs (#3314).
"},{"location":"community/changelog/#changed_18","title":"Changed","text":"
  • All docker related files have been moved into the docker folder (#3053).
  • release.Dockerfile have been renamed to Dockerfile (#3133).
  • Updated rg.load function to raise a ValueError with a explanatory message for the cases in which the user tries to use the function to load a FeedbackDataset (#3289).
  • Updated ArgillaSpaCyTrainer to allow re-using tok2vec (#3256).
"},{"location":"community/changelog/#fixed_30","title":"Fixed","text":"
  • Check available workspaces on Argilla on rg.set_workspace (Closes #3262)
"},{"location":"community/changelog/#1110","title":"1.11.0","text":""},{"location":"community/changelog/#fixed_31","title":"Fixed","text":"
  • Replaced np.float alias by float to avoid AttributeError when using find_label_errors function with numpy>=1.24.0 (#3214).
  • Fixed format_as(\"datasets\") when no responses or optional respones in FeedbackRecord, to set their value to what \ud83e\udd17 Datasets expects instead of just None (#3224).
  • Fixed push_to_huggingface() when generate_card=True (default behaviour), as we were passing a sample record to the ArgillaDatasetCard class, and UUIDs introduced in 1.10.0 (#3192), are not JSON-serializable (#3231).
  • Fixed from_argilla and push_to_argilla to ensure consistency on both field and question re-construction, and to ensure UUIDs are properly serialized as str, respectively (#3234).
  • Refactored usage of import argilla as rg to clarify package navigation (#3279).
"},{"location":"community/changelog/#docs_1","title":"Docs","text":"
  • Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
  • Fixed library buttons' formatting on Tutorials page (#3255).
  • Modified styling of error code outputs in notebooks (#3270).
  • Added ElasticSearch and OpenSearch versions (#3280).
  • Removed template notebook from table of contents (#3271).
  • Fixed tutorials with pip install argilla to not use older versions of the package (#3282).
"},{"location":"community/changelog/#added_22","title":"Added","text":"
  • Added metadata attribute to the Record of the FeedbackDataset (#3194)
  • New users update command to update the role for an existing user (#3188)
  • New Workspace class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180)
  • Added User class to let users manage their Argilla users via the Python client (#3169).
  • Added an option to display tqdm progress bar to FeedbackDataset.push_to_argilla when looping over the records to upload (#3233).
"},{"location":"community/changelog/#changed_19","title":"Changed","text":"
  • The role system now support three different roles owner, admin and annotator (#3104)
  • admin role is scoped to workspace-level operations (#3115)
  • The owner user is created among the default pool of users in the quickstart, and the default user in the server has now owner role (#3248), reverting (#3188).
"},{"location":"community/changelog/#deprecated_6","title":"Deprecated","text":"
  • As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
"},{"location":"community/changelog/#1100","title":"1.10.0","text":""},{"location":"community/changelog/#added_23","title":"Added","text":"
  • Added search component for feedback datasets (#3138)
  • Added markdown support for feedback dataset guidelines (#3153)
  • Added Train button for feedback datasets (#3170)
"},{"location":"community/changelog/#changed_20","title":"Changed","text":"
  • Updated SearchEngine and POST /api/v1/me/datasets/{dataset_id}/records/search to return the total number of records matching the search query (#3166)
"},{"location":"community/changelog/#fixed_32","title":"Fixed","text":"
  • Replaced Enum for string value in URLs for client API calls (Closes #3149)
  • Resolve breaking issue with ArgillaSpanMarkerTrainer for Named Entity Recognition with span_marker v1.1.x onwards.
  • Move ArgillaDatasetCard import under @requires_version decorator, so that the ImportError on huggingface_hub is handled properly (#3174)
  • Allow flow FeedbackDataset.from_argilla -> FeedbackDataset.push_to_argilla under different dataset names and/or workspaces (#3192)
"},{"location":"community/changelog/#docs_2","title":"Docs","text":"
  • Resolved typos in the docs (#3240).
  • Fixed mention of master branch (#3254).
"},{"location":"community/changelog/#190","title":"1.9.0","text":""},{"location":"community/changelog/#added_24","title":"Added","text":"
  • Added boolean use_markdown property to TextFieldSettings model.
  • Added boolean use_markdown property to TextQuestionSettings model.
  • Added new status draft for the Response model.
  • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API (#3005)
  • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API (#3010).
  • Added POST /api/v1/me/datasets/{dataset_id}/records/search endpoint (#3068).
  • Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
  • Added docstrings to the pydantic.BaseModels defined at argilla/client/feedback/schemas.py (#3137)
  • Added the information about executing tests in the developer documentation ([#3143]).
"},{"location":"community/changelog/#changed_21","title":"Changed","text":"
  • Updated GET /api/v1/me/datasets/:dataset_id/metrics output payload to include the count of responses with draft status.
  • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API.
  • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API.
  • Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
  • Updated alembic setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044)
  • Improved DatasetCard generation on FeedbackDataset.push_to_huggingface when generate_card=True, following the official HuggingFace Hub template, but suited to FeedbackDatasets from Argilla (#3110)
"},{"location":"community/changelog/#fixed_33","title":"Fixed","text":"
  • Disallow fields and questions in FeedbackDataset with the same name (#3126).
  • Fixed broken links in the documentation and updated the development branch name from development to develop ([#3145]).
"},{"location":"community/changelog/#180","title":"1.8.0","text":""},{"location":"community/changelog/#added_25","title":"Added","text":"
  • /api/v1/datasets new endpoint to list and create datasets (#2615).
  • /api/v1/datasets/{dataset_id} new endpoint to get and delete datasets (#2615).
  • /api/v1/datasets/{dataset_id}/publish new endpoint to publish a dataset (#2615).
  • /api/v1/datasets/{dataset_id}/questions new endpoint to list and create dataset questions (#2615)
  • /api/v1/datasets/{dataset_id}/fields new endpoint to list and create dataset fields (#2615)
  • /api/v1/datasets/{dataset_id}/questions/{question_id} new endpoint to delete a dataset questions (#2615)
  • /api/v1/datasets/{dataset_id}/fields/{field_id} new endpoint to delete a dataset field (#2615)
  • /api/v1/workspaces/{workspace_id} new endpoint to get workspaces by id (#2615)
  • /api/v1/responses/{response_id} new endpoint to update and delete a response (#2615)
  • /api/v1/datasets/{dataset_id}/records new endpoint to create and list dataset records (#2615)
  • /api/v1/me/datasets new endpoint to list user visible datasets (#2615)
  • /api/v1/me/dataset/{dataset_id}/records new endpoint to list dataset records with user responses (#2615)
  • /api/v1/me/datasets/{dataset_id}/metrics new endpoint to get the dataset user metrics (#2615)
  • /api/v1/me/records/{record_id}/responses new endpoint to create record user responses (#2615)
  • showing new feedback task datasets in datasets list ([#2719])
  • new page for feedback task ([#2680])
  • show feedback task metrics ([#2822])
  • user can delete dataset in dataset settings page ([#2792])
  • Support for FeedbackDataset in Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003])
  • Integration with the HuggingFace Hub ([#2949])
  • Added ArgillaPeftTrainer for text and token classificaiton #2854
  • Added predict_proba() method to ArgillaSetFitTrainer
  • Added ArgillaAutoTrainTrainer for Text Classification #2664
  • New database revisions command showing database revisions info
"},{"location":"community/changelog/#fixes","title":"Fixes","text":"
  • Avoid rendering html for invalid html strings in Text2text ([#2911]https://github.com/argilla-io/argilla/issues/2911)
"},{"location":"community/changelog/#changed_22","title":"Changed","text":"
  • The database migrate command accepts a --revision param to provide specific revision id
  • tokens_length metrics function returns empty data (#3045)
  • token_length metrics function returns empty data (#3045)
  • mention_length metrics function returns empty data (#3045)
  • entity_density metrics function returns empty data (#3045)
"},{"location":"community/changelog/#deprecated_7","title":"Deprecated","text":"
  • Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
  • tokens_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • token_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • mention_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • entity_density metrics function has been deprecated and will be removed in 1.10.0 (#3045)
"},{"location":"community/changelog/#removed_6","title":"Removed","text":"
  • Removed mention density, tokens_length and chars_length metrics from token classification metrics storage (#3045)
  • Removed token char_start, char_end, tag, and score metrics from token classification metrics storage (#3045)
  • Removed tags-related metrics from token classification metrics storage (#3045)
"},{"location":"community/changelog/#170","title":"1.7.0","text":""},{"location":"community/changelog/#added_26","title":"Added","text":"
  • add max_retries and num_threads parameters to rg.log to run data logging request concurrently with backoff retry policy. See #2458 and #2533
  • rg.load accepts include_vectors and include_metrics when loading data. Closes #2398
  • Added settings param to prepare_for_training (#2689)
  • Added prepare_for_training for openai (#2658)
  • Added ArgillaOpenAITrainer (#2659)
  • Added ArgillaSpanMarkerTrainer for Named Entity Recognition (#2693)
  • Added ArgillaTrainer CLI support. Closes (#2809)
"},{"location":"community/changelog/#fixes_1","title":"Fixes","text":"
  • fix image alignment on token classification
"},{"location":"community/changelog/#changed_23","title":"Changed","text":"
  • Argilla quickstart image dependencies are externalized into quickstart.requirements.txt. See #2666
  • bulk endpoints will upsert data when record id is present. Closes #2535
  • moved from click to typer CLI support. Closes (#2815)
  • Argilla server docker image is built with PostgreSQL support. Closes #2686
  • The rg.log computes all batches and raise an error for all failed batches.
  • The default batch size for rg.log is now 100.
"},{"location":"community/changelog/#fixed_34","title":"Fixed","text":"
  • argilla.training bugfixes and unification (#2665)
  • Resolved several small bugs in the ArgillaTrainer.
"},{"location":"community/changelog/#deprecated_8","title":"Deprecated","text":"
  • The rg.log_async function is deprecated and will be removed in next minor release.
"},{"location":"community/changelog/#160","title":"1.6.0","text":""},{"location":"community/changelog/#added_27","title":"Added","text":"
  • ARGILLA_HOME_PATH new environment variable (#2564).
  • ARGILLA_DATABASE_URL new environment variable (#2564).
  • Basic support for user roles with admin and annotator (#2564).
  • id, first_name, last_name, role, inserted_at and updated_at new user fields (#2564).
  • /api/users new endpoint to list and create users (#2564).
  • /api/users/{user_id} new endpoint to delete users (#2564).
  • /api/workspaces new endpoint to list and create workspaces (#2564).
  • /api/workspaces/{workspace_id}/users new endpoint to list workspace users (#2564).
  • /api/workspaces/{workspace_id}/users/{user_id} new endpoint to create and delete workspace users (#2564).
  • argilla.tasks.users.migrate new task to migrate users from old YAML file to database (#2564).
  • argilla.tasks.users.create new task to create a user (#2564).
  • argilla.tasks.users.create_default new task to create a user with default credentials (#2564).
  • argilla.tasks.database.migrate new task to execute database migrations (#2564).
  • release.Dockerfile and quickstart.Dockerfile now creates a default argilladata volume to persist data (#2564).
  • Add user settings page. Closes #2496
  • Added Argilla.training module with support for spacy, setfit, and transformers. Closes #2504
"},{"location":"community/changelog/#fixes_2","title":"Fixes","text":"
  • Now the prepare_for_training method is working when multi_label=True. Closes #2606
"},{"location":"community/changelog/#changed_24","title":"Changed","text":"
  • ARGILLA_USERS_DB_FILE environment variable now it's only used to migrate users from YAML file to database (#2564).
  • full_name user field is now deprecated and first_name and last_name should be used instead (#2564).
  • password user field now requires a minimum of 8 and a maximum of 100 characters in size (#2564).
  • quickstart.Dockerfile image default users from team and argilla to admin and annotator including new passwords and API keys (#2564).
  • Datasets to be managed only by users with admin role (#2564).
  • The list of rules is now accessible while metrics are computed. Closes#2117
  • Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648
"},{"location":"community/changelog/#removed_7","title":"Removed","text":"
  • email user field (#2564).
  • disabled user field (#2564).
  • Support for private workspaces (#2564).
  • ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY and ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD environment variables. Use python -m argilla.tasks.users.create_default instead (#2564).
  • The old headers for API Key and workspace from python client
  • The default value for old API Key constant. Closes #2251
"},{"location":"community/changelog/#151-2023-03-30","title":"1.5.1 - 2023-03-30","text":""},{"location":"community/changelog/#fixes_3","title":"Fixes","text":"
  • Copying datasets between workspaces with proper owner/workspace info. Closes #2562
  • Copy dataset with empty workspace to the default user workspace 905d4de
  • Using elasticsearch config to request backend version. Closes #2311
  • Remove sorting by score in labels. Closes #2622
"},{"location":"community/changelog/#changed_25","title":"Changed","text":"
  • Update field name in metadata for image url. See #2609
  • Improvements in tutorial doc cards. Closes #2216
"},{"location":"community/changelog/#150-2023-03-21","title":"1.5.0 - 2023-03-21","text":""},{"location":"community/changelog/#added_28","title":"Added","text":"
  • Add the fields to retrieve when loading the data from argilla. rg.load takes too long because of the vector field, even when users don't need it. Closes #2398
  • Add new page and components for dataset settings. Closes #2442
  • Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
  • Non-searchable fields support in metadata. #2570
  • Add record ID references to the prepare for training methods. Closes #2483
  • Add tutorial on Image Classification. #2420
  • Add Train button, visible for \"admin\" role, with code snippets from a selection of libraries. Closes [#2591] (https://github.com/argilla-io/argilla/pull/2591)
"},{"location":"community/changelog/#changed_26","title":"Changed","text":"
  • Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see https://github.com/argilla-io/argilla/issues/2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
  • The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
  • Update \"Define a labeling schema\" section in docs.
  • The record inputs are sorted alphabetically in UI by default. #2581
  • The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587
"},{"location":"community/changelog/#fixes_4","title":"Fixes","text":"
  • Allow URL to be clickable in Jupyter notebook again. Closes #2527
"},{"location":"community/changelog/#removed_8","title":"Removed","text":"
  • Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client <v1.3.0
  • Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version <1.3.0
  • Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.
"},{"location":"community/contributor/","title":"How to contribute?","text":"

Thank you for investing your time in contributing to the project! Any contribution you make will be reflected in the most recent version of Argilla \ud83e\udd29.

New to contributing in general?

If you're a new contributor, read the README to get an overview of the project. In addition, here are some resources to help you get started with open-source contributions:

  • Discord: You are welcome to join the Argilla Discord community, where you can keep in touch with other users, contributors and the Argilla team. In the following section, you can find more information on how to get started in Discord.
  • Git: This is a very useful tool to keep track of the changes in your files. Using the command-line interface (CLI), you can make your contributions easily. For that, you need to have it installed and updated on your computer.
  • GitHub: It is a platform and cloud-based service that uses git and allows developers to collaborate on projects. To contribute to Argilla, you'll need to create an account. Check the Contributor Workflow with Git and Github for more info.
  • Developer Documentation: To collaborate, you'll need to set up an efficient environment. Check the developer documentation to know how to do it.
"},{"location":"community/contributor/#first-contact-in-discord","title":"First Contact in Discord","text":"

Discord is a handy tool for more casual conversations and to answer day-to-day questions. As part of Hugging Face, we have set up some Argilla channels on the server. Click here to join the Hugging Face Discord community effortlessly.

When part of the Hugging Face Discord, you can select \"Channels & roles\" and select \"Argilla\" along with any of the other groups that are interesting to you. \"Argilla\" will cover anything about Argilla and Distilabel. You can join the following channels:

  • #argilla-announcements: \ud83d\udce2 Important announcements and updates.
  • #argilla-distilabel-general: \ud83d\udcac General discussions about Argilla and Distilabel.
  • #argilla-distilabel-help: \ud83d\ude4b\u200d\u2640\ufe0f Need assistance? We're always here to help. Select the appropriate label (argilla or distilabel) for your issue and post it.

So now there is only one thing left to do: introduce yourself and talk to the community. You'll always be welcome! \ud83e\udd17\ud83d\udc4b

"},{"location":"community/contributor/#contributor-workflow-with-git-and-github","title":"Contributor Workflow with Git and GitHub","text":"

If you're working with Argilla and suddenly a new idea comes to your mind or you find an issue that can be improved, it's time to actively participate and contribute to the project!

"},{"location":"community/contributor/#report-an-issue","title":"Report an issue","text":"

If you spot a problem, search if an issue already exists. You can use the Label filter. If that is the case, participate in the conversation. If it does not exist, create an issue by clicking on New Issue.

This will show various templates, choose the one that best suits your issue.

Below, you can see an example of the Feature request template. Once you choose one, you will need to fill in it following the guidelines. Try to be as clear as possible. In addition, you can assign yourself to the issue and add or choose the right labels. Finally, click on Submit new issue.

"},{"location":"community/contributor/#work-with-a-fork","title":"Work with a fork","text":""},{"location":"community/contributor/#fork-the-argilla-repository","title":"Fork the Argilla repository","text":"

After having reported the issue, you can start working on it. For that, you will need to create a fork of the project. To do that, click on the Fork button.

Now, fill in the information. Remember to uncheck the Copy develop branch only if you are going to work in or from another branch (for instance, to fix documentation the main branch is used). Then, click on Create fork.

Now, you will be redirected to your fork. You can see that you are in your fork because the name of the repository will be your username/argilla, and it will indicate forked from argilla-io/argilla.

"},{"location":"community/contributor/#clone-your-forked-repository","title":"Clone your forked repository","text":"

In order to make the required adjustments, clone the forked repository to your local machine. Choose the destination folder and run the following command:

git clone https://github.com/[your-github-username]/argilla.git\ncd argilla\n

To keep your fork\u2019s main/develop branch up to date with our repo, add it as an upstream remote branch.

git remote add upstream https://github.com/argilla-io/argilla.git\n
"},{"location":"community/contributor/#create-a-new-branch","title":"Create a new branch","text":"

For each issue you're addressing, it's advisable to create a new branch. GitHub offers a straightforward method to streamline this process.

\u26a0\ufe0f Never work directly on the main or develop branch. Always create a new branch for your changes.

Navigate to your issue and on the right column, select Create a branch.

After the new window pops up, the branch will be named after the issue, include a prefix such as feature/, bug/, or docs/ to facilitate quick recognition of the issue type. In the Repository destination, pick your fork ( [your-github-username]/argilla), and then select Change branch source to specify the source branch for creating the new one. Complete the process by clicking Create branch.

\ud83e\udd14 Remember that the main branch is only used to work with the documentation. For any other changes, use the develop branch.

Now, locally change to the new branch you just created.

git fetch origin\ngit checkout [branch-name]\n
"},{"location":"community/contributor/#use-changelogmd","title":"Use CHANGELOG.md","text":"

If you are working on a new feature, it is a good practice to make note of it for others to keep up with the changes. For that, we utilize the CHANGELOG.md file in the root directory. This file is used to list changes made in each version of the project and there are headers that we use to denote each type of change.

  • Added: for new features.
  • Changed: for changes in existing functionality.
  • Deprecated: for soon-to-be removed features.
  • Removed: for now removed features.
  • Fixed: for any bug fixes.
  • Security: in case of vulnerabilities.

A sample addition would be:

- Fixed the key errors for the `init` method ([#NUMBER_OF_PR](LINK_TO_PR)). Contributed by @github_handle.\n

You can have a look at the CHANGELOG.md) file to see more cases and examples.

"},{"location":"community/contributor/#make-changes-and-push-them","title":"Make changes and push them","text":"

Make the changes you want in your local repository, and test that everything works and you are following the guidelines.

Check the developer documentation to set up your environment and start working on the project.

Once you have finished, you can check the status of your repository and synchronize with the upstreaming repo with the following command:

# Check the status of your repository\ngit status\n\n# Synchronize with the upstreaming repo\ngit checkout [branch-name]\ngit rebase [default-branch]\n

If everything is right, we need to commit and push the changes to your fork. For that, run the following commands:

# Add the changes to the staging area\ngit add filename\n\n# Commit the changes by writing a proper message\ngit commit -m \"commit-message\"\n\n# Push the changes to your fork\ngit push origin [branch-name]\n

When pushing, you will be asked to enter your GitHub login credentials. Once the push is complete, all local commits will be on your GitHub repository.

"},{"location":"community/contributor/#create-a-pull-request","title":"Create a pull request","text":"

Come back to GitHub, navigate to the original repository where you created your fork, and click on Compare & pull request.

First, click on compare across forks and select the right repositories and branches.

In the base repository, keep in mind to select either main or develop based on the modifications made. In the head repository, indicate your forked repository and the branch corresponding to the issue.

Then, fill in the pull request template. You should add a prefix to the PR name as we did with the branch above. If you are working on a new feature, you can name your PR as feat: TITLE. If your PR consists of a solution for a bug, you can name your PR as bug: TITLE And, if your work is for improving the documentation, you can name your PR as docs: TITLE.

In addition, on the right side, you can select a reviewer (for instance, if you discussed the issue with a member of the Argilla team) and assign the pull request to yourself. It is highly advisable to add labels to PR as well. You can do this again by the labels section right to the screen. For instance, if you are addressing a bug, add the bug label or if the PR is related to the documentation, add the documentation label. This way, PRs can be easily filtered.

Finally, fill in the template carefully and follow the guidelines. Remember to link the original issue and enable the checkbox to allow maintainer edits so the branch can be updated for a merge. Then, click on Create pull request.

"},{"location":"community/contributor/#review-your-pull-request","title":"Review your pull request","text":"

Once you submit your PR, a team member will review your proposal. We may ask questions, request additional information or ask for changes to be made before a PR can be merged, either using suggested changes or pull request comments.

You can apply the changes directly through the UI (check the files changed and click on the right-corner three dots, see image below) or from your fork, and then commit them to your branch. The PR will be updated automatically and the suggestions will appear as outdated.

If you run into any merge issues, check out this git tutorial to help you resolve merge conflicts and other issues.

"},{"location":"community/contributor/#your-pr-is-merged","title":"Your PR is merged!","text":"

Congratulations \ud83c\udf89\ud83c\udf8a We thank you \ud83e\udd29

Once your PR is merged, your contributions will be publicly visible on the Argilla GitHub.

Additionally, we will include your changes in the next release based on our development branch.

"},{"location":"community/contributor/#additional-resources","title":"Additional resources","text":"

Here are some helpful resources for your reference.

  • Configuring Discord, a guide to learn how to get started with Discord.
  • Pro Git, a book to learn Git.
  • Git in VSCode, a guide to learn how to easily use Git in VSCode.
  • GitHub Skills, an interactive course to learn GitHub.
"},{"location":"community/developer/","title":"Developer documentation","text":"

As an Argilla developer, you are already part of the community, and your contribution is to our development. This guide will help you set up your development environment and start contributing.

Argilla core components

  • Documentation: Argilla's documentation serves as an invaluable resource, providing a comprehensive and in-depth guide for users seeking to explore, understand, and effectively harness the core components of the Argilla ecosystem.

  • Python SDK: A Python SDK installable with pip install argilla to interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration, and annotation workflows.

  • FastAPI Server: The core of Argilla is a Python FastAPI server that manages the data by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides an REST API that interacts with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data.

  • Relational Database: A relational database to store the metadata of the records and the annotations. SQLite is used as the default built-in option and is deployed separately with the Argilla Server, but a separate PostgreSQL can be used.

  • Vector Database: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support ElasticSearch and OpenSearch, which can be deployed as separate Docker images.

  • Vue.js UI: A web application to visualize and annotate your data, users, and teams. It is built with Vue.js and is directly deployed alongside the Argilla Server within our Argilla Docker image.

"},{"location":"community/developer/#the-argilla-repository","title":"The Argilla repository","text":"

The Argilla repository has a monorepo structure, which means that all the components are located in the same repository: argilla-io/argilla. This repo is divided into the following folders:

  • argilla: The python SDK project
  • argilla-server: The FastAPI server project
  • argilla-frontend: The Vue.js UI project
  • argilla/docs: The documentation project
  • examples: Example resources for deployments, scripts and notebooks

How to contribute?

Before starting to develop, we recommend reading our contribution guide to understand the contribution process and the guidelines to follow. Once you have cloned the Argilla repository and checked out to the correct branch, you can start setting up your development environment.

"},{"location":"community/developer/#set-up-the-python-environment","title":"Set up the Python environment","text":"

To work on the Argilla Python SDK, you must install the Argilla package on your system.

Create a virtual environment

We recommend creating a dedicated virtual environment for SDK development to prevent conflicts. For this, you can use the manager of your choice, such as venv, conda, pyenv, or uv.

From the root of the cloned Argilla repository, you should move to the argilla folder in your terminal.

cd argilla\n

Next, activate your virtual environment and make the required installations:

# Install the `pdm` package manager\npip install pdm\n\n# Install argilla in editable mode and the development dependencies\npdm install --dev\n
"},{"location":"community/developer/#linting-and-formatting","title":"Linting and formatting","text":"

To maintain a consistent code format, install the pre-commit hooks to run before each commit automatically.

pre-commit install\n

In addition, run the following scripts to check the code formatting and linting:

pdm run format\npdm run lint\n
"},{"location":"community/developer/#running-tests","title":"Running tests","text":"

Running tests at the end of every development cycle is indispensable to ensure no breaking changes.

# Run all tests\npdm run tests\n\n# Run specific tests\npytest tests/integration\npytest tests/unit\n
Running linting, formatting, and tests

You can run all the checks at once by using the following command:

    pdm run all\n
"},{"location":"community/developer/#set-up-the-databases","title":"Set up the databases","text":"

To run your development environment, you need to set up Argilla's databases.

"},{"location":"community/developer/#vector-database","title":"Vector database","text":"

Argilla supports ElasticSearch as its primary search engine for the vector database by default. For more information about setting OpenSearch, check the Server configuration.

You can run ElasticSearch locally using Docker:

# Argilla supports ElasticSearch versions >=8.5\ndocker run -d --name elasticsearch-for-argilla -p 9200:9200 -p 9300:9300 -e \"ES_JAVA_OPTS=-Xms512m -Xmx512m\" -e \"discovery.type=single-node\" -e \"xpack.security.enabled=false\" docker.elastic.co/elasticsearch/elasticsearch:8.5.3\n

Install Docker

You can find the Docker installation guides for Windows, macOS and Linux on Docker website.

"},{"location":"community/developer/#relational-database","title":"Relational database","text":"

Argilla will use SQLite as the default built-in option to store information about users, workspaces, etc., for the relational database. No additional configuration is required to start using SQLite.

By default, the database file will be created at ~/.argilla/argilla.db; this can be configured by setting different values for ARGILLA_DATABASE_URL and ARGILLA_HOME_PATH environment variables.

Manage the database

For more information about the database migration and user management, refer to the Argilla server README.

"},{"location":"community/developer/#set-up-the-server","title":"Set up the server","text":"

Once you have set up the databases, you can start the Argilla server. To run the server, you can check the Argilla server README file.

"},{"location":"community/developer/#set-up-the-frontend","title":"Set up the frontend","text":"

Optionally, if you need to run the Argilla frontend, you can follow the instructions in the Argilla frontend README.

"},{"location":"community/developer/#set-up-the-documentation","title":"Set up the documentation","text":"

Documentation is essential to provide users with a comprehensive guide about Argilla.

From main or develop?

If you are updating, improving, or fixing the current documentation without a code change, work on the main branch. For new features or bug fixes that require documentation, use the develop branch.

To contribute to the documentation and generate it locally, ensure you installed the development dependencies as shown in the \"Set up the Python environment\" section, and run the following command to create the development server with mkdocs:

mkdocs serve\n
"},{"location":"community/developer/#documentation-guidelines","title":"Documentation guidelines","text":"

As mentioned, we use mkdocs to build the documentation. You can write the documentation in markdown format, and it will automatically be converted to HTML. In addition, you can include elements such as tables, tabs, images, and others, as shown in this guide. We recommend following these guidelines:

  • Use clear and concise language: Ensure the documentation is easy to understand for all users by using straightforward language and including meaningful examples. Images are not easy to maintain, so use them only when necessary and place them in the appropriate folder within the docs/assets/images directory.
  • Verify code snippets: Double-check that all code snippets are correct and runnable.
  • Review spelling and grammar: Check the spelling and grammar of the documentation.
  • Update the table of contents: If you add a new page, include it in the relevant index.md or the mkdocs.yml file.

Contribute with a tutorial

You can also contribute a tutorial (.ipynb) to the \"Community\" section. We recommend aligning the tutorial with the structure of the existing tutorials. For an example, check this tutorial.

"},{"location":"community/popular_issues/","title":"Issue dashboard","text":"Most engaging open issuesLatest issues open by the communityPlanned issues for upcoming releases Rank Issue Reactions Comments 1 4637 - [FEATURE] Label breakdown in Feedback dataset stats \ud83d\udc4d 6 \ud83d\udcac 4 2 1607 - Support for hierarchical multilabel text classification (taxonomy) \ud83d\udc4d 5 \ud83d\udcac 15 3 4658 - Active listeners for Feedback Dataset \ud83d\udc4d 5 \ud83d\udcac 5 4 1800 - Add comments/notes to annotation datasets to share with teammates. \ud83d\udc4d 2 \ud83d\udcac 6 5 1837 - Custom Record UI Templates \ud83d\udc4d 2 \ud83d\udcac 6 6 1922 - Show potential number of records during filter selection \ud83d\udc4d 2 \ud83d\udcac 4 7 1630 - Accepting several predictions/annotations for the same record \ud83d\udc4d 2 \ud83d\udcac 2 8 5348 - [FEATURE] Ability to create new labels on-the-fly \ud83d\udc4d 2 \ud83d\udcac 0 9 3625 - [IMPROVE] Fields with empty title shall have exactly the same value as the user entered in the name field, without altering it \ud83d\udc4d 2 \ud83d\udcac 0 10 4372 - [FEATURE] distribution indication for filters \ud83d\udc4d 1 \ud83d\udcac 6 Rank Issue Author 1 \ud83d\udfe2 5570 - [BUG-python/deployment] by lecheuklun 2 \ud83d\udfe2 5561 - [FEATURE] Force predetermined sorting for a dataset by lgienapp 3 \ud83d\udfe2 5557 - [DOCS] \"Bulk Labeling Multimodal Data\" Notebook outdated by trojblue 4 \ud83d\udfe2 5548 - [BUG-python/deployment] verify=False parameter is not passed to httpx.Client through Argilla class (v2.2.0) by xiajing10 5 \ud83d\udfe3 5543 - automatically load token from collab secrets if it exists by not-lain 6 \ud83d\udfe3 5530 - [FEATURE] updated_at / inserted_at properties on retrieved Records by maxserras 7 \ud83d\udfe3 5529 - [BUG-UI/UX] API Key copy button not working by cceyda 8 \ud83d\udfe2 5528 - [FEATURE] Filter by responses & suggestions by cceyda 9 \ud83d\udfe2 5516 - [FEATURE] Allow all annotators in workspace to see all the submitted records by cceyda 10 \ud83d\udfe2 5513 - [ENHANCEMENT] Improve ImageField error messaging to deal with paths, urls, none by cceyda Rank Issue Milestone 1 \ud83d\udfe2 5415 - [FEATURE] Do not stop logging records if UnprocessableEntityError is raised because one single record v2.2.0 2 \ud83d\udfe2 5534 - [FEATURE] preview custom field data in dataset settings page v2.3.0 3 \ud83d\udfe2 5520 - [BUG-UI/UX] Incorrect iframe height calculation in sandBox Component v2.4.0 4 \ud83d\udfe2 5513 - [ENHANCEMENT] Improve ImageField error messaging to deal with paths, urls, none v2.4.0 5 \ud83d\udfe2 5458 - [FEATURE] Controls for data schema for images when exporting datasets and records v2.4.0 6 \ud83d\udfe2 4931 - [REFACTOR] Improve handling of question models and dicts v2.4.0 7 \ud83d\udfe2 4935 - [CONFIG] Resolve python requirements for python version and dependencies with server. v2.4.0 8 \ud83d\udfe2 1836 - Webhooks v2.4.0

Last update: 2024-10-07

"},{"location":"community/integrations/llamaindex_rag_github/","title":"LlamaIndex","text":"
!pip install \"argilla-llama-index\"\n!pip install \"llama-index-readers-github==0.1.9\"\n

Let's make the required imports:

from llama_index.core import (\n    Settings,\n    VectorStoreIndex,\n    set_global_handler,\n)\nfrom llama_index.llms.openai import OpenAI\nfrom llama_index.readers.github import (\n    GithubClient,\n    GithubRepositoryReader,\n)\n

We need to set the OpenAI API key and the GitHub token. The OpenAI API key is required to run queries using GPT models, while the GitHub token ensures you have access to the repository you're using. Although the GitHub token might not be necessary for public repositories, it is still recommended.

import os\n\nos.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\nopenai_api_key = os.getenv(\"OPENAI_API_KEY\")\n\nos.environ[\"GITHUB_TOKEN\"] = \"ghp_...\"\ngithub_token = os.getenv(\"GITHUB_TOKEN\")\n
set_global_handler(\n    \"argilla\",\n    dataset_name=\"github_query_model\",\n    api_url=\"http://localhost:6900\",\n    api_key=\"argilla.apikey\",\n    number_of_retrievals=2,\n)\n
github_client = GithubClient(github_token=github_token, verbose=True)\n

Before creating our GithubRepositoryReader instance, we need to adjust the nesting. Since the Jupyter kernel operates on an event loop, we must prevent this loop from finishing before the repository is fully read.

import nest_asyncio\n\nnest_asyncio.apply()\n

Now, let\u2019s create a GithubRepositoryReader instance with the necessary repository details. In this case, we'll target the main branch of the argilla repository. As we will focus on the documentation, we will focus on the argilla/docs/ folder, excluding images, json files, and ipynb files.

documents = GithubRepositoryReader(\n    github_client=github_client,\n    owner=\"argilla-io\",\n    repo=\"argilla\",\n    use_parser=False,\n    verbose=False,\n    filter_directories=(\n        [\"argilla/docs/\"],\n        GithubRepositoryReader.FilterType.INCLUDE,\n    ),\n    filter_file_extensions=(\n        [\n            \".png\",\n            \".jpg\",\n            \".jpeg\",\n            \".gif\",\n            \".svg\",\n            \".ico\",\n            \".json\",\n            \".ipynb\",   # Erase this line if you want to include notebooks\n\n        ],\n        GithubRepositoryReader.FilterType.EXCLUDE,\n    ),\n).load_data(branch=\"main\")\n

Now, let's create a LlamaIndex index out of this document, and we can start querying the RAG system.

# LLM settings\nSettings.llm = OpenAI(\n    model=\"gpt-3.5-turbo\", temperature=0.8, openai_api_key=openai_api_key\n)\n\n# Load the data and create the index\nindex = VectorStoreIndex.from_documents(documents)\n\n# Create the query engine\nquery_engine = index.as_query_engine()\n
response = query_engine.query(\"How do I create a Dataset in Argilla?\")\nresponse\n

The generated response will be automatically logged in our Argilla instance. Check it out! From Argilla you can quickly have a look at your predictions and annotate them, so you can combine both synthetic data and human feedback.

Let's ask a couple of more questions to see the overall behavior of the RAG chatbot. Remember that the answers are automatically logged into your Argilla instance.

questions = [\n    \"How can I list the available datasets?\",\n    \"Which are the user credentials?\",\n    \"Can I use markdown in Argilla?\",\n    \"Could you explain how to annotate datasets in Argilla?\",\n]\n\nanswers = []\n\nfor question in questions:\n    answers.append(query_engine.query(question))\n\nfor question, answer in zip(questions, answers):\n    print(f\"Question: {question}\")\n    print(f\"Answer: {answer}\")\n    print(\"----------------------------\")\n
\nQuestion: How can I list the available datasets?\nAnswer: You can list all the datasets available in a workspace by utilizing the `datasets` attribute of the `Workspace` class. Additionally, you can determine the number of datasets in a workspace by using `len(workspace.datasets)`. To list the datasets, you can iterate over them and print out each dataset. Remember that dataset settings are not preloaded when listing datasets, and if you need to work with settings, you must load them explicitly for each dataset.\n----------------------------\nQuestion: Which are the user credentials?\nAnswer: The user credentials in Argilla consist of a username, password, and API key.\n----------------------------\nQuestion: Can I use markdown in Argilla?\nAnswer: Yes, you can use Markdown in Argilla.\n----------------------------\nQuestion: Could you explain how to annotate datasets in Argilla?\nAnswer: To annotate datasets in Argilla, users can manage their data annotation projects by setting up `Users`, `Workspaces`, `Datasets`, and `Records`. By deploying Argilla on the Hugging Face Hub or with `Docker`, installing the Python SDK with `pip`, and creating the first project, users can get started in just 5 minutes. The tool allows for interacting with data in a more engaging way through features like quick labeling with filters, AI feedback suggestions, and semantic search, enabling users to focus on training models and monitoring their performance effectively.\n----------------------------\n\n
\n\n\n
"},{"location":"community/integrations/llamaindex_rag_github/#create-a-rag-system-expert-in-a-github-repository-and-log-your-predictions-in-argilla","title":"\ud83d\udd75\ud83c\udffb\u200d\u2640\ufe0f Create a RAG system expert in a GitHub repository and log your predictions in Argilla","text":"

In this tutorial, we'll show you how to create a RAG system that can answer questions about a specific GitHub repository. As example, we will target the Argilla repository. This RAG system will target the docs of the repository, as that's where most of the natural language information about the repository can be found.

This tutorial includes the following steps:

  • Setting up the Argilla callback handler for LlamaIndex.
  • Initializing a GitHub client
  • Creating an index with a specific set of files from the GitHub repository of our choice.
  • Create a RAG system out of the Argilla repository, ask questions, and automatically log the answers to Argilla.

This tutorial is based on the Github Repository Reader made by LlamaIndex.

"},{"location":"community/integrations/llamaindex_rag_github/#getting-started","title":"Getting started","text":""},{"location":"community/integrations/llamaindex_rag_github/#deploy-the-argilla-server","title":"Deploy the Argilla server\u00b6","text":"

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

"},{"location":"community/integrations/llamaindex_rag_github/#set-up-the-environment","title":"Set up the environment\u00b6","text":"

To complete this tutorial, you need to install this integration and a third-party library via pip.

Note

Check the integration GitHub repository here.

"},{"location":"community/integrations/llamaindex_rag_github/#set-the-argillas-llamaindex-handler","title":"Set the Argilla's LlamaIndex handler","text":"

To easily log your data into Argilla within your LlamaIndex workflow, you only need a simple step. Just call the Argilla global handler for Llama Index before starting production with your LLM. This ensured that the predictions obtained using Llama Index are automatically logged to the Argilla instance.

  • dataset_name: The name of the dataset. If the dataset does not exist, it will be created with the specified name. Otherwise, it will be updated.
  • api_url: The URL to connect to the Argilla instance.
  • api_key: The API key to authenticate with the Argilla instance.
  • number_of_retrievals: The number of retrieved documents to be logged. Defaults to 0.
  • workspace_name: The name of the workspace to log the data. By default, the first available workspace.

> For more information about the credentials, check the documentation for users and workspaces.

"},{"location":"community/integrations/llamaindex_rag_github/#retrieve-the-data-from-github","title":"Retrieve the data from GitHub","text":"

First, we need to initialize the GitHub client, which will include the GitHub token for repository access.

"},{"location":"community/integrations/llamaindex_rag_github/#create-the-index-and-make-some-queries","title":"Create the index and make some queries","text":""},{"location":"getting_started/faq/","title":"FAQs","text":"What is Argilla?

Argilla is a collaboration tool for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency. It is designed to help you achieve and keep high-quality data standards, store your training data, store the results of your models, evaluate their performance, and improve the data through human and AI feedback.

Does Argilla cost money?

No. Argilla is an open-source project and is free to use. You can deploy Argilla on your own infrastructure or use our cloud offering.

What data types does Argilla support?

Text data, mostly. Argilla natively supports textual data, however, we do support rich text, which means you can represent different types of data in Argilla as long as you can convert it to text. For example, you can store images, audio, video, and any other type of data as long as you can convert it to their base64 representation or render them as HTML in for example an IFrame.

Does Argilla train models?

No. Argilla is a collaboration tool to achieve and keep high-quality data standards. You can use Argilla to store your training data, store the results of your models, evaluate their performance and improve the data. For training models, you can use any machine learning framework or library that you prefer even though we recommend starting with Hugging Face Transformers.

Does Argilla provide annotation workforces?

Yes, kind of. We don't provide annotation workforce in-house but we do have partnerships with workforce providers that ensure ethical practices and secure work environments. Feel free to schedule a meeting here or contact us via email.

How does Argilla differ from competitors like Lilac, Snorkel, Prodigy and Scale?

Argilla distinguishes itself for its focus on specific use cases and human-in-the-loop approaches. While it does offer programmatic features, Argilla\u2019s core value lies in actively involving human experts in the tool-building process, setting it apart from other competitors.

Furthermore, Argilla places particular emphasis on smooth integration with other tools in the community, particularly within the realms of MLOps and NLP. So, its compatibility with popular frameworks like spaCy and Hugging Face makes it exceptionally user-friendly and accessible.

Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, often require a significant commitment. Argilla, on the other hand, works more as a tool within the MLOps ecosystem, allowing users to begin with specific use cases and then scale up as needed. This flexibility is particularly beneficial for users and customers who prefer to start small and expand their applications over time, as opposed to committing to an all-encompassing tool from the outset.

What is the difference between Argilla 2.0 and the legacy datasets in 1.0?

Argilla 1.0 relied on 3 main task datasets: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text. These tasks were designed to be simple, easy to use and high in functionality but they were limited in adaptability. With the introduction of Large Language Models (LLMs) and the increasing complexity of NLP tasks, we realized that we needed to expand the capabilities of Argilla to support more advanced feedback mechanisms which led to the introduction of the FeedbackDataset. Compared to its predecessor it was high in adaptability but still limited in functionality. After having ported all of the functionality of the legacy tasks to the new FeedbackDataset, we decided to deprecate the legacy tasks in favor of a brand new SDK with the FeedbackDataset at its core.

"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/","title":"Hugging Face Spaces Settings","text":"

This section details how to configure and deploy Argilla on Hugging Face Spaces. It covers:

  • Persistent storage
  • How to deploy Argilla under a Hugging Face Organization
  • How to configure and disable HF OAuth access
  • How to use Private Spaces

Looking to get started easily?

If you just discovered Argilla and want to get started quickly, go to the Quickstart guide.

"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#persistent-storage","title":"Persistent storage","text":"

In the Space creation UI, persistent storage is set to Small PAID, which is a paid service, charged per hour of usage.

Spaces get restarted due to maintainance, inactivity, and every time you change your Spaces settings. Persistent storage enables Argilla to save to disk your datasets and configurations across restarts.

Ephimeral FREE persistent storage

Not setting persistent storage to Small means that you will loose your data when the Space restarts.

If you plan to use the Argilla Space beyond testing, it's highly recommended to set persistent storage to Small.

If you just want to quickly test or use Argilla for a few hours with the risk of loosing your datasets, choose Ephemeral FREE. Ephemeral FREE means your datasets and configuration will not be saved to disk, when the Space is restarted your datasets, workspaces, and users will be lost.

If you want to disable the persistence storage warning, you can set the environment variable ARGILLA_SHOW_HUGGINGFACE_SPACE_PERSISTENT_STORAGE_WARNING=false

Read this if you have datasets and want to enable persistent storage

If you want to enable persistent storage Small PAID and you have created datasets, users, or workspaces, follow this process:

  • First, make a local or remote copy of your datasets, following the Import and Export guide. This is the most important step, because changing the settings of your Space leads to a restart and thus a data loss.
  • If you have created users (not signed in with Hugging Face login), consider storing a copy of users following the manage users guide.
  • Once you have stored all your data safely, go to you Space Settings Tab and select Small.
  • Your Space will be restarted and existing data will be lost. From now on, all the new data you create in Argilla will be kept safely
  • Recover your data, by following the above mentioned guides.
"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-configure-and-disable-oauth-access","title":"How to configure and disable OAuth access","text":"

By default, Argilla Spaces are configured with Hugging Face OAuth, in the following way:

  • Any Hugging Face user that can see your Space, can use the Sign in button, join as an annotator, and contribute to the datasets available under the argilla workspace. This workspace is created during the deployment process.
  • These users can only explore and annotate datasets in the argilla workspace but can't perform any critical operation like create, delete, update, or configure datasets. By default, any other workspace you create, won't be visible to these users.

To restrict access or change the default behaviour, there's two options:

Set your Space to private. This is especially useful if your Space is under an organization. This will only allow members within your organization to see and join your Argilla space. It can also be used for personal, solo projects.

Modify the .oauth.yml configuration file. You can find and modify this file under the Files tab of your Space. The default file looks like this:

# Change to `false` to disable HF oauth integration\n#enabled: false\n\nproviders:\n  - name: huggingface\n\n# Allowed workspaces must exists\nallowed_workspaces:\n  - name: argilla\n
You can modify two things:

  • Uncomment enabled: false to completely disable the Sign in with Hugging Face. If you disable it make sure to set the USERNAME and PASSWORD Space secrets to be able to login as an owner.
  • Change the list of allowed workspaces.

For example if you want to let users join a new workspace community-initiative:

allowed_workspaces:\n  - name: argilla\n  - name: community-initiative\n
"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-deploy-argilla-under-a-hugging-face-organization","title":"How to deploy Argilla under a Hugging Face Organization","text":"

Creating an Argilla Space within an organization is useful for several scenarios:

  • You want to only enable members of your organization to join your Space. You can achieve this by setting your Space to private.
  • You want manage the Space together with other users (e.g., Space settings, etc.). Note that if you just want to manage your Argilla datasets, workspaces, you can achieve this by adding other Argilla owner roles to your Argilla Server.
  • More generally, you want to make available your space under an organization/community umbrella.

The steps are very similar the Quickstart guide with two important differences:

Setup USERNAME

You need to set up the USERNAME Space Secret with your Hugging Face username. This way, the first time you enter with the Hugging Face Sign in button, you'll be granted the owner role.

Enable Persistent Storage SMALL

Not setting persistent storage to Small means that you will loose your data when the Space restarts.

For Argilla Spaces with many users, it's strongly recommended to set persistent storage to Small.

"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-use-private-spaces","title":"How to use Private Spaces","text":"

Setting your Space visibility to private can be useful if:

  • You want to work on your personal, solo project.
  • You want your Argilla to be available only to members of the organization where you deploy the Argilla Space.

You can set the visibility of the Space during the Space creation process or afterwards under the Settings Tab.

To use the Python SDK with private Spaces you need to specify your HF_TOKEN which can be found here, when creating the client:

import argilla as rg\n\nHF_TOKEN = \"...\"\n\nclient = rg.Argilla(\n    api_url=\"<api_url>\",\n    api_key=\"<api_key>\"\n    headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n
"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#space-secrets-overview","title":"Space Secrets overview","text":"

There's two optional secrets to set up the USERNAME and PASSWORD of the owner of the Argilla Space. Remember that, by default Argilla Spaces are configured with a Sign in with Hugging Face button, which is also used to grant an owner to the creator of the Space for personal spaces.

The USERNAME and PASSWORD are only useful in a couple of scenarios:

  • You have disabled Hugging Face OAuth.
  • You want to set up Argilla under an organization and want your Hugging Face username to be granted the owner role.

In summary, when setting up a Space:

Creating a Space under your personal account

If you are creating the Space under your personal account, don't insert any value for USERNAME and PASSWORD. Once you launch the Space you will be able to Sign in with your Hugging Face username and the owner role.

Creating a Space under an organization

If you are creating the Space under an organization make sure to insert your Hugging Face username in the secret USERNAME. In this way, you'll be able to Sign in with your Hugging Face user.

"},{"location":"getting_started/how-to-deploy-argilla-with-docker/","title":"Deploy with Docker","text":"

This guide describes how to deploy the Argilla Server with docker compose. This is useful if you want to deploy Argilla locally, and/or have full control over the configuration the server, database, and search engine (Elasticsearch).

First, you need to install docker on your machine and make sure you can run docker compose.

Then, create a folder (you can modify the folder name):

mkdir argilla && cd argilla\n

Download docker-compose.yaml:

wget -O docker-compose.yaml https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml\n

or using curl:

curl https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml -o docker-compose.yaml\n

Run to deploy the server on http://localhost:6900:

docker compose up -d\n

Once is completed, go to this URL with your browser: http://localhost:6900 and you should see the Argilla login page.

If it's not available, check the logs:

docker compose logs -f\n

Most of the deployment issues are related to ElasticSearch. Join Hugging Face Discord's server and ask for support on the Argilla channel.

"},{"location":"getting_started/quickstart/","title":"Quickstart","text":"

Argilla is a free, open-source, self-hosted tool. This means you need to deploy its UI to start using it. There is two main ways to deploy Argilla:

Deploy on the Hugging Face Hub

The recommended choice to get started. You can get up and running in under 5 minutes and don't need to maintain a server or run any commands.

If you're just getting started with Argilla, click the deploy button below:

You can use the default values following these steps:

  • Leave the default Space owner (your personal account)
  • Leave USERNAME and PASSWORD secrets empty since you'll sign in with your HF user as the Argilla Space owner.
  • Click create Space to launch Argilla \ud83d\ude80.
  • Once you see the Argilla UI, go to the Sign in into the Argilla UI section. If you see the Building message for longer than 2-3 min refresh the page.

Persistent storage SMALL

Not setting persistent storage to SMALL means that you will loose your data when the Space restarts. Spaces get restarted due to maintainance, inactivity, and every time you change your Spaces settings. If you want to use the Space just for testing you can use FREE temporarily.

If you want to deploy Argilla within a Hugging Face organization, setup a more stable Space, or understand the settings, check out the HF Spaces settings guide.

Deploy with Docker

If you want to run Argilla locally on your machine or a server, or tune the server configuration, choose this option. To use this option, check this guide.

"},{"location":"getting_started/quickstart/#sign-in-into-the-argilla-ui","title":"Sign in into the Argilla UI","text":"

If everything went well, you should see the Argilla sign in page that looks like this:

Building errors

If you get a build error, sometimes restarting the Space from the Settings page works, otherwise check the HF Spaces settings guide.

In the sign in page:

  1. Click on Sign in with Hugging Face
  2. Authorize the application and you will be logged in into Argilla as an owner.

Unauthorized error

Sometimes, after authorizing you'll see an unauthorized error, and get redirected to the sign in page. Typically, clicking the Sign in button solves the issue.

Congrats! Your Argilla server is ready to start your first project using the Python SDK. You now have full rights to create datasets. Follow the instructions in the home page, or keep reading this guide if you want a more detailed explanation.

"},{"location":"getting_started/quickstart/#install-the-python-sdk","title":"Install the Python SDK","text":"

To manage workspaces and datasets in Argilla, you need to use the Argilla Python SDK. You can install it with pip as follows:

pip install argilla\n
"},{"location":"getting_started/quickstart/#create-your-first-dataset","title":"Create your first dataset","text":"

For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab.

To start interacting with your Argilla server, you need to create a instantiate a client with an API key and API URL:

  • The <api_key> is in the My Settings page of your Argilla Space.

  • The <api_url> is the URL shown in your browser if it ends with *.hf.space.

import argilla as rg\n\nclient = rg.Argilla(\n    api_url=\"<api_url>\",\n    api_key=\"<api_key>\"\n)\n

You can't find your API URL

If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, there are two options: 1. Click on the three points menu at the top of the Space, select \"Embed this Space\", and open the direct URL. 2. Use this pattern: https://[your-owner-name]-[your_space_name].hf.space.

To create a dataset with a simple text classification task, first, you need to define the dataset settings.

settings = rg.Settings(\n    guidelines=\"Classify the reviews as positive or negative.\",\n    fields=[\n        rg.TextField(\n            name=\"review\",\n            title=\"Text from the review\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"my_label\",\n            title=\"In which category does this article fit?\",\n            labels=[\"positive\", \"negative\"],\n        )\n    ],\n)\n

Now you can create the dataset with these settings. Publish the dataset to make it available in the UI and add the records.

About workspaces

Workspaces in Argilla group datasets and user access rights. The workspace parameter is optional in this case. If you don't specify it, the dataset will be created in the default workspace argilla.

By default, this workspace will be visible to users joining with the Sign in with Hugging Face button. You can create other workspaces and decide to grant access to users either with the SDK or the changing the OAuth configuration.

dataset = rg.Dataset(\n    name=f\"my_first_dataset\",\n    settings=settings,\n    client=client,\n    #workspace=\"argilla\"\n)\ndataset.create()\n

Now you can add records to your dataset. We will use the IMDB dataset from the Hugging Face Datasets library as an example. The mapping parameter indicates which keys/columns in the source dataset correspond to the Argilla dataset fields.

from datasets import load_dataset\n\ndata = load_dataset(\"imdb\", split=\"train[:100]\").to_list()\n\ndataset.records.log(records=data, mapping={\"text\": \"review\"})\n

\ud83c\udf89 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records.

"},{"location":"getting_started/quickstart/#next-steps","title":"Next steps","text":"
  • To learn how to create your datasets, workspace, and manage users, check the how-to guides.

  • To learn Argilla with hands-on examples, check the Tutorials section.

  • To further configure your Argilla Space, check the Hugging Face Spaces settings guide.

"},{"location":"how_to_guides/","title":"How-to guides","text":"

These guides provide step-by-step instructions for common scenarios, including detailed explanations and code samples. They are divided into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Argilla, while the advanced guides will help you explore more advanced features.

"},{"location":"how_to_guides/#basic","title":"Basic","text":"
  • Manage users and credentials

    Learn what they are and how to manage (create, read and delete) Users in Argilla.

    How-to guide

  • Manage workspaces

    Learn what they are and how to manage (create, read and delete) Workspaces in Argilla.

    How-to guide

  • Create, update, and delete datasets

    Learn what they are and how to manage (create, read and delete) Datasets and customize them using the Settings for Fields, Questions, Metadata and Vectors.

    How-to guide

  • Add, update, and delete records

    Learn what they are and how to add, update and delete the values for a Record, which are made up of Metadata, Vectors, Suggestions and Responses.

    How-to guide

  • Distribute the annotation

    Learn how to use Argilla's automatic TaskDistribution to annotate as a team efficiently.

    How-to guide

  • Annotate a dataset

    Learn how to use the Argilla UI to navigate Datasets and submit Responses.

    How-to guide

  • Query and filter a dataset

    Learn how to query and filter a Dataset.

    How-to guide

  • Import and export datasets and records

    Learn how to export your Dataset or its Records to Python, your local disk, or the Hugging Face Hub.

    How-to guide

"},{"location":"how_to_guides/#advanced","title":"Advanced","text":"
  • Custom fields with layout templates

    Learn how to create CustomFields with HTML, CSS and JavaScript templates.

    How-to guide

  • Use Markdown to format rich content

    Learn how to use Markdown and HTML in TextField to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

    How-to guide

  • Migrate to Argilla V2

    Learn how to migrate Users, Workspaces and Datasets from Argilla V1 to V2.

    How-to guide

"},{"location":"how_to_guides/annotate/","title":"Annotate your dataset","text":"

To experience the UI features firsthand, you can take a look at the Demo \u2197.

Argilla UI offers many functions to help you manage your annotation workflow, aiming to provide the most flexible approach to fit the wide variety of use cases handled by the community.

"},{"location":"how_to_guides/annotate/#annotation-interface-overview","title":"Annotation interface overview","text":""},{"location":"how_to_guides/annotate/#flexible-layout","title":"Flexible layout","text":"

The UI is responsive with two columns for larger devices and one column for smaller devices. This enables you to annotate data using your mobile phone for simple datasets (i.e., not very long text and 1-2 questions) or resize your screen to get a more compact UI.

HeaderLeft paneRight paneLeft bottom panelRight bottom panel

At the right side of the navigation breadcrumb, you can customize the dataset settings and edit your profile.

This area displays the control panel on the top. The control panel is used for performing keyword-based search, applying filters, and sorting the results.

Below the control panel, the record card(s) are displayed one by one (Focus view) or in a vertical list (Bulk view).

This is where you annotate your dataset. Simply fill it out as a form, then choose to Submit, Save as Draft, or Discard.

This expandable area displays the annotation guidelines. The annotation guidelines can be edited by owner and admin roles in the dataset settings.

This expandable area displays your annotation progress.

"},{"location":"how_to_guides/annotate/#shortcuts","title":"Shortcuts","text":"

The Argilla UI includes a range of shortcuts. For the main actions (submit, discard, save as draft and selecting labels) the keys are showed in the corresponding button.

To learn how to move from one question to another or between records using the keyboard, take a look at the table below.

Shortcuts provide a smoother annotation experience, especially with datasets using a single question (Label, MultiLabel, Rating, or Ranking).

Available shortcuts Action Keys Activate form \u21e5 Tab Move between questions \u2193 Down arrow\u00a0or\u00a0\u2191 Up arrow Select and unselect label 1,\u00a02,\u00a03 Move between labels or ranking options \u21e5 Tab\u00a0or\u00a0\u21e7 Shift\u00a0\u21e5 Tab Select rating and rank 1,\u00a02,\u00a03 Fit span to character selection Hold\u00a0\u21e7 Shift Activate text area \u21e7 Shift\u00a0\u21b5 Enter Exit text area Esc Discard \u232b Backspace Save draft (Mac os) \u2318 Cmd\u00a0S Save draft (Other) Ctrl\u00a0S Submit \u21b5 Enter Move between pages \u2192 Right arrow\u00a0or\u00a0\u2190 Left arrow"},{"location":"how_to_guides/annotate/#view-by-status","title":"View by status","text":"

The view selector is set by default on Pending.

If you are starting an annotation effort, all the records are initially kept in the Pending view. Once you start annotating, the records will move to the other queues: Draft, Submitted, Discarded.

  • Pending: The records without a response.
  • Draft: The records with partial responses. They can be submitted or discarded later. You can\u2019t move them back to the pending queue.
  • Discarded: The records may or may not have responses. They can be edited but you can\u2019t move them back to the pending queue.
  • Submitted: The records have been fully annotated and have already been submitted. You can remove them from this queue and send them to the draft or discarded queues, but never back to the pending queue.

Note

If you are working as part of a team, the number of records in your Pending queue may change as other members of the team submit responses and those records get completed.

Tip

If you are working as part of a team, the records in the draft queue that have been completed by other team members will show a check mark to indicate that there is no need to provide a response.

"},{"location":"how_to_guides/annotate/#suggestions","title":"Suggestions","text":"

If your dataset includes model predictions, you will see them represented by a sparkle icon \u2728 in the label or value button. We call them \u201cSuggestions\u201d and they appear in the form as pre-filled responses. If confidence scores have been included by the dataset admin, they will be shown alongside with the label. Additionally, admins can choose to always show suggested labels at the beginning of the list. This can be configured from the dataset settings.

If you agree with the suggestions, you just need to click on the Submit button, and they will be considered as your response. If the suggestion is incorrect, you can modify it and submit your final response.

"},{"location":"how_to_guides/annotate/#focus-view","title":"Focus view","text":"

This is the default view to annotate your dataset linearly, displaying one record after another.

Tip

You should use this view if you have a large number of required questions or need a strong focus on the record content to be labelled. This is also the recommended view for annotating a dataset sample to avoid potential biases introduced by using filters, search, sorting and bulk labelling.

Once you submit your first response, the next record will appear automatically. To see again your submitted response, just click on Prev.

Navigating through the records

To navigate through the records, you can use the\u00a0Prev, shown as\u00a0<, and\u00a0Next,\u00a0> buttons on top of the record card.

Each time the page is fully refreshed, the records with modified statuses (Pending to Discarded, Pending to Save as Draft, Pending to Submitted) are sent to the corresponding queue. The control panel displays the status selector, which is set to Pending by default.

"},{"location":"how_to_guides/annotate/#bulk-view","title":"Bulk view","text":"

The bulk view is designed to speed up the annotation and get a quick overview of the whole dataset.

The bulk view displays the records in a vertical list. Once this view is active, some functions from the control panel will activate to optimize the view. You can define the number of records to display by page between 10, 25, 50, 100 and whether records are shown with a fixed (Collapse records) or their natural height (Expand records).

Tip

You should use this to quickly explore a dataset. This view is also recommended if you have a good understanding of the domain and want to apply your knowledge based on things like similarity and keyword search, filters, and suggestion score thresholds. For a datasets with a large number of required questions or very long fields, the focus view would be more suitable.

With multiple questions, think about using the bulk view to annotate massively one question. Then, you can complete the annotation per record from the draft queue.

Note

Please note that suggestions are not shown in bulk view (except for Spans) and that you will need to save as a draft when you are not providing responses to all required questions.

"},{"location":"how_to_guides/annotate/#annotation-progress","title":"Annotation progress","text":"

You can track the progress of an annotation task in the progress bar shown in the dataset list and in the progress panel inside the dataset. This bar shows the number of records that have been completed (i.e., those that have the minimum number of submitted responses) and those left to be completed.

You can also track your own progress in real time expanding the right-bottom panel inside the dataset page. There you can see the number of records for which you have Pending,\u00a0Draft,\u00a0Submitted\u00a0and\u00a0Discarded responses.

Note

You can also explore the dataset progress from the SDK. Check the Track your team's progress to know more about it.

"},{"location":"how_to_guides/annotate/#use-search-filters-and-sort","title":"Use search, filters, and sort","text":"

The UI offers various features designed for data exploration and understanding. Combining these features with bulk labelling can save you and your team hours of time.

Tip

You should use this when you are familiar with your data and have large volumes to annotate based on verified beliefs and experience.

"},{"location":"how_to_guides/annotate/#search","title":"Search","text":"

From the control panel at the top of the left pane, you can search by keyword across the entire dataset. If you have more than one field in your records, you may specify if the search is to be performed \u201cAll\u201d fields or on a specific one. Matched results are highlighted in color.

Note

If you introduce more than one keyword, the search will return results where all keywords have a match.

Tip

For more advanced searches, take a look at the advanced queries DSL.

"},{"location":"how_to_guides/annotate/#order-by-record-semantic-similarity","title":"Order by record semantic similarity","text":"

You can retrieve records based on their similarity to another record if vectors have been added to the dataset.

Note

Check these guides to know how to add vectors to your\u00a0dataset and\u00a0records.

To use the search by semantic similarity function, click on Find similar within the record you wish to use as a reference. If multiple vectors are available, select the desired vector. You can also choose whether to retrieve the most or least similar records.

The retrieved records are then ordered by similarity, with the similarity score displayed on each record card.

While the semantic search is active, you can update the selected vector or adjust the order of similarity, and specify the number of desired results.

To cancel the search, click on the cross icon next to the reference record.

"},{"location":"how_to_guides/annotate/#filter-and-sort-by-metadata-responses-and-suggestions","title":"Filter and sort by metadata, responses, and suggestions","text":""},{"location":"how_to_guides/annotate/#filter","title":"Filter","text":"

If the dataset contains metadata, responses and suggestions, click on\u00a0Filter in the control panel to display the available filters. You can select multiple filters and combine them.

Note

Record info including metadata is visible from the ellipsis menu in the record card.

From the Metadata dropdown, type and select the property. You can set a range for integer and float properties, and select specific values for term metadata.

Note

Note that if a metadata property was set to visible_for_annotators=False this metadata property will only appear in the metadata filter for users with the admin or owner role.

From the Responses dropdown, type and select the question. You can set a range for rating questions and select specific values for label, multi-label, and span questions.

Note

The text and ranking questions are not available for filtering.

From the Suggestions dropdown, filter the suggestions by\u00a0Suggestion values,\u00a0Score\u00a0, or\u00a0Agent.\u00a0

"},{"location":"how_to_guides/annotate/#sort","title":"Sort","text":"

You can sort your records according to one or several attributes.

The insertion time and last update are general to all records.

The suggestion scores, response, and suggestion values for rating questions and metadata properties are available only when they were provided.

"},{"location":"how_to_guides/custom_fields/","title":"Custom fields with layout templates","text":"

This guide demonstrates how to create custom fields in Argilla using HTML, CSS, and JavaScript templates.

Main Class

rg.CustomField(\n    name=\"custom\",\n    title=\"Custom\",\n    template=\"<div>{{record.fields.custom.key}}</div>\",\n    advanced_mode=False,\n    required=True,\n    description=\"Field description\",\n)\n

Check the CustomField - Python Reference to see the attributes, arguments, and methods of the CustomField class in detail.

"},{"location":"how_to_guides/custom_fields/#understanding-the-record-object","title":"Understanding the Record Object","text":"

The record object is the main JavaScript object that contains all the information about the Argilla record object in the UI, like fields, metadata, etc. Your template can use this object to display record information within the custom field. You can for example access the fields of the record by navigating to record.fields.<field_name> and this generally works the same for metadata, responses, etc.

"},{"location":"how_to_guides/custom_fields/#using-handlebars-in-your-template","title":"Using Handlebars in your template","text":"

By default, custom fields will use the handlebars syntax engine to render templates with record information. This engine will convert the content inside the brackets {{}} to the values of record's field's object that you reference within your template. As described in the Understanding the Record Object section, you can access the fields of the record by navigating to {{record.fields.<field_name>}}. For more complex use cases, handlebars has various expressions, partials, and helpers that you can use to render your data. You can deactivate the handlebars engine with the advanced_mode=True parameter in CustomField, then you will need to define custom javascript to access the record attributes, like described in the Advanced Mode section.

"},{"location":"how_to_guides/custom_fields/#usage-example","title":"Usage example","text":"

Because of the handlebars syntax engine, we only need to pass the HTML and potentially some CSS in between the <style> tags.

css_template = \"\"\"\n<style>\n#container {\n    display: flex;\n    gap: 10px;\n}\n.column {\n    flex: 1;\n}\n</style>\n\"\"\" # (1)\n\nhtml_template = \"\"\"\n<div id=\"container\">\n    <div class=\"column\">\n        <h3>Original</h3>\n        <img src=\"{{record.fields.image.original}}\" />\n    </div>\n    <div class=\"column\">\n        <h3>Revision</h3>\n        <img src=\"{{record.fields.image.revision}}\" />\n    </div>\n</div>\n\"\"\" # (2)\n
  1. This is a CSS template, which ensures that the container and columns are styled.
  2. This is an HTML template, which creates a container with two columns and injects the value corresponding to the key of the image field into it.

We can now pass these templates to the CustomField class.

import argilla as rg\n\ncustom_field = rg.CustomField(\n    name=\"image\",\n    template=css_template + html_template,\n)\n\nsettings = rg.Settings(\n    fields=[custom_field],\n    questions=[rg.TextQuestion(name=\"response\")],\n)\n\ndataset = rg.Dataset(\n    name=\"custom_field_dataset\",\n    settings=settings,\n).create()\n\ndataset.records.log([\n    rg.Record(\n        fields={\n            \"image\": {\n                \"original\": \"https://argilla.io/brand-assets/argilla/argilla-logo-color-black.png\",\n                \"revision\": \"https://argilla.io/brand-assets/argilla/argilla-logo-black.png\",\n            }\n        }\n    )]\n)\n

The result will be the following:

"},{"location":"how_to_guides/custom_fields/#example-gallery","title":"Example Gallery","text":"Metadata in a table

You can make it easier to read metadata by displaying it in a table. This uses handlebars to iterate over the metadata object and display each key-value pair in a row.

template = \"\"\"\n<style>\n    .container {\n        border: 1px solid #ddd;\n        font-family: sans-serif;\n    }\n    .row {\n        display: flex;\n        border-bottom: 1px solid #ddd;\n    }\n    .row:last-child {\n        border-bottom: none;\n    }\n    .column {\n        flex: 1;\n        padding: 8px;\n    }\n    .column:first-child {\n        border-right: 1px solid #ddd;\n    }\n</style>\n<div class=\"container\">\n    <div class=\"header\">\n        <div class=\"column\">Metadata</div>\n        <div class=\"column\">Value</div>\n    </div>\n    {{#each record.metadata}}\n    <div class=\"row\">\n        <div class=\"column\">{{@key}}</div>\n        <div class=\"column\">{{this}}</div>\n    </div>\n    {{/each}}\n</div>\n\"\"\"\nrecord = rg.Record(\n    fields={\"text\": \"hello\"},\n    metadata={\n        \"name\": \"John Doe\",\n        \"age\": 25,\n    }\n)\n

JSON viewer

The value of a custom field is a dictionary in Python and a JavaScript object in the browser. You can render this object as a JSON string using the json helper. This is implemented in Argilla's frontend for convenience. If you want to learn more about handlebars helpers, you can check the handlebars documentation.

template = \"{{ json record.fields.user_profile }}\"\n\nrecord = rg.Record(\n    fields={\n        \"user_profile\": {\n            \"name\": \"John Doe\",\n            \"age\": 30,\n            \"address\": \"123 Main St\",\n            \"email\": \"john.doe@hooli.com\",\n        }\n    },\n)\n
"},{"location":"how_to_guides/custom_fields/#advanced-mode","title":"Advanced Mode","text":"

When advanced_mode=True, you can use the template argument to pass a full HTML page. This allows for more complex customizations, including the use of JavaScript. The record object will be available in the global scope, so you can access it in your JavaScript code as described in the Understanding the Record Object section.

"},{"location":"how_to_guides/custom_fields/#usage-example_1","title":"Usage example","text":"

Let's reproduce example from the Without advanced mode section but this time we will insert the handlebars syntax engine into the template ourselves.

template = \"\"\"\n<div id=\"custom-field-container\"></div>\n<script id=\"template\" type=\"text/x-handlebars-template\">\n    <div id=\"container\">\n        <div class=\"column\">\n            <h3>Original</h3>\n            <img src=\"{{record.fields.image.original}}\" />\n        </div>\n        <div class=\"column\">\n            <h3>Revision</h3>\n            <img src=\"{{record.fields.image.revision}}\" />\n        </div>\n    </div>\n</script>\n\"\"\" # (1)\n\nscript = \"\"\"\n<script src=\"https://cdn.jsdelivr.net/npm/handlebars@latest/dist/handlebars.js\"></script>\n<script>\n    const template = document.getElementById(\"template\").innerHTML;\n    const compiledTemplate = Handlebars.compile(template);\n    const html = compiledTemplate({ record });\n    document.getElementById(\"custom-field-container\").innerHTML = html;\n</script>\n\"\"\" # (2)\n
  1. This is a JavaScript template script. We set id to template to use it later in our JavaScript code and type to text/x-handlebars-template to indicate that this is a Handlebars template. Note that we also added a div with id to custom-field-container to render the template into.
  2. This is a JavaScript template script. We load the Handlebars library and then use it to compile the template and render the record. Eventually, we render the result into the div with id to custom-field-container.

We can now pass these templates to the CustomField class, ensuring that the advanced_mode is set to True.

import argilla as rg\n\ncustom_field = rg.CustomField(\n    name=\"image\",\n    template=template + script,\n    advanced_mode=True\n)\n

Besides the new CustomField code above, reusing the same approach as in the Using handlebars in your template section, will create a dataset and log a record to it, yielding the same result.

"},{"location":"how_to_guides/custom_fields/#example-gallery_1","title":"Example Gallery","text":"3D object viewer

We will now use native javascript and three.js to create a 3D object viewer. We will then use the record object directly to insert URLs from the record's fields.

template = \"\"\"\n<script src=\"https://cdnjs.cloudflare.com/ajax/libs/three.js/r128/three.min.js\"></script>\n<script src=\"https://cdn.jsdelivr.net/npm/three@0.128.0/examples/js/loaders/GLTFLoader.js\"></script>\n<script src=\"https://cdn.jsdelivr.net/npm/three@0.128.0/examples/js/controls/OrbitControls.js\"></script>\n\n\n<div style=\"display: flex;\">\n    <div>\n        <h3>Option A</h3>\n        <canvas id=\"canvas1\" width=\"400\" height=\"400\"></canvas>\n    </div>\n    <div>\n        <h3>Option B</h3>\n        <canvas id=\"canvas2\" width=\"400\" height=\"400\"></canvas>\n    </div>\n</div>\n\n<script>\n    function init(canvasId, modelUrl) {\n    let scene, camera, renderer, controls;\n\n    const canvas = document.getElementById(canvasId);\n    scene = new THREE.Scene();\n    camera = new THREE.PerspectiveCamera(75, 1, 0.1, 1000);\n    renderer = new THREE.WebGLRenderer({ canvas, alpha: true });\n\n    renderer.setSize(canvas.clientWidth, canvas.clientHeight);\n\n    const directionalLight = new THREE.DirectionalLight(0xffffff, 1);\n    directionalLight.position.set(2, 2, 5);\n    scene.add(directionalLight);\n\n    const ambientLight = new THREE.AmbientLight(0x404040, 7);\n    scene.add(ambientLight);\n\n    controls = new THREE.OrbitControls(camera, renderer.domElement);\n    controls.maxPolarAngle = Math.PI / 2;\n\n    const loader = new THREE.GLTFLoader();\n    loader.load(\n        modelUrl,\n        function (gltf) {\n        const model = gltf.scene;\n        scene.add(model);\n        model.position.set(0, 0, 0);\n\n        const box = new THREE.Box3().setFromObject(model);\n        const center = box.getCenter(new THREE.Vector3());\n        model.position.sub(center);\n        camera.position.set(center.x, center.y, center.z + 1.2);\n\n        animate();\n        },\n        undefined,\n        function (error) {\n        console.error(error);\n        }\n    );\n\n    function animate() {\n        requestAnimationFrame(animate);\n        controls.update();\n        renderer.render(scene, camera);\n    }\n    }\n\n    init(\"canvas1\", record.fields.object.option_a);\n    init(\"canvas2\", record.fields.object.option_b);\n</script>\n\n\"\"\"\n

Next, we will create a record with two URLs to 3D objects from the 3d-arena dataset.

record = rg.Record(\n    fields={\n        \"object\": {\n            \"option_a\": \"https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/outputs/Strawb3rry/a_bookshelf_with_ten_books_stacked_vertically.glb\",\n            \"option_b\": \"https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/outputs/MeshFormer/a_bookshelf_with_ten_books_stacked_vertically.glb\",\n        }\n    }\n)\n

"},{"location":"how_to_guides/custom_fields/#updating-templates","title":"Updating templates","text":"

As described in the dataset guide, you can update certain setting attributes for a published dataset. This includes the custom fields templates, which is a usefule feature when you want to iterate on the template of a custom field without the need to create a new dataset. The following example shows how to update the template of a custom field.

dataset.settings.fields[\"custom\"].template = \"<new-template>\"\ndataset.update()\n
"},{"location":"how_to_guides/dataset/","title":"Dataset management","text":"

This guide provides an overview of datasets, explaining the basics of how to set them up and manage them in Argilla.

A dataset is a collection of records that you can configure for labelers to provide feedback using the UI. Depending on the specific requirements of your task, you may need various types of feedback. You can customize the dataset to include different kinds of questions, so the first step will be to define the aim of your project and the kind of data and feedback you will need. With this information, you can start configuring a dataset by defining fields, questions, metadata, vectors, and guidelines through settings.

Question: Who can manage datasets?

Only users with the owner role can manage (create, retrieve, update and delete) all the datasets.

The users with the admin role can manage (create, retrieve, update and delete) the datasets in the workspaces they have access to.

Main Classes

rg.Datasetrg.Settings
rg.Dataset(\n    name=\"name\",\n    workspace=\"workspace\",\n    settings=settings,\n    client=client\n)\n

Check the Dataset - Python Reference to see the attributes, arguments, and methods of the Dataset class in detail.

rg.Settings(\n    fields=[rg.TextField(name=\"text\")],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        )\n    ],\n    metadata=[rg.TermsMetadataProperty(name=\"metadata\")],\n    vectors=[rg.VectorField(name=\"vector\", dimensions=10)],\n    guidelines=\"guidelines\",\n    allow_extra_metadata=True,\n    distribution=rg.TaskDistribution(min_submitted=2),\n)\n

Check the Settings - Python Reference to see the attributes, arguments, and methods of the Settings class in detail.

"},{"location":"how_to_guides/dataset/#create-a-dataset","title":"Create a dataset","text":"

To create a dataset, you can define it in the Dataset class and then call the create method that will send the dataset to the server so that it can be visualized in the UI. If the dataset does not appear in the UI, you may need to click the refresh button to update the view. For further configuration of the dataset, you can refer to the settings section.

Info

If you have deployed Argilla with Hugging Face Spaces and HF Sign in, you can use argilla as a workspace name. Otherwise, you might need to create a workspace following this guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nsettings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    settings=settings,\n)\n\ndataset.create()\n

The created dataset will be empty, to add records go to this how-to guide.

Accessing attributes

Access the attributes of a dataset by calling them directly on the dataset object. For example, dataset.id, dataset.name or dataset.settings. You can similarly access the fields, questions, metadata, vectors and guidelines. For instance, dataset.fields or dataset.questions.

"},{"location":"how_to_guides/dataset/#create-multiple-datasets-with-the-same-settings","title":"Create multiple datasets with the same settings","text":"

To create multiple datasets with the same settings, define the settings once and pass it to each dataset.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nsettings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[rg.TextField(name=\"text\", use_markdown=True)],\n    questions=[\n        rg.LabelQuestion(name=\"label\", labels=[\"label_1\", \"label_2\", \"label_3\"])\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3),\n)\n\ndataset1 = rg.Dataset(name=\"my_dataset_1\", settings=settings)\ndataset2 = rg.Dataset(name=\"my_dataset_2\", settings=settings)\n\n# Create the datasets on the server\ndataset1.create()\ndataset2.create()\n
"},{"location":"how_to_guides/dataset/#create-a-dataset-from-an-existing-dataset","title":"Create a dataset from an existing dataset","text":"

To create a new dataset from an existing dataset, get the settings from the existing dataset and pass them to the new dataset.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nexisting_dataset = client.datasets(\"my_dataset\")\n\nnew_dataset = rg.Dataset(name=\"my_dataset_copy\", settings=existing_dataset.settings)\n\nnew_dataset.create()\n

Info

You can also copy the records from the original dataset to the new one:

records = list(existing_dataset.records)\nnew_dataset.records.log(records)\n
"},{"location":"how_to_guides/dataset/#define-dataset-settings","title":"Define dataset settings","text":"

Tip

Instead of defining your own custom settings, you can use some of our pre-built templates for text classification, ranking and rating. Learn more here.

"},{"location":"how_to_guides/dataset/#fields","title":"Fields","text":"

The fields in a dataset consist of one or more data items requiring annotation. Currently, Argilla supports plain text and markdown through the TextField, images through the ImageField, chat formatted data through the ChatField and full custom templates through our CustomField.

Note

The order of the fields in the UI follows the order in which these are added to the fields attribute in the Python SDK.

Check the Field - Python Reference to see the field classes in detail.

TextImageChatCustom

rg.TextField(\n    name=\"text\",\n    title=\"Text\",\n    use_markdown=False,\n    required=True,\n    description=\"Field description\",\n)\n

rg.ImageField(\n    name=\"image\",\n    title=\"Image\",\n    required=True,\n    description=\"Field description\",\n)\n

rg.ChatField(\n    name=\"chat\",\n    title=\"Chat\",\n    use_markdown=True,\n    required=True,\n    description=\"Field description\",\n)\n

A CustomField allows you to use a custom template for the field. This is useful if you want to use a custom UI for the field. You can use the template argument to pass a string that will be rendered as the field's UI.

By default, advanced_mode=False, which will use a brackets syntax engine for the templates. This engine converts {{record.fields.field.key}} to the values of record's field's object. You can also use advanced_mode=True, which deactivates the above brackets syntax engine and allows you to add custom javascript to your template to render the field.

rg.CustomField(\n    name=\"custom\",\n    title=\"Custom\",\n    template=\"<div>{{record.fields.custom.key}}</div>\",\n    advanced_mode=False,\n    required=True,\n    description=\"Field description\",\n)\n

Tip

To learn more about how to create custom fields with HTML and CSS templates, check this how-to guide.

"},{"location":"how_to_guides/dataset/#questions","title":"Questions","text":"

To collect feedback for your dataset, you need to formulate questions that annotators will be asked to answer.

Check the Questions - Python Reference to see the question classes in detail.

LabelMulti-labelRankingRatingSpanText

A LabelQuestion asks annotators to choose a unique label from a list of options. This type is useful for text classification tasks. In the UI, they will have a rounded shape.

rg.LabelQuestion(\n    name=\"label\",\n    labels={\"YES\": \"Yes\", \"NO\": \"No\"}, # or [\"YES\", \"NO\"]\n    title=\"Is the response relevant for the given prompt?\",\n    description=\"Select the one that applies.\",\n    required=True,\n    visible_labels=10\n)\n

A MultiLabelQuestion asks annotators to choose all applicable labels from a list of options. This type is useful for multi-label text classification tasks. In the UI, they will have a squared shape.

rg.MultiLabelQuestion(\n    name=\"multi_label\",\n    labels={\n        \"hate\": \"Hate Speech\",\n        \"sexual\": \"Sexual content\",\n        \"violent\": \"Violent content\",\n        \"pii\": \"Personal information\",\n        \"untruthful\": \"Untruthful info\",\n        \"not_english\": \"Not English\",\n        \"inappropriate\": \"Inappropriate content\"\n    }, # or [\"hate\", \"sexual\", \"violent\", \"pii\", \"untruthful\", \"not_english\", \"inappropriate\"]\n    title=\"Does the response include any of the following?\",\n    description=\"Select all that apply.\",\n    required=True,\n    visible_labels=10,\n    labels_order=\"natural\"\n)\n

A RankingQuestion asks annotators to order a list of options. It is useful to gather information on the preference or relevance of a set of options.

rg.RankingQuestion(\n    name=\"ranking\",\n    values={\n        \"reply-1\": \"Reply 1\",\n        \"reply-2\": \"Reply 2\",\n        \"reply-3\": \"Reply 3\"\n    } # or [\"reply-1\", \"reply-2\", \"reply-3\"]\n    title=\"Order replies based on your preference\",\n    description=\"1 = best, 3 = worst. Ties are allowed.\",\n    required=True,\n)\n

A RatingQuestion asks annotators to select one option from a list of integer values. This type is useful for collecting numerical scores.

rg.RatingQuestion(\n    name=\"rating\",\n    values=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n    title=\"How satisfied are you with the response?\",\n    description=\"1 = very unsatisfied, 10 = very satisfied\",\n    required=True,\n)\n

A SpanQuestion asks annotators to select a portion of the text of a specific field and apply a label to it. This type of question is useful for named entity recognition or information extraction tasks.

rg.SpanQuestion(\n    name=\"span\",\n    field=\"text\",\n    labels={\n        \"PERSON\": \"Person\",\n        \"ORG\": \"Organization\",\n        \"LOC\": \"Location\",\n        \"MISC\": \"Miscellaneous\"\n    }, # or [\"PERSON\", \"ORG\", \"LOC\", \"MISC\"]\n    title=\"Select the entities in the text\",\n    description=\"Select the entities in the text\",\n    required=True,\n    allow_overlapping=False,\n    visible_labels=10\n)\n

A TextQuestion offers to annotators a free-text area where they can enter any text. This type is useful for collecting natural language data, such as corrections or explanations.

rg.TextQuestion(\n    name=\"text\",\n    title=\"Please provide feedback on the response\",\n    description=\"Please provide feedback on the response\",\n    required=True,\n    use_markdown=True\n)\n

"},{"location":"how_to_guides/dataset/#metadata","title":"Metadata","text":"

Metadata properties allow you to configure the use of metadata information for the filtering and sorting features available in the UI and Python SDK.

Check the Metadata - Python Reference to see the metadata classes in detail.

TermsIntegerFloat

A TermsMetadataProperty allows to add a list of strings as metadata options.

rg.TermsMetadataProperty(\n    name=\"terms\",\n    options=[\"group-a\", \"group-b\", \"group-c\"]\n    title=\"Annotation groups\",\n    visible_for_annotators=True,\n)\n

An IntegerMetadataProperty allows to add integer values as metadata.

rg.IntegerMetadataProperty(\n    name=\"integer\",\n    title=\"length-input\",\n    min=42,\n    max=1984,\n)\n

A FloatMetadataProperty allows to add float values as metadata.

rg.FloatMetadataProperty(\n    name=\"float\",\n    title=\"Reading ease\",\n    min=-92.29914,\n    max=119.6975,\n)\n

Note

You can also set the allow_extra_metadata argument in the dataset to True to specify whether the dataset will allow metadata fields in the records other than those specified under metadata. Note that these will not be accessible from the UI for any user, only retrievable using the Python SDK.

"},{"location":"how_to_guides/dataset/#vectors","title":"Vectors","text":"

To use the similarity search in the UI and the Python SDK, you will need to configure vectors using the VectorField class.

Check the Vector - Python Reference to see the VectorField class in detail.

rg.VectorField(\n    name=\"my_vector\",\n    title=\"My Vector\",\n    dimensions=768\n)\n

"},{"location":"how_to_guides/dataset/#guidelines","title":"Guidelines","text":"

Once you have decided on the data to show and the questions to ask, it's important to provide clear guidelines to the annotators. These guidelines help them understand the task and answer the questions consistently. You can provide guidelines in two ways:

  • In the dataset guidelines: this is added as an argument when you create your dataset in the Python SDK. They will appear in the annotation interface.

guidelines = \"In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. [...]\"\n

  • As question descriptions: these are added as an argument when you create questions in the Python SDK. This text will appear in a tooltip next to the question in the UI.

It is good practice to use at least the dataset guidelines if not both methods. Question descriptions should be short and provide context to a specific question. They can be a summary of the guidelines to that question, but often that is not sufficient to align the whole annotation team. In the guidelines, you can include a description of the project, details on how to answer each question with examples, instructions on when to discard a record, etc.

Tip

If you want further guidance on good practices for guidelines during the project development, check our blog post.

"},{"location":"how_to_guides/dataset/#distribution","title":"Distribution","text":"

When working as a team, you may want to distribute the annotation task to ensure efficiency and quality. You can use the\u00a0TaskDistribution settings to configure the number of minimum submitted responses expected for each record. Argilla will use this setting to automatically handle records in your team members' pending queues.

Check the Task Distribution - Python Reference to see the TaskDistribution class in detail.

rg.TaskDistribution(\n    min_submitted = 2\n)\n

To learn more about how to distribute the task among team members in the Distribute the annotation guide.

"},{"location":"how_to_guides/dataset/#list-datasets","title":"List datasets","text":"

You can list all the datasets available in a workspace using the datasets attribute of the Workspace class. You can also use len(workspace.datasets) to get the number of datasets in a workspace.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\ndatasets = workspace.datasets\n\nfor dataset in datasets:\n    print(dataset)\n

When you list datasets, dataset settings are not preloaded, since this can introduce extra requests to the server. If you want to work with settings when listing datasets, you need to load them:

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nfor dataset in client.datasets:\n    dataset.settings.get() # this will get the dataset settings from the server\n    print(dataset.settings)\n

Notebooks

When using a notebook, executing client.datasets will display a table with the nameof the existing datasets, the id, workspace_id to which they belong, and the last update as updated_at. .

"},{"location":"how_to_guides/dataset/#retrieve-a-dataset","title":"Retrieve a dataset","text":"

You can retrieve a dataset by calling the datasets method on the Argilla class and passing the name or id of the dataset as an argument. If the dataset does not exist, a warning message will be raised and None will be returned.

By nameBy id

By default, this method attempts to retrieve the dataset from the first workspace. If the dataset is in a different workspace, you must specify either the workspace or workspace name as an argument.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\n# Retrieve the dataset from the first workspace\nretrieved_dataset = client.datasets(name=\"my_dataset\")\n\n# Retrieve the dataset from the specified workspace\nretrieved_dataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(id=\"<uuid-or-uuid-string>\")\n
"},{"location":"how_to_guides/dataset/#check-dataset-existence","title":"Check dataset existence","text":"

You can check if a dataset exists. The client.datasets method will return None if the dataset was not found.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\nif dataset is not None:\n    pass\n
"},{"location":"how_to_guides/dataset/#update-a-dataset","title":"Update a dataset","text":"

Once a dataset is published, there are limited things you can update. Here is a summary of the attributes you can change for each setting:

FieldsQuestionsMetadataVectorsGuidelinesDistribution Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Required \u274c \u274c Use markdown \u2705 \u2705 Template \u2705 \u274c Attributes From SDK From UI Name \u274c \u274c Title \u274c \u2705 Description \u274c \u2705 Required \u274c \u274c Labels \u274c \u274c Values \u274c \u274c Label order \u274c \u2705 Suggestions first \u274c \u2705 Visible labels \u274c \u2705 Field \u274c \u274c Allow overlapping \u274c \u274c Use markdown \u274c \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Options \u274c \u274c Minimum value \u274c \u274c Maximum value \u274c \u274c Visible for annotators \u2705 \u2705 Allow extra metadata \u2705 \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Dimensions \u274c \u274c From SDK From UI \u2705 \u2705 Attributes From SDK From UI Minimum submitted \u2705 \u2705

To modify these attributes, you can simply set the new value of the attributes you wish to change and call the update method on the Dataset object.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.fields[\"text\"].use_markdown = True\ndataset.settings.metadata[\"my_metadata\"].visible_for_annotators = False\n\ndataset.update()\n

You can also add and delete metadata properties and vector fields using the add and delete methods.

AddDelete
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.vectors.add(rg.VectorField(name=\"my_new_vector\", dimensions=123))\ndataset.settings.metadata.add(\n    rg.TermsMetadataProperty(\n        name=\"my_new_metadata\",\n        options=[\"option_1\", \"option_2\", \"option_3\"],\n    ),\n)\ndataset.update()\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.vectors[\"my_old_vector\"].delete()\ndataset.settings.metadata[\"my_old_metadata\"].delete()\n\ndataset.update()\n
"},{"location":"how_to_guides/dataset/#delete-a-dataset","title":"Delete a dataset","text":"

You can delete an existing dataset by calling the delete method on the Dataset class.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset_to_delete = client.datasets(name=\"my_dataset\")\n\ndataset_deleted = dataset_to_delete.delete()\n
"},{"location":"how_to_guides/distribution/","title":"Distribute the annotation task among the team","text":"

This guide explains how you can use Argilla\u2019s automatic task distribution to efficiently divide the task of annotating a dataset among multiple team members.

Owners and admins can define the minimum number of submitted responses expected for each record. Argilla will use this setting to handle automatically the records that will be shown in the pending queues of all users with access to the dataset.

When a record has met the minimum number of submissions, the status of the record will change to completed, and the record will be removed from the Pending queue of all team members so they can focus on providing responses where they are most needed. The dataset\u2019s annotation task will be fully completed once all records have the completed status.

Note

The status of a record can be either completed, when it has the required number of responses with submitted status, or pending, when it doesn\u2019t meet this requirement.

Each record can have multiple responses, and each of those can have the status submitted, discarded, or draft.

Main Class

rg.TaskDistribution(\n    min_submitted = 2\n)\n

Check the Task Distribution - Python Reference to see the attributes, arguments, and methods of the TaskDistribution class in detail.

"},{"location":"how_to_guides/distribution/#configure-task-distribution-settings","title":"Configure task distribution settings","text":"

By default, Argilla will set the required minimum submitted responses to 1. This means that whenever a record has at least 1 response with the status submitted the status of the record will be completed and removed from the Pending queue of other team members.

Tip

Leave the default value of minimum submissions (1) if you are working on your own or when you don't require more than one submitted response per record.

If you wish to set a different number, you can do so through the distribution setting in your dataset settings:

settings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3)\n)\n

Learn more about configuring dataset settings in the Dataset management guide.

Tip

Increase the number of minimum subsmissions if you\u2019d like to ensure you get more than one submitted response per record. Make sure that this number is never higher than the number of members in your team. Note that the lower this number is, the faster the task will be completed.

Note

Note that some records may have more responses than expected if multiple team members submit responses on the same record simultaneously.

"},{"location":"how_to_guides/distribution/#change-task-distribution-settings","title":"Change task distribution settings","text":"

If you wish to change the minimum submitted responses required in a dataset, you can do so as long as the annotation hasn\u2019t started, i.e., the dataset has no responses for any records.

Admins and owners can change this value from the dataset settings page in the UI or from the SDK:

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.distribution.min_submitted = 4\n\ndataset.update()\n
"},{"location":"how_to_guides/distribution/#track-your-teams-progress","title":"Track your team's progress","text":"

You can check the progress of the annotation task by using the dataset.progress method. This method will return the number of records that have the status completed, pending, and the total number of records in the dataset.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\nprogress = dataset.progress()\n
{\n    \"total\": 100,\n    \"completed\": 10,\n    \"pending\": 90\n}\n

You can see also include to the progress the users distribution by setting the with_users_distribution parameter to True. This will return the number of records that have the status completed, pending, and the total number of records in the dataset, as well as the number of completed submissions per user. You can visit the Annotation Progress section for more information.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\nprogress = dataset.progress(with_users_distribution=True)\n
{\n    \"total\": 100,\n    \"completed\": 50,\n    \"pending\": 50,\n    \"users\": {\n        \"user1\": {\n           \"completed\": { \"submitted\": 10, \"draft\": 5, \"discarded\": 5},\n           \"pending\": { \"submitted\": 5, \"draft\": 10, \"discarded\": 10},\n        },\n        \"user2\": {\n           \"completed\": { \"submitted\": 20, \"draft\": 10, \"discarded\": 5},\n           \"pending\": { \"submitted\": 2, \"draft\": 25, \"discarded\": 0},\n        },\n        ...\n}\n

Note

Since the completed records can contain submissions from multiple users, the number of completed submissions per user may not match the total number of completed records.

"},{"location":"how_to_guides/import_export/","title":"Importing and exporting datasets and records","text":"

This guide provides an overview of how to import and export your dataset or its records to Python, your local disk, or the Hugging Face Hub.

In Argilla, you can import/export two main components of a dataset:

  • The dataset's complete configuration is defined in rg.Settings. This is useful if you want to share your feedback task or restore it later in Argilla.
  • The records stored in the dataset, including Metadata, Vectors, Suggestions, and Responses. This is useful if you want to use your dataset's records outside of Argilla.

Check the Dataset - Python Reference to see the attributes, arguments, and methods of the export Dataset class in detail.

Main Classes

rg.Dataset.to_hubrg.Dataset.from_hubrg.Dataset.to_diskrg.Dataset.from_diskrg.Dataset.records.to_datasets()rg.Dataset.records.to_dict()rg.Dataset.records.to_list()
rg.Dataset.to_hub(\n    repo_id=\"<my_org>/<my_dataset>\",\n    with_records=True,\n    generate_card=True\n)\n
rg.Dataset.from_hub(\n    repo_id=\"<my_org>/<my_dataset>\",\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    client=rg.Client(),\n    with_records=True\n)\n
rg.Dataset.to_disk(\n    path=\"<path-empty-directory>\",\n    with_records=True\n)\n
rg.Dataset.from_disk(\n    path=\"<path-dataset-directory>\",\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    client=rg.Client(),\n    with_records=True\n)\n
rg.Dataset.records.to_datasets()\n
rg.Dataset.records.to_dict()\n
rg.Dataset.records.to_list()\n

Check the Dataset - Python Reference to see the attributes, arguments, and methods of the export Dataset class in detail.

Check the Record - Python Reference to see the attributes, arguments, and methods of the Record class in detail.

"},{"location":"how_to_guides/import_export/#importing-and-exporting-datasets","title":"Importing and exporting datasets","text":"

First, we will go through exporting a complete dataset from Argilla. This includes the dataset's settings and records. All of these methods use the rg.Dataset.from_* and rg.Dataset.to_* methods.

"},{"location":"how_to_guides/import_export/#hugging-face-hub","title":"Hugging Face Hub","text":""},{"location":"how_to_guides/import_export/#export-to-hub","title":"Export to Hub","text":"

You can push a dataset from Argilla to the Hugging Face Hub. This is useful if you want to share your dataset with the community or version control it. You can push the dataset to the Hugging Face Hub using the rg.Dataset.to_hub method.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\ndataset.to_hub(repo_id=\"<my_org>/<my_dataset>\")\n

With or without records

The example above will push the dataset's Settings and records to the hub. If you only want to push the dataset's configuration, you can set the with_records parameter to False. This is useful if you're just interested in a specific dataset template or you want to make changes in the dataset settings and/or records.

dataset.to_hub(repo_id=\"<my_org>/<my_dataset>\", with_records=False)\n
"},{"location":"how_to_guides/import_export/#import-from-hub","title":"Import from Hub","text":"

You can pull a dataset from the Hugging Face Hub to Argilla. This is useful if you want to restore a dataset and its configuration. You can pull the dataset from the Hugging Face Hub using the rg.Dataset.from_hub method.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\")\n

The rg.Dataset.from_hub method loads the configuration and records from the dataset repo. If you only want to load records, you can pass a datasets.Dataset object to the rg.Dataset.log method. This enables you to configure your own dataset and reuse existing Hub datasets. See the guide on records for more information.

With or without records

The example above will pull the dataset's Settings and records from the hub. If you only want to pull the dataset's configuration, you can set the with_records parameter to False. This is useful if you're just interested in a specific dataset template or you want to make changes in the records.

dataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\", with_records=False)\n

You could then log the dataset's records using the load_dataset method of the datasets package and pass the dataset to the rg.Dataset.log method.

hf_dataset = load_dataset(\"<my_org>/<my_dataset>\")\ndataset.records.log(hf_dataset) # (1)\n
  1. You could also use the mapping parameter to map record field names to argilla field and question names.
"},{"location":"how_to_guides/import_export/#import-settings-from-hub","title":"Import settings from Hub","text":"

When importing datasets from the hub, Argilla will load settings from the hub in three ways:

  1. If the dataset was pushed to hub by Argilla, then the settings will be loaded from the hub via the configuration file.
  2. If the dataset was loaded by another source, then Argilla will define the settings based on the dataset's features in datasets.Features. For example, creating a TextField for a text feature or a LabelQuestion for a label class.
  3. You can pass a custom rg.Settings object to the rg.Dataset.from_hub method via the settings parameter. This will override the settings loaded from the hub.
settings = rg.Settings(\n    fields=[rg.TextField(name=\"text\")],\n    questions=[rg.TextQuestion(name=\"answer\")]\n) # (1)\n\ndataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\", settings=settings)\n
  1. The settings that you pass to the rg.Dataset.from_hub method will override the settings loaded from the hub, and need to align with the dataset being loaded.
"},{"location":"how_to_guides/import_export/#local-disk","title":"Local Disk","text":""},{"location":"how_to_guides/import_export/#export-to-disk","title":"Export to Disk","text":"

You can save a dataset from Argilla to your local disk. This is useful if you want to back up your dataset. You can use the rg.Dataset.to_disk method. We recommend you to use an empty directory.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\ndataset.to_disk(path=\"<path-empty-directory>\")\n

This will save the dataset's configuration and records to the specified path. If you only want to save the dataset's configuration, you can set the with_records parameter to False.

dataset.to_disk(path=\"<path-empty-directory>\", with_records=False)\n
"},{"location":"how_to_guides/import_export/#import-from-disk","title":"Import from Disk","text":"

You can load a dataset from your local disk to Argilla. This is useful if you want to restore a dataset's configuration. You can use the rg.Dataset.from_disk method.

import argilla as rg\n\ndataset = rg.Dataset.from_disk(path=\"<path-dataset-directory>\")\n

Directing the dataset to a name and workspace

You can also specify the name and workspace of the dataset when loading it from the disk.

dataset = rg.Dataset.from_disk(path=\"<path-dataset-directory>\", name=\"my_dataset\", workspace=\"my_workspace\")\n
"},{"location":"how_to_guides/import_export/#importing-and-exporting-records","title":"Importing and exporting records","text":"

The records alone can be exported from a dataset in Argilla. This is useful if you want to process the records in Python, export them to a different platform, or use them in model training. All of these methods use the rg.Dataset.records attribute.

"},{"location":"how_to_guides/import_export/#export-records","title":"Export records","text":"

The records can be exported as a dictionary, a list of dictionaries, or a Dataset of the datasets package.

With images

If your dataset includes images, the recommended approach for exporting records is to use the to_datasets method, which exports the images as rescaled PIL objects. With other methods, the images will be exported using the data URI schema.

To a python dictionaryTo a python listTo the datasets package

Records can be exported from Dataset.records as a dictionary. The to_dict method can be used to export records as a dictionary. You can specify the orientation of the dictionary output. You can also decide if to flatten or not the dictionary.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\")\n\n# Export records as a dictionary\nexported_records = dataset.records.to_dict()\n# {'fields': [{'text': 'Hello'},{'text': 'World'}], suggestions': [{'label': {'value': 'positive'}}, {'label': {'value': 'negative'}}]\n\n# Export records as a dictionary with orient=index\nexported_records = dataset.records.to_dict(orient=\"index\")\n# {\"uuid\": {'fields': {'text': 'Hello'}, 'suggestions': {'label': {'value': 'positive'}}}, {\"uuid\": {'fields': {'text': 'World'}, 'suggestions': {'label': {'value': 'negative'}}},\n\n# Export records as a dictionary with flatten=True\nexported_records = dataset.records.to_dict(flatten=True)\n# {\"text\": [\"Hello\", \"World\"], \"label.suggestion\": [\"greeting\", \"greeting\"]}\n

Records can be exported from Dataset.records as a list of dictionaries. The to_list method can be used to export records as a list of dictionaries. You can decide if to flatten it or not.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=workspace)\n\n# Export records as a list of dictionaries\nexported_records = dataset.records.to_list()\n# [{'fields': {'text': 'Hello'}, 'suggestion': {'label': {value: 'greeting'}}}, {'fields': {'text': 'World'}, 'suggestion': {'label': {value: 'greeting'}}}]\n\n# Export records as a list of dictionaries with flatten=True\nexported_records = dataset.records.to_list(flatten=True)\n# [{\"text\": \"Hello\", \"label\": \"greeting\"}, {\"text\": \"World\", \"label\": \"greeting\"}]\n

Records can be exported from Dataset.records to the datasets package. The to_dataset method can be used to export records to the datasets package. You can specify the name of the dataset and the split to export the records.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\")\n\n# Export records as a dictionary\nexported_dataset = dataset.records.to_datasets()\n
"},{"location":"how_to_guides/import_export/#import-records","title":"Import records","text":"

To import records to a dataset, use the rg.Datasets.records.log method. There is a guide on how to do this in How-to guides - Record, or you can check the Record - Python Reference.

"},{"location":"how_to_guides/migrate_from_legacy_datasets/","title":"Migrate users, workspaces and datasets to Argilla 2.x","text":"

This guide will help you migrate task to Argilla V2. These do not include the FeedbackDataset which is just an interim naming convention for the latest extensible dataset. Task-specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the SDK migration blog post. Additionally, we will provide guidance on how to maintain your User's and Workspace's within the new Argilla V2 format.

Note

Legacy datasets include: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text.

FeedbackDataset's do not need to be migrated as they are already in the Argilla V2 format. Anyway, since the 2.x version includes changes to the search index structure, you should reindex the datasets by enabling the docker environment variable REINDEX_DATASET (This step is automatically executed if you're running Argilla in an HF Space). See the server configuration docs section for more details.

To follow this guide, you will need to have the following prerequisites:

  • An argilla 1.* server instance running with legacy datasets.
  • An argilla >=1.29 server instance running. If you don't have one, you can create one by following this Argilla guide.
  • The argilla sdk package installed in your environment.

Warning

This guide will recreate all User's' and Workspace's' on a new server. Hence, they will be created with new passwords and IDs. If you want to keep the same passwords and IDs, you can can copy the datasets to a temporary v2 instance, then upgrade your current instance to v2.0 and copy the datasets back to your original instance after.

If your current legacy datasets are on a server with Argilla release after 1.29, you could chose to recreate your legacy datasets as new datasets on the same server. You could then upgrade the server to Argilla 2.0 and carry on working their. Your legacy datasets will not be visible on the new server, but they will remain in storage layers if you need to access them.

For migrating the guides you will need to install the new argilla package. This includes a new v1 module that allows you to connect to the Argilla V1 server.

pip install \"argilla>=2.0.0\"\n
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#migrate-users-and-workspaces","title":"Migrate Users and Workspaces","text":"

The guide will take you through two steps:

  1. Retrieve the old users and workspaces from the Argilla V1 server using the new argilla package.
  2. Recreate the users and workspaces on the Argilla V2 server based op name as unique identifier.
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-1-retrieve-the-old-users-and-workspaces","title":"Step 1: Retrieve the old users and workspaces","text":"

You can use the v1 module to connect to the Argilla V1 server.

import argilla.v1 as rg_v1\n\n# Initialize the API with an Argilla server less than 2.0\napi_url = \"<your-url>\"\napi_key = \"<your-api-key>\"\nrg_v1.init(api_url, api_key)\n

Next, load the dataset User and Workspaces and from the Argilla V1 server:

users_v1 = rg_v1.User.list()\nworkspaces_v1 = rg_v1.Workspace.list()\n
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-2-recreate-the-users-and-workspaces","title":"Step 2: Recreate the users and workspaces","text":"

To recreate the users and workspaces on the Argilla V2 server, you can use the argilla package.

First, instantiate the Argilla class to connect to the Argilla V2 server:

import argilla as rg\n\nclient = rg.Argilla()\n

Next, recreate the users and workspaces on the Argilla V2 server:

for workspace in workspaces_v1:\n    rg.Workspace(\n        name=workspace.name\n    ).create()\n
for user in users_v1:\n    user = rg.User(\n        username=user.username,\n        first_name=user.first_name,\n        last_name=user.last_name,\n        role=user.role,\n        password=\"<your_chosen_password>\" # (1)\n    ).create()\n    if user.role == \"owner\":\n       continue\n\n    for workspace_name in user.workspaces:\n        if workspace_name != user.name:\n            workspace = client.workspaces(name=workspace_name)\n            user.add_to_workspace(workspace)\n
  1. You need to chose a new password for the user, to do this programmatically you can use the uuid package to generate a random password. Take care to keep track of the passwords you chose, since you will not be able to retrieve them later.

Now you have successfully migrated your users and workspaces to Argilla V2 and can continue with the next steps.

"},{"location":"how_to_guides/migrate_from_legacy_datasets/#migrate-datasets","title":"Migrate datasets","text":"

The guide will take you through three steps:

  1. Retrieve the legacy dataset from the Argilla V1 server using the new argilla package.
  2. Define the new dataset in the Argilla V2 format.
  3. Upload the dataset records to the new Argilla V2 dataset format and attributes.
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-1-retrieve-the-legacy-dataset","title":"Step 1: Retrieve the legacy dataset","text":"

You can use the v1 module to connect to the Argilla V1 server.

import argilla.v1 as rg_v1\n\n# Initialize the API with an Argilla server less than 2.0\napi_url = \"<your-url>\"\napi_key = \"<your-api-key>\"\nrg_v1.init(api_url, api_key)\n

Next, load the dataset settings and records from the Argilla V1 server:

dataset_name = \"news-programmatic-labeling\"\nworkspace = \"demo\"\n\nsettings_v1 = rg_v1.load_dataset_settings(dataset_name, workspace)\nrecords_v1 = rg_v1.load(dataset_name, workspace)\nhf_dataset = records_v1.to_datasets()\n

Your legacy dataset is now loaded into the hf_dataset object.

"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-2-define-the-new-dataset","title":"Step 2: Define the new dataset","text":"

Define the new dataset in the Argilla V2 format. The new dataset format is defined in the argilla package. You can create a new dataset with the Settings and Dataset classes:

First, instantiate the Argilla class to connect to the Argilla V2 server:

import argilla as rg\n\nclient = rg.Argilla()\n

Next, define the new dataset settings:

For single-label classificationFor multi-label classificationFor token classificationFor text generation
settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"), # (1)\n    ],\n    questions=[\n        rg.LabelQuestion(name=\"label\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (2)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (3)\n    ],\n)\n
  1. The default field in DatasetForTextClassification is text, but make sure you provide all fields included in record.inputs.
  2. Make sure you provide all relevant metadata fields available in the dataset.
  3. Make sure you provide all relevant vectors available in the dataset.
settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"), # (1)\n    ],\n    questions=[\n        rg.MultiLabelQuestion(name=\"labels\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (2)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (3)\n    ],\n)\n
  1. The default field in DatasetForTextClassification is text, but we should provide all fields included in record.inputs.
  2. Make sure you provide all relevant metadata fields available in the dataset.
  3. Make sure you provide all relevant vectors available in the dataset.
settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        rg.SpanQuestion(name=\"spans\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (1)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (2)\n    ],\n)\n
  1. Make sure you provide all relevant metadata fields available in the dataset.
  2. Make sure you provide all relevant vectors available in the dataset.
settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        rg.TextQuestion(name=\"text_generation\"),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (1)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (2)\n    ],\n)\n
  1. We should provide all relevant metadata fields available in the dataset.
  2. We should provide all relevant vectors available in the dataset.

Finally, create the new dataset on the Argilla V2 server:

dataset = rg.Dataset(name=dataset_name, workspace=workspace, settings=settings)\ndataset.create()\n

Note

If a dataset with the same name already exists, the create method will raise an exception. You can check if the dataset exists and delete it before creating a new one.

dataset = client.datasets(name=dataset_name, workspace=workspace)\n\nif dataset is not None:\n    dataset.delete()\n
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-3-upload-the-dataset-records","title":"Step 3: Upload the dataset records","text":"

To upload the records to the new server, we will need to convert the records from the Argilla V1 format to the Argilla V2 format. The new argilla sdk package uses a generic Record class, but legacy datasets have specific record classes. We will need to convert the records to the generic Record class.

Here are a set of example functions to convert the records for single-label and multi-label classification. You can modify these functions to suit your dataset.

For single-label classificationFor multi-label classificationFor token classificationFor text generation
def map_to_record_for_single_label(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        label, score = prediction[0].values()\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"label\", # (1)\n                value=label,\n                score=score,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"label\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields=data[\"inputs\"],\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        suggestions=suggestions,\n        responses=responses,\n    )\n
  1. Make sure the question_name matches the name of the question in question settings.

  2. Make sure the question_name matches the name of the question in question settings.

def map_to_record_for_multi_label(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        labels, scores = zip(*[(pred[\"label\"], pred[\"score\"]) for pred in prediction])\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"labels\", # (1)\n                value=labels,\n                score=scores,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"labels\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields=data[\"inputs\"],\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        suggestions=suggestions,\n        responses=responses,\n    )\n
  1. Make sure the question_name matches the name of the question in question settings.

  2. Make sure the question_name matches the name of the question in question settings.

def map_to_record_for_span(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a token classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        scores = [span[\"score\"] for span in prediction]\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"spans\", # (1)\n                value=prediction,\n                score=scores,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"spans\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields={\"text\": data[\"text\"]},\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        # The vectors field should be a dictionary with the same keys as the `vectors` in the settings\n        suggestions=suggestions,\n        responses=responses,\n    )\n
  1. Make sure the question_name matches the name of the question in question settings.

  2. Make sure the question_name matches the name of the question in question settings.

def map_to_record_for_text_generation(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text2text record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        first = prediction[0]\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"text_generation\", # (1)\n                value=first[\"text\"],\n                score=first[\"score\"],\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        # From data[annotation]\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"text_generation\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields={\"text\": data[\"text\"]},\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        # The vectors field should be a dictionary with the same keys as the `vectors` in the settings\n        suggestions=suggestions,\n        responses=responses,\n    )\n
  1. Make sure the question_name matches the name of the question in question settings.

  2. Make sure the question_name matches the name of the question in question settings.

The functions above depend on the users_by_name dictionary and the current_user object to assign responses to users, we need to load the existing users. You can retrieve the users from the Argilla V2 server and the current user as follows:

users_by_name = {user.username: user for user in client.users}\ncurrent_user = client.me\n

Finally, upload the records to the new dataset using the log method and map functions.

records = []\n\nfor data in hf_records:\n    records.append(map_to_record_for_single_label(data, users_by_name, current_user))\n\n# Upload the records to the new dataset\ndataset.records.log(records)\n

You have now successfully migrated your legacy dataset to Argilla V2. For more guides on how to use the Argilla SDK, please refer to the How to guides.

"},{"location":"how_to_guides/query/","title":"Query and filter records","text":"

This guide provides an overview of how to query and filter a dataset in Argilla.

You can search for records in your dataset by querying or filtering. The query focuses on the content of the text field, while the filter is used to filter the records based on conditions. You can use them independently or combine multiple filters to create complex search queries. You can also export records from a dataset either as a single dictionary or a list of dictionaries.

Main Classes

rg.Queryrg.Filterrg.Similar
rg.Query(\n    query=\"query\",\n    filter=filter\n)\n

Check the Query - Python Reference to see the attributes, arguments, and methods of the Query class in detail.

rg.Filter(\n    [\n        (\"field\", \"==\", \"value\"),\n    ]\n)\n

Check the Filter - Python Reference to see the attributes, arguments, and methods of the Filter class in detail.

rg.Similar(\n    name=\"vector\",\n    value=[0.1, 0.2, 0.3],\n)\n

Check the Similar - Python Reference to see the attributes, arguments, and methods of the Similar class in detail.

"},{"location":"how_to_guides/query/#query-with-search-terms","title":"Query with search terms","text":"

To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field. You can search a single term or various terms, in the latter, all of them should appear in the record to be retrieved.

Single term searchMultiple terms search
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery = rg.Query(query=\"my_term\")\n\nqueried_records = dataset.records(query=query).to_list(flatten=True)\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery = rg.Query(query=\"my_term1 my_term2\")\n\nqueried_records = dataset.records(query=query).to_list(flatten=True)\n
"},{"location":"how_to_guides/query/#advanced-queries","title":"Advanced queries","text":"

If you need more complex searches, you can use Elasticsearch's simple query string syntax. Here is a summary of the different available operators:

operator description example + or space AND: search both terms argilla + distilabel or argilla distilabel return records that include the terms \"argilla\" and \"distilabel\" | OR: search either term argilla | distilabel returns records that include the term \"argilla\" or \"distilabel\" - Negation: exclude a term argilla -distilabel returns records that contain the term \"argilla\" and don't have the term \"distilabel\" * Prefix: search a prefix arg* returns records with any words starting with \"arg-\" \" Phrase: search a phrase \"argilla and distilabel\" returns records that contain the phrase \"argilla and distilabel\" ( and ) Precedence: group terms (argilla | distilabel) rules returns records that contain either \"argilla\" or \"distilabel\" and \"rules\" ~N Edit distance: search a term or phrase with an edit distance argilla~1 returns records that contain the term \"argilla\" with an edit distance of 1, e.g. \"argila\"

Tip

To use one of these characters literally, escape it with a preceding backslash \\, e.g. \"1 \\+ 2\" would match records where the phrase \"1 + 2\" is found.

"},{"location":"how_to_guides/query/#filter-by-conditions","title":"Filter by conditions","text":"

You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include \"==\", \">=\", \"<=\", or \"in\". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses. You can use a single condition or multiple conditions to filter records.

operator description == The field value is equal to the value >= The field value is greater than or equal to the value <= The field value is less than or equal to the value in The field value is included in a list of values Single conditionMultiple conditions
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nfilter_label = rg.Filter((\"label\", \"==\", \"positive\"))\n\nfiltered_records = dataset.records(query=rg.Query(filter=filter_label)).to_list(\n    flatten=True\n)\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nfilters = rg.Filter(\n    [\n        (\"label.suggestion\", \"==\", \"positive\"),\n        (\"metadata.count\", \">=\", 10),\n        (\"metadata.count\", \"<=\", 20),\n        (\"label\", \"in\", [\"positive\", \"negative\"])\n    ]\n)\n\nfiltered_records = dataset.records(\n    query=rg.Query(filter=filters), with_suggestions=True\n).to_list(flatten=True)\n
"},{"location":"how_to_guides/query/#available-fields","title":"Available fields","text":"

You can filter records based on the following fields:

field description example id The record id (\"id\", \"in\", [\"1\",\"2\",\"3\"]) _server_id The internal record id. This value must be a valida UUID (\"_server_id\", \"==\", \"ba69a996-85c2-4af0-a473-23138929641b\") inserted_at The date and time the record was inserted. You can pass a datetime or a string (\"inserted_at\" \">=\", \"2024-10-10\") updated_at The date and time the record was updated. (\"updated_at\" \">=\", \"2024-10-10\") status The record status, which can be pending or completed. (\"status\", \"==\", \"completed\") response.status The response status, which can be draft, submitted, or discarded. (\"response.status\", \"==\", \"submitted\") metadata.<name> Filter by a metadata property (\"metadata.split\", \"==\", \"train\") <question>.suggestion Filter by a question suggestion value (\"label.sugggestion\", \"==\", \"positive\") <question>.score Filter by a suggestion score (\"label.score\", \"<=\", \"0.9\") <question>.agent Filter by a suggestion agent (\"label.agent\", \"<=\", \"ChatGPT4.0\") <question>.response Filter by a question response (\"label.response\", \"==\", \"negative\")"},{"location":"how_to_guides/query/#filter-by-status","title":"Filter by status","text":"

You can filter records based on record or response status. Record status can be pending or completed, and response status can be draft, submitted, or discarded.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nstatus_filter = rg.Query(\n    filter=rg.Filter(\n        [\n            (\"status\", \"==\", \"completed\"),\n            (\"response.status\", \"==\", \"discarded\")\n        ]\n    )\n)\n\nfiltered_records = dataset.records(status_filter).to_list(flatten=True)\n
"},{"location":"how_to_guides/query/#similarity-search","title":"Similarity search","text":"

You can search for records that are similar to a given vector. You can use the Similar class to define the vector and pass it as part of the query argument to the Dataset.records.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\n\nsimilar_filter = rg.Query(\n    similar=rg.Similar(\n        name=\"vector\", value=[0.1, 0.2, 0.3],\n    )\n)\n\nfiltered_records = dataset.records(similar_filter).to_list(flatten=True)\n

Note

The Similar search expects a vector field definition as part of the dataset settings. If the dataset does not have a vector field, the search will return an error. Vist the Vectors section for more details on how to define a vector field.

"},{"location":"how_to_guides/query/#query-and-filter-a-dataset","title":"Query and filter a dataset","text":"

As mentioned, you can use a query with a search term and a filter or various filters to create complex search queries.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery_filter = rg.Query(\n    query=\"my_term\",\n    filter=rg.Filter(\n        [\n            (\"label.suggestion\", \"==\", \"positive\"),\n            (\"metadata.count\", \">=\", 10),\n        ]\n    )\n)\n\nqueried_filtered_records = dataset.records(\n    query=query_filter,\n    with_metadata=True,\n    with_suggestions=True\n).to_list(flatten=True)\n
"},{"location":"how_to_guides/record/","title":"Add, update, and delete records","text":"

This guide provides an overview of records, explaining the basics of how to define and manage them in Argilla.

A record in Argilla is a data item that requires annotation, consisting of one or more fields. These are the pieces of information displayed to the user in the UI to facilitate the completion of the annotation task. Each record also includes questions that annotators are required to answer, with the option of adding suggestions and responses to assist them. Guidelines are also provided to help annotators effectively complete their tasks.

A record is part of a dataset, so you will need to create a dataset before adding records. Check this guide to learn how to create a dataset.

Main Class

rg.Record(\n    external_id=\"1234\",\n    fields={\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\"\n    },\n    metadata={\n        \"category\": \"A\"\n    },\n    vectors={\n        \"my_vector\": [0.1, 0.2, 0.3],\n    },\n    suggestions=[\n        rg.Suggestion(\"my_label\", \"positive\", score=0.9, agent=\"model_name\")\n    ],\n    responses=[\n        rg.Response(\"label\", \"positive\", user_id=user_id)\n    ],\n)\n

Check the Record - Python Reference to see the attributes, arguments, and methods of the Record class in detail.

"},{"location":"how_to_guides/record/#add-records","title":"Add records","text":"

You can add records to a dataset in two different ways: either by using a dictionary or by directly initializing a Record object. You should ensure that fields, metadata and vectors match those configured in the dataset settings. In both cases, are added via the Dataset.records.log method. As soon as you add the records, these will be available in the Argilla UI. If they do not appear in the UI, you may need to click the refresh button to update the view.

Tip

Take some time to inspect the data before adding it to the dataset in case this triggers changes in the questions or fields.

Note

If you are planning to use public data, the Datasets page of the Hugging Face Hub is a good place to start. Remember to always check the license to make sure you can legally use it for your specific use case.

As Record objectsFrom a generic data structureFrom a Hugging Face dataset

You can add records to a dataset by initializing a Record object directly. This is ideal if you need to apply logic to the data before defining the record. If the data is already structured, you should consider adding it directly as a dictionary or Hugging Face dataset.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n    ), # (1)\n]\n\ndataset.records.log(records)\n
  1. This is an illustration of a definition. In a real-world scenario, you would iterate over a data structure and create Record objects for each iteration.

You can add the data directly as a dictionary like structure, where the keys correspond to the names of fields, questions, metadata or vectors in the dataset and the values are the data to be added.

If your data structure does not correspond to your Argilla dataset names, you can use a mapping to indicate which keys in the source data correspond to the dataset fields, metadata, vectors, suggestions, or responses. If you need to add the same data to multiple attributes, you can also use a list with the name of the attributes.

We illustrate this python dictionaries that represent your data, but we would not advise you to define dictionaries. Instead, use the Record object to instantiate records.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\n# Add records to the dataset with the fields 'question' and 'answer'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n    }, # (1)\n]\ndataset.records.log(data)\n\n# Add records to the dataset with a mapping of the fields 'question' and 'answer'\ndata = [\n    {\n        \"query\": \"Do you need oxygen to breathe?\",\n        \"response\": \"Yes\",\n    },\n    {\n        \"query\": \"What is the boiling point of water?\",\n        \"response\": \"100 degrees Celsius\",\n    },\n]\ndataset.records.log(data, mapping={\"query\": \"question\", \"response\": \"answer\"}) # (2)\n
  1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
  2. The data structure has keys query and response, and the Argilla dataset has fields question and answer. You can use the mapping parameter to map the keys in the data structure to the fields in the Argilla dataset.

You can also add records to a dataset using a Hugging Face dataset. This is useful when you want to use a dataset from the Hugging Face Hub and add it to your Argilla dataset.

You can add the dataset where the column names correspond to the names of fields, metadata or vectors in the Argilla dataset.

import argilla as rg\nfrom datasets import load_dataset\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\") # (1)\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (2)\n\ndataset.records.log(records=hf_dataset)\n
  1. In this case, we are using the my_dataset dataset from the Argilla workspace. The dataset has a text field and a label question.

  2. In this example, the Hugging Face dataset matches the Argilla dataset schema. If that is not the case, you could use the .map of the datasets library to prepare the data before adding it to the Argilla dataset.

If the Hugging Face dataset's schema does not correspond to your Argilla dataset field names, you can use a mapping to specify the relationship. You should indicate as key the column name of the Hugging Face dataset and, as value, the field name of the Argilla dataset.

dataset.records.log(\n    records=hf_dataset, mapping={\"text\": \"review\", \"label\": \"sentiment\"}\n) # (1)\n
  1. In this case, the text key in the Hugging Face dataset would correspond to the review field in the Argilla dataset, and the label key in the Hugging Face dataset would correspond to the sentiment field in the Argilla dataset.
"},{"location":"how_to_guides/record/#fields","title":"Fields","text":"

Fields are the main pieces of information of the record. These are shown at first sight in the UI together with the questions form. You may only include fields that you have previously configured in the dataset settings. Depending on the type of fields included in the dataset, the data format may be slightly different:

TextImageChatCustom

Text fields expect input in the form of a string.

record = rg.Record(\n    fields={\"text\": \"Hello World, how are you?\"}\n)\n

Image fields expect a remote URL or local path to an image file in the form of a string, or a PIL object.

Check the Dataset.records - Python Reference to see how to add records with with images in detail.

records = [\n    rg.Record(\n        fields={\"image\": \"https://example.com/image.jpg\"}\n    ),\n    rg.Record(\n        fields={\"image\": \"path/to/image.jpg\"}\n    ),\n    rg.Record(\n        fields={\"image\": Image.open(\"path/to/image.jpg\")}\n    ),\n]\n

Chat fields expect a list of dictionaries with the keys role and content, where the role identifies the interlocutor type (e.g., user, assistant, model, etc.), whereas the content contains the text of the message.

record = rg.Record(\n    fields={\n        \"chat\": [\n            {\"role\": \"user\", \"content\": \"What is Argilla?\"},\n            {\"role\": \"assistant\", \"content\": \"Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets\"},\n        ]\n    }\n)\n

Custom fields expect a dictionary with the keys and values you define in the dataset settings. You need to ensure these are aligned with CustomField.template in order for them to be rendered in the UI.

record = rg.Record(\n    fields={\"custom\": {\"key\": \"value\"}}\n)\n
"},{"location":"how_to_guides/record/#metadata","title":"Metadata","text":"

Record metadata can include any information about the record that is not part of the fields in the form of a dictionary. To use metadata for filtering and sorting records, make sure that the key of the dictionary corresponds with the metadata property name. When the key doesn't correspond, this will be considered extra metadata that will get stored with the record (as long as allow_extra_metadata is set to True for the dataset), but will not be usable for filtering and sorting.

Note

Remember that to use metadata within a dataset, you must define a metadata property in the dataset settings.

Check the Metadata - Python Reference to see the attributes, arguments, and methods for using metadata in detail.

As Record objectsFrom a generic data structure

You can add metadata to a record in an initialized Record object.

# Add records to the dataset with the metadata 'category'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        metadata={\"my_metadata\": \"option_1\"},\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        metadata={\"my_metadata\": \"option_1\"},\n    ),\n]\ndataset.records.log(records)\n

You can add metadata to a record directly as a dictionary structure, where the keys correspond to the names of metadata properties in the dataset and the values are the metadata to be added. Remember that you can also use the mapping parameter to specify the data structure.

# Add records to the dataset with the metadata 'category'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"my_metadata\": \"option_1\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"my_metadata\": \"option_1\",\n    },\n]\ndataset.records.log(data)\n
"},{"location":"how_to_guides/record/#vectors","title":"Vectors","text":"

You can associate vectors, like text embeddings, to your records. They can be used for semantic search in the UI and the Python SDK. Make sure that the length of the list corresponds to the dimensions set in the vector settings.

Note

Remember that to use vectors within a dataset, you must define them in the dataset settings.

Check the Vector - Python Reference to see the attributes, arguments, and methods of the Vector class in detail.

As Record objectsFrom a generic data structure

You can also add vectors to a record in an initialized Record object.

# Add records to the dataset with the vector 'my_vector' and dimension=3\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        vectors={\n            \"my_vector\": [0.1, 0.2, 0.3]\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        vectors={\n            \"my_vector\": [0.2, 0.5, 0.3]\n        },\n    ),\n]\ndataset.records.log(records)\n

You can add vectors from a dictionary-like structure, where the keys correspond to the names of the vector settings that were configured for your dataset and the value is a list of floats. Remember that you can also use the mapping parameter to specify the data structure.

# Add records to the dataset with the vector 'my_vector' and dimension=3\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"my_vector\": [0.1, 0.2, 0.3],\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"my_vector\": [0.2, 0.5, 0.3],\n    },\n]\ndataset.records.log(data)\n
"},{"location":"how_to_guides/record/#suggestions","title":"Suggestions","text":"

Suggestions refer to suggested responses (e.g. model predictions) that you can add to your records to make the annotation process faster. These can be added during the creation of the record or at a later stage. Only one suggestion can be provided for each question, and suggestion values must be compliant with the pre-defined questions e.g. if we have a RatingQuestion between 1 and 5, the suggestion should have a valid value within that range.

Check the Suggestions - Python Reference to see the attributes, arguments, and methods of the Suggestion class in detail.

Tip

Check the Suggestions - Python Reference for different formats per Question type.

As Record objectsFrom a generic data structure

You can also add suggestions to a record in an initialized Record object.

# Add records to the dataset with the label 'my_label'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        suggestions=[\n            rg.Suggestion(\n                \"my_label\",\n                \"positive\",\n                score=0.9,\n                agent=\"model_name\"\n            )\n        ],\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        suggestions=[\n            rg.Suggestion(\n                \"my_label\",\n                \"negative\",\n                score=0.9,\n                agent=\"model_name\"\n            )\n        ],\n    ),\n]\ndataset.records.log(records)\n

You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure.

# Add records to the dataset with the label question 'my_label'\ndata =  [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"label\": \"positive\",\n        \"score\": 0.9,\n        \"agent\": \"model_name\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"label\": \"negative\",\n        \"score\": 0.9,\n        \"agent\": \"model_name\",\n    },\n]\ndataset.records.log(\n    data=data,\n    mapping={\n        \"label\": \"my_label\",\n        \"score\": \"my_label.suggestion.score\",\n        \"agent\": \"my_label.suggestion.agent\",\n    },\n)\n
"},{"location":"how_to_guides/record/#responses","title":"Responses","text":"

If your dataset includes some annotations, you can add those to the records as you create them. Make sure that the responses adhere to the same format as Argilla's output and meet the schema requirements for the specific type of question being answered. Make sure to include the user_id in case you're planning to add more than one response for the same question, if not responses will apply to all the annotators.

Check the Responses - Python Reference to see the attributes, arguments, and methods of the Response class in detail.

Note

Keep in mind that records with responses will be displayed as \"Draft\" in the UI.

Tip

Check the Responses - Python Reference for different formats per Question type.

As Record objectsFrom a generic data structure

You can also add suggestions to a record in an initialized Record object.

# Add records to the dataset with the label 'my_label'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        responses=[\n            rg.Response(\"my_label\", \"positive\", user_id=user.id)\n        ]\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        responses=[\n            rg.Response(\"my_label\", \"negative\", user_id=user.id)\n        ]\n    ),\n]\ndataset.records.log(records)\n

You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure. If you want to specify the user that added the response, you can use the user_id parameter.

# Add records to the dataset with the label 'my_label'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"label\": \"positive\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"label\": \"negative\",\n    },\n]\ndataset.records.log(data, user_id=user.id, mapping={\"label\": \"my_label.response\"})\n
"},{"location":"how_to_guides/record/#list-records","title":"List records","text":"

To list records in a dataset, you can use the records method on the Dataset object. This method returns a list of Record objects that can be iterated over to access the record properties.

for record in dataset.records(\n    with_suggestions=True,\n    with_responses=True,\n    with_vectors=True\n):\n\n    # Access the record properties\n    print(record.metadata)\n    print(record.vectors)\n    print(record.suggestions)\n    print(record.responses)\n\n    # Access the responses of the record\n    for response in record.responses:\n        print(response.value)\n
"},{"location":"how_to_guides/record/#update-records","title":"Update records","text":"

You can update records in a dataset by calling the log method on the Dataset object. To update a record, you need to provide the record id and the new data to be updated.

data = dataset.records.to_list(flatten=True)\n\nupdated_data = [\n    {\n        \"text\": sample[\"text\"],\n        \"label\": \"positive\",\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n
Update the metadataUpdate vectorsUpdate suggestionsUpdate responses

The metadata of the Record object is a python dictionary. To update it, you can iterate over the records and update the metadata by key. After that, you should update the records in the dataset.

Tip

Check the Metadata - Python Reference for different formats per MetadataProperty type.

updated_records = []\n\nfor record in dataset.records():\n\n    record.metadata[\"my_metadata\"] = \"new_value\"\n    record.metadata[\"my_new_metadata\"] = \"new_value\"\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

If a new vector field is added to the dataset settings or some value for the existing record vectors must be updated, you can iterate over the records and update the vectors by key. After that, you should update the records in the dataset.

updated_records = []\n\nfor record in dataset.records(with_vectors=True):\n\n    record.vectors[\"my_vector\"] = [ 0, 1, 2, 3, 4, 5 ]\n    record.vectors[\"my_new_vector\"] = [ 0, 1, 2, 3, 4, 5 ]\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

If some value for the existing record suggestions must be updated, you can iterate over the records and update the suggestions by key. You can also add a suggestion using the add method. After that, you should update the records in the dataset.

Tip

Check the Suggestions - Python Reference for different formats per Question type.

updated_records = []\n\nfor record in dataset.records(with_suggestions=True):\n\n    # We can update existing suggestions\n    record.suggestions[\"label\"].value = \"new_value\"\n    record.suggestions[\"label\"].score = 0.9\n    record.suggestions[\"label\"].agent = \"model_name\"\n\n    # We can also add new suggestions with the `add` method:\n    if not record.suggestions[\"label\"]:\n        record.suggestions.add(\n            rg.Suggestion(\"value\", \"label\", score=0.9, agent=\"model_name\")\n        )\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

If some value for the existing record responses must be updated, you can iterate over the records and update the responses by key. You can also add a response using the add method. After that, you should update the records in the dataset.

Tip

Check the Responses - Python Reference for different formats per Question type.

updated_records = []\n\nfor record in dataset.records(with_responses=True):\n\n    for response in record.responses[\"label\"]:\n\n        if response:\n                response.value = \"new_value\"\n                response.user_id = \"existing_user_id\"\n\n        else:\n            record.responses.add(rg.Response(\"label\", \"YES\", user_id=user.id))\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n
"},{"location":"how_to_guides/record/#delete-records","title":"Delete records","text":"

You can delete records in a dataset calling the delete method on the Dataset object. To delete records, you need to retrieve them from the server and get a list with those that you want to delete.

records_to_delete = list(dataset.records)[:5]\ndataset.records.delete(records=records_to_delete)\n

Delete records based on a query

It can be very useful to avoid eliminating records with responses.

For more information about the query syntax, check this how-to guide.

status_filter = rg.Query(\n    filter = rg.Filter((\"response.status\", \"==\", \"pending\"))\n)\nrecords_to_delete = list(dataset.records(status_filter))\n\ndataset.records.delete(records_to_delete)\n
"},{"location":"how_to_guides/use_markdown_to_format_rich_content/","title":"Use Markdown to format rich content","text":"

This guide provides an overview of how to use Markdown and HTML in TextFields to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

The TextField and TextQuestion provide the option to enable Markdown and therefore HTML by setting use_markdown=True. Given the flexibility of HTML, we can get great control over the presentation of data to our annotators. We provide some out-of-the-box methods for multi-modality and chat templates in the examples below.

Main Methods

image_to_htmlaudio_to_htmlvideo_to_htmlpdf_to_htmlchat_to_html
image_to_html(\"local_image_file.png\")\n
audio_to_html(\"local_audio_file.mp3\")\n
audio_to_html(\"local_video_file.mp4\")\n
pdf_to_html(\"local_pdf_file.pdf\")\n
chat_to_html([{\"role\": \"user\", \"content\": \"hello\"}])\n

Check the Markdown - Python Reference to see the arguments of the rg.markdown methods in detail.

Tip

You can get pretty creative with HTML. For example, think about visualizing graphs and tables. You can use some interesting Python packages methods like pandas.DataFrame.to_html and plotly.io.to_html.

"},{"location":"how_to_guides/use_markdown_to_format_rich_content/#multi-modal-support-images-audio-video-pdfs-and-more","title":"Multi-modal support: images, audio, video, PDFs and more","text":"

Argilla has basic multi-modal support in different ways, each with pros and cons, but they both offer the same UI experience because they both rely on HTML.

"},{"location":"how_to_guides/use_markdown_to_format_rich_content/#local-content-through-dataurls","title":"Local content through DataURLs","text":"

A DataURL is a scheme that allows data to be encoded into a base64-encoded string and then embedded directly into HTML. To facilitate this, we offer some functions: image_to_html, audio_to_html, video_to_thml, and pdf_to_html. These functions accept either the file path or the file's byte data and return the corresponding HTMurl to render the media file within the Argilla user interface. Additionally, you can also set the width and height in pixel or percentage for video and image (defaults to the original dimensions) and the autoplay and loop attributes to True for audio and video (defaults to False).

Warning

DataURLs increase the memory usage of the original filesize. Additionally, different browsers enforce different size limitations for rendering DataURLs which might block the visualization experience per user.

ImageAudioVideoPDF
from argilla.markdown import image_to_html\n\nhtml = image_to_html(\n    \"local_image_file.png\",\n    width=\"300px\",\n    height=\"300px\"\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
from argilla.markdown import audio_to_html\n\nhtml = audio_to_html(\n    \"local_audio_file.mp3\",\n    width=\"300px\",\n    height=\"300px\",\n    autoplay=True,\n    loop=True\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
from argilla.markdown import video_to_thml\n\nhtml = video_to_html(\n    \"local_video_file.mp4\",\n    width=\"300px\",\n    height=\"300px\",\n    autoplay=True,\n    loop=True\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
from argilla.markdown import pdf_to_html\n\nhtml = pdf_to_html(\n    \"local_pdf_file.pdf\",\n    width=\"300px\",\n    height=\"300px\"\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
"},{"location":"how_to_guides/use_markdown_to_format_rich_content/#hosted-content","title":"Hosted content","text":"

Instead of uploading local files through DataURLs, we can also visualize URLs directly linking to media files such as images, audio, video, and PDFs hosted on a public or private server. In this case, you can use basic HTML to visualize content available on platforms like Google Drive or decide to configure a private media server.

Warning

When trying to access content from a private media server you have to ensure that the Argilla server has network access to the private media server, which might be done through something like IP whitelisting.

ImageAudioVideoPDF
html = \"<img src='https://example.com/public-image-file.jpg'>\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
html = \"\"\"\n<audio controls>\n    <source src=\"https://example.com/public-audio-file.mp3\" type=\"audio/mpeg\">\n</audio>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
html = \"\"\"\n<video width=\"320\" height=\"240\" controls>\n    <source src=\"https://example.com/public-video-file.mp4\" type=\"video/mp4\">\n</video>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
html = \"\"\"\n<iframe\n    src=\"https://example.com/public-pdf-file.pdf\"\n    width=\"600\"\n    height=\"500\">\n</iframe>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
"},{"location":"how_to_guides/use_markdown_to_format_rich_content/#chat-and-conversation-support","title":"Chat and conversation support","text":"

When working with chat data from multi-turn interaction with a Large Language Model, it might be nice to be able to visualize the conversation in a similar way as a common chat interface. To facilitate this, we offer the chat_to_html function, which converts messages from OpenAI chat format to an HTML-formatted chat interface.

OpenAI chat format

The OpenAI chat format is a way to structure a list of messages as input from users and returns a model-generated message as output. These messages can only contain the roles \"user\" for human messages and \"assistant\", \"system\" or \"model\" for model-generated messages.

from argilla.markdown import chat_to_html\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Hello! How are you?\"},\n    {\"role\": \"assistant\", \"content\": \"I'm good, thank you!\"}\n]\n\nhtml = chat_to_html(messages)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n

"},{"location":"how_to_guides/user/","title":"User management","text":"

This guide provides an overview of user roles and credentials, explaining how to set up and manage users in Argilla.

A user in Argilla is an authorized person who, depending on their role, can use the Python SDK and access the UI in a running Argilla instance. We differentiate between three types of users depending on their role, permissions and needs: owner, admin and annotator.

OverviewOwnerAdminAnnotator Owner Admin Annotator Number Unlimited Unlimited Unlimited Create and delete workspaces Yes No No Assign users to workspaces Yes No No Create, configure, update, and delete datasets Yes Only within assigned workspaces No Create, update, and delete users Yes No No Provide feedback with Argila UI Yes Yes Yes

The owner refers to the root user who created the Argilla instance. Using workspaces within Argilla proves highly beneficial for organizing tasks efficiently. So, the owner has full access to all workspaces and their functionalities:

  • Workspace management: It can create, read and delete a workspace.
  • User management: It can create a new user, assign it to a workspace, and delete it. It can also list them and search for a specific one.
  • Dataset management: It can create, configure, retrieve, update, and delete datasets.
  • Annotation: It can annotate datasets in the Argilla UI.
  • Feedback: It can provide feedback with the Argilla UI.

An admin user can only access the workspaces it has been assigned to and cannot assign other users to it. An admin user has the following permissions:

  • Dataset management: It can create, configure, retrieve, update, and delete datasets only on the assigned workspaces.
  • Annotation: It can annotate datasets in the assigned workspaces via the Argilla UI.
  • Feedback: It can provide feedback with the Argilla UI.

An annotator user is limited to accessing only the datasets assigned to it within the workspace. It has two specific permissions:

  • Annotation: It can annotate the assigned datasets in the Argilla UI.
  • Feedback: It can provide feedback with the Argilla UI.
Question: Who can manage users?

Only users with the owner role can manage (create, retrieve, delete) other users.

"},{"location":"how_to_guides/user/#initial-users-and-credentials","title":"Initial users and credentials","text":"

Depending on your Argilla deployment, the initial user with the owner role will vary.

  • If you deploy on the Hugging Face Hub, the initial user will correspond to the Space owner (your personal account). The API key is automatically generated and can be copied from the \"Settings\" section of the UI.
  • If you deploy with Docker, the default values for the environment variables are: USERNAME: argilla, PASSWORD: 12345678, API_KEY: argilla.apikey.

For the new users, the username and password are set during the creation process. The API key can be copied from the \"Settings\" section of the UI.

Main Class

rg.User(\n    username=\"username\",\n    first_name=\"first_name\",\n    last_name=\"last_name\",\n    role=\"owner\",\n    password=\"password\",\n    client=client\n)\n

Check the User - Python Reference to see the attributes, arguments, and methods of the User class in detail.

"},{"location":"how_to_guides/user/#get-current-user","title":"Get current user","text":"

To ensure you're using the correct credentials for managing users, you can get the current user in Argilla using the me attribute of the Argilla class.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ncurrent_user = client.me\n
"},{"location":"how_to_guides/user/#create-a-user","title":"Create a user","text":"

To create a new user in Argilla, you can define it in the User class and then call the create method. This method is inherited from the Resource base class and operates without modifications.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser_to_create = rg.User(\n    username=\"my_username\",\n    password=\"12345678\",\n)\n\ncreated_user = user_to_create.create()\n

Accessing attributes

Access the attributes of a user by calling them directly on the User object. For example, user.id or user.username.

"},{"location":"how_to_guides/user/#list-users","title":"List users","text":"

You can list all the existing users in Argilla by accessing the users attribute on the Argilla class and iterating over them. You can also use len(client.users) to get the number of users.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nusers = client.users\n\nfor user in users:\n    print(user)\n

Notebooks

When using a notebook, executing client.users will display a table with username, id, role, and the last update as updated_at.

"},{"location":"how_to_guides/user/#retrieve-a-user","title":"Retrieve a user","text":"

You can retrieve an existing user from Argilla by accessing the users attribute on the Argilla class and passing the username or id as an argument. If the user does not exist, a warning message will be raised and None will be returned.

By usernameBy id
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_user = client.users(\"my_username\")\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_user = client.users(id=\"<uuid-or-uuid-string>\")\n
"},{"location":"how_to_guides/user/#check-user-existence","title":"Check user existence","text":"

You can check if a user exists. The client.users method will return None if the user was not found.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users(\"my_username\")\n\nif user is not None:\n    pass\n
"},{"location":"how_to_guides/user/#list-users-in-a-workspace","title":"List users in a workspace","text":"

You can list all the users in a workspace by accessing the users attribute on the Workspace class and iterating over them. You can also use len(workspace.users) to get the number of users by workspace.

For further information on how to manage workspaces, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces('my_workspace')\n\nfor user in workspace.users:\n    print(user)\n
"},{"location":"how_to_guides/user/#add-a-user-to-a-workspace","title":"Add a user to a workspace","text":"

You can add an existing user to a workspace in Argilla by calling the add_to_workspace method on the User class.

For further information on how to manage workspaces, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users('my_username')\nworkspace = client.workspaces('my_workspace')\n\nadded_user = user.add_to_workspace(workspace)\n
"},{"location":"how_to_guides/user/#remove-a-user-from-a-workspace","title":"Remove a user from a workspace","text":"

You can remove an existing user from a workspace in Argilla by calling the remove_from_workspace method on the User class.

For further information on how to manage workspaces, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users('my_username')\nworkspace = client.workspaces('my_workspace')\n\nremoved_user = user.remove_from_workspace(workspace)\n
"},{"location":"how_to_guides/user/#delete-a-user","title":"Delete a user","text":"

You can delete an existing user from Argilla by calling the delete method on the User class.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser_to_delete = client.users('my_username')\n\ndeleted_user = user_to_delete.delete()\n
"},{"location":"how_to_guides/workspace/","title":"Workspace management","text":"

This guide provides an overview of workspaces, explaining how to set up and manage workspaces in Argilla.

A workspace is a space inside your Argilla instance where authorized users can collaborate on datasets. It is accessible through the Python SDK and the UI.

Question: Who can manage workspaces?

Only users with the owner role can manage (create, read and delete) workspaces.

A user with the admin role can only read the workspace to which it belongs.

"},{"location":"how_to_guides/workspace/#initial-workspaces","title":"Initial workspaces","text":"

Depending on your Argilla deployment, the initial workspace will vary.

  • If you deploy on the Hugging Face Hub, the initial workspace will be the one indicated in the .oauth.yaml file. By default, argilla.
  • If you deploy with Docker, you will need to create a workspace as shown in the next section.

Main Class

rg.Workspace(\n    name = \"name\",\n    client=client\n)\n

Check the Workspace - Python Reference to see the attributes, arguments, and methods of the Workspace class in detail.

"},{"location":"how_to_guides/workspace/#create-a-new-workspace","title":"Create a new workspace","text":"

To create a new workspace in Argilla, you can define it in the Workspace class and then call the create method. This method is inherited from the Resource base class and operates without modifications.

When you create a new workspace, it will be empty. To create and add a new dataset, check these guides.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace_to_create = rg.Workspace(name=\"my_workspace\")\n\ncreated_workspace = workspace_to_create.create()\n

Accessing attributes

Access the attributes of a workspace by calling them directly on the Workspace object. For example, workspace.id or workspace.name.

"},{"location":"how_to_guides/workspace/#list-workspaces","title":"List workspaces","text":"

You can list all the existing workspaces in Argilla by calling the workspaces attribute on the Argilla class and iterating over them. You can also use len(client.workspaces) to get the number of workspaces.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspaces = client.workspaces\n\nfor workspace in workspaces:\n    print(workspace)\n

Notebooks

When using a notebook, executing client.workspaces will display a table with the number of datasets in each workspace, name, id, and the last update as updated_at.

"},{"location":"how_to_guides/workspace/#retrieve-a-workspace","title":"Retrieve a workspace","text":"

You can retrieve a workspace by accessing the workspaces method on the Argilla class and passing the name or id of the workspace as an argument. If the workspace does not exist, a warning message will be raised and None will be returned.

By nameBy id
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_workspace = client.workspaces(\"my_workspace\")\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_workspace = client.workspaces(id=\"<uuid-or-uuid-string>\")\n
"},{"location":"how_to_guides/workspace/#check-workspace-existence","title":"Check workspace existence","text":"

You can check if a workspace exists. The client.workspaces method will return None if the workspace is not found.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nif workspace is not None:\n    pass\n
"},{"location":"how_to_guides/workspace/#list-users-in-a-workspace","title":"List users in a workspace","text":"

You can list all the users in a workspace by accessing the users attribute on the Workspace class and iterating over them. You can also use len(workspace.users) to get the number of users by workspace.

For further information on how to manage users, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces('my_workspace')\n\nfor user in workspace.users:\n    print(user)\n
"},{"location":"how_to_guides/workspace/#add-a-user-to-a-workspace","title":"Add a user to a workspace","text":"

You can also add a user to a workspace by calling the add_user method on the Workspace class.

For further information on how to manage users, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nadded_user = workspace.add_user(\"my_username\")\n
"},{"location":"how_to_guides/workspace/#remove-a-user-from-workspace","title":"Remove a user from workspace","text":"

You can also remove a user from a workspace by calling the remove_user method on the Workspace class.

For further information on how to manage users, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nremoved_user = workspace.remove_user(\"my_username\")\n
"},{"location":"how_to_guides/workspace/#delete-a-workspace","title":"Delete a workspace","text":"

To delete a workspace, no dataset can be associated with it. If the workspace contains any dataset, deletion will fail. You can delete a workspace by calling the delete method on the Workspace class.

To clear a workspace and delete all their datasets, refer to this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace_to_delete = client.workspaces(\"my_workspace\")\n\ndeleted_workspace = workspace_to_delete.delete()\n
"},{"location":"reference/argilla/SUMMARY/","title":"SUMMARY","text":"
  • rg.Argilla
  • rg.Workspace
  • rg.User
  • rg.Dataset
    • rg.Dataset.records
  • rg.Settings
    • Fields
    • Questions
    • Metadata
    • Vectors
    • Distribution
  • rg.Record
    • rg.Response
    • rg.Suggestion
    • rg.Vector
    • rg.Metadata
  • rg.Query
  • rg.markdown
"},{"location":"reference/argilla/client/","title":"rg.Argilla","text":"

To interact with the Argilla server from Python you can use the Argilla class. The Argilla client is used to create, get, update, and delete all Argilla resources, such as workspaces, users, datasets, and records.

"},{"location":"reference/argilla/client/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/client/#connecting-to-an-argilla-server","title":"Connecting to an Argilla server","text":"

To connect to an Argilla server, instantiate the Argilla class and pass the api_url of the server and the api_key to authenticate.

import argilla as rg\n\nclient = rg.Argilla(\n    api_url=\"https://argilla.example.com\",\n    api_key=\"my_api_key\",\n)\n
"},{"location":"reference/argilla/client/#accessing-dataset-workspace-and-user-objects","title":"Accessing Dataset, Workspace, and User objects","text":"

The Argilla clients provides access to the Dataset, Workspace, and User objects of the Argilla server.

my_dataset = client.datasets(\"my_dataset\")\n\nmy_workspace = client.workspaces(\"my_workspace\")\n\nmy_user = client.users(\"my_user\")\n

These resources can then be interacted with to access their properties and methods. For example, to list all datasets in a workspace:

for dataset in my_workspace.datasets:\n    print(dataset.name)\n
"},{"location":"reference/argilla/client/#src.argilla.client.Argilla","title":"Argilla","text":"

Bases: APIClient

Argilla API client. This is the main entry point to interact with the API.

Attributes:

Name Type Description workspaces Workspaces

A collection of workspaces.

datasets Datasets

A collection of datasets.

users Users

A collection of users.

me User

The current user.

Source code in src/argilla/client.py
class Argilla(_api.APIClient):\n    \"\"\"Argilla API client. This is the main entry point to interact with the API.\n\n    Attributes:\n        workspaces: A collection of workspaces.\n        datasets: A collection of datasets.\n        users: A collection of users.\n        me: The current user.\n    \"\"\"\n\n    # Default instance of Argilla\n    _default_client: Optional[\"Argilla\"] = None\n\n    def __init__(\n        self,\n        api_url: Optional[str] = DEFAULT_HTTP_CONFIG.api_url,\n        api_key: Optional[str] = DEFAULT_HTTP_CONFIG.api_key,\n        timeout: int = DEFAULT_HTTP_CONFIG.timeout,\n        retries: int = DEFAULT_HTTP_CONFIG.retries,\n        **http_client_args,\n    ) -> None:\n        \"\"\"Inits the `Argilla` client.\n\n        Args:\n            api_url: the URL of the Argilla API. If not provided, then the value will try\n                to be set from `ARGILLA_API_URL` environment variable. Defaults to\n                `\"http://localhost:6900\"`.\n            api_key: the key to be used to authenticate in the Argilla API. If not provided,\n                then the value will try to be set from `ARGILLA_API_KEY` environment variable.\n                Defaults to `None`.\n            timeout: the maximum time in seconds to wait for a request to the Argilla API\n                to be completed before raising an exception. Defaults to `60`.\n            retries: the number of times to retry the HTTP connection to the Argilla API\n                before raising an exception. Defaults to `5`.\n        \"\"\"\n        super().__init__(api_url=api_url, api_key=api_key, timeout=timeout, retries=retries, **http_client_args)\n\n        self._set_default(self)\n\n    @property\n    def workspaces(self) -> \"Workspaces\":\n        \"\"\"A collection of workspaces on the server.\"\"\"\n        return Workspaces(client=self)\n\n    @property\n    def datasets(self) -> \"Datasets\":\n        \"\"\"A collection of datasets on the server.\"\"\"\n        return Datasets(client=self)\n\n    @property\n    def users(self) -> \"Users\":\n        \"\"\"A collection of users on the server.\"\"\"\n        return Users(client=self)\n\n    @cached_property\n    def me(self) -> \"User\":\n        from argilla.users import User\n\n        return User(client=self, _model=self.api.users.get_me())\n\n    ############################\n    # Private methods\n    ############################\n\n    @classmethod\n    def _set_default(cls, client: \"Argilla\") -> None:\n        \"\"\"Set the default instance of Argilla.\"\"\"\n        cls._default_client = client\n\n    @classmethod\n    def _get_default(cls) -> \"Argilla\":\n        \"\"\"Get the default instance of Argilla. If it doesn't exist, create a new one.\"\"\"\n        if cls._default_client is None:\n            cls._default_client = Argilla()\n        return cls._default_client\n
"},{"location":"reference/argilla/client/#src.argilla.client.Argilla.workspaces","title":"workspaces: Workspaces property","text":"

A collection of workspaces on the server.

"},{"location":"reference/argilla/client/#src.argilla.client.Argilla.datasets","title":"datasets: Datasets property","text":"

A collection of datasets on the server.

"},{"location":"reference/argilla/client/#src.argilla.client.Argilla.users","title":"users: Users property","text":"

A collection of users on the server.

"},{"location":"reference/argilla/client/#src.argilla.client.Argilla.__init__","title":"__init__(api_url=DEFAULT_HTTP_CONFIG.api_url, api_key=DEFAULT_HTTP_CONFIG.api_key, timeout=DEFAULT_HTTP_CONFIG.timeout, retries=DEFAULT_HTTP_CONFIG.retries, **http_client_args)","text":"

Inits the Argilla client.

Parameters:

Name Type Description Default api_url Optional[str]

the URL of the Argilla API. If not provided, then the value will try to be set from ARGILLA_API_URL environment variable. Defaults to \"http://localhost:6900\".

api_url api_key Optional[str]

the key to be used to authenticate in the Argilla API. If not provided, then the value will try to be set from ARGILLA_API_KEY environment variable. Defaults to None.

api_key timeout int

the maximum time in seconds to wait for a request to the Argilla API to be completed before raising an exception. Defaults to 60.

timeout retries int

the number of times to retry the HTTP connection to the Argilla API before raising an exception. Defaults to 5.

retries Source code in src/argilla/client.py
def __init__(\n    self,\n    api_url: Optional[str] = DEFAULT_HTTP_CONFIG.api_url,\n    api_key: Optional[str] = DEFAULT_HTTP_CONFIG.api_key,\n    timeout: int = DEFAULT_HTTP_CONFIG.timeout,\n    retries: int = DEFAULT_HTTP_CONFIG.retries,\n    **http_client_args,\n) -> None:\n    \"\"\"Inits the `Argilla` client.\n\n    Args:\n        api_url: the URL of the Argilla API. If not provided, then the value will try\n            to be set from `ARGILLA_API_URL` environment variable. Defaults to\n            `\"http://localhost:6900\"`.\n        api_key: the key to be used to authenticate in the Argilla API. If not provided,\n            then the value will try to be set from `ARGILLA_API_KEY` environment variable.\n            Defaults to `None`.\n        timeout: the maximum time in seconds to wait for a request to the Argilla API\n            to be completed before raising an exception. Defaults to `60`.\n        retries: the number of times to retry the HTTP connection to the Argilla API\n            before raising an exception. Defaults to `5`.\n    \"\"\"\n    super().__init__(api_url=api_url, api_key=api_key, timeout=timeout, retries=retries, **http_client_args)\n\n    self._set_default(self)\n
"},{"location":"reference/argilla/markdown/","title":"rg.markdown","text":"

To support the usage of Markdown within Argilla, we've created some helper functions to easy the usage of DataURL conversions and chat message visualizations.

"},{"location":"reference/argilla/markdown/#src.argilla.markdown.media","title":"media","text":""},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.video_to_html","title":"video_to_html(file_source, file_type=None, width=None, height=None, autoplay=False, loop=False)","text":"

Convert a video file to an HTML tag with embedded base64 data.

Parameters:

Name Type Description Default file_source Union[str, bytes]

The path to the media file or a non-b64 encoded byte string.

required file_type Optional[str]

The type of the video file. If not provided, it will be inferred from the file extension.

None width Optional[str]

Display width in HTML. Defaults to None.

None height Optional[str]

Display height in HTML. Defaults to None.

None autoplay bool

True to autoplay media. Defaults to False.

False loop bool

True to loop media. Defaults to False.

False

Returns:

Type Description str

The HTML tag with embedded base64 data.

Examples:

from argilla.markdown import video_to_html\nhtml = video_to_html(\"my_video.mp4\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n
Source code in src/argilla/markdown/media.py
def video_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n    autoplay: bool = False,\n    loop: bool = False,\n) -> str:\n    \"\"\"\n    Convert a video file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the video file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n        autoplay: True to autoplay media. Defaults to False.\n        loop: True to loop media. Defaults to False.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import video_to_html\n        html = video_to_html(\"my_video.mp4\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n        ```\n    \"\"\"\n    return _media_to_html(\"video\", file_source, file_type, width, height, autoplay, loop)\n
"},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.audio_to_html","title":"audio_to_html(file_source, file_type=None, width=None, height=None, autoplay=False, loop=False)","text":"

Convert an audio file to an HTML tag with embedded base64 data.

Parameters:

Name Type Description Default file_source Union[str, bytes]

The path to the media file or a non-b64 encoded byte string.

required file_type Optional[str]

The type of the audio file. If not provided, it will be inferred from the file extension.

None width Optional[str]

Display width in HTML. Defaults to None.

None height Optional[str]

Display height in HTML. Defaults to None.

None autoplay bool

True to autoplay media. Defaults to False.

False loop bool

True to loop media. Defaults to False.

False

Returns:

Type Description str

The HTML tag with embedded base64 data.

Examples:

from argilla.markdown import audio_to_html\nhtml = audio_to_html(\"my_audio.mp3\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n
Source code in src/argilla/markdown/media.py
def audio_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n    autoplay: bool = False,\n    loop: bool = False,\n) -> str:\n    \"\"\"\n    Convert an audio file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the audio file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n        autoplay: True to autoplay media. Defaults to False.\n        loop: True to loop media. Defaults to False.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import audio_to_html\n        html = audio_to_html(\"my_audio.mp3\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n        ```\n    \"\"\"\n    return _media_to_html(\"audio\", file_source, file_type, width, height, autoplay, loop)\n
"},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.image_to_html","title":"image_to_html(file_source, file_type=None, width=None, height=None)","text":"

Convert an image file to an HTML tag with embedded base64 data.

Parameters:

Name Type Description Default file_source Union[str, bytes]

The path to the media file or a non-b64 encoded byte string.

required file_type Optional[str]

The type of the image file. If not provided, it will be inferred from the file extension.

None width Optional[str]

Display width in HTML. Defaults to None.

None height Optional[str]

Display height in HTML. Defaults to None.

None

Returns:

Type Description str

The HTML tag with embedded base64 data.

Examples:

from argilla.markdown import image_to_html\nhtml = image_to_html(\"my_image.png\", width=\"300px\", height=\"300px\")\n
Source code in src/argilla/markdown/media.py
def image_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n) -> str:\n    \"\"\"\n    Convert an image file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the image file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import image_to_html\n        html = image_to_html(\"my_image.png\", width=\"300px\", height=\"300px\")\n        ```\n    \"\"\"\n    return _media_to_html(\"image\", file_source, file_type, width, height)\n
"},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.pdf_to_html","title":"pdf_to_html(file_source, width='1000px', height='1000px')","text":"

Convert a pdf file to an HTML tag with embedded data.

Parameters:

Name Type Description Default file_source Union[str, bytes]

The path to the PDF file, a bytes object with PDF data, or a URL.

required width Optional[str]

Display width in HTML. Defaults to \"1000px\".

'1000px' height Optional[str]

Display height in HTML. Defaults to \"1000px\".

'1000px'

Returns:

Type Description str

HTML string embedding the PDF.

Raises:

Type Description ValueError

If the width and height are not pixel or percentage.

Examples:

from argilla.markdown import pdf_to_html\nhtml = pdf_to_html(\"my_pdf.pdf\", width=\"300px\", height=\"300px\")\n
Source code in src/argilla/markdown/media.py
def pdf_to_html(\n    file_source: Union[str, bytes], width: Optional[str] = \"1000px\", height: Optional[str] = \"1000px\"\n) -> str:\n    \"\"\"\n    Convert a pdf file to an HTML tag with embedded data.\n\n    Args:\n        file_source: The path to the PDF file, a bytes object with PDF data, or a URL.\n        width: Display width in HTML. Defaults to \"1000px\".\n        height: Display height in HTML. Defaults to \"1000px\".\n\n    Returns:\n        HTML string embedding the PDF.\n\n    Raises:\n        ValueError: If the width and height are not pixel or percentage.\n\n    Examples:\n        ```python\n        from argilla.markdown import pdf_to_html\n        html = pdf_to_html(\"my_pdf.pdf\", width=\"300px\", height=\"300px\")\n        ```\n    \"\"\"\n    if not _is_valid_dimension(width) or not _is_valid_dimension(height):\n        raise ValueError(\"Width and height must be valid pixel (e.g., '300px') or percentage (e.g., '50%') values.\")\n\n    if isinstance(file_source, str) and urlparse(file_source).scheme in [\"http\", \"https\"]:\n        return f'<embed src=\"{file_source}\" type=\"application/pdf\" width=\"{width}\" height=\"{height}\"></embed>'\n\n    file_data, _ = _get_file_data(file_source, \"pdf\")\n    pdf_base64 = base64.b64encode(file_data).decode(\"utf-8\")\n    data_url = f\"data:application/pdf;base64,{pdf_base64}\"\n    return f'<object id=\"pdf\" data=\"{data_url}\" type=\"application/pdf\" width=\"{width}\" height=\"{height}\"></object>'\n
"},{"location":"reference/argilla/markdown/#src.argilla.markdown.chat","title":"chat","text":""},{"location":"reference/argilla/markdown/#src.argilla.markdown.chat.chat_to_html","title":"chat_to_html(messages)","text":"

Converts a list of chat messages in the OpenAI format to HTML.

Parameters:

Name Type Description Default messages List[Dict[str, str]]

A list of dictionaries where each dictionary represents a chat message. Each dictionary should have the keys: - \"role\": A string indicating the role of the sender (e.g., \"user\", \"model\", \"assistant\", \"system\"). - \"content\": The content of the message.

required

Returns:

Name Type Description str str

An HTML string that represents the chat conversation.

Raises:

Type Description ValueError

If the an invalid role is passed.

Examples:

from argilla.markdown import chat_to_html\nhtml = chat_to_html([\n    {\"role\": \"user\", \"content\": \"hello\"},\n    {\"role\": \"assistant\", \"content\": \"goodbye\"}\n])\n
Source code in src/argilla/markdown/chat.py
def chat_to_html(messages: List[Dict[str, str]]) -> str:\n    \"\"\"\n    Converts a list of chat messages in the OpenAI format to HTML.\n\n    Args:\n        messages (List[Dict[str, str]]): A list of dictionaries where each dictionary represents a chat message.\n            Each dictionary should have the keys:\n                - \"role\": A string indicating the role of the sender (e.g., \"user\", \"model\", \"assistant\", \"system\").\n                - \"content\": The content of the message.\n\n    Returns:\n        str: An HTML string that represents the chat conversation.\n\n    Raises:\n        ValueError: If the an invalid role is passed.\n\n    Examples:\n        ```python\n        from argilla.markdown import chat_to_html\n        html = chat_to_html([\n            {\"role\": \"user\", \"content\": \"hello\"},\n            {\"role\": \"assistant\", \"content\": \"goodbye\"}\n        ])\n        ```\n    \"\"\"\n    chat_html = \"\"\n    for message in messages:\n        role = message[\"role\"]\n        content = message[\"content\"]\n        content_html = markdown.markdown(content)\n\n        if role == \"user\":\n            html = '<div class=\"user-message\">' + '<div class=\"message-content\">'\n        elif role in [\"model\", \"assistant\", \"system\"]:\n            html = '<div class=\"system-message\">' + '<div class=\"message-content\">'\n        else:\n            raise ValueError(f\"Invalid role: {role}\")\n\n        html += f\"{content_html}\"\n        html += \"</div></div>\"\n        chat_html += html\n\n    return f\"<body>{CHAT_CSS_STYLE}{chat_html}</body>\"\n
"},{"location":"reference/argilla/search/","title":"rg.Query","text":"

To collect records based on searching criteria, you can use the Query and Filter classes. The Query class is used to define the search criteria, while the Filter class is used to filter the search results. Filter is passed to a Query object so you can combine multiple filters to create complex search queries. A Query object can also be passed to Dataset.records to fetch records based on the search criteria.

"},{"location":"reference/argilla/search/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/search/#searching-for-records-with-terms","title":"Searching for records with terms","text":"

To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field.

for record in dataset.records(query=\"paris\"):\n    print(record)\n
"},{"location":"reference/argilla/search/#filtering-records-by-conditions","title":"Filtering records by conditions","text":"

Argilla allows you to filter records based on conditions. You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include \"==\", \">=\", \"<=\", or \"in\". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses.

# create a range from 10 to 20\nrange_filter = rg.Filter(\n    [\n        (\"metadata.count\", \">=\", 10),\n        (\"metadata.count\", \"<=\", 20)\n    ]\n)\n\n# query records with metadata count greater than 10 and less than 20\nquery = rg.Query(filters=range_filter, query=\"paris\")\n\n# iterate over the results\nfor record in dataset.records(query=query):\n    print(record)\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Query","title":"Query","text":"

This class is used to map user queries to the internal query models

Source code in src/argilla/records/_search.py
class Query:\n    \"\"\"This class is used to map user queries to the internal query models\"\"\"\n\n    def __init__(\n        self,\n        *,\n        query: Union[str, None] = None,\n        similar: Union[Similar, None] = None,\n        filter: Union[Filter, Conditions, None] = None,\n    ):\n        \"\"\"Create a query object for use in Argilla search requests.add()\n\n        Parameters:\n            query (Union[str, None], optional): The query string that will be used to search.\n            similar (Union[Similar, None], optional): The similar object that will be used to search for similar records\n            filter (Union[Filter, None], optional): The filter object that will be used to filter the search results.\n        \"\"\"\n\n        if isinstance(filter, tuple):\n            filter = [filter]\n\n        if isinstance(filter, list):\n            filter = Filter(conditions=filter)\n\n        self.query = query\n        self.filter = filter\n        self.similar = similar\n\n    def has_search(self) -> bool:\n        return bool(self.query or self.similar or self.filter)\n\n    def api_model(self) -> SearchQueryModel:\n        model = SearchQueryModel()\n\n        if self.query or self.similar:\n            query = QueryModel()\n\n            if self.query is not None:\n                query.text = TextQueryModel(q=self.query)\n\n            if self.similar is not None:\n                query.vector = self.similar.api_model()\n\n            model.query = query\n\n        if self.filter is not None:\n            model.filters = self.filter.api_model()\n\n        return model\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Query.__init__","title":"__init__(*, query=None, similar=None, filter=None)","text":"

Create a query object for use in Argilla search requests.add()

Parameters:

Name Type Description Default query Union[str, None]

The query string that will be used to search.

None similar Union[Similar, None]

The similar object that will be used to search for similar records

None filter Union[Filter, None]

The filter object that will be used to filter the search results.

None Source code in src/argilla/records/_search.py
def __init__(\n    self,\n    *,\n    query: Union[str, None] = None,\n    similar: Union[Similar, None] = None,\n    filter: Union[Filter, Conditions, None] = None,\n):\n    \"\"\"Create a query object for use in Argilla search requests.add()\n\n    Parameters:\n        query (Union[str, None], optional): The query string that will be used to search.\n        similar (Union[Similar, None], optional): The similar object that will be used to search for similar records\n        filter (Union[Filter, None], optional): The filter object that will be used to filter the search results.\n    \"\"\"\n\n    if isinstance(filter, tuple):\n        filter = [filter]\n\n    if isinstance(filter, list):\n        filter = Filter(conditions=filter)\n\n    self.query = query\n    self.filter = filter\n    self.similar = similar\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Filter","title":"Filter","text":"

This class is used to map user filters to the internal filter models

Source code in src/argilla/records/_search.py
class Filter:\n    \"\"\"This class is used to map user filters to the internal filter models\"\"\"\n\n    def __init__(self, conditions: Union[Conditions, None] = None):\n        \"\"\" Create a filter object for use in Argilla search requests.\n\n        Parameters:\n            conditions (Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None], optional): \\\n                The conditions that will be used to filter the search results. \\\n                The conditions should be a list of tuples where each tuple contains \\\n                the field, operator, and value. For example `(\"label\", \"in\", [\"positive\",\"happy\"])`.\\\n        \"\"\"\n\n        if isinstance(conditions, tuple):\n            conditions = [conditions]\n        self.conditions = [Condition(condition) for condition in conditions]\n\n    def api_model(self) -> AndFilterModel:\n        return AndFilterModel.model_validate({\"and\": [condition.api_model() for condition in self.conditions]})\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Filter.__init__","title":"__init__(conditions=None)","text":"

Create a filter object for use in Argilla search requests.

Parameters:

Name Type Description Default conditions Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None]

The conditions that will be used to filter the search results. The conditions should be a list of tuples where each tuple contains the field, operator, and value. For example (\"label\", \"in\", [\"positive\",\"happy\"]).

None Source code in src/argilla/records/_search.py
def __init__(self, conditions: Union[Conditions, None] = None):\n    \"\"\" Create a filter object for use in Argilla search requests.\n\n    Parameters:\n        conditions (Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None], optional): \\\n            The conditions that will be used to filter the search results. \\\n            The conditions should be a list of tuples where each tuple contains \\\n            the field, operator, and value. For example `(\"label\", \"in\", [\"positive\",\"happy\"])`.\\\n    \"\"\"\n\n    if isinstance(conditions, tuple):\n        conditions = [conditions]\n    self.conditions = [Condition(condition) for condition in conditions]\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Similar","title":"Similar","text":"

This class is used to map user similar queries to the internal query models

Source code in src/argilla/records/_search.py
class Similar:\n    \"\"\"This class is used to map user similar queries to the internal query models\"\"\"\n\n    def __init__(self, name: str, value: Union[Iterable[float], \"Record\"], most_similar: bool = True):\n        \"\"\"\n        Create a similar object for use in Argilla search requests.\n\n        Parameters:\n            name: The name of the vector field\n            value: The vector value or the record to search for similar records\n            most_similar: Whether to search for the most similar records or the least similar records\n        \"\"\"\n\n        self.name = name\n        self.value = value\n        self.most_similar = most_similar if most_similar is not None else True\n\n    def api_model(self) -> VectorQueryModel:\n        from argilla.records import Record\n\n        order = \"most_similar\" if self.most_similar else \"least_similar\"\n\n        if isinstance(self.value, Record):\n            return VectorQueryModel(name=self.name, record_id=self.value._server_id, order=order)\n\n        return VectorQueryModel(name=self.name, value=self.value, order=order)\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Similar.__init__","title":"__init__(name, value, most_similar=True)","text":"

Create a similar object for use in Argilla search requests.

Parameters:

Name Type Description Default name str

The name of the vector field

required value Union[Iterable[float], Record]

The vector value or the record to search for similar records

required most_similar bool

Whether to search for the most similar records or the least similar records

True Source code in src/argilla/records/_search.py
def __init__(self, name: str, value: Union[Iterable[float], \"Record\"], most_similar: bool = True):\n    \"\"\"\n    Create a similar object for use in Argilla search requests.\n\n    Parameters:\n        name: The name of the vector field\n        value: The vector value or the record to search for similar records\n        most_similar: Whether to search for the most similar records or the least similar records\n    \"\"\"\n\n    self.name = name\n    self.value = value\n    self.most_similar = most_similar if most_similar is not None else True\n
"},{"location":"reference/argilla/users/","title":"rg.User","text":"

A user in Argilla is a profile that uses the SDK or UI. Their profile can be used to track their feedback activity and to manage their access to the Argilla server.

"},{"location":"reference/argilla/users/#usage-examples","title":"Usage Examples","text":"

To create a new user, instantiate the User object with the client and the username:

user = rg.User(username=\"my_username\", password=\"my_password\")\nuser.create()\n

Existing users can be retrieved by their username:

user = client.users(\"my_username\")\n

The current user of the rg.Argilla client can be accessed using the me attribute:

client.me\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User","title":"User","text":"

Bases: Resource

Class for interacting with Argilla users in the Argilla server. User profiles are used to manage access to the Argilla server and track responses to records.

Attributes:

Name Type Description username str

The username of the user.

first_name str

The first name of the user.

last_name str

The last name of the user.

role str

The role of the user, either 'annotator' or 'admin'.

password str

The password of the user.

id UUID

The ID of the user.

Source code in src/argilla/users/_resource.py
class User(Resource):\n    \"\"\"Class for interacting with Argilla users in the Argilla server. User profiles \\\n        are used to manage access to the Argilla server and track responses to records.\n\n    Attributes:\n        username (str): The username of the user.\n        first_name (str): The first name of the user.\n        last_name (str): The last name of the user.\n        role (str): The role of the user, either 'annotator' or 'admin'.\n        password (str): The password of the user.\n        id (UUID): The ID of the user.\n    \"\"\"\n\n    _model: UserModel\n    _api: UsersAPI\n\n    def __init__(\n        self,\n        username: Optional[str] = None,\n        first_name: Optional[str] = None,\n        last_name: Optional[str] = None,\n        role: Optional[str] = None,\n        password: Optional[str] = None,\n        client: Optional[\"Argilla\"] = None,\n        id: Optional[UUID] = None,\n        _model: Optional[UserModel] = None,\n    ) -> None:\n        \"\"\"Initializes a User object with a client and a username\n\n        Parameters:\n            username (str): The username of the user\n            first_name (str): The first name of the user\n            last_name (str): The last name of the user\n            role (str): The role of the user, either 'annotator', admin, or 'owner'\n            password (str): The password of the user\n            client (Argilla): The client used to interact with Argilla\n\n        Returns:\n            User: The initialized user object\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.users)\n\n        if _model is None:\n            _model = UserModel(\n                username=username,\n                password=password,\n                first_name=first_name or username,\n                last_name=last_name,\n                role=role or Role.annotator,\n                id=id,\n            )\n            self._log_message(f\"Initialized user with username {username}\")\n        self._model = _model\n\n    def create(self) -> \"User\":\n        \"\"\"Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.\n\n        Returns:\n            User: The user that was created in Argilla.\n        \"\"\"\n        model_create = self.api_model()\n        model = self._api.create(model_create)\n        # The password is not returned in the response\n        model.password = model_create.password\n        self._model = model\n        return self\n\n    def delete(self) -> None:\n        \"\"\"Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.\"\"\"\n        super().delete()\n        # exists relies on the id, so we need to set it to None\n        self._model = UserModel(username=self.username)\n\n    def add_to_workspace(self, workspace: \"Workspace\") -> \"User\":\n        \"\"\"Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets\n        in the workspace.\n\n        Args:\n            workspace (Workspace): The workspace to add the user to.\n\n        Returns:\n            User: The user that was added to the workspace.\n        \"\"\"\n        self._model = self._api.add_to_workspace(workspace.id, self.id)\n        return self\n\n    def remove_from_workspace(self, workspace: \"Workspace\") -> \"User\":\n        \"\"\"Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to\n        the datasets in the workspace.\n\n        Args:\n            workspace (Workspace): The workspace to remove the user from.\n\n        Returns:\n            User: The user that was removed from the workspace.\n\n        \"\"\"\n        self._model = self._api.delete_from_workspace(workspace.id, self.id)\n        return self\n\n    ############################\n    # Properties\n    ############################\n    @property\n    def username(self) -> str:\n        return self._model.username\n\n    @username.setter\n    def username(self, value: str) -> None:\n        self._model.username = value\n\n    @property\n    def password(self) -> str:\n        return self._model.password\n\n    @password.setter\n    def password(self, value: str) -> None:\n        self._model.password = value\n\n    @property\n    def first_name(self) -> str:\n        return self._model.first_name\n\n    @first_name.setter\n    def first_name(self, value: str) -> None:\n        self._model.first_name = value\n\n    @property\n    def last_name(self) -> str:\n        return self._model.last_name\n\n    @last_name.setter\n    def last_name(self, value: str) -> None:\n        self._model.last_name = value\n\n    @property\n    def role(self) -> Role:\n        return self._model.role\n\n    @role.setter\n    def role(self, value: Role) -> None:\n        self._model.role = value\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.__init__","title":"__init__(username=None, first_name=None, last_name=None, role=None, password=None, client=None, id=None, _model=None)","text":"

Initializes a User object with a client and a username

Parameters:

Name Type Description Default username str

The username of the user

None first_name str

The first name of the user

None last_name str

The last name of the user

None role str

The role of the user, either 'annotator', admin, or 'owner'

None password str

The password of the user

None client Argilla

The client used to interact with Argilla

None

Returns:

Name Type Description User None

The initialized user object

Source code in src/argilla/users/_resource.py
def __init__(\n    self,\n    username: Optional[str] = None,\n    first_name: Optional[str] = None,\n    last_name: Optional[str] = None,\n    role: Optional[str] = None,\n    password: Optional[str] = None,\n    client: Optional[\"Argilla\"] = None,\n    id: Optional[UUID] = None,\n    _model: Optional[UserModel] = None,\n) -> None:\n    \"\"\"Initializes a User object with a client and a username\n\n    Parameters:\n        username (str): The username of the user\n        first_name (str): The first name of the user\n        last_name (str): The last name of the user\n        role (str): The role of the user, either 'annotator', admin, or 'owner'\n        password (str): The password of the user\n        client (Argilla): The client used to interact with Argilla\n\n    Returns:\n        User: The initialized user object\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.users)\n\n    if _model is None:\n        _model = UserModel(\n            username=username,\n            password=password,\n            first_name=first_name or username,\n            last_name=last_name,\n            role=role or Role.annotator,\n            id=id,\n        )\n        self._log_message(f\"Initialized user with username {username}\")\n    self._model = _model\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.create","title":"create()","text":"

Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.

Returns:

Name Type Description User User

The user that was created in Argilla.

Source code in src/argilla/users/_resource.py
def create(self) -> \"User\":\n    \"\"\"Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.\n\n    Returns:\n        User: The user that was created in Argilla.\n    \"\"\"\n    model_create = self.api_model()\n    model = self._api.create(model_create)\n    # The password is not returned in the response\n    model.password = model_create.password\n    self._model = model\n    return self\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.delete","title":"delete()","text":"

Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.

Source code in src/argilla/users/_resource.py
def delete(self) -> None:\n    \"\"\"Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.\"\"\"\n    super().delete()\n    # exists relies on the id, so we need to set it to None\n    self._model = UserModel(username=self.username)\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.add_to_workspace","title":"add_to_workspace(workspace)","text":"

Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets in the workspace.

Parameters:

Name Type Description Default workspace Workspace

The workspace to add the user to.

required

Returns:

Name Type Description User User

The user that was added to the workspace.

Source code in src/argilla/users/_resource.py
def add_to_workspace(self, workspace: \"Workspace\") -> \"User\":\n    \"\"\"Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets\n    in the workspace.\n\n    Args:\n        workspace (Workspace): The workspace to add the user to.\n\n    Returns:\n        User: The user that was added to the workspace.\n    \"\"\"\n    self._model = self._api.add_to_workspace(workspace.id, self.id)\n    return self\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.remove_from_workspace","title":"remove_from_workspace(workspace)","text":"

Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to the datasets in the workspace.

Parameters:

Name Type Description Default workspace Workspace

The workspace to remove the user from.

required

Returns:

Name Type Description User User

The user that was removed from the workspace.

Source code in src/argilla/users/_resource.py
def remove_from_workspace(self, workspace: \"Workspace\") -> \"User\":\n    \"\"\"Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to\n    the datasets in the workspace.\n\n    Args:\n        workspace (Workspace): The workspace to remove the user from.\n\n    Returns:\n        User: The user that was removed from the workspace.\n\n    \"\"\"\n    self._model = self._api.delete_from_workspace(workspace.id, self.id)\n    return self\n
"},{"location":"reference/argilla/workspaces/","title":"rg.Workspace","text":"

In Argilla, workspaces are used to organize datasets in to groups. For example, you might have a workspace for each project or team.

"},{"location":"reference/argilla/workspaces/#usage-examples","title":"Usage Examples","text":"

To create a new workspace, instantiate the Workspace object with the client and the name:

workspace = rg.Workspace(name=\"my_workspace\")\nworkspace.create()\n

To retrieve an existing workspace, use the client.workspaces attribute:

workspace = client.workspaces(\"my_workspace\")\n
"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace","title":"Workspace","text":"

Bases: Resource

Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.

Attributes:

Name Type Description name str

The name of the workspace.

id UUID

The ID of the workspace. This is a unique identifier for the workspace in the server.

datasets List[Dataset]

A list of all datasets in the workspace.

users WorkspaceUsers

A list of all users in the workspace.

Source code in src/argilla/workspaces/_resource.py
class Workspace(Resource):\n    \"\"\"Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.\n\n    Attributes:\n        name (str): The name of the workspace.\n        id (UUID): The ID of the workspace. This is a unique identifier for the workspace in the server.\n        datasets (List[Dataset]): A list of all datasets in the workspace.\n        users (WorkspaceUsers): A list of all users in the workspace.\n    \"\"\"\n\n    name: Optional[str]\n\n    _api: \"WorkspacesAPI\"\n\n    def __init__(\n        self,\n        name: Optional[str] = None,\n        id: Optional[UUID] = None,\n        client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Initializes a Workspace object with a client and a name or id\n\n        Parameters:\n            client (Argilla): The client used to interact with Argilla\n            name (str): The name of the workspace\n            id (UUID): The id of the workspace\n\n        Returns:\n            Workspace: The initialized workspace object\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.workspaces)\n\n        self._model = WorkspaceModel(name=name, id=id)\n\n    def add_user(self, user: Union[\"User\", str]) -> \"User\":\n        \"\"\"Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets\n        in the workspace.\n\n        Args:\n            user (Union[User, str]): The user to add to the workspace. Can be a User object or a username.\n\n        Returns:\n            User: The user that was added to the workspace\n        \"\"\"\n        return self.users.add(user)\n\n    def remove_user(self, user: Union[\"User\", str]) -> \"User\":\n        \"\"\"Removes a user from the workspace. After removing a user from the workspace, it will no longer have access\n\n        Args:\n            user (Union[User, str]): The user to remove from the workspace. Can be a User object or a username.\n\n        Returns:\n            User: The user that was removed from the workspace.\n        \"\"\"\n        return self.users.delete(user)\n\n    # TODO: Make this method private\n    def list_datasets(self) -> List[\"Dataset\"]:\n        from argilla.datasets import Dataset\n\n        datasets = self._client.api.datasets.list(self.id)\n        self._log_message(f\"Got {len(datasets)} datasets for workspace {self.id}\")\n        return [Dataset.from_model(model=dataset, client=self._client) for dataset in datasets]\n\n    @classmethod\n    def from_model(cls, model: WorkspaceModel, client: Argilla) -> \"Workspace\":\n        instance = cls(name=model.name, id=model.id, client=client)\n        instance._model = model\n\n        return instance\n\n    ############################\n    # Properties\n    ############################\n\n    @property\n    def name(self) -> Optional[str]:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def datasets(self) -> List[\"Dataset\"]:\n        \"\"\"List all datasets in the workspace\n\n        Returns:\n            List[Dataset]: A list of all datasets in the workspace\n        \"\"\"\n        return self.list_datasets()\n\n    @property\n    def users(self) -> \"WorkspaceUsers\":\n        \"\"\"List all users in the workspace\n\n        Returns:\n            WorkspaceUsers: A list of all users in the workspace\n        \"\"\"\n        return WorkspaceUsers(workspace=self)\n
"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.datasets","title":"datasets: List[Dataset] property","text":"

List all datasets in the workspace

Returns:

Type Description List[Dataset]

List[Dataset]: A list of all datasets in the workspace

"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.users","title":"users: WorkspaceUsers property","text":"

List all users in the workspace

Returns:

Name Type Description WorkspaceUsers WorkspaceUsers

A list of all users in the workspace

"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.__init__","title":"__init__(name=None, id=None, client=None)","text":"

Initializes a Workspace object with a client and a name or id

Parameters:

Name Type Description Default client Argilla

The client used to interact with Argilla

None name str

The name of the workspace

None id UUID

The id of the workspace

None

Returns:

Name Type Description Workspace None

The initialized workspace object

Source code in src/argilla/workspaces/_resource.py
def __init__(\n    self,\n    name: Optional[str] = None,\n    id: Optional[UUID] = None,\n    client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Initializes a Workspace object with a client and a name or id\n\n    Parameters:\n        client (Argilla): The client used to interact with Argilla\n        name (str): The name of the workspace\n        id (UUID): The id of the workspace\n\n    Returns:\n        Workspace: The initialized workspace object\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.workspaces)\n\n    self._model = WorkspaceModel(name=name, id=id)\n
"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.add_user","title":"add_user(user)","text":"

Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets in the workspace.

Parameters:

Name Type Description Default user Union[User, str]

The user to add to the workspace. Can be a User object or a username.

required

Returns:

Name Type Description User User

The user that was added to the workspace

Source code in src/argilla/workspaces/_resource.py
def add_user(self, user: Union[\"User\", str]) -> \"User\":\n    \"\"\"Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets\n    in the workspace.\n\n    Args:\n        user (Union[User, str]): The user to add to the workspace. Can be a User object or a username.\n\n    Returns:\n        User: The user that was added to the workspace\n    \"\"\"\n    return self.users.add(user)\n
"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.remove_user","title":"remove_user(user)","text":"

Removes a user from the workspace. After removing a user from the workspace, it will no longer have access

Parameters:

Name Type Description Default user Union[User, str]

The user to remove from the workspace. Can be a User object or a username.

required

Returns:

Name Type Description User User

The user that was removed from the workspace.

Source code in src/argilla/workspaces/_resource.py
def remove_user(self, user: Union[\"User\", str]) -> \"User\":\n    \"\"\"Removes a user from the workspace. After removing a user from the workspace, it will no longer have access\n\n    Args:\n        user (Union[User, str]): The user to remove from the workspace. Can be a User object or a username.\n\n    Returns:\n        User: The user that was removed from the workspace.\n    \"\"\"\n    return self.users.delete(user)\n
"},{"location":"reference/argilla/datasets/dataset_records/","title":"rg.Dataset.records","text":""},{"location":"reference/argilla/datasets/dataset_records/#usage-examples","title":"Usage Examples","text":"

In most cases, you will not need to create a DatasetRecords object directly. Instead, you can access it via the Dataset object:

dataset.records\n

For user familiar with legacy approaches

  1. Dataset.records object is used to interact with the records in a dataset. It interactively fetches records from the server in batches without using a local copy of the records.
  2. The log method of Dataset.records is used to both add and update records in a dataset. If the record includes a known id field, the record will be updated. If the record does not include a known id field, the record will be added.
"},{"location":"reference/argilla/datasets/dataset_records/#adding-records-to-a-dataset","title":"Adding records to a dataset","text":"

To add records to a dataset, use the log method. Records can be added as dictionaries or as Record objects. Single records can also be added as a dictionary or Record.

As a Record objectFrom a data structureFrom a data structure with a mappingFrom a Hugging Face dataset

You can also add records to a dataset by initializing a Record object directly.

records = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n    ),\n] # (1)\n\ndataset.records.log(records)\n
  1. This is an illustration of a definition. In a real world scenario, you would iterate over a data structure and create Record objects for each iteration.
data = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n    },\n] # (1)\n\ndataset.records.log(data)\n
  1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
data = [\n    {\n        \"query\": \"Do you need oxygen to breathe?\",\n        \"response\": \"Yes\",\n    },\n    {\n        \"query\": \"What is the boiling point of water?\",\n        \"response\": \"100 degrees Celsius\",\n    },\n] # (1)\ndataset.records.log(\n    records=data,\n    mapping={\"query\": \"question\", \"response\": \"answer\"} # (2)\n)\n
  1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
  2. The data structure has keys query and response and the Argilla dataset has question and answer. You can use the mapping parameter to map the keys in the data structure to the fields in the Argilla dataset.

You can also add records to a dataset using a Hugging Face dataset. This is useful when you want to use a dataset from the Hugging Face Hub and add it to your Argilla dataset.

You can add the dataset where the column names correspond to the names of fields, questions, metadata or vectors in the Argilla dataset.

If the dataset's schema does not correspond to your Argilla dataset names, you can use a mapping to indicate which columns in the dataset correspond to the Argilla dataset fields.

from datasets import load_dataset\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (1)\n\ndataset.records.log(records=hf_dataset)\n
  1. In this example, the Hugging Face dataset matches the Argilla dataset schema. If that is not the case, you could use the .map of the datasets library to prepare the data before adding it to the Argilla dataset.

Here we use the mapping parameter to specify the relationship between the Hugging Face dataset and the Argilla dataset.

dataset.records.log(records=hf_dataset, mapping={\"txt\": \"text\", \"y\": \"label\"}) # (1)\n
  1. In this case, the txt key in the Hugging Face dataset corresponds to the text field in the Argilla dataset, and the y key in the Hugging Face dataset corresponds to the label field in the Argilla dataset.
"},{"location":"reference/argilla/datasets/dataset_records/#updating-records-in-a-dataset","title":"Updating records in a dataset","text":"

Records can also be updated using the log method with records that contain an id to identify the records to be updated. As above, records can be added as dictionaries or as Record objects.

As a Record objectFrom a data structureFrom a data structure with a mappingFrom a Hugging Face dataset

You can update records in a dataset by initializing a Record object directly and providing the id field.

records = [\n    rg.Record(\n        metadata={\"department\": \"toys\"},\n        id=\"2\" # (1)\n    ),\n]\n\ndataset.records.log(records)\n
  1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.

You can also update records in a dataset by providing the id field in the data structure.

data = [\n    {\n        \"metadata\": {\"department\": \"toys\"},\n        \"id\": \"2\" # (1)\n    },\n]\n\ndataset.records.log(data)\n
  1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.

You can also update records in a dataset by providing the id field in the data structure and using a mapping to map the keys in the data structure to the fields in the dataset.

data = [\n    {\n        \"metadata\": {\"department\": \"toys\"},\n        \"my_id\": \"2\" # (1)\n    },\n]\n\ndataset.records.log(\n    records=data,\n    mapping={\"my_id\": \"id\"} # (2)\n)\n
  1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.
  2. Let's say that your data structure has keys my_id instead of id. You can use the mapping parameter to map the keys in the data structure to the fields in the dataset.

You can also update records to an Argilla dataset using a Hugging Face dataset. To update records, the Hugging Face dataset must contain an id field to identify the records to be updated, or you can use a mapping to map the keys in the Hugging Face dataset to the fields in the Argilla dataset.

from datasets import load_dataset\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (1)\n\ndataset.records.log(records=hf_dataset, mapping={\"uuid\": \"id\"}) # (2)\n
  1. In this example, the Hugging Face dataset matches the Argilla dataset schema.
  2. The uuid key in the Hugging Face dataset corresponds to the id field in the Argilla dataset.
"},{"location":"reference/argilla/datasets/dataset_records/#adding-and-updating-records-with-images","title":"Adding and updating records with images","text":"

Argilla datasets can contain image fields. You can add images to a dataset by passing the image to the record object as either a remote URL, a local path to an image file, or a PIL object. The field names must be defined as an rg.ImageField in the dataset's Settings object to be accepted. Images will be stored in the Argilla database and returned using the data URI schema.

As PIL objects

To retrieve the images as rescaled PIL objects, you can use the to_datasets method when exporting the records, as shown in this how-to guide.

From a data structure with remote URLsFrom a data structure with local files or PIL objectsFrom a Hugging Face dataset
data = [\n    {\n        \"image\": \"https://example.com/image1.jpg\",\n    },\n    {\n        \"image\": \"https://example.com/image2.jpg\",\n    },\n]\n\ndataset.records.log(data)\n
import os\nfrom PIL import Image\n\nimage_dir = \"path/to/images\"\n\ndata = [\n    {\n        \"image\": os.path.join(image_dir, \"image1.jpg\"), # (1)\n    },\n    {\n        \"image\": Image.open(os.path.join(image_dir, \"image2.jpg\")), # (2)\n    },\n]\n\ndataset.records.log(data)\n
  1. The image is a local file path.
  2. The image is a PIL object.

Hugging Face datasets can be passed directly to the log method. The image field must be defined as an Image in the dataset's features.

hf_dataset = load_dataset(\"ylecun/mnist\", split=\"train[:100]\")\ndataset.records.log(records=hf_dataset)\n

If the image field is not defined as an Image in the dataset's features, you can cast the dataset to the correct schema before adding it to the Argilla dataset. This is only necessary if the image field is not defined as an Image in the dataset's features, and is not one of the supported image types by Argilla (URL, local path, or PIL object).

hf_dataset = load_dataset(\"<my_custom_dataset>\") # (1)\nhf_dataset = hf_dataset.cast(\n    features=Features({\"image\": Image(), \"label\": Value(\"string\")}),\n)\ndataset.records.log(records=hf_dataset)\n
  1. In this example, the Hugging Face dataset matches the Argilla dataset schema but the image field is not defined as an Image in the dataset's features.
"},{"location":"reference/argilla/datasets/dataset_records/#iterating-over-records-in-a-dataset","title":"Iterating over records in a dataset","text":"

Dataset.records can be used to iterate over records in a dataset from the server. The records will be fetched in batches from the server::

for record in dataset.records:\n    print(record)\n\n# Fetch records with suggestions and responses\nfor record in dataset.records(with_suggestions=True, with_responses=True):\n    print(record.suggestions)\n    print(record.responses)\n\n# Filter records by a query and fetch records with vectors\nfor record in dataset.records(query=\"capital\", with_vectors=True):\n    print(record.vectors)\n

Check out the rg.Record class reference for more information on the properties and methods available on a record and the rg.Query class reference for more information on the query syntax.

"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords","title":"DatasetRecords","text":"

Bases: Iterable[Record], LoggingMixin

This class is used to work with records from a dataset and is accessed via Dataset.records. The responsibility of this class is to provide an interface to interact with records in a dataset, by adding, updating, fetching, querying, deleting, and exporting records.

Attributes:

Name Type Description client Argilla

The Argilla client object.

dataset Dataset

The dataset object.

Source code in src/argilla/records/_dataset_records.py
class DatasetRecords(Iterable[Record], LoggingMixin):\n    \"\"\"This class is used to work with records from a dataset and is accessed via `Dataset.records`.\n    The responsibility of this class is to provide an interface to interact with records in a dataset,\n    by adding, updating, fetching, querying, deleting, and exporting records.\n\n    Attributes:\n        client (Argilla): The Argilla client object.\n        dataset (Dataset): The dataset object.\n    \"\"\"\n\n    _api: RecordsAPI\n\n    DEFAULT_BATCH_SIZE = 256\n    DEFAULT_DELETE_BATCH_SIZE = 64\n\n    def __init__(\n        self, client: \"Argilla\", dataset: \"Dataset\", mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None\n    ):\n        \"\"\"Initializes a DatasetRecords object with a client and a dataset.\n        Args:\n            client: An Argilla client object.\n            dataset: A Dataset object.\n        \"\"\"\n        self.__client = client\n        self.__dataset = dataset\n        self._mapping = mapping or {}\n        self._api = self.__client.api.records\n\n    def __iter__(self):\n        return DatasetRecordsIterator(self.__dataset, self.__client, with_suggestions=True, with_responses=True)\n\n    def __call__(\n        self,\n        query: Optional[Union[str, Query]] = None,\n        batch_size: Optional[int] = DEFAULT_BATCH_SIZE,\n        start_offset: int = 0,\n        with_suggestions: bool = True,\n        with_responses: bool = True,\n        with_vectors: Optional[Union[List, bool, str]] = None,\n        limit: Optional[int] = None,\n    ) -> DatasetRecordsIterator:\n        \"\"\"Returns an iterator over the records in the dataset on the server.\n\n        Parameters:\n            query: A string or a Query object to filter the records.\n            batch_size: The number of records to fetch in each batch. The default is 256.\n            start_offset: The offset from which to start fetching records. The default is 0.\n            with_suggestions: Whether to include suggestions in the records. The default is True.\n            with_responses: Whether to include responses in the records. The default is True.\n            with_vectors: A list of vector names to include in the records. The default is None.\n                If a list is provided, only the specified vectors will be included.\n                If True is provided, all vectors will be included.\n            limit: The maximum number of records to fetch. The default is None.\n\n        Returns:\n            An iterator over the records in the dataset on the server.\n\n        \"\"\"\n        if query and isinstance(query, str):\n            query = Query(query=query)\n\n        if with_vectors:\n            self._validate_vector_names(vector_names=with_vectors)\n\n        return DatasetRecordsIterator(\n            dataset=self.__dataset,\n            client=self.__client,\n            query=query,\n            batch_size=batch_size,\n            start_offset=start_offset,\n            with_suggestions=with_suggestions,\n            with_responses=with_responses,\n            with_vectors=with_vectors,\n            limit=limit,\n        )\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}({self.__dataset})\"\n\n    ############################\n    # Public methods\n    ############################\n\n    def log(\n        self,\n        records: Union[List[dict], List[Record], HFDataset],\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        user_id: Optional[UUID] = None,\n        batch_size: int = DEFAULT_BATCH_SIZE,\n        on_error: RecordErrorHandling = RecordErrorHandling.RAISE,\n    ) -> \"DatasetRecords\":\n        \"\"\"Add or update records in a dataset on the server using the provided records.\n        If the record includes a known `id` field, the record will be updated.\n        If the record does not include a known `id` field, the record will be added as a new record.\n        See `rg.Record` for more information on the record definition.\n\n        Parameters:\n            records: A list of `Record` objects, a Hugging Face Dataset, or a list of dictionaries representing the records.\n                     If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the\n                     fields in the Argilla dataset's fields and questions. `id` should be provided to identify the records when updating.\n            mapping: A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset.\n                     To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.\n            user_id: The user id to be associated with the records' response. If not provided, the current user id is used.\n            batch_size: The number of records to send in each batch. The default is 256.\n\n        Returns:\n            A list of Record objects representing the updated records.\n        \"\"\"\n        record_models = self._ingest_records(\n            records=records, mapping=mapping, user_id=user_id or self.__client.me.id, on_error=on_error\n        )\n        batch_size = self._normalize_batch_size(\n            batch_size=batch_size,\n            records_length=len(record_models),\n            max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,\n        )\n\n        created_or_updated = []\n        records_updated = 0\n\n        for batch in tqdm(\n            iterable=range(0, len(records), batch_size),\n            desc=\"Sending records...\",\n            total=len(records) // batch_size,\n            unit=\"batch\",\n        ):\n            self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n            batch_records = record_models[batch : batch + batch_size]\n            models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)\n            created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])\n            records_updated += updated\n\n        records_created = len(created_or_updated) - records_updated\n        self._log_message(\n            message=f\"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}\",\n            level=\"info\",\n        )\n\n        return self\n\n    def delete(\n        self,\n        records: List[Record],\n        batch_size: int = DEFAULT_DELETE_BATCH_SIZE,\n    ) -> List[Record]:\n        \"\"\"Delete records in a dataset on the server using the provided records\n            and matching based on the id.\n\n        Parameters:\n            records: A list of `Record` objects representing the records to be deleted.\n            batch_size: The number of records to send in each batch. The default is 64.\n\n        Returns:\n            A list of Record objects representing the deleted records.\n\n        \"\"\"\n        mapping = None\n        user_id = self.__client.me.id\n        record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id)\n        batch_size = self._normalize_batch_size(\n            batch_size=batch_size,\n            records_length=len(record_models),\n            max_value=self._api.MAX_RECORDS_PER_DELETE_BULK,\n        )\n\n        records_deleted = 0\n        for batch in tqdm(\n            iterable=range(0, len(records), batch_size),\n            desc=\"Sending records...\",\n            total=len(records) // batch_size,\n            unit=\"batch\",\n        ):\n            self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n            batch_records = record_models[batch : batch + batch_size]\n            self._api.delete_many(dataset_id=self.__dataset.id, records=batch_records)\n            records_deleted += len(batch_records)\n\n        self._log_message(\n            message=f\"Deleted {len(record_models)} records from dataset {self.__dataset.name}\",\n            level=\"info\",\n        )\n\n        return records\n\n    def to_dict(self, flatten: bool = False, orient: str = \"names\") -> Dict[str, Any]:\n        \"\"\"\n        Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().\n\n        Parameters:\n            flatten (bool): The structure of the exported dictionary.\n                - True: The record fields, metadata, suggestions and responses will be flattened.\n                - False: The record fields, metadata, suggestions and responses will be nested.\n            orient (str): The orientation of the exported dictionary.\n                - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.\n                - \"index\": The keys of the dictionary will be the id of the records.\n        Returns:\n            A dictionary of records.\n\n        \"\"\"\n        return self().to_dict(flatten=flatten, orient=orient)\n\n    def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:\n        \"\"\"\n        Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().\n\n        Parameters:\n            flatten (bool): The structure of the exported dictionaries in the list.\n                - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, `label.suggestion` and `label.response`. Records responses are spread across multiple columns for values and users.\n                - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.\n        Returns:\n            A list of dictionaries of records.\n        \"\"\"\n        data = self().to_list(flatten=flatten)\n        return data\n\n    def to_json(self, path: Union[Path, str]) -> Path:\n        \"\"\"\n        Export the records to a file on disk.\n\n        Parameters:\n            path (str): The path to the file to save the records.\n\n        Returns:\n            The path to the file where the records were saved.\n\n        \"\"\"\n        return self().to_json(path=path)\n\n    def from_json(self, path: Union[Path, str]) -> List[Record]:\n        \"\"\"Creates a DatasetRecords object from a disk path to a JSON file.\n            The JSON file should be defined by `DatasetRecords.to_json`.\n\n        Args:\n            path (str): The path to the file containing the records.\n\n        Returns:\n            DatasetRecords: The DatasetRecords object created from the disk path.\n\n        \"\"\"\n        records = JsonIO._records_from_json(path=path)\n        return self.log(records=records)\n\n    def to_datasets(self) -> HFDataset:\n        \"\"\"\n        Export the records to a HFDataset.\n\n        Returns:\n            The dataset containing the records.\n\n        \"\"\"\n\n        return self().to_datasets()\n\n    ############################\n    # Private methods\n    ############################\n\n    def _ingest_records(\n        self,\n        records: Union[List[Dict[str, Any]], List[Record], HFDataset],\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        user_id: Optional[UUID] = None,\n        on_error: RecordErrorHandling = RecordErrorHandling.RAISE,\n    ) -> List[RecordModel]:\n        \"\"\"Ingests records from a list of dictionaries, a Hugging Face Dataset, or a list of Record objects.\"\"\"\n\n        mapping = mapping or self._mapping\n        if len(records) == 0:\n            raise ValueError(\"No records provided to ingest.\")\n\n        if HFDatasetsIO._is_hf_dataset(dataset=records):\n            records = HFDatasetsIO._record_dicts_from_datasets(hf_dataset=records)\n\n        ingested_records = []\n        record_mapper = IngestedRecordMapper(mapping=mapping, dataset=self.__dataset, user_id=user_id)\n        for record in records:\n            try:\n                if isinstance(record, dict):\n                    record = record_mapper(data=record)\n                elif isinstance(record, Record):\n                    record.dataset = self.__dataset\n                else:\n                    raise ValueError(\n                        \"Records should be a a list Record instances, \"\n                        \"a Hugging Face Dataset, or a list of dictionaries representing the records.\"\n                        f\"Found a record of type {type(record)}: {record}.\"\n                    )\n            except Exception as e:\n                if on_error == RecordErrorHandling.IGNORE:\n                    self._log_message(\n                        message=f\"Failed to ingest record from dict {record}: {e}\",\n                        level=\"info\",\n                    )\n                    continue\n                elif on_error == RecordErrorHandling.WARN:\n                    warnings.warn(f\"Failed to ingest record from dict {record}: {e}\")\n                    continue\n                raise RecordsIngestionError(f\"Failed to ingest record from dict {record}\") from e\n            ingested_records.append(record.api_model())\n        return ingested_records\n\n    def _normalize_batch_size(self, batch_size: int, records_length, max_value: int):\n        norm_batch_size = min(batch_size, records_length, max_value)\n\n        if batch_size != norm_batch_size:\n            self._log_message(\n                message=f\"The provided batch size {batch_size} was normalized. Using value {norm_batch_size}.\",\n                level=\"warning\",\n            )\n\n        return norm_batch_size\n\n    def _validate_vector_names(self, vector_names: Union[List[str], str]) -> None:\n        if not isinstance(vector_names, list):\n            vector_names = [vector_names]\n        for vector_name in vector_names:\n            if isinstance(vector_name, bool):\n                continue\n            if vector_name not in self.__dataset.schema:\n                raise ValueError(f\"Vector field {vector_name} not found in dataset schema.\")\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.__init__","title":"__init__(client, dataset, mapping=None)","text":"

Initializes a DatasetRecords object with a client and a dataset. Args: client: An Argilla client object. dataset: A Dataset object.

Source code in src/argilla/records/_dataset_records.py
def __init__(\n    self, client: \"Argilla\", dataset: \"Dataset\", mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None\n):\n    \"\"\"Initializes a DatasetRecords object with a client and a dataset.\n    Args:\n        client: An Argilla client object.\n        dataset: A Dataset object.\n    \"\"\"\n    self.__client = client\n    self.__dataset = dataset\n    self._mapping = mapping or {}\n    self._api = self.__client.api.records\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.__call__","title":"__call__(query=None, batch_size=DEFAULT_BATCH_SIZE, start_offset=0, with_suggestions=True, with_responses=True, with_vectors=None, limit=None)","text":"

Returns an iterator over the records in the dataset on the server.

Parameters:

Name Type Description Default query Optional[Union[str, Query]]

A string or a Query object to filter the records.

None batch_size Optional[int]

The number of records to fetch in each batch. The default is 256.

DEFAULT_BATCH_SIZE start_offset int

The offset from which to start fetching records. The default is 0.

0 with_suggestions bool

Whether to include suggestions in the records. The default is True.

True with_responses bool

Whether to include responses in the records. The default is True.

True with_vectors Optional[Union[List, bool, str]]

A list of vector names to include in the records. The default is None. If a list is provided, only the specified vectors will be included. If True is provided, all vectors will be included.

None limit Optional[int]

The maximum number of records to fetch. The default is None.

None

Returns:

Type Description DatasetRecordsIterator

An iterator over the records in the dataset on the server.

Source code in src/argilla/records/_dataset_records.py
def __call__(\n    self,\n    query: Optional[Union[str, Query]] = None,\n    batch_size: Optional[int] = DEFAULT_BATCH_SIZE,\n    start_offset: int = 0,\n    with_suggestions: bool = True,\n    with_responses: bool = True,\n    with_vectors: Optional[Union[List, bool, str]] = None,\n    limit: Optional[int] = None,\n) -> DatasetRecordsIterator:\n    \"\"\"Returns an iterator over the records in the dataset on the server.\n\n    Parameters:\n        query: A string or a Query object to filter the records.\n        batch_size: The number of records to fetch in each batch. The default is 256.\n        start_offset: The offset from which to start fetching records. The default is 0.\n        with_suggestions: Whether to include suggestions in the records. The default is True.\n        with_responses: Whether to include responses in the records. The default is True.\n        with_vectors: A list of vector names to include in the records. The default is None.\n            If a list is provided, only the specified vectors will be included.\n            If True is provided, all vectors will be included.\n        limit: The maximum number of records to fetch. The default is None.\n\n    Returns:\n        An iterator over the records in the dataset on the server.\n\n    \"\"\"\n    if query and isinstance(query, str):\n        query = Query(query=query)\n\n    if with_vectors:\n        self._validate_vector_names(vector_names=with_vectors)\n\n    return DatasetRecordsIterator(\n        dataset=self.__dataset,\n        client=self.__client,\n        query=query,\n        batch_size=batch_size,\n        start_offset=start_offset,\n        with_suggestions=with_suggestions,\n        with_responses=with_responses,\n        with_vectors=with_vectors,\n        limit=limit,\n    )\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.log","title":"log(records, mapping=None, user_id=None, batch_size=DEFAULT_BATCH_SIZE, on_error=RecordErrorHandling.RAISE)","text":"

Add or update records in a dataset on the server using the provided records. If the record includes a known id field, the record will be updated. If the record does not include a known id field, the record will be added as a new record. See rg.Record for more information on the record definition.

Parameters:

Name Type Description Default records Union[List[dict], List[Record], HFDataset]

A list of Record objects, a Hugging Face Dataset, or a list of dictionaries representing the records. If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the fields in the Argilla dataset's fields and questions. id should be provided to identify the records when updating.

required mapping Optional[Dict[str, Union[str, Sequence[str]]]]

A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset. To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.

None user_id Optional[UUID]

The user id to be associated with the records' response. If not provided, the current user id is used.

None batch_size int

The number of records to send in each batch. The default is 256.

DEFAULT_BATCH_SIZE

Returns:

Type Description DatasetRecords

A list of Record objects representing the updated records.

Source code in src/argilla/records/_dataset_records.py
def log(\n    self,\n    records: Union[List[dict], List[Record], HFDataset],\n    mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n    user_id: Optional[UUID] = None,\n    batch_size: int = DEFAULT_BATCH_SIZE,\n    on_error: RecordErrorHandling = RecordErrorHandling.RAISE,\n) -> \"DatasetRecords\":\n    \"\"\"Add or update records in a dataset on the server using the provided records.\n    If the record includes a known `id` field, the record will be updated.\n    If the record does not include a known `id` field, the record will be added as a new record.\n    See `rg.Record` for more information on the record definition.\n\n    Parameters:\n        records: A list of `Record` objects, a Hugging Face Dataset, or a list of dictionaries representing the records.\n                 If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the\n                 fields in the Argilla dataset's fields and questions. `id` should be provided to identify the records when updating.\n        mapping: A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset.\n                 To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.\n        user_id: The user id to be associated with the records' response. If not provided, the current user id is used.\n        batch_size: The number of records to send in each batch. The default is 256.\n\n    Returns:\n        A list of Record objects representing the updated records.\n    \"\"\"\n    record_models = self._ingest_records(\n        records=records, mapping=mapping, user_id=user_id or self.__client.me.id, on_error=on_error\n    )\n    batch_size = self._normalize_batch_size(\n        batch_size=batch_size,\n        records_length=len(record_models),\n        max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,\n    )\n\n    created_or_updated = []\n    records_updated = 0\n\n    for batch in tqdm(\n        iterable=range(0, len(records), batch_size),\n        desc=\"Sending records...\",\n        total=len(records) // batch_size,\n        unit=\"batch\",\n    ):\n        self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n        batch_records = record_models[batch : batch + batch_size]\n        models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)\n        created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])\n        records_updated += updated\n\n    records_created = len(created_or_updated) - records_updated\n    self._log_message(\n        message=f\"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}\",\n        level=\"info\",\n    )\n\n    return self\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.delete","title":"delete(records, batch_size=DEFAULT_DELETE_BATCH_SIZE)","text":"

Delete records in a dataset on the server using the provided records and matching based on the id.

Parameters:

Name Type Description Default records List[Record]

A list of Record objects representing the records to be deleted.

required batch_size int

The number of records to send in each batch. The default is 64.

DEFAULT_DELETE_BATCH_SIZE

Returns:

Type Description List[Record]

A list of Record objects representing the deleted records.

Source code in src/argilla/records/_dataset_records.py
def delete(\n    self,\n    records: List[Record],\n    batch_size: int = DEFAULT_DELETE_BATCH_SIZE,\n) -> List[Record]:\n    \"\"\"Delete records in a dataset on the server using the provided records\n        and matching based on the id.\n\n    Parameters:\n        records: A list of `Record` objects representing the records to be deleted.\n        batch_size: The number of records to send in each batch. The default is 64.\n\n    Returns:\n        A list of Record objects representing the deleted records.\n\n    \"\"\"\n    mapping = None\n    user_id = self.__client.me.id\n    record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id)\n    batch_size = self._normalize_batch_size(\n        batch_size=batch_size,\n        records_length=len(record_models),\n        max_value=self._api.MAX_RECORDS_PER_DELETE_BULK,\n    )\n\n    records_deleted = 0\n    for batch in tqdm(\n        iterable=range(0, len(records), batch_size),\n        desc=\"Sending records...\",\n        total=len(records) // batch_size,\n        unit=\"batch\",\n    ):\n        self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n        batch_records = record_models[batch : batch + batch_size]\n        self._api.delete_many(dataset_id=self.__dataset.id, records=batch_records)\n        records_deleted += len(batch_records)\n\n    self._log_message(\n        message=f\"Deleted {len(record_models)} records from dataset {self.__dataset.name}\",\n        level=\"info\",\n    )\n\n    return records\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_dict","title":"to_dict(flatten=False, orient='names')","text":"

Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().

Parameters:

Name Type Description Default flatten bool

The structure of the exported dictionary. - True: The record fields, metadata, suggestions and responses will be flattened. - False: The record fields, metadata, suggestions and responses will be nested.

False orient str

The orientation of the exported dictionary. - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses. - \"index\": The keys of the dictionary will be the id of the records.

'names'

Returns: A dictionary of records.

Source code in src/argilla/records/_dataset_records.py
def to_dict(self, flatten: bool = False, orient: str = \"names\") -> Dict[str, Any]:\n    \"\"\"\n    Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().\n\n    Parameters:\n        flatten (bool): The structure of the exported dictionary.\n            - True: The record fields, metadata, suggestions and responses will be flattened.\n            - False: The record fields, metadata, suggestions and responses will be nested.\n        orient (str): The orientation of the exported dictionary.\n            - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.\n            - \"index\": The keys of the dictionary will be the id of the records.\n    Returns:\n        A dictionary of records.\n\n    \"\"\"\n    return self().to_dict(flatten=flatten, orient=orient)\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_list","title":"to_list(flatten=False)","text":"

Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().

Parameters:

Name Type Description Default flatten bool

The structure of the exported dictionaries in the list. - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, label.suggestion and label.response. Records responses are spread across multiple columns for values and users. - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.

False

Returns: A list of dictionaries of records.

Source code in src/argilla/records/_dataset_records.py
def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:\n    \"\"\"\n    Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().\n\n    Parameters:\n        flatten (bool): The structure of the exported dictionaries in the list.\n            - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, `label.suggestion` and `label.response`. Records responses are spread across multiple columns for values and users.\n            - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.\n    Returns:\n        A list of dictionaries of records.\n    \"\"\"\n    data = self().to_list(flatten=flatten)\n    return data\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_json","title":"to_json(path)","text":"

Export the records to a file on disk.

Parameters:

Name Type Description Default path str

The path to the file to save the records.

required

Returns:

Type Description Path

The path to the file where the records were saved.

Source code in src/argilla/records/_dataset_records.py
def to_json(self, path: Union[Path, str]) -> Path:\n    \"\"\"\n    Export the records to a file on disk.\n\n    Parameters:\n        path (str): The path to the file to save the records.\n\n    Returns:\n        The path to the file where the records were saved.\n\n    \"\"\"\n    return self().to_json(path=path)\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.from_json","title":"from_json(path)","text":"

Creates a DatasetRecords object from a disk path to a JSON file. The JSON file should be defined by DatasetRecords.to_json.

Parameters:

Name Type Description Default path str

The path to the file containing the records.

required

Returns:

Name Type Description DatasetRecords List[Record]

The DatasetRecords object created from the disk path.

Source code in src/argilla/records/_dataset_records.py
def from_json(self, path: Union[Path, str]) -> List[Record]:\n    \"\"\"Creates a DatasetRecords object from a disk path to a JSON file.\n        The JSON file should be defined by `DatasetRecords.to_json`.\n\n    Args:\n        path (str): The path to the file containing the records.\n\n    Returns:\n        DatasetRecords: The DatasetRecords object created from the disk path.\n\n    \"\"\"\n    records = JsonIO._records_from_json(path=path)\n    return self.log(records=records)\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_datasets","title":"to_datasets()","text":"

Export the records to a HFDataset.

Returns:

Type Description HFDataset

The dataset containing the records.

Source code in src/argilla/records/_dataset_records.py
def to_datasets(self) -> HFDataset:\n    \"\"\"\n    Export the records to a HFDataset.\n\n    Returns:\n        The dataset containing the records.\n\n    \"\"\"\n\n    return self().to_datasets()\n
"},{"location":"reference/argilla/datasets/datasets/","title":"rg.Dataset","text":"

Dataset is a class that represents a collection of records. It is used to store and manage records in Argilla.

"},{"location":"reference/argilla/datasets/datasets/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/datasets/datasets/#creating-a-dataset","title":"Creating a Dataset","text":"

To create a new dataset you need to define its name and settings. Optional parameters are workspace and client, if you want to create the dataset in a specific workspace or on a specific Argilla instance.

dataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=rg.Settings(\n        fields=[\n            rg.TextField(name=\"text\"),\n        ],\n        questions=[\n            rg.TextQuestion(name=\"response\"),\n        ],\n    ),\n)\ndataset.create()\n

For a detail guide of the dataset creation and publication process, see the Dataset how to guide.

"},{"location":"reference/argilla/datasets/datasets/#retrieving-an-existing-dataset","title":"Retrieving an existing Dataset","text":"

To retrieve an existing dataset, use client.datasets(\"my_dataset\") instead.

dataset = client.datasets(\"my_dataset\")\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset","title":"Dataset","text":"

Bases: Resource, HubImportExportMixin, DiskImportExportMixin

Class for interacting with Argilla Datasets

Attributes:

Name Type Description name str

Name of the dataset.

records DatasetRecords

The records object for the dataset. Used to interact with the records of the dataset by iterating, searching, etc.

settings Settings

The settings object of the dataset. Used to configure the dataset with fields, questions, guidelines, etc.

fields list

The fields of the dataset, for example the rg.TextField of the dataset. Defined in the settings.

questions list

The questions of the dataset defined in the settings. For example, the rg.TextQuestion that you want labelers to answer.

guidelines str

The guidelines of the dataset defined in the settings. Used to provide instructions to labelers.

allow_extra_metadata bool

True if extra metadata is allowed, False otherwise.

Source code in src/argilla/datasets/_resource.py
class Dataset(Resource, HubImportExportMixin, DiskImportExportMixin):\n    \"\"\"Class for interacting with Argilla Datasets\n\n    Attributes:\n        name: Name of the dataset.\n        records (DatasetRecords): The records object for the dataset. Used to interact with the records of the dataset by iterating, searching, etc.\n        settings (Settings): The settings object of the dataset. Used to configure the dataset with fields, questions, guidelines, etc.\n        fields (list): The fields of the dataset, for example the `rg.TextField` of the dataset. Defined in the settings.\n        questions (list): The questions of the dataset defined in the settings. For example, the `rg.TextQuestion` that you want labelers to answer.\n        guidelines (str): The guidelines of the dataset defined in the settings. Used to provide instructions to labelers.\n        allow_extra_metadata (bool): True if extra metadata is allowed, False otherwise.\n    \"\"\"\n\n    name: str\n    id: Optional[UUID]\n\n    _api: \"DatasetsAPI\"\n    _model: \"DatasetModel\"\n\n    def __init__(\n        self,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str, UUID]] = None,\n        settings: Optional[Settings] = None,\n        client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Initializes a new Argilla Dataset object with the given parameters.\n\n        Parameters:\n            name (str): Name of the dataset. Replaced by random UUID if not assigned.\n            workspace (UUID): Workspace of the dataset. Default is the first workspace found in the server.\n            settings (Settings): Settings class to be used to configure the dataset.\n            client (Argilla): Instance of Argilla to connect with the server. Default is the default client.\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.datasets)\n        if name is None:\n            name = f\"dataset_{uuid4()}\"\n            self._log_message(f\"Settings dataset name to unique UUID: {name}\")\n\n        self._workspace = workspace\n        self._model = DatasetModel(name=name)\n        self._settings = settings._copy() if settings else Settings(_dataset=self)\n        self._settings.dataset = self\n        self.__records = DatasetRecords(client=self._client, dataset=self, mapping=self._settings.mapping)\n\n    #####################\n    #  Properties       #\n    #####################\n\n    @property\n    def name(self) -> str:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def records(self) -> \"DatasetRecords\":\n        return self.__records\n\n    @property\n    def settings(self) -> Settings:\n        return self._settings\n\n    @settings.setter\n    def settings(self, value: Settings) -> None:\n        settings_copy = value._copy()\n        settings_copy.dataset = self\n        self._settings = settings_copy\n\n    @property\n    def fields(self) -> list:\n        return self.settings.fields\n\n    @property\n    def questions(self) -> list:\n        return self.settings.questions\n\n    @property\n    def guidelines(self) -> str:\n        return self.settings.guidelines\n\n    @guidelines.setter\n    def guidelines(self, value: str) -> None:\n        self.settings.guidelines = value\n\n    @property\n    def allow_extra_metadata(self) -> bool:\n        return self.settings.allow_extra_metadata\n\n    @allow_extra_metadata.setter\n    def allow_extra_metadata(self, value: bool) -> None:\n        self.settings.allow_extra_metadata = value\n\n    @property\n    def schema(self) -> dict:\n        return self.settings.schema\n\n    @property\n    def workspace(self) -> Workspace:\n        self._workspace = self._resolve_workspace()\n        return self._workspace\n\n    @property\n    def distribution(self) -> TaskDistribution:\n        return self.settings.distribution\n\n    @distribution.setter\n    def distribution(self, value: TaskDistribution) -> None:\n        self.settings.distribution = value\n\n    #####################\n    #  Core methods     #\n    #####################\n\n    def get(self) -> \"Dataset\":\n        super().get()\n        self.settings.get()\n        return self\n\n    def create(self) -> \"Dataset\":\n        \"\"\"Creates the dataset on the server with the `Settings` configuration.\n\n        Returns:\n            Dataset: The created dataset object.\n        \"\"\"\n        try:\n            super().create()\n        except ForbiddenError as e:\n            settings_url = f\"{self._client.api_url}/user-settings\"\n            user_role = self._client.me.role.value\n            user_name = self._client.me.username\n            workspace_name = self.workspace.name\n            message = f\"\"\"User '{user_name}' is not authorized to create a dataset in workspace '{workspace_name}'\n            with role '{user_role}'. Go to {settings_url} to view your role.\"\"\"\n            raise ForbiddenError(message) from e\n        try:\n            return self._publish()\n        except Exception as e:\n            self._log_message(message=f\"Error creating dataset: {e}\", level=\"error\")\n            self._rollback_dataset_creation()\n            raise SettingsError from e\n\n    def update(self) -> \"Dataset\":\n        \"\"\"Updates the dataset on the server with the current settings.\n\n        Returns:\n            Dataset: The updated dataset object.\n        \"\"\"\n        self.settings.update()\n        return self\n\n    def progress(self, with_users_distribution: bool = False) -> dict:\n        \"\"\"Returns the team's progress on the dataset.\n\n        Parameters:\n            with_users_distribution (bool): If True, the progress of the dataset is returned\n                with users distribution. This includes the number of responses made by each user.\n\n        Returns:\n            dict: The team's progress on the dataset.\n\n        An example of a response when `with_users_distribution` is `True`:\n        ```json\n        {\n            \"total\": 100,\n            \"completed\": 50,\n            \"pending\": 50,\n            \"users\": {\n                \"user1\": {\n                   \"completed\": { \"submitted\": 10, \"draft\": 5, \"discarded\": 5},\n                   \"pending\": { \"submitted\": 5, \"draft\": 10, \"discarded\": 10},\n                },\n                \"user2\": {\n                   \"completed\": { \"submitted\": 20, \"draft\": 10, \"discarded\": 5},\n                   \"pending\": { \"submitted\": 2, \"draft\": 25, \"discarded\": 0},\n                },\n                ...\n        }\n        ```\n\n        \"\"\"\n\n        progress = self._api.get_progress(dataset_id=self._model.id).model_dump()\n\n        if with_users_distribution:\n            users_progress = self._api.list_users_progress(dataset_id=self._model.id)\n            users_distribution = {\n                user.username: {\n                    \"completed\": user.completed.model_dump(),\n                    \"pending\": user.pending.model_dump(),\n                }\n                for user in users_progress\n            }\n\n            progress.update({\"users\": users_distribution})\n\n        return progress\n\n    @classmethod\n    def from_model(cls, model: DatasetModel, client: \"Argilla\") -> \"Dataset\":\n        instance = cls(client=client, workspace=model.workspace_id, name=model.name)\n        instance._model = model\n\n        return instance\n\n    #####################\n    #  Utility methods  #\n    #####################\n\n    def api_model(self) -> DatasetModel:\n        self._model.workspace_id = self.workspace.id\n        return self._model\n\n    def _publish(self) -> \"Dataset\":\n        self._settings.create()\n        self._api.publish(dataset_id=self._model.id)\n\n        return self.get()\n\n    def _resolve_workspace(self) -> Workspace:\n        workspace = self._workspace\n\n        if workspace is None:\n            workspace = self._client.workspaces.default\n            warnings.warn(f\"Workspace not provided. Using default workspace: {workspace.name} id: {workspace.id}\")\n        elif isinstance(workspace, str):\n            workspace = self._client.workspaces(workspace)\n            if workspace is None:\n                available_workspace_names = [ws.name for ws in self._client.workspaces]\n                raise NotFoundError(\n                    f\"Workspace with name {workspace} not found. Available workspaces: {available_workspace_names}\"\n                )\n        elif isinstance(workspace, UUID):\n            ws_model = self._client.api.workspaces.get(workspace)\n            workspace = Workspace.from_model(ws_model, client=self._client)\n        elif not isinstance(workspace, Workspace):\n            raise ValueError(f\"Wrong workspace value found {workspace}\")\n\n        return workspace\n\n    def _rollback_dataset_creation(self):\n        if not self._is_published():\n            self.delete()\n\n    def _is_published(self) -> bool:\n        return self._model.status == \"ready\"\n\n    @classmethod\n    def _sanitize_name(cls, name: str):\n        name = name.replace(\" \", \"_\")\n\n        for character in [\"/\", \"\\\\\", \".\", \",\", \";\", \":\", \"-\", \"+\", \"=\"]:\n            name = name.replace(character, \"-\")\n        return name\n\n    def _with_client(self, client: Argilla) -> \"Self\":\n        return super()._with_client(client=client)\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.__init__","title":"__init__(name=None, workspace=None, settings=None, client=None)","text":"

Initializes a new Argilla Dataset object with the given parameters.

Parameters:

Name Type Description Default name str

Name of the dataset. Replaced by random UUID if not assigned.

None workspace UUID

Workspace of the dataset. Default is the first workspace found in the server.

None settings Settings

Settings class to be used to configure the dataset.

None client Argilla

Instance of Argilla to connect with the server. Default is the default client.

None Source code in src/argilla/datasets/_resource.py
def __init__(\n    self,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str, UUID]] = None,\n    settings: Optional[Settings] = None,\n    client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Initializes a new Argilla Dataset object with the given parameters.\n\n    Parameters:\n        name (str): Name of the dataset. Replaced by random UUID if not assigned.\n        workspace (UUID): Workspace of the dataset. Default is the first workspace found in the server.\n        settings (Settings): Settings class to be used to configure the dataset.\n        client (Argilla): Instance of Argilla to connect with the server. Default is the default client.\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.datasets)\n    if name is None:\n        name = f\"dataset_{uuid4()}\"\n        self._log_message(f\"Settings dataset name to unique UUID: {name}\")\n\n    self._workspace = workspace\n    self._model = DatasetModel(name=name)\n    self._settings = settings._copy() if settings else Settings(_dataset=self)\n    self._settings.dataset = self\n    self.__records = DatasetRecords(client=self._client, dataset=self, mapping=self._settings.mapping)\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.create","title":"create()","text":"

Creates the dataset on the server with the Settings configuration.

Returns:

Name Type Description Dataset Dataset

The created dataset object.

Source code in src/argilla/datasets/_resource.py
def create(self) -> \"Dataset\":\n    \"\"\"Creates the dataset on the server with the `Settings` configuration.\n\n    Returns:\n        Dataset: The created dataset object.\n    \"\"\"\n    try:\n        super().create()\n    except ForbiddenError as e:\n        settings_url = f\"{self._client.api_url}/user-settings\"\n        user_role = self._client.me.role.value\n        user_name = self._client.me.username\n        workspace_name = self.workspace.name\n        message = f\"\"\"User '{user_name}' is not authorized to create a dataset in workspace '{workspace_name}'\n        with role '{user_role}'. Go to {settings_url} to view your role.\"\"\"\n        raise ForbiddenError(message) from e\n    try:\n        return self._publish()\n    except Exception as e:\n        self._log_message(message=f\"Error creating dataset: {e}\", level=\"error\")\n        self._rollback_dataset_creation()\n        raise SettingsError from e\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.update","title":"update()","text":"

Updates the dataset on the server with the current settings.

Returns:

Name Type Description Dataset Dataset

The updated dataset object.

Source code in src/argilla/datasets/_resource.py
def update(self) -> \"Dataset\":\n    \"\"\"Updates the dataset on the server with the current settings.\n\n    Returns:\n        Dataset: The updated dataset object.\n    \"\"\"\n    self.settings.update()\n    return self\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.progress","title":"progress(with_users_distribution=False)","text":"

Returns the team's progress on the dataset.

Parameters:

Name Type Description Default with_users_distribution bool

If True, the progress of the dataset is returned with users distribution. This includes the number of responses made by each user.

False

Returns:

Name Type Description dict dict

The team's progress on the dataset.

An example of a response when with_users_distribution is True:

{\n    \"total\": 100,\n    \"completed\": 50,\n    \"pending\": 50,\n    \"users\": {\n        \"user1\": {\n           \"completed\": { \"submitted\": 10, \"draft\": 5, \"discarded\": 5},\n           \"pending\": { \"submitted\": 5, \"draft\": 10, \"discarded\": 10},\n        },\n        \"user2\": {\n           \"completed\": { \"submitted\": 20, \"draft\": 10, \"discarded\": 5},\n           \"pending\": { \"submitted\": 2, \"draft\": 25, \"discarded\": 0},\n        },\n        ...\n}\n

Source code in src/argilla/datasets/_resource.py
def progress(self, with_users_distribution: bool = False) -> dict:\n    \"\"\"Returns the team's progress on the dataset.\n\n    Parameters:\n        with_users_distribution (bool): If True, the progress of the dataset is returned\n            with users distribution. This includes the number of responses made by each user.\n\n    Returns:\n        dict: The team's progress on the dataset.\n\n    An example of a response when `with_users_distribution` is `True`:\n    ```json\n    {\n        \"total\": 100,\n        \"completed\": 50,\n        \"pending\": 50,\n        \"users\": {\n            \"user1\": {\n               \"completed\": { \"submitted\": 10, \"draft\": 5, \"discarded\": 5},\n               \"pending\": { \"submitted\": 5, \"draft\": 10, \"discarded\": 10},\n            },\n            \"user2\": {\n               \"completed\": { \"submitted\": 20, \"draft\": 10, \"discarded\": 5},\n               \"pending\": { \"submitted\": 2, \"draft\": 25, \"discarded\": 0},\n            },\n            ...\n    }\n    ```\n\n    \"\"\"\n\n    progress = self._api.get_progress(dataset_id=self._model.id).model_dump()\n\n    if with_users_distribution:\n        users_progress = self._api.list_users_progress(dataset_id=self._model.id)\n        users_distribution = {\n            user.username: {\n                \"completed\": user.completed.model_dump(),\n                \"pending\": user.pending.model_dump(),\n            }\n            for user in users_progress\n        }\n\n        progress.update({\"users\": users_distribution})\n\n    return progress\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._disk.DiskImportExportMixin","title":"DiskImportExportMixin","text":"

Bases: ABC

A mixin for exporting and importing datasets to and from disk.

Source code in src/argilla/datasets/_io/_disk.py
class DiskImportExportMixin(ABC):\n    \"\"\"A mixin for exporting and importing datasets to and from disk.\"\"\"\n\n    _model: DatasetModel\n    _DEFAULT_RECORDS_PATH = \"records.json\"\n    _DEFAULT_CONFIG_REPO_DIR = \".argilla\"\n    _DEFAULT_SETTINGS_PATH = f\"{_DEFAULT_CONFIG_REPO_DIR}/settings.json\"\n    _DEFAULT_DATASET_PATH = f\"{_DEFAULT_CONFIG_REPO_DIR}/dataset.json\"\n    _DEFAULT_CONFIGURATION_FILES = [_DEFAULT_SETTINGS_PATH, _DEFAULT_DATASET_PATH]\n\n    def to_disk(self: \"Dataset\", path: str, *, with_records: bool = True) -> str:\n        \"\"\"Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.\n\n        Parameters:\n            path (str): The path to export the dataset to. Must be an empty directory.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        \"\"\"\n        dataset_path, settings_path, records_path = self._define_child_paths(path=path)\n        logging.info(f\"Loading dataset from {dataset_path}\")\n        logging.info(f\"Loading settings from {settings_path}\")\n        logging.info(f\"Loading records from {records_path}\")\n        # Export the dataset model, settings and records\n        self._persist_dataset_model(path=dataset_path)\n        self.settings.to_json(path=settings_path)\n        if with_records:\n            self.records.to_json(path=records_path)\n\n        return path\n\n    @classmethod\n    def from_disk(\n        cls: Type[\"Dataset\"],\n        path: str,\n        *,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str]] = None,\n        client: Optional[\"Argilla\"] = None,\n        with_records: bool = True,\n    ) -> \"Dataset\":\n        \"\"\"Imports a dataset from disk as a directory containing the dataset model, settings and records.\n        The directory should be defined using the `to_disk` method.\n\n        Parameters:\n            path (str): The path to the directory containing the dataset model, settings and records.\n            name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n            workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n            client (Argilla, optional): The client to use for the import. Defaults to None and the default client is used.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        \"\"\"\n\n        client = client or Argilla._get_default()\n\n        try:\n            dataset_path, settings_path, records_path = cls._define_child_paths(path=path)\n            logging.info(f\"Loading dataset from {dataset_path}\")\n            logging.info(f\"Loading settings from {settings_path}\")\n            logging.info(f\"Loading records from {records_path}\")\n\n            dataset_model = cls._load_dataset_model(path=dataset_path)\n        except (NotADirectoryError, FileNotFoundError) as e:\n            raise ImportDatasetError(f\"Error loading dataset from disk. {e}\") from e\n\n        # Get the relevant workspace_id of the incoming dataset\n        if isinstance(workspace, str):\n            workspace = client.workspaces(workspace)\n            if not workspace:\n                raise ArgillaError(f\"Workspace {workspace} not found on the server.\")\n        else:\n            warnings.warn(\"Workspace not provided. Using default workspace.\")\n            workspace = client.workspaces.default\n        dataset_model.workspace_id = workspace.id\n\n        if name and (name != dataset_model.name):\n            logging.info(f\"Changing dataset name from {dataset_model.name} to {name}\")\n            dataset_model.name = name\n\n        if client.api.datasets.name_exists(name=dataset_model.name, workspace_id=workspace.id):\n            warnings.warn(\n                f\"Loaded dataset name {dataset_model.name} already exists in the workspace {workspace.name} so using it. To create a new dataset, provide a unique name to the `name` parameter.\"\n            )\n            dataset_model = client.api.datasets.get_by_name_and_workspace_id(\n                name=dataset_model.name, workspace_id=workspace.id\n            )\n            dataset = cls.from_model(model=dataset_model, client=client)\n        else:\n            # Create a new dataset and load the settings and records\n            if not os.path.exists(settings_path):\n                raise ImportDatasetError(f\"Settings file not found at {settings_path}\")\n\n            dataset = cls.from_model(model=dataset_model, client=client)\n            dataset.settings = Settings.from_json(path=settings_path)\n            dataset.create()\n\n        if os.path.exists(records_path) and with_records:\n            try:\n                dataset.records.from_json(path=records_path)\n            except RecordsIngestionError as e:\n                raise RecordsIngestionError(\n                    message=\"Error importing dataset records from disk. \"\n                    \"Records and datasets settings are not compatible.\"\n                ) from e\n\n        return dataset\n\n    ############################\n    # Utility methods\n    ############################\n\n    def _persist_dataset_model(self, path: Path):\n        \"\"\"Persists the dataset model to disk.\"\"\"\n        if path.exists():\n            raise FileExistsError(f\"Dataset already exists at {path}\")\n        with open(file=path, mode=\"w\") as f:\n            json.dump(self.api_model().model_dump(), f)\n\n    @classmethod\n    def _load_dataset_model(cls, path: Path):\n        \"\"\"Loads the dataset model from disk.\"\"\"\n        if not os.path.exists(path):\n            raise FileNotFoundError(f\"Dataset model not found at {path}\")\n        with open(file=path, mode=\"r\") as f:\n            dataset_model = json.load(f)\n            dataset_model = DatasetModel(**dataset_model)\n        return dataset_model\n\n    @classmethod\n    def _define_child_paths(cls, path: Union[Path, str]) -> Tuple[Path, Path, Path]:\n        path = Path(path)\n        if not path.is_dir():\n            raise NotADirectoryError(f\"Path {path} is not a directory\")\n        main_path = path / cls._DEFAULT_CONFIG_REPO_DIR\n        main_path.mkdir(exist_ok=True)\n        dataset_path = path / cls._DEFAULT_DATASET_PATH\n        settings_path = path / cls._DEFAULT_SETTINGS_PATH\n        records_path = path / cls._DEFAULT_RECORDS_PATH\n        return dataset_path, settings_path, records_path\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._disk.DiskImportExportMixin.to_disk","title":"to_disk(path, *, with_records=True)","text":"

Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.

Parameters:

Name Type Description Default path str

The path to export the dataset to. Must be an empty directory.

required with_records bool

whether to load the records from the Hugging Face dataset. Defaults to True.

True Source code in src/argilla/datasets/_io/_disk.py
def to_disk(self: \"Dataset\", path: str, *, with_records: bool = True) -> str:\n    \"\"\"Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.\n\n    Parameters:\n        path (str): The path to export the dataset to. Must be an empty directory.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n    \"\"\"\n    dataset_path, settings_path, records_path = self._define_child_paths(path=path)\n    logging.info(f\"Loading dataset from {dataset_path}\")\n    logging.info(f\"Loading settings from {settings_path}\")\n    logging.info(f\"Loading records from {records_path}\")\n    # Export the dataset model, settings and records\n    self._persist_dataset_model(path=dataset_path)\n    self.settings.to_json(path=settings_path)\n    if with_records:\n        self.records.to_json(path=records_path)\n\n    return path\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._disk.DiskImportExportMixin.from_disk","title":"from_disk(path, *, name=None, workspace=None, client=None, with_records=True) classmethod","text":"

Imports a dataset from disk as a directory containing the dataset model, settings and records. The directory should be defined using the to_disk method.

Parameters:

Name Type Description Default path str

The path to the directory containing the dataset model, settings and records.

required name str

The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.

None workspace Union[Workspace, str]

The workspace to import the dataset to. Defaults to None and default workspace is used.

None client Argilla

The client to use for the import. Defaults to None and the default client is used.

None with_records bool

whether to load the records from the Hugging Face dataset. Defaults to True.

True Source code in src/argilla/datasets/_io/_disk.py
@classmethod\ndef from_disk(\n    cls: Type[\"Dataset\"],\n    path: str,\n    *,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str]] = None,\n    client: Optional[\"Argilla\"] = None,\n    with_records: bool = True,\n) -> \"Dataset\":\n    \"\"\"Imports a dataset from disk as a directory containing the dataset model, settings and records.\n    The directory should be defined using the `to_disk` method.\n\n    Parameters:\n        path (str): The path to the directory containing the dataset model, settings and records.\n        name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n        workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n        client (Argilla, optional): The client to use for the import. Defaults to None and the default client is used.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n    \"\"\"\n\n    client = client or Argilla._get_default()\n\n    try:\n        dataset_path, settings_path, records_path = cls._define_child_paths(path=path)\n        logging.info(f\"Loading dataset from {dataset_path}\")\n        logging.info(f\"Loading settings from {settings_path}\")\n        logging.info(f\"Loading records from {records_path}\")\n\n        dataset_model = cls._load_dataset_model(path=dataset_path)\n    except (NotADirectoryError, FileNotFoundError) as e:\n        raise ImportDatasetError(f\"Error loading dataset from disk. {e}\") from e\n\n    # Get the relevant workspace_id of the incoming dataset\n    if isinstance(workspace, str):\n        workspace = client.workspaces(workspace)\n        if not workspace:\n            raise ArgillaError(f\"Workspace {workspace} not found on the server.\")\n    else:\n        warnings.warn(\"Workspace not provided. Using default workspace.\")\n        workspace = client.workspaces.default\n    dataset_model.workspace_id = workspace.id\n\n    if name and (name != dataset_model.name):\n        logging.info(f\"Changing dataset name from {dataset_model.name} to {name}\")\n        dataset_model.name = name\n\n    if client.api.datasets.name_exists(name=dataset_model.name, workspace_id=workspace.id):\n        warnings.warn(\n            f\"Loaded dataset name {dataset_model.name} already exists in the workspace {workspace.name} so using it. To create a new dataset, provide a unique name to the `name` parameter.\"\n        )\n        dataset_model = client.api.datasets.get_by_name_and_workspace_id(\n            name=dataset_model.name, workspace_id=workspace.id\n        )\n        dataset = cls.from_model(model=dataset_model, client=client)\n    else:\n        # Create a new dataset and load the settings and records\n        if not os.path.exists(settings_path):\n            raise ImportDatasetError(f\"Settings file not found at {settings_path}\")\n\n        dataset = cls.from_model(model=dataset_model, client=client)\n        dataset.settings = Settings.from_json(path=settings_path)\n        dataset.create()\n\n    if os.path.exists(records_path) and with_records:\n        try:\n            dataset.records.from_json(path=records_path)\n        except RecordsIngestionError as e:\n            raise RecordsIngestionError(\n                message=\"Error importing dataset records from disk. \"\n                \"Records and datasets settings are not compatible.\"\n            ) from e\n\n    return dataset\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._hub.HubImportExportMixin","title":"HubImportExportMixin","text":"

Bases: DiskImportExportMixin

Source code in src/argilla/datasets/_io/_hub.py
class HubImportExportMixin(DiskImportExportMixin):\n    def to_hub(\n        self: \"Dataset\",\n        repo_id: str,\n        *,\n        with_records: bool = True,\n        generate_card: Optional[bool] = True,\n        **kwargs: Any,\n    ) -> None:\n        \"\"\"Pushes the `Dataset` to the Hugging Face Hub. If the dataset has been previously pushed to the\n        Hugging Face Hub, it will be updated instead of creating a new dataset repo.\n\n        Parameters:\n            repo_id: the ID of the Hugging Face Hub repo to push the `Dataset` to.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n            generate_card: whether to generate a dataset card for the `Dataset` in the Hugging Face Hub. Defaults\n                to `True`.\n            **kwargs: the kwargs to pass to `datasets.Dataset.push_to_hub`.\n\n        Returns:\n            None\n        \"\"\"\n\n        from huggingface_hub import DatasetCardData, HfApi\n\n        from argilla.datasets._io.card import (\n            ArgillaDatasetCard,\n            size_categories_parser,\n        )\n\n        hf_api = HfApi(token=kwargs.get(\"token\"))\n\n        hfds = False\n        if with_records:\n            hfds = self.records(with_vectors=True, with_responses=True, with_suggestions=True).to_datasets()\n            hfds.push_to_hub(repo_id, **kwargs)\n        else:\n            hf_api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=kwargs.get(\"exist_ok\") or True)\n\n        with TemporaryDirectory() as tmpdirname:\n            config_dir = os.path.join(tmpdirname)\n\n            self.to_disk(path=config_dir, with_records=False)\n\n            if generate_card:\n                sample_argilla_record = next(iter(self.records(with_suggestions=True, with_responses=True)))\n                sample_huggingface_record = self._get_sample_hf_record(hfds) if with_records else None\n                dataset_size = len(hfds) if with_records else 0\n                card = ArgillaDatasetCard.from_template(\n                    card_data=DatasetCardData(\n                        size_categories=size_categories_parser(dataset_size),\n                        tags=[\"rlfh\", \"argilla\", \"human-feedback\"],\n                    ),\n                    repo_id=repo_id,\n                    argilla_fields=self.settings.fields,\n                    argilla_questions=self.settings.questions,\n                    argilla_guidelines=self.settings.guidelines or None,\n                    argilla_vectors_settings=self.settings.vectors or None,\n                    argilla_metadata_properties=self.settings.metadata,\n                    argilla_record=sample_argilla_record.to_dict(),\n                    huggingface_record=sample_huggingface_record,\n                )\n                card.save(filepath=os.path.join(tmpdirname, \"README.md\"))\n\n            hf_api.upload_folder(\n                folder_path=tmpdirname,\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n            )\n\n    @classmethod\n    def from_hub(\n        cls: Type[\"Dataset\"],\n        repo_id: str,\n        *,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str]] = None,\n        client: Optional[\"Argilla\"] = None,\n        with_records: bool = True,\n        settings: Optional[\"Settings\"] = None,\n        split: Optional[str] = None,\n        subset: Optional[str] = None,\n        **kwargs: Any,\n    ) -> \"Dataset\":\n        \"\"\"Loads a `Dataset` from the Hugging Face Hub.\n\n        Parameters:\n            repo_id: the ID of the Hugging Face Hub repo to load the `Dataset` from.\n            name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n            workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n            client: the client to use to load the `Dataset`. If not provided, the default client will be used.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n            settings: the settings to use to load the `Dataset`. If not provided, the settings will be loaded from the Hugging Face dataset.\n            split: the split to load from the Hugging Face dataset. If not provided, the first split will be loaded.\n            **kwargs: the kwargs to pass to `datasets.Dataset.load_from_hub`.\n\n        Returns:\n            A `Dataset` loaded from the Hugging Face Hub.\n        \"\"\"\n        from datasets import load_dataset\n        from huggingface_hub import snapshot_download\n        from argilla import Dataset\n\n        if name is None:\n            name = Dataset._sanitize_name(repo_id)\n\n        if settings is not None:\n            dataset = cls(name=name, settings=settings)\n            dataset.create()\n        else:\n            try:\n                # download configuration files from the hub\n                folder_path = snapshot_download(\n                    repo_id=repo_id,\n                    repo_type=\"dataset\",\n                    allow_patterns=cls._DEFAULT_CONFIGURATION_FILES,\n                    token=kwargs.get(\"token\"),\n                )\n\n                dataset = cls.from_disk(\n                    path=folder_path, workspace=workspace, name=name, client=client, with_records=with_records\n                )\n            except ImportDatasetError:\n                from argilla import Settings\n\n                settings = Settings.from_hub(repo_id=repo_id, subset=subset)\n                dataset = cls.from_hub(\n                    repo_id=repo_id,\n                    name=name,\n                    workspace=workspace,\n                    client=client,\n                    with_records=with_records,\n                    settings=settings,\n                    split=split,\n                    subset=subset,\n                    **kwargs,\n                )\n                return dataset\n\n        if with_records:\n            try:\n                hf_dataset = load_dataset(\n                    path=repo_id,\n                    split=split,\n                    name=subset,\n                    **kwargs,\n                )  # type: ignore\n                hf_dataset = cls._get_dataset_split(hf_dataset=hf_dataset, split=split, **kwargs)\n                cls._log_dataset_records(hf_dataset=hf_dataset, dataset=dataset)\n            except EmptyDatasetError:\n                warnings.warn(\n                    message=\"Trying to load a dataset `with_records=True` but dataset does not contain any records.\",\n                    category=UserWarning,\n                )\n\n        return dataset\n\n    @staticmethod\n    def _log_dataset_records(hf_dataset: \"HFDataset\", dataset: \"Dataset\"):\n        \"\"\"This method extracts the responses from a Hugging Face dataset and returns a list of `Record` objects\"\"\"\n        # THIS IS REQUIRED SINCE THE NAME RESTRICTION IN ARGILLA. HUGGING FACE DATASET COLUMNS ARE CASE SENSITIVE\n        # Also, there is a logic with column names including \".responses\" and \".suggestion\" in the name.\n        columns_map = {}\n        for column in hf_dataset.column_names:\n            if \".responses\" in column or \".suggestion\" in column:\n                columns_map[column] = column.lower()\n            else:\n                columns_map[column] = dataset.settings._sanitize_settings_name(column)\n\n        hf_dataset = hf_dataset.rename_columns(columns_map)\n\n        # Identify columns that columns that contain responses\n        responses_columns = [col for col in hf_dataset.column_names if \".responses\" in col]\n        response_questions = defaultdict(dict)\n        user_ids = {}\n        for col in responses_columns:\n            question_name = col.split(\".\")[0]\n            if col.endswith(\"users\"):\n                response_questions[question_name][\"users\"] = hf_dataset[col]\n                user_ids.update({UUID(user_id): UUID(user_id) for user_id in set(sum(hf_dataset[col], []))})\n            elif col.endswith(\"responses\"):\n                response_questions[question_name][\"responses\"] = hf_dataset[col]\n            elif col.endswith(\"status\"):\n                response_questions[question_name][\"status\"] = hf_dataset[col]\n\n        # Check if all user ids are known to this Argilla client\n        known_users_ids = [user.id for user in dataset._client.users]\n        unknown_user_ids = set(user_ids.keys()) - set(known_users_ids)\n        my_user = dataset._client.me\n        if len(unknown_user_ids) > 1:\n            warnings.warn(\n                message=f\"\"\"Found unknown user ids in dataset repo: {unknown_user_ids}.\n                    Assigning first response for each record to current user ({my_user.username}) and discarding the rest.\"\"\"\n            )\n        for unknown_user_id in unknown_user_ids:\n            user_ids[unknown_user_id] = my_user.id\n\n        # Create a mapper to map the Hugging Face dataset to a Record object\n        mapping = {col: col for col in hf_dataset.column_names if \".suggestion\" in col}\n        mapper = IngestedRecordMapper(dataset=dataset, mapping=mapping, user_id=my_user.id)\n\n        # Extract responses and create Record objects\n        records = []\n        hf_dataset = HFDatasetsIO.to_argilla(hf_dataset=hf_dataset)\n        for idx, row in enumerate(hf_dataset):\n            record = mapper(row)\n            for question_name, values in response_questions.items():\n                response_values = values[\"responses\"][idx]\n                response_users = values[\"users\"][idx]\n                response_status = values[\"status\"][idx]\n                for value, user_id, status in zip(response_values, response_users, response_status):\n                    user_id = user_ids[UUID(user_id)]\n                    if user_id in response_users:\n                        continue\n                    response_users[user_id] = True\n                    response = Response(\n                        user_id=user_id,\n                        question_name=question_name,\n                        value=value,\n                        status=status,\n                    )\n                    record.responses.add(response)\n            records.append(record)\n\n        try:\n            dataset.records.log(records=records)\n        except (RecordsIngestionError, UnprocessableEntityError) as e:\n            raise SettingsError(\n                message=f\"Failed to load records from Hugging Face dataset. Defined settings do not match dataset schema. Hugging face dataset features: {hf_dataset.features}. Argilla dataset settings : {dataset.settings}\"\n            ) from e\n\n    @staticmethod\n    def _get_dataset_split(hf_dataset: \"HFDataset\", split: Optional[str] = None, **kwargs: Dict) -> \"HFDataset\":\n        \"\"\"Get a single dataset from a Hugging Face dataset.\n\n        Parameters:\n            hf_dataset (HFDataset): The Hugging Face dataset to get a single dataset from.\n\n        Returns:\n            HFDataset: The single dataset.\n        \"\"\"\n\n        if isinstance(hf_dataset, DatasetDict) and split is None:\n            split = next(iter(hf_dataset.keys()))\n            if len(hf_dataset.keys()) > 1:\n                warnings.warn(\n                    message=f\"Multiple splits found in Hugging Face dataset. Using the first split: {split}. \"\n                    f\"Available splits are: {', '.join(hf_dataset.keys())}.\"\n                )\n            hf_dataset = hf_dataset[split]\n        return hf_dataset\n\n    @staticmethod\n    def _get_sample_hf_record(hf_dataset: \"HFDataset\") -> Dict:\n        \"\"\"Get a sample record from a Hugging Face dataset.\n\n        Parameters:\n            hf_dataset (HFDataset): The Hugging Face dataset to get a sample record from.\n\n        Returns:\n            Dict: The sample record.\n        \"\"\"\n\n        if hf_dataset:\n            sample_huggingface_record = {}\n            for key, value in hf_dataset[0].items():\n                try:\n                    json.dumps(value)\n                    sample_huggingface_record[key] = value\n                except TypeError:\n                    if isinstance(value, Image.Image):\n                        sample_huggingface_record[key] = pil_to_data_uri(value)\n                    else:\n                        sample_huggingface_record[key] = \"Record value is not serializable\"\n            return sample_huggingface_record\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._hub.HubImportExportMixin.to_hub","title":"to_hub(repo_id, *, with_records=True, generate_card=True, **kwargs)","text":"

Pushes the Dataset to the Hugging Face Hub. If the dataset has been previously pushed to the Hugging Face Hub, it will be updated instead of creating a new dataset repo.

Parameters:

Name Type Description Default repo_id str

the ID of the Hugging Face Hub repo to push the Dataset to.

required with_records bool

whether to load the records from the Hugging Face dataset. Defaults to True.

True generate_card Optional[bool]

whether to generate a dataset card for the Dataset in the Hugging Face Hub. Defaults to True.

True **kwargs Any

the kwargs to pass to datasets.Dataset.push_to_hub.

{}

Returns:

Type Description None

None

Source code in src/argilla/datasets/_io/_hub.py
def to_hub(\n    self: \"Dataset\",\n    repo_id: str,\n    *,\n    with_records: bool = True,\n    generate_card: Optional[bool] = True,\n    **kwargs: Any,\n) -> None:\n    \"\"\"Pushes the `Dataset` to the Hugging Face Hub. If the dataset has been previously pushed to the\n    Hugging Face Hub, it will be updated instead of creating a new dataset repo.\n\n    Parameters:\n        repo_id: the ID of the Hugging Face Hub repo to push the `Dataset` to.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        generate_card: whether to generate a dataset card for the `Dataset` in the Hugging Face Hub. Defaults\n            to `True`.\n        **kwargs: the kwargs to pass to `datasets.Dataset.push_to_hub`.\n\n    Returns:\n        None\n    \"\"\"\n\n    from huggingface_hub import DatasetCardData, HfApi\n\n    from argilla.datasets._io.card import (\n        ArgillaDatasetCard,\n        size_categories_parser,\n    )\n\n    hf_api = HfApi(token=kwargs.get(\"token\"))\n\n    hfds = False\n    if with_records:\n        hfds = self.records(with_vectors=True, with_responses=True, with_suggestions=True).to_datasets()\n        hfds.push_to_hub(repo_id, **kwargs)\n    else:\n        hf_api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=kwargs.get(\"exist_ok\") or True)\n\n    with TemporaryDirectory() as tmpdirname:\n        config_dir = os.path.join(tmpdirname)\n\n        self.to_disk(path=config_dir, with_records=False)\n\n        if generate_card:\n            sample_argilla_record = next(iter(self.records(with_suggestions=True, with_responses=True)))\n            sample_huggingface_record = self._get_sample_hf_record(hfds) if with_records else None\n            dataset_size = len(hfds) if with_records else 0\n            card = ArgillaDatasetCard.from_template(\n                card_data=DatasetCardData(\n                    size_categories=size_categories_parser(dataset_size),\n                    tags=[\"rlfh\", \"argilla\", \"human-feedback\"],\n                ),\n                repo_id=repo_id,\n                argilla_fields=self.settings.fields,\n                argilla_questions=self.settings.questions,\n                argilla_guidelines=self.settings.guidelines or None,\n                argilla_vectors_settings=self.settings.vectors or None,\n                argilla_metadata_properties=self.settings.metadata,\n                argilla_record=sample_argilla_record.to_dict(),\n                huggingface_record=sample_huggingface_record,\n            )\n            card.save(filepath=os.path.join(tmpdirname, \"README.md\"))\n\n        hf_api.upload_folder(\n            folder_path=tmpdirname,\n            repo_id=repo_id,\n            repo_type=\"dataset\",\n        )\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._hub.HubImportExportMixin.from_hub","title":"from_hub(repo_id, *, name=None, workspace=None, client=None, with_records=True, settings=None, split=None, subset=None, **kwargs) classmethod","text":"

Loads a Dataset from the Hugging Face Hub.

Parameters:

Name Type Description Default repo_id str

the ID of the Hugging Face Hub repo to load the Dataset from.

required name str

The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.

None workspace Union[Workspace, str]

The workspace to import the dataset to. Defaults to None and default workspace is used.

None client Optional[Argilla]

the client to use to load the Dataset. If not provided, the default client will be used.

None with_records bool

whether to load the records from the Hugging Face dataset. Defaults to True.

True settings Optional[Settings]

the settings to use to load the Dataset. If not provided, the settings will be loaded from the Hugging Face dataset.

None split Optional[str]

the split to load from the Hugging Face dataset. If not provided, the first split will be loaded.

None **kwargs Any

the kwargs to pass to datasets.Dataset.load_from_hub.

{}

Returns:

Type Description Dataset

A Dataset loaded from the Hugging Face Hub.

Source code in src/argilla/datasets/_io/_hub.py
@classmethod\ndef from_hub(\n    cls: Type[\"Dataset\"],\n    repo_id: str,\n    *,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str]] = None,\n    client: Optional[\"Argilla\"] = None,\n    with_records: bool = True,\n    settings: Optional[\"Settings\"] = None,\n    split: Optional[str] = None,\n    subset: Optional[str] = None,\n    **kwargs: Any,\n) -> \"Dataset\":\n    \"\"\"Loads a `Dataset` from the Hugging Face Hub.\n\n    Parameters:\n        repo_id: the ID of the Hugging Face Hub repo to load the `Dataset` from.\n        name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n        workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n        client: the client to use to load the `Dataset`. If not provided, the default client will be used.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        settings: the settings to use to load the `Dataset`. If not provided, the settings will be loaded from the Hugging Face dataset.\n        split: the split to load from the Hugging Face dataset. If not provided, the first split will be loaded.\n        **kwargs: the kwargs to pass to `datasets.Dataset.load_from_hub`.\n\n    Returns:\n        A `Dataset` loaded from the Hugging Face Hub.\n    \"\"\"\n    from datasets import load_dataset\n    from huggingface_hub import snapshot_download\n    from argilla import Dataset\n\n    if name is None:\n        name = Dataset._sanitize_name(repo_id)\n\n    if settings is not None:\n        dataset = cls(name=name, settings=settings)\n        dataset.create()\n    else:\n        try:\n            # download configuration files from the hub\n            folder_path = snapshot_download(\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n                allow_patterns=cls._DEFAULT_CONFIGURATION_FILES,\n                token=kwargs.get(\"token\"),\n            )\n\n            dataset = cls.from_disk(\n                path=folder_path, workspace=workspace, name=name, client=client, with_records=with_records\n            )\n        except ImportDatasetError:\n            from argilla import Settings\n\n            settings = Settings.from_hub(repo_id=repo_id, subset=subset)\n            dataset = cls.from_hub(\n                repo_id=repo_id,\n                name=name,\n                workspace=workspace,\n                client=client,\n                with_records=with_records,\n                settings=settings,\n                split=split,\n                subset=subset,\n                **kwargs,\n            )\n            return dataset\n\n    if with_records:\n        try:\n            hf_dataset = load_dataset(\n                path=repo_id,\n                split=split,\n                name=subset,\n                **kwargs,\n            )  # type: ignore\n            hf_dataset = cls._get_dataset_split(hf_dataset=hf_dataset, split=split, **kwargs)\n            cls._log_dataset_records(hf_dataset=hf_dataset, dataset=dataset)\n        except EmptyDatasetError:\n            warnings.warn(\n                message=\"Trying to load a dataset `with_records=True` but dataset does not contain any records.\",\n                category=UserWarning,\n            )\n\n    return dataset\n
"},{"location":"reference/argilla/records/metadata/","title":"metadata","text":"

Metadata in argilla is a dictionary that can be attached to a record. It is used to store additional information about the record that is not part of the record's fields or responses. For example, the source of the record, the date it was created, or any other information that is relevant to the record. Metadata can be added to a record directly or as valules within a dictionary.

"},{"location":"reference/argilla/records/metadata/#usage-examples","title":"Usage Examples","text":"

To use metadata within a dataset, you must define a metadata property in the dataset settings. The metadata property is a list of metadata properties that can be attached to a record. The following example demonstrates how to add metadata to a dataset and how to access metadata from a record object:

import argilla as rg\n\ndataset = Dataset(\n    name=\"dataset_with_metadata\",\n    settings=Settings(\n        fields=[TextField(name=\"text\")],\n        questions=[LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])],\n        metadata=[\n            rg.TermsMetadataProperty(name=\"category\", options=[\"A\", \"B\", \"C\"]),\n        ],\n    ),\n)\ndataset.create()\n

Then, you can add records to the dataset with metadata that corresponds to the metadata property defined in the dataset settings:

dataset_with_metadata.records.log(\n    [\n        {\"text\": \"text\", \"label\": \"positive\", \"category\": \"A\"},\n        {\"text\": \"text\", \"label\": \"negative\", \"category\": \"B\"},\n    ]\n)\n
"},{"location":"reference/argilla/records/metadata/#format-per-metadataproperty-type","title":"Format per MetadataProperty type","text":"

Depending on the MetadataProperty type, metadata might need to be formatted in a slightly different way.

For TermsMetadataPropertyFor FloatMetadataPropertyFor IntegerMetadataProperty
rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": \"A\"}\n)\n\n# with multiple terms\n\nrg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": [\"A\", \"B\"]}\n)\n
rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": 2.1}\n)\n
rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": 42}\n)\n
"},{"location":"reference/argilla/records/records/","title":"rg.Record","text":"

The Record object is used to represent a single record in Argilla. It contains fields, suggestions, responses, metadata, and vectors.

"},{"location":"reference/argilla/records/records/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/records/records/#creating-a-record","title":"Creating a Record","text":"

To create records, you can use the Record class and pass it to the Dataset.records.log method. The Record class requires a fields parameter, which is a dictionary of field names and values. The field names must match the field names in the dataset's Settings object to be accepted.

dataset.records.log(\n    records=[\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n        ),\n    ]\n) # (1)\n
  1. The Argilla dataset contains a field named text matching the key here.

To create records with image fields, pass the image to the record object as either a remote url, local path to an image file, or a PIL object. The field names must be defined as an rg.ImageFieldin the dataset's Settings object to be accepted. Images will be stored in the Argilla database and returned as rescaled PIL objects.

dataset.records.log(\n    records=[\n        rg.Record(\n            fields={\"image\": \"https://example.com/image.jpg\"}, # (1)\n        ),\n    ]\n)\n
  1. The image can be referenced as either a remote url, a local file path, or a PIL object.

Note

The image will be stored in the Argilla database and can impact the dataset's storage usage. Images should be less than 5mb in size and datasets should contain less than 10,000 images.

"},{"location":"reference/argilla/records/records/#accessing-record-attributes","title":"Accessing Record Attributes","text":"

The Record object has suggestions, responses, metadata, and vectors attributes that can be accessed directly whilst iterating over records in a dataset.

for record in dataset.records(\n    with_suggestions=True,\n    with_responses=True,\n    with_metadata=True,\n    with_vectors=True\n    ):\n    print(record.suggestions)\n    print(record.responses)\n    print(record.metadata)\n    print(record.vectors)\n

Record properties can also be updated whilst iterating over records in a dataset.

for record in dataset.records(with_metadata=True):\n    record.metadata = {\"department\": \"toys\"}\n

For changes to take effect, the user must call the update method on the Dataset object, or pass the updated records to Dataset.records.log. All core record atttributes can be updated in this way. Check their respective documentation for more information: Suggestions, Responses, Metadata, Vectors.

"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record","title":"Record","text":"

Bases: Resource

The class for interacting with Argilla Records. A Record is a single sample in a dataset. Records receives feedback in the form of responses and suggestions. Records contain fields, metadata, and vectors.

Attributes:

Name Type Description id Union[str, UUID]

The id of the record.

fields RecordFields

The fields of the record.

metadata RecordMetadata

The metadata of the record.

vectors RecordVectors

The vectors of the record.

responses RecordResponses

The responses of the record.

suggestions RecordSuggestions

The suggestions of the record.

dataset Dataset

The dataset to which the record belongs.

_server_id UUID

An id for the record generated by the Argilla server.

Source code in src/argilla/records/_resource.py
class Record(Resource):\n    \"\"\"The class for interacting with Argilla Records. A `Record` is a single sample\n    in a dataset. Records receives feedback in the form of responses and suggestions.\n    Records contain fields, metadata, and vectors.\n\n    Attributes:\n        id (Union[str, UUID]): The id of the record.\n        fields (RecordFields): The fields of the record.\n        metadata (RecordMetadata): The metadata of the record.\n        vectors (RecordVectors): The vectors of the record.\n        responses (RecordResponses): The responses of the record.\n        suggestions (RecordSuggestions): The suggestions of the record.\n        dataset (Dataset): The dataset to which the record belongs.\n        _server_id (UUID): An id for the record generated by the Argilla server.\n    \"\"\"\n\n    _model: RecordModel\n\n    def __init__(\n        self,\n        id: Optional[Union[UUID, str]] = None,\n        fields: Optional[Dict[str, FieldValue]] = None,\n        metadata: Optional[Dict[str, MetadataValue]] = None,\n        vectors: Optional[Dict[str, VectorValue]] = None,\n        responses: Optional[List[Response]] = None,\n        suggestions: Optional[List[Suggestion]] = None,\n        _server_id: Optional[UUID] = None,\n        _dataset: Optional[\"Dataset\"] = None,\n    ):\n        \"\"\"Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id.\n        Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions\n        and passed to Dataset.DatasetRecords.add() as a list of dictionaries.\n\n        Args:\n            id: An id for the record. If not provided, a UUID will be generated.\n            fields: A dictionary of fields for the record.\n            metadata: A dictionary of metadata for the record.\n            vectors: A dictionary of vectors for the record.\n            responses: A list of Response objects for the record.\n            suggestions: A list of Suggestion objects for the record.\n            _server_id: An id for the record. (Read-only and set by the server)\n            _dataset: The dataset object to which the record belongs.\n        \"\"\"\n\n        if fields is None and metadata is None and vectors is None and responses is None and suggestions is None:\n            raise ValueError(\"At least one of fields, metadata, vectors, responses, or suggestions must be provided.\")\n        if fields is None and id is None:\n            raise ValueError(\"If fields are not provided, an id must be provided.\")\n        if fields == {} and id is None:\n            raise ValueError(\"If fields are an empty dictionary, an id must be provided.\")\n\n        self._dataset = _dataset\n        self._model = RecordModel(external_id=id, id=_server_id)\n        self.__fields = RecordFields(fields=fields, record=self)\n        self.__vectors = RecordVectors(vectors=vectors)\n        self.__metadata = RecordMetadata(metadata=metadata)\n        self.__responses = RecordResponses(responses=responses, record=self)\n        self.__suggestions = RecordSuggestions(suggestions=suggestions, record=self)\n\n    def __repr__(self) -> str:\n        return (\n            f\"Record(id={self.id},status={self.status},fields={self.fields},metadata={self.metadata},\"\n            f\"suggestions={self.suggestions},responses={self.responses})\"\n        )\n\n    ############################\n    # Properties\n    ############################\n\n    @property\n    def id(self) -> str:\n        return self._model.external_id\n\n    @id.setter\n    def id(self, value: str) -> None:\n        self._model.external_id = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n\n    @property\n    def fields(self) -> \"RecordFields\":\n        return self.__fields\n\n    @property\n    def responses(self) -> \"RecordResponses\":\n        return self.__responses\n\n    @property\n    def suggestions(self) -> \"RecordSuggestions\":\n        return self.__suggestions\n\n    @property\n    def metadata(self) -> \"RecordMetadata\":\n        return self.__metadata\n\n    @property\n    def vectors(self) -> \"RecordVectors\":\n        return self.__vectors\n\n    @property\n    def status(self) -> str:\n        return self._model.status\n\n    @property\n    def _server_id(self) -> Optional[UUID]:\n        return self._model.id\n\n    ############################\n    # Public methods\n    ############################\n\n    def get(self) -> \"Record\":\n        \"\"\"Retrieves the record from the server.\"\"\"\n        model = self._client.api.records.get(self._server_id)\n        instance = self.from_model(model, dataset=self.dataset)\n        self.__dict__ = instance.__dict__\n\n        return self\n\n    def api_model(self) -> RecordModel:\n        return RecordModel(\n            id=self._model.id,\n            external_id=self._model.external_id,\n            fields=self.fields.to_dict(),\n            metadata=self.metadata.api_models(),\n            vectors=self.vectors.api_models(),\n            responses=self.responses.api_models(),\n            suggestions=self.suggestions.api_models(),\n            status=self.status,\n        )\n\n    def serialize(self) -> Dict[str, Any]:\n        \"\"\"Serializes the Record to a dictionary for interaction with the API\"\"\"\n        serialized_model = self._model.model_dump()\n        serialized_suggestions = [suggestion.serialize() for suggestion in self.__suggestions]\n        serialized_responses = [response.serialize() for response in self.__responses]\n        serialized_model[\"responses\"] = serialized_responses\n        serialized_model[\"suggestions\"] = serialized_suggestions\n\n        return serialized_model\n\n    def to_dict(self) -> Dict[str, Dict]:\n        \"\"\"Converts a Record object to a dictionary for export.\n        Returns:\n            A dictionary representing the record where the keys are \"fields\",\n            \"metadata\", \"suggestions\", and \"responses\". Each field and question is\n            represented as a key-value pair in the dictionary of the respective key. i.e.\n            `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},\n        \"\"\"\n        id = str(self.id) if self.id else None\n        server_id = str(self._model.id) if self._model.id else None\n        status = self.status\n        fields = self.fields.to_dict()\n        metadata = self.metadata.to_dict()\n        suggestions = self.suggestions.to_dict()\n        responses = self.responses.to_dict()\n        vectors = self.vectors.to_dict()\n\n        # TODO: Review model attributes when to_dict and serialize methods are unified\n        return {\n            \"id\": id,\n            \"fields\": fields,\n            \"metadata\": metadata,\n            \"suggestions\": suggestions,\n            \"responses\": responses,\n            \"vectors\": vectors,\n            \"status\": status,\n            \"_server_id\": server_id,\n        }\n\n    @classmethod\n    def from_dict(cls, data: Dict[str, Dict], dataset: Optional[\"Dataset\"] = None) -> \"Record\":\n        \"\"\"Converts a dictionary to a Record object.\n        Args:\n            data: A dictionary representing the record.\n            dataset: The dataset object to which the record belongs.\n        Returns:\n            A Record object.\n        \"\"\"\n        fields = data.get(\"fields\", {})\n        metadata = data.get(\"metadata\", {})\n        suggestions = data.get(\"suggestions\", {})\n        responses = data.get(\"responses\", {})\n        vectors = data.get(\"vectors\", {})\n        record_id = data.get(\"id\", None)\n        _server_id = data.get(\"_server_id\", None)\n\n        suggestions = [Suggestion(question_name=question_name, **value) for question_name, value in suggestions.items()]\n        responses = [\n            Response(question_name=question_name, **value)\n            for question_name, _responses in responses.items()\n            for value in _responses\n        ]\n\n        return cls(\n            id=record_id,\n            fields=fields,\n            suggestions=suggestions,\n            responses=responses,\n            vectors=vectors,\n            metadata=metadata,\n            _dataset=dataset,\n            _server_id=_server_id,\n        )\n\n    @classmethod\n    def from_model(cls, model: RecordModel, dataset: \"Dataset\") -> \"Record\":\n        \"\"\"Converts a RecordModel object to a Record object.\n        Args:\n            model: A RecordModel object.\n            dataset: The dataset object to which the record belongs.\n        Returns:\n            A Record object.\n        \"\"\"\n        instance = cls(\n            id=model.external_id,\n            fields=model.fields,\n            metadata={meta.name: meta.value for meta in model.metadata},\n            vectors={vector.name: vector.vector_values for vector in model.vectors},\n            _dataset=dataset,\n            responses=[],\n            suggestions=[],\n        )\n\n        # set private attributes\n        instance._dataset = dataset\n        instance._model = model\n\n        # Responses and suggestions are computed separately based on the record model\n        instance.responses.from_models(model.responses)\n        instance.suggestions.from_models(model.suggestions)\n\n        return instance\n\n    @property\n    def _client(self) -> Optional[\"Argilla\"]:\n        if self._dataset:\n            return self.dataset._client\n\n    @property\n    def _api(self) -> Optional[\"RecordsAPI\"]:\n        if self._client:\n            return self._client.api.records\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.__init__","title":"__init__(id=None, fields=None, metadata=None, vectors=None, responses=None, suggestions=None, _server_id=None, _dataset=None)","text":"

Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id. Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions and passed to Dataset.DatasetRecords.add() as a list of dictionaries.

Parameters:

Name Type Description Default id Optional[Union[UUID, str]]

An id for the record. If not provided, a UUID will be generated.

None fields Optional[Dict[str, FieldValue]]

A dictionary of fields for the record.

None metadata Optional[Dict[str, MetadataValue]]

A dictionary of metadata for the record.

None vectors Optional[Dict[str, VectorValue]]

A dictionary of vectors for the record.

None responses Optional[List[Response]]

A list of Response objects for the record.

None suggestions Optional[List[Suggestion]]

A list of Suggestion objects for the record.

None _server_id Optional[UUID]

An id for the record. (Read-only and set by the server)

None _dataset Optional[Dataset]

The dataset object to which the record belongs.

None Source code in src/argilla/records/_resource.py
def __init__(\n    self,\n    id: Optional[Union[UUID, str]] = None,\n    fields: Optional[Dict[str, FieldValue]] = None,\n    metadata: Optional[Dict[str, MetadataValue]] = None,\n    vectors: Optional[Dict[str, VectorValue]] = None,\n    responses: Optional[List[Response]] = None,\n    suggestions: Optional[List[Suggestion]] = None,\n    _server_id: Optional[UUID] = None,\n    _dataset: Optional[\"Dataset\"] = None,\n):\n    \"\"\"Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id.\n    Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions\n    and passed to Dataset.DatasetRecords.add() as a list of dictionaries.\n\n    Args:\n        id: An id for the record. If not provided, a UUID will be generated.\n        fields: A dictionary of fields for the record.\n        metadata: A dictionary of metadata for the record.\n        vectors: A dictionary of vectors for the record.\n        responses: A list of Response objects for the record.\n        suggestions: A list of Suggestion objects for the record.\n        _server_id: An id for the record. (Read-only and set by the server)\n        _dataset: The dataset object to which the record belongs.\n    \"\"\"\n\n    if fields is None and metadata is None and vectors is None and responses is None and suggestions is None:\n        raise ValueError(\"At least one of fields, metadata, vectors, responses, or suggestions must be provided.\")\n    if fields is None and id is None:\n        raise ValueError(\"If fields are not provided, an id must be provided.\")\n    if fields == {} and id is None:\n        raise ValueError(\"If fields are an empty dictionary, an id must be provided.\")\n\n    self._dataset = _dataset\n    self._model = RecordModel(external_id=id, id=_server_id)\n    self.__fields = RecordFields(fields=fields, record=self)\n    self.__vectors = RecordVectors(vectors=vectors)\n    self.__metadata = RecordMetadata(metadata=metadata)\n    self.__responses = RecordResponses(responses=responses, record=self)\n    self.__suggestions = RecordSuggestions(suggestions=suggestions, record=self)\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.get","title":"get()","text":"

Retrieves the record from the server.

Source code in src/argilla/records/_resource.py
def get(self) -> \"Record\":\n    \"\"\"Retrieves the record from the server.\"\"\"\n    model = self._client.api.records.get(self._server_id)\n    instance = self.from_model(model, dataset=self.dataset)\n    self.__dict__ = instance.__dict__\n\n    return self\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.serialize","title":"serialize()","text":"

Serializes the Record to a dictionary for interaction with the API

Source code in src/argilla/records/_resource.py
def serialize(self) -> Dict[str, Any]:\n    \"\"\"Serializes the Record to a dictionary for interaction with the API\"\"\"\n    serialized_model = self._model.model_dump()\n    serialized_suggestions = [suggestion.serialize() for suggestion in self.__suggestions]\n    serialized_responses = [response.serialize() for response in self.__responses]\n    serialized_model[\"responses\"] = serialized_responses\n    serialized_model[\"suggestions\"] = serialized_suggestions\n\n    return serialized_model\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.to_dict","title":"to_dict()","text":"

Converts a Record object to a dictionary for export. Returns: A dictionary representing the record where the keys are \"fields\", \"metadata\", \"suggestions\", and \"responses\". Each field and question is represented as a key-value pair in the dictionary of the respective key. i.e. `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},

Source code in src/argilla/records/_resource.py
def to_dict(self) -> Dict[str, Dict]:\n    \"\"\"Converts a Record object to a dictionary for export.\n    Returns:\n        A dictionary representing the record where the keys are \"fields\",\n        \"metadata\", \"suggestions\", and \"responses\". Each field and question is\n        represented as a key-value pair in the dictionary of the respective key. i.e.\n        `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},\n    \"\"\"\n    id = str(self.id) if self.id else None\n    server_id = str(self._model.id) if self._model.id else None\n    status = self.status\n    fields = self.fields.to_dict()\n    metadata = self.metadata.to_dict()\n    suggestions = self.suggestions.to_dict()\n    responses = self.responses.to_dict()\n    vectors = self.vectors.to_dict()\n\n    # TODO: Review model attributes when to_dict and serialize methods are unified\n    return {\n        \"id\": id,\n        \"fields\": fields,\n        \"metadata\": metadata,\n        \"suggestions\": suggestions,\n        \"responses\": responses,\n        \"vectors\": vectors,\n        \"status\": status,\n        \"_server_id\": server_id,\n    }\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.from_dict","title":"from_dict(data, dataset=None) classmethod","text":"

Converts a dictionary to a Record object. Args: data: A dictionary representing the record. dataset: The dataset object to which the record belongs. Returns: A Record object.

Source code in src/argilla/records/_resource.py
@classmethod\ndef from_dict(cls, data: Dict[str, Dict], dataset: Optional[\"Dataset\"] = None) -> \"Record\":\n    \"\"\"Converts a dictionary to a Record object.\n    Args:\n        data: A dictionary representing the record.\n        dataset: The dataset object to which the record belongs.\n    Returns:\n        A Record object.\n    \"\"\"\n    fields = data.get(\"fields\", {})\n    metadata = data.get(\"metadata\", {})\n    suggestions = data.get(\"suggestions\", {})\n    responses = data.get(\"responses\", {})\n    vectors = data.get(\"vectors\", {})\n    record_id = data.get(\"id\", None)\n    _server_id = data.get(\"_server_id\", None)\n\n    suggestions = [Suggestion(question_name=question_name, **value) for question_name, value in suggestions.items()]\n    responses = [\n        Response(question_name=question_name, **value)\n        for question_name, _responses in responses.items()\n        for value in _responses\n    ]\n\n    return cls(\n        id=record_id,\n        fields=fields,\n        suggestions=suggestions,\n        responses=responses,\n        vectors=vectors,\n        metadata=metadata,\n        _dataset=dataset,\n        _server_id=_server_id,\n    )\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.from_model","title":"from_model(model, dataset) classmethod","text":"

Converts a RecordModel object to a Record object. Args: model: A RecordModel object. dataset: The dataset object to which the record belongs. Returns: A Record object.

Source code in src/argilla/records/_resource.py
@classmethod\ndef from_model(cls, model: RecordModel, dataset: \"Dataset\") -> \"Record\":\n    \"\"\"Converts a RecordModel object to a Record object.\n    Args:\n        model: A RecordModel object.\n        dataset: The dataset object to which the record belongs.\n    Returns:\n        A Record object.\n    \"\"\"\n    instance = cls(\n        id=model.external_id,\n        fields=model.fields,\n        metadata={meta.name: meta.value for meta in model.metadata},\n        vectors={vector.name: vector.vector_values for vector in model.vectors},\n        _dataset=dataset,\n        responses=[],\n        suggestions=[],\n    )\n\n    # set private attributes\n    instance._dataset = dataset\n    instance._model = model\n\n    # Responses and suggestions are computed separately based on the record model\n    instance.responses.from_models(model.responses)\n    instance.suggestions.from_models(model.suggestions)\n\n    return instance\n
"},{"location":"reference/argilla/records/responses/","title":"rg.Response","text":"

Class for interacting with Argilla Responses of records. Responses are answers to questions by a user. Therefore, a recod question can have multiple responses, one for each user that has answered the question. A Response is typically created by a user in the UI or consumed from a data source as a label, unlike a Suggestion which is typically created by a model prediction.

"},{"location":"reference/argilla/records/responses/#usage-examples","title":"Usage Examples","text":"

Responses can be added to an instantiated Record directly or as a dictionary a dictionary. The following examples demonstrate how to add responses to a record object and how to access responses from a record object:

Instantiate the Record and related Response objects:

dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            responses=[rg.Response(\"label\", \"negative\", user_id=user.id)],\n            external_id=str(uuid.uuid4()),\n        )\n    ]\n)\n

Or, add a response from a dictionary where key is the question name and value is the response:

dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"label.response\": \"negative\",\n        },\n    ]\n)\n

Responses can be accessed from a Record via their question name as an attribute of the record. So if a question is named label, the response can be accessed as record.label. The following example demonstrates how to access responses from a record object:

# iterate over the records and responses\n\nfor record in dataset.records:\n    for response in record.responses[\"label\"]: # (1)\n        print(response.value)\n        print(response.user_id)\n\n# validate that the record has a response\n\nfor record in dataset.records:\n    if record.responses[\"label\"]:\n        for response in record.responses[\"label\"]:\n            print(response.value)\n            print(response.user_id)\n    else:\n        record.responses.add(\n            rg.Response(\"label\", \"positive\", user_id=user.id)\n        ) # (2)\n
1. Access the responses for the question named label for each record like a dictionary containing a list of Response objects. 2. Add a response to the record if it does not already have one.

"},{"location":"reference/argilla/records/responses/#format-per-question-type","title":"Format per Question type","text":"

Depending on the Question type, responses might need to be formatted in a slightly different way.

For LabelQuestionFor MultiLabelQuestionFor RankingQuestionFor RatingQuestionFor SpanQuestionFor TextQuestion
rg.Response(\n    question_name=\"label\",\n    value=\"positive\",\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"multi-label\",\n    value=[\"positive\", \"negative\"],\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"rank\",\n    value=[\"1\", \"3\", \"2\"],\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"rating\",\n    value=4,\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"span\",\n    value=[{\"start\": 0, \"end\": 9, \"label\": \"MISC\"}],\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"text\",\n    value=\"value\",\n    user_id=user.id,\n    status=\"draft\"\n)\n
"},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response","title":"Response","text":"

Class for interacting with Argilla Responses of records. Responses are answers to questions by a user. Therefore, a record question can have multiple responses, one for each user that has answered the question. A Response is typically created by a user in the UI or consumed from a data source as a label, unlike a Suggestion which is typically created by a model prediction.

Source code in src/argilla/responses.py
class Response:\n    \"\"\"Class for interacting with Argilla Responses of records. Responses are answers to questions by a user.\n    Therefore, a record question can have multiple responses, one for each user that has answered the question.\n    A `Response` is typically created by a user in the UI or consumed from a data source as a label,\n    unlike a `Suggestion` which is typically created by a model prediction.\n\n    \"\"\"\n\n    def __init__(\n        self,\n        question_name: str,\n        value: Any,\n        user_id: UUID,\n        status: Optional[Union[ResponseStatus, str]] = None,\n        _record: Optional[\"Record\"] = None,\n    ) -> None:\n        \"\"\"Initializes a `Response` for a `Record` with a user_id and value\n\n        Attributes:\n            question_name (str): The name of the question that the suggestion is for.\n            value (str): The value of the response\n            user_id (UUID): The id of the user that submits the response\n            status (Union[ResponseStatus, str]): The status of the response as \"draft\", \"submitted\", \"discarded\".\n        \"\"\"\n\n        if question_name is None:\n            raise ValueError(\"question_name is required\")\n        if value is None:\n            raise ValueError(\"value is required\")\n        if user_id is None:\n            raise ValueError(\"user_id is required\")\n\n        if isinstance(status, str):\n            status = ResponseStatus(status)\n\n        self._record = _record\n        self.question_name = question_name\n        self.value = value\n        self.user_id = user_id\n        self.status = status\n\n    @property\n    def record(self) -> \"Record\":\n        \"\"\"Returns the record associated with the response\"\"\"\n        return self._record\n\n    @record.setter\n    def record(self, record: \"Record\") -> None:\n        \"\"\"Sets the record associated with the response\"\"\"\n        self._record = record\n\n    def serialize(self) -> dict[str, Any]:\n        \"\"\"Serializes the Response to a dictionary. This is principally used for sending the response to the API, \\\n            but can be used for data wrangling or manual export.\n\n        Returns:\n            dict[str, Any]: The serialized response as a dictionary with keys `question_name`, `value`, and `user_id`.\n\n        Examples:\n\n        ```python\n        response = rg.Response(\"label\", \"negative\", user_id=user.id)\n        response.serialize()\n        ```\n        \"\"\"\n        return {\n            \"question_name\": self.question_name,\n            \"value\": self.value,\n            \"user_id\": self.user_id,\n            \"status\": self.status,\n        }\n
"},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.record","title":"record: Record property writable","text":"

Returns the record associated with the response

"},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.__init__","title":"__init__(question_name, value, user_id, status=None, _record=None)","text":"

Initializes a Response for a Record with a user_id and value

Attributes:

Name Type Description question_name str

The name of the question that the suggestion is for.

value str

The value of the response

user_id UUID

The id of the user that submits the response

status Union[ResponseStatus, str]

The status of the response as \"draft\", \"submitted\", \"discarded\".

Source code in src/argilla/responses.py
def __init__(\n    self,\n    question_name: str,\n    value: Any,\n    user_id: UUID,\n    status: Optional[Union[ResponseStatus, str]] = None,\n    _record: Optional[\"Record\"] = None,\n) -> None:\n    \"\"\"Initializes a `Response` for a `Record` with a user_id and value\n\n    Attributes:\n        question_name (str): The name of the question that the suggestion is for.\n        value (str): The value of the response\n        user_id (UUID): The id of the user that submits the response\n        status (Union[ResponseStatus, str]): The status of the response as \"draft\", \"submitted\", \"discarded\".\n    \"\"\"\n\n    if question_name is None:\n        raise ValueError(\"question_name is required\")\n    if value is None:\n        raise ValueError(\"value is required\")\n    if user_id is None:\n        raise ValueError(\"user_id is required\")\n\n    if isinstance(status, str):\n        status = ResponseStatus(status)\n\n    self._record = _record\n    self.question_name = question_name\n    self.value = value\n    self.user_id = user_id\n    self.status = status\n
"},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.serialize","title":"serialize()","text":"

Serializes the Response to a dictionary. This is principally used for sending the response to the API, but can be used for data wrangling or manual export.

Returns:

Type Description dict[str, Any]

dict[str, Any]: The serialized response as a dictionary with keys question_name, value, and user_id.

Examples:

response = rg.Response(\"label\", \"negative\", user_id=user.id)\nresponse.serialize()\n
Source code in src/argilla/responses.py
def serialize(self) -> dict[str, Any]:\n    \"\"\"Serializes the Response to a dictionary. This is principally used for sending the response to the API, \\\n        but can be used for data wrangling or manual export.\n\n    Returns:\n        dict[str, Any]: The serialized response as a dictionary with keys `question_name`, `value`, and `user_id`.\n\n    Examples:\n\n    ```python\n    response = rg.Response(\"label\", \"negative\", user_id=user.id)\n    response.serialize()\n    ```\n    \"\"\"\n    return {\n        \"question_name\": self.question_name,\n        \"value\": self.value,\n        \"user_id\": self.user_id,\n        \"status\": self.status,\n    }\n
"},{"location":"reference/argilla/records/suggestions/","title":"rg.Suggestion","text":"

Class for interacting with Argilla Suggestions of records. Suggestions are typically created by a model prediction, unlike a Response which is typically created by a user in the UI or consumed from a data source as a label.

"},{"location":"reference/argilla/records/suggestions/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/records/suggestions/#adding-records-with-suggestions","title":"Adding records with suggestions","text":"

Suggestions can be added to a record directly or via a dictionary structure. The following examples demonstrate how to add suggestions to a record object and how to access suggestions from a record object:

Add a response from a dictionary where key is the question name and value is the response:

dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"label\": \"negative\", # this will be used as a suggestion\n        },\n    ]\n)\n

If your data contains scores for suggestions you can add them as well via the mapping parameter. The following example demonstrates how to add a suggestion with a score to a record object:

dataset.records.log(\n    [\n        {\n            \"prompt\": \"Hello World, how are you?\",\n            \"label\": \"negative\",  # this will be used as a suggestion\n            \"score\": 0.9,  # this will be used as the suggestion score\n            \"model\": \"model_name\",  # this will be used as the suggestion agent\n        },\n    ],\n    mapping={\n        \"score\": \"label.suggestion.score\",\n        \"model\": \"label.suggestion.agent\",\n    },  # `label` is the question name in the dataset settings\n)\n

Or, instantiate the Record and related Suggestions objects directly, like this:

dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            suggestions=[rg.Suggestion(\"negative\", \"label\", score=0.9, agent=\"model_name\")],\n        )\n    ]\n)\n
"},{"location":"reference/argilla/records/suggestions/#iterating-over-records-with-suggestions","title":"Iterating over records with suggestions","text":"

Just like responses, suggestions can be accessed from a Record via their question name as an attribute of the record. So if a question is named label, the suggestion can be accessed as record.label. The following example demonstrates how to access suggestions from a record object:

for record in dataset.records(with_suggestions=True):\n    print(record.suggestions[\"label\"].value)\n

We can also add suggestions to records as we iterate over them using the add method:

for record in dataset.records(with_suggestions=True):\n    if not record.suggestions[\"label\"]: # (1)\n        record.suggestions.add(\n            rg.Suggestion(\"positive\", \"label\", score=0.9, agent=\"model_name\")\n        ) # (2)\n
  1. Validate that the record has a suggestion
  2. Add a suggestion to the record if it does not already have one
"},{"location":"reference/argilla/records/suggestions/#format-per-question-type","title":"Format per Question type","text":"

Depending on the Question type, responses might need to be formatted in a slightly different way.

For LabelQuestionFor MultiLabelQuestionFor RankingQuestionFor RatingQuestionFor SpanQuestionFor TextQuestion
rg.Suggestion(\n    question_name=\"label\",\n    value=\"positive\",\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"multi-label\",\n    value=[\"positive\", \"negative\"],\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"rank\",\n    value=[\"1\", \"3\", \"2\"],\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"rating\",\n    value=4,\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"span\",\n    value=[{\"start\": 0, \"end\": 9, \"label\": \"MISC\"}],\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"text\",\n    value=\"value\",\n    score=0.9,\n    agent=\"model_name\"\n)\n
"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion","title":"Suggestion","text":"

Bases: Resource

Class for interacting with Argilla Suggestions. Suggestions are typically model predictions for records. Suggestions are rendered in the user interfaces as 'hints' or 'suggestions' for the user to review and accept or reject.

Attributes:

Name Type Description question_name str

The name of the question that the suggestion is for.

value str

The value of the suggestion

score float

The score of the suggestion. For example, the probability of the model prediction.

agent str

The agent that created the suggestion. For example, the model name.

type str

The type of suggestion, either 'model' or 'human'.

Source code in src/argilla/suggestions.py
class Suggestion(Resource):\n    \"\"\"Class for interacting with Argilla Suggestions. Suggestions are typically model predictions for records.\n    Suggestions are rendered in the user interfaces as 'hints' or 'suggestions' for the user to review and accept or reject.\n\n    Attributes:\n        question_name (str): The name of the question that the suggestion is for.\n        value (str): The value of the suggestion\n        score (float): The score of the suggestion. For example, the probability of the model prediction.\n        agent (str): The agent that created the suggestion. For example, the model name.\n        type (str): The type of suggestion, either 'model' or 'human'.\n    \"\"\"\n\n    _model: SuggestionModel\n\n    def __init__(\n        self,\n        question_name: str,\n        value: Any,\n        score: Union[float, List[float], None] = None,\n        agent: Optional[str] = None,\n        type: Optional[Literal[\"model\", \"human\"]] = None,\n        _record: Optional[\"Record\"] = None,\n    ) -> None:\n        super().__init__()\n\n        if question_name is None:\n            raise ValueError(\"question_name is required\")\n        if value is None:\n            raise ValueError(\"value is required\")\n\n        self._record = _record\n        self._model = SuggestionModel(\n            question_name=question_name,\n            value=value,\n            type=type,\n            score=score,\n            agent=agent,\n        )\n\n    ##############################\n    # Properties\n    ##############################\n\n    @property\n    def value(self) -> Any:\n        \"\"\"The value of the suggestion.\"\"\"\n        return self._model.value\n\n    @property\n    def question_name(self) -> Optional[str]:\n        \"\"\"The name of the question that the suggestion is for.\"\"\"\n        return self._model.question_name\n\n    @question_name.setter\n    def question_name(self, value: str) -> None:\n        self._model.question_name = value\n\n    @property\n    def type(self) -> Optional[Literal[\"model\", \"human\"]]:\n        \"\"\"The type of suggestion, either 'model' or 'human'.\"\"\"\n        return self._model.type\n\n    @property\n    def score(self) -> Optional[Union[float, List[float]]]:\n        \"\"\"The score of the suggestion.\"\"\"\n        return self._model.score\n\n    @score.setter\n    def score(self, value: float) -> None:\n        self._model.score = value\n\n    @property\n    def agent(self) -> Optional[str]:\n        \"\"\"The agent that created the suggestion.\"\"\"\n        return self._model.agent\n\n    @agent.setter\n    def agent(self, value: str) -> None:\n        self._model.agent = value\n\n    @property\n    def record(self) -> Optional[\"Record\"]:\n        \"\"\"The record that the suggestion is for.\"\"\"\n        return self._record\n\n    @record.setter\n    def record(self, value: \"Record\") -> None:\n        self._record = value\n\n    @classmethod\n    def from_model(cls, model: SuggestionModel, record: \"Record\") -> \"Suggestion\":\n        question = record.dataset.settings.questions[model.question_id]\n        model.question_name = question.name\n        model.value = cls.__from_model_value(model.value, question)\n\n        instance = cls(question.name, model.value, _record=record)\n        instance._model = model\n\n        return instance\n\n    def api_model(self) -> SuggestionModel:\n        if self.record is None or self.record.dataset is None:\n            return self._model\n\n        question = self.record.dataset.settings.questions[self.question_name]\n        if question:\n            return SuggestionModel(\n                value=self.__to_model_value(self.value, question),\n                question_name=None if not question else question.name,\n                question_id=None if not question else question.id,\n                type=self._model.type,\n                score=self._model.score,\n                agent=self._model.agent,\n                id=self._model.id,\n            )\n        else:\n            raise RecordSuggestionsError(\n                f\"Record suggestion is invalid because question with name={self.question_name} does not exist in the dataset ({self.record.dataset.name}). Available questions are: {list(self.record.dataset.settings.questions._properties_by_name.keys())}\"\n            )\n\n    @classmethod\n    def __to_model_value(cls, value: Any, question: \"QuestionType\") -> Any:\n        if isinstance(question, RankingQuestion):\n            return cls.__ranking_to_model_value(value)\n        return value\n\n    @classmethod\n    def __from_model_value(cls, value: Any, question: \"QuestionType\") -> Any:\n        if isinstance(question, RankingQuestion):\n            return cls.__ranking_from_model_value(value)\n        return value\n\n    @classmethod\n    def __ranking_from_model_value(cls, value: List[Dict[str, Any]]) -> List[str]:\n        return [v[\"value\"] for v in value]\n\n    @classmethod\n    def __ranking_to_model_value(cls, value: List[str]) -> List[Dict[str, str]]:\n        return [{\"value\": str(v)} for v in value]\n
"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.value","title":"value: Any property","text":"

The value of the suggestion.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.question_name","title":"question_name: Optional[str] property writable","text":"

The name of the question that the suggestion is for.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.type","title":"type: Optional[Literal['model', 'human']] property","text":"

The type of suggestion, either 'model' or 'human'.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.score","title":"score: Optional[Union[float, List[float]]] property writable","text":"

The score of the suggestion.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.agent","title":"agent: Optional[str] property writable","text":"

The agent that created the suggestion.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.record","title":"record: Optional[Record] property writable","text":"

The record that the suggestion is for.

"},{"location":"reference/argilla/records/vectors/","title":"rg.Vector","text":"

A vector is a numerical representation of a Record field or attribute, usually the record's text. Vectors can be used to search for similar records via the UI or SDK. Vectors can be added to a record directly or as a dictionary with a key that the matches rg.VectorField name.

"},{"location":"reference/argilla/records/vectors/#usage-examples","title":"Usage Examples","text":"

To use vectors within a dataset, you must define a vector field in the dataset settings. The vector field is a list of vector fields that can be attached to a record. The following example demonstrates how to add vectors to a dataset and how to access vectors from a record object:

import argilla as rg\n\ndataset = Dataset(\n    name=\"dataset_with_metadata\",\n    settings=Settings(\n        fields=[TextField(name=\"text\")],\n        questions=[LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])],\n        vectors=[\n            VectorField(name=\"vector_name\"),\n        ],\n    ),\n)\ndataset.create()\n

Then, you can add records to the dataset with vectors that correspond to the vector field defined in the dataset settings:

dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"vector_name\": [0.1, 0.2, 0.3]\n        }\n    ]\n)\n

Vectors can be passed using a mapping, where the key is the key in the data source and the value is the name in the dataset's setting's rg.VectorField object. For example, the following code adds a record with a vector using a mapping:

dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"x\": [0.1, 0.2, 0.3]\n        }\n    ],\n    mapping={\"x\": \"vector_name\"}\n)\n

Or, vectors can be instantiated and added to a record directly, like this:

dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            vectors=[rg.Vector(\"embedding\", [0.1, 0.2, 0.3])],\n        )\n    ]\n)\n
"},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector","title":"Vector","text":"

Bases: Resource

Class for interacting with Argilla Vectors. Vectors are typically used to represent embeddings or features of records. The Vector class is used to deliver vectors to the Argilla server.

Attributes:

Name Type Description name str

The name of the vector.

values list[float]

The values of the vector.

Source code in src/argilla/vectors.py
class Vector(Resource):\n    \"\"\" Class for interacting with Argilla Vectors. Vectors are typically used to represent \\\n        embeddings or features of records. The `Vector` class is used to deliver vectors to the Argilla server.\n\n    Attributes:\n        name (str): The name of the vector.\n        values (list[float]): The values of the vector.\n    \"\"\"\n\n    _model: VectorModel\n\n    def __init__(\n        self,\n        name: str,\n        values: list[float],\n    ) -> None:\n        \"\"\"Initializes a Vector with a name and values that can be used to search in the Argilla ui.\n\n        Parameters:\n            name (str): Name of the vector\n            values (list[float]): List of float values\n\n        \"\"\"\n        self._model = VectorModel(\n            name=name,\n            vector_values=values,\n        )\n\n    def __repr__(self) -> str:\n        return repr(f\"{self.__class__.__name__}({self._model})\")\n\n    ##############################\n    # Properties\n    ##############################\n\n    @property\n    def name(self) -> str:\n        \"\"\"Name of the vector that corresponds to the name of the vector in the dataset's `Settings`\"\"\"\n        return self._model.name\n\n    @property\n    def values(self) -> list[float]:\n        \"\"\"List of float values that represent the vector.\"\"\"\n        return self._model.vector_values\n\n    ##############################\n    # Methods\n    ##############################\n\n    @classmethod\n    def from_model(cls, model: VectorModel) -> \"Vector\":\n        return cls(\n            name=model.name,\n            values=model.vector_values,\n        )\n\n    def serialize(self) -> dict[str, Any]:\n        dumped_model = self._model.model_dump()\n        name = dumped_model.pop(\"name\")\n        values = dumped_model.pop(\"vector_values\")\n        return {name: values}\n
"},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.name","title":"name: str property","text":"

Name of the vector that corresponds to the name of the vector in the dataset's Settings

"},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.values","title":"values: list[float] property","text":"

List of float values that represent the vector.

"},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.__init__","title":"__init__(name, values)","text":"

Initializes a Vector with a name and values that can be used to search in the Argilla ui.

Parameters:

Name Type Description Default name str

Name of the vector

required values list[float]

List of float values

required Source code in src/argilla/vectors.py
def __init__(\n    self,\n    name: str,\n    values: list[float],\n) -> None:\n    \"\"\"Initializes a Vector with a name and values that can be used to search in the Argilla ui.\n\n    Parameters:\n        name (str): Name of the vector\n        values (list[float]): List of float values\n\n    \"\"\"\n    self._model = VectorModel(\n        name=name,\n        vector_values=values,\n    )\n
"},{"location":"reference/argilla/settings/fields/","title":"Fields","text":"

Fields in Argilla define the content of a record that will be reviewed by a user.

"},{"location":"reference/argilla/settings/fields/#usage-examples","title":"Usage Examples","text":"

To define a field, instantiate the different field classes and pass it to the fields parameter of the Settings class.

text_field = rg.TextField(name=\"text\")\nmarkdown_field = rg.TextField(name=\"text\", use_markdown=True)\nimage_field = rg.ImageField(name=\"image\")\n

The fields parameter of the Settings class can accept a list of fields, like this:

settings = rg.Settings(\n    fields=[\n        text_field,\n        markdown_field,\n        image_field,\n    ],\n    questions=[\n        rg.TextQuestion(name=\"response\"),\n    ],\n)\n\ndata = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

To add records with values for fields, refer to the rg.Dataset.records documentation.

"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.TextField","title":"TextField","text":"

Bases: AbstractField

Text field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_field.py
class TextField(AbstractField):\n    \"\"\"Text field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        use_markdown: Optional[bool] = False,\n        required: bool = True,\n        description: Optional[str] = None,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Text field for use in Argilla `Dataset` `Settings`\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str], optional): The title of the field. Defaults to None.\n            use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to False.\n            required (bool): Whether the field is required. Defaults to True.\n            description (Optional[str], optional): The description of the field. Defaults to None.\n\n        \"\"\"\n\n        super().__init__(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=TextFieldSettings(use_markdown=use_markdown),\n            _client=client,\n        )\n\n    @property\n    def use_markdown(self) -> Optional[bool]:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, value: bool) -> None:\n        self._model.settings.use_markdown = value\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.TextField.__init__","title":"__init__(name, title=None, use_markdown=False, required=True, description=None, client=None)","text":"

Text field for use in Argilla Dataset Settings Parameters: name (str): The name of the field title (Optional[str], optional): The title of the field. Defaults to None. use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to False. required (bool): Whether the field is required. Defaults to True. description (Optional[str], optional): The description of the field. Defaults to None.

Source code in src/argilla/settings/_field.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    use_markdown: Optional[bool] = False,\n    required: bool = True,\n    description: Optional[str] = None,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Text field for use in Argilla `Dataset` `Settings`\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str], optional): The title of the field. Defaults to None.\n        use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to False.\n        required (bool): Whether the field is required. Defaults to True.\n        description (Optional[str], optional): The description of the field. Defaults to None.\n\n    \"\"\"\n\n    super().__init__(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=TextFieldSettings(use_markdown=use_markdown),\n        _client=client,\n    )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.ImageField","title":"ImageField","text":"

Bases: AbstractField

Image field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_field.py
class ImageField(AbstractField):\n    \"\"\"Image field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        required: Optional[bool] = True,\n        description: Optional[str] = None,\n        _client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"\n        Text field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str], optional): The title of the field. Defaults to None.\n            required (Optional[bool], optional): Whether the field is required. Defaults to True.\n            description (Optional[str], optional): The description of the field. Defaults to None.\n        \"\"\"\n\n        super().__init__(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=ImageFieldSettings(),\n            _client=_client,\n        )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.ImageField.__init__","title":"__init__(name, title=None, required=True, description=None, _client=None)","text":"

Text field for use in Argilla Dataset Settings

Parameters:

Name Type Description Default name str

The name of the field

required title Optional[str]

The title of the field. Defaults to None.

None required Optional[bool]

Whether the field is required. Defaults to True.

True description Optional[str]

The description of the field. Defaults to None.

None Source code in src/argilla/settings/_field.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    required: Optional[bool] = True,\n    description: Optional[str] = None,\n    _client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"\n    Text field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str], optional): The title of the field. Defaults to None.\n        required (Optional[bool], optional): Whether the field is required. Defaults to True.\n        description (Optional[str], optional): The description of the field. Defaults to None.\n    \"\"\"\n\n    super().__init__(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=ImageFieldSettings(),\n        _client=_client,\n    )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.ChatField","title":"ChatField","text":"

Bases: AbstractField

Chat field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_field.py
class ChatField(AbstractField):\n    \"\"\"Chat field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        use_markdown: Optional[bool] = True,\n        required: bool = True,\n        description: Optional[str] = None,\n        _client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"\n        Chat field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str], optional): The title of the field. Defaults to None.\n            use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to True.\n            required (bool): Whether the field is required. Defaults to True.\n            description (Optional[str], optional): The description of the field. Defaults to None.\n        \"\"\"\n\n        super().__init__(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=ChatFieldSettings(use_markdown=use_markdown),\n            _client=_client,\n        )\n\n    @property\n    def use_markdown(self) -> Optional[bool]:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, value: bool) -> None:\n        self._model.settings.use_markdown = value\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.ChatField.__init__","title":"__init__(name, title=None, use_markdown=True, required=True, description=None, _client=None)","text":"

Chat field for use in Argilla Dataset Settings

Parameters:

Name Type Description Default name str

The name of the field

required title Optional[str]

The title of the field. Defaults to None.

None use_markdown Optional[bool]

Whether to use markdown. Defaults to True.

True required bool

Whether the field is required. Defaults to True.

True description Optional[str]

The description of the field. Defaults to None.

None Source code in src/argilla/settings/_field.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    use_markdown: Optional[bool] = True,\n    required: bool = True,\n    description: Optional[str] = None,\n    _client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"\n    Chat field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str], optional): The title of the field. Defaults to None.\n        use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to True.\n        required (bool): Whether the field is required. Defaults to True.\n        description (Optional[str], optional): The description of the field. Defaults to None.\n    \"\"\"\n\n    super().__init__(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=ChatFieldSettings(use_markdown=use_markdown),\n        _client=_client,\n    )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.CustomField","title":"CustomField","text":"

Bases: AbstractField

Custom field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_field.py
class CustomField(AbstractField):\n    \"\"\"Custom field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        template: Optional[str] = \"\",\n        advanced_mode: Optional[bool] = False,\n        required: bool = True,\n        description: Optional[str] = None,\n        _client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"\n        Custom field for use in Argilla `Dataset` `Settings` for working with custom HTML and CSS templates.\n        By default argilla will use a brackets syntax engine for the templates, which converts\n        `{{ field.key }}` to the values of record's field's object.\n\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str], optional): The title of the field. Defaults to None.\n            template (str): The template of the field (HTML and CSS)\n            advanced_mode (Optional[bool], optional): Whether to use advanced mode. Defaults to False.\n                Deactivate the brackets syntax engine and use custom javascript to render the field.\n            required (Optional[bool], optional): Whether the field is required. Defaults to True.\n            required (bool): Whether the field is required. Defaults to True.\n            description (Optional[str], optional): The description of the field. Defaults to None.\n        \"\"\"\n        template = self._load_template(template)\n        super().__init__(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=CustomFieldSettings(template=template, advanced_mode=advanced_mode),\n            _client=_client,\n        )\n\n    @property\n    def template(self) -> Optional[str]:\n        return self._model.settings.template\n\n    @template.setter\n    def template(self, value: str) -> None:\n        self._model.settings.template = self._load_template(value)\n\n    @property\n    def advanced_mode(self) -> Optional[bool]:\n        return self._model.settings.advanced_mode\n\n    @advanced_mode.setter\n    def advanced_mode(self, value: bool) -> None:\n        self._model.settings.advanced_mode = value\n\n    def validate(self):\n        if self.template is None or self.template.strip() == \"\":\n            raise SettingsError(\"A valid template is required for CustomField\")\n\n    @classmethod\n    def _load_template(cls, template: str) -> str:\n        if template.endswith(\".html\") and os.path.exists(template):\n            with open(template, \"r\") as f:\n                return f.read()\n        if template.startswith(\"http\") or template.startswith(\"https\"):\n            return requests.get(template).text\n        if isinstance(template, str):\n            return template\n        raise ArgillaError(\n            \"Invalid template. Please provide 1: a valid path or URL to a HTML file. 2: a valid HTML string.\"\n        )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.CustomField.__init__","title":"__init__(name, title=None, template='', advanced_mode=False, required=True, description=None, _client=None)","text":"

Custom field for use in Argilla Dataset Settings for working with custom HTML and CSS templates. By default argilla will use a brackets syntax engine for the templates, which converts {{ field.key }} to the values of record's field's object.

Parameters:

Name Type Description Default name str

The name of the field

required title Optional[str]

The title of the field. Defaults to None.

None template str

The template of the field (HTML and CSS)

'' advanced_mode Optional[bool]

Whether to use advanced mode. Defaults to False. Deactivate the brackets syntax engine and use custom javascript to render the field.

False required Optional[bool]

Whether the field is required. Defaults to True.

True required bool

Whether the field is required. Defaults to True.

True description Optional[str]

The description of the field. Defaults to None.

None Source code in src/argilla/settings/_field.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    template: Optional[str] = \"\",\n    advanced_mode: Optional[bool] = False,\n    required: bool = True,\n    description: Optional[str] = None,\n    _client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"\n    Custom field for use in Argilla `Dataset` `Settings` for working with custom HTML and CSS templates.\n    By default argilla will use a brackets syntax engine for the templates, which converts\n    `{{ field.key }}` to the values of record's field's object.\n\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str], optional): The title of the field. Defaults to None.\n        template (str): The template of the field (HTML and CSS)\n        advanced_mode (Optional[bool], optional): Whether to use advanced mode. Defaults to False.\n            Deactivate the brackets syntax engine and use custom javascript to render the field.\n        required (Optional[bool], optional): Whether the field is required. Defaults to True.\n        required (bool): Whether the field is required. Defaults to True.\n        description (Optional[str], optional): The description of the field. Defaults to None.\n    \"\"\"\n    template = self._load_template(template)\n    super().__init__(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=CustomFieldSettings(template=template, advanced_mode=advanced_mode),\n        _client=_client,\n    )\n
"},{"location":"reference/argilla/settings/metadata_property/","title":"Metadata Properties","text":"

Metadata properties are used to define metadata fields in a dataset. Metadata fields are used to store additional information about the records in the dataset. For example, the category of a record, the price of a product, or any other information that is relevant to the record.

"},{"location":"reference/argilla/settings/metadata_property/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/settings/metadata_property/#defining-metadata-property-for-a-dataset","title":"Defining Metadata Property for a dataset","text":"

We define metadata properties via type specific classes. The following example demonstrates how to define metadata properties as either a float, integer, or terms metadata property and pass them to the Settings.

TermsMetadataProperty is used to define a metadata field with a list of options. For example, a color field with options red, blue, and green. FloatMetadataProperty and IntegerMetadataProperty is used to define a metadata field with a float value. For example, a price field with a minimum value of 0.0 and a maximum value of 100.0.

metadata_field = rg.TermsMetadataProperty(\n    name=\"color\",\n    options=[\"red\", \"blue\", \"green\"],\n    title=\"Color\",\n)\n\nfloat_metadata_field = rg.FloatMetadataProperty(\n    name=\"price\",\n    min=0.0,\n    max=100.0,\n    title=\"Price\",\n)\n\nint_metadata_field = rg.IntegerMetadataProperty(\n    name=\"quantity\",\n    min=0,\n    max=100,\n    title=\"Quantity\",\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=rg.Settings(\n        fields=[\n            rg.TextField(name=\"text\"),\n        ],\n        questions=[\n            rg.TextQuestion(name=\"response\"),\n        ],\n        metadata=[\n            metadata_field,\n            float_metadata_field,\n            int_metadata_field,\n        ],\n    ),\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

To add records with metadata, refer to the rg.Metadata class documentation.

"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.FloatMetadataProperty","title":"FloatMetadataProperty","text":"

Bases: MetadataPropertyBase

Source code in src/argilla/settings/_metadata.py
class FloatMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        min: Optional[float] = None,\n        max: Optional[float] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with float settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            min (Optional[float]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n            max (Optional[float]): The maximum valid value. If none is provided, it will be computed from the values provided in the records.\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings.\n        \"\"\"\n\n        super().__init__(client=client)\n\n        try:\n            settings = FloatMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.float)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.float,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def min(self) -> Optional[int]:\n        return self._model.settings.min\n\n    @min.setter\n    def min(self, value: Optional[int]) -> None:\n        self._model.settings.min = value\n\n    @property\n    def max(self) -> Optional[int]:\n        return self._model.settings.max\n\n    @max.setter\n    def max(self, value: Optional[int]) -> None:\n        self._model.settings.max = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"FloatMetadataProperty\":\n        instance = FloatMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.FloatMetadataProperty.__init__","title":"__init__(name, min=None, max=None, title=None, visible_for_annotators=True, client=None)","text":"

Create a metadata field with float settings.

Parameters:

Name Type Description Default name str

The name of the metadata field

required min Optional[float]

The minimum valid value. If none is provided, it will be computed from the values provided in the records.

None max Optional[float]

The maximum valid value. If none is provided, it will be computed from the values provided in the records.

None title Optional[str]

The title of the metadata to be shown in the UI

None visible_for_annotators Optional[bool]

Whether the metadata field is visible for annotators.

True

Raises:

Type Description MetadataError

If an error occurs while defining metadata settings.

Source code in src/argilla/settings/_metadata.py
def __init__(\n    self,\n    name: str,\n    min: Optional[float] = None,\n    max: Optional[float] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with float settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        min (Optional[float]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n        max (Optional[float]): The maximum valid value. If none is provided, it will be computed from the values provided in the records.\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings.\n    \"\"\"\n\n    super().__init__(client=client)\n\n    try:\n        settings = FloatMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.float)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.float,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.IntegerMetadataProperty","title":"IntegerMetadataProperty","text":"

Bases: MetadataPropertyBase

Source code in src/argilla/settings/_metadata.py
class IntegerMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        min: Optional[int] = None,\n        max: Optional[int] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with integer settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            min (Optional[int]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n            max (Optional[int]): The maximum  valid value. If none is provided, it will be computed from the values provided in the records.\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings.\n        \"\"\"\n        super().__init__(client=client)\n\n        try:\n            settings = IntegerMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.integer)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.integer,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def min(self) -> Optional[int]:\n        return self._model.settings.min\n\n    @min.setter\n    def min(self, value: Optional[int]) -> None:\n        self._model.settings.min = value\n\n    @property\n    def max(self) -> Optional[int]:\n        return self._model.settings.max\n\n    @max.setter\n    def max(self, value: Optional[int]) -> None:\n        self._model.settings.max = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"IntegerMetadataProperty\":\n        instance = IntegerMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.IntegerMetadataProperty.__init__","title":"__init__(name, min=None, max=None, title=None, visible_for_annotators=True, client=None)","text":"

Create a metadata field with integer settings.

Parameters:

Name Type Description Default name str

The name of the metadata field

required min Optional[int]

The minimum valid value. If none is provided, it will be computed from the values provided in the records.

None max Optional[int]

The maximum valid value. If none is provided, it will be computed from the values provided in the records.

None title Optional[str]

The title of the metadata to be shown in the UI

None visible_for_annotators Optional[bool]

Whether the metadata field is visible for annotators.

True

Raises:

Type Description MetadataError

If an error occurs while defining metadata settings.

Source code in src/argilla/settings/_metadata.py
def __init__(\n    self,\n    name: str,\n    min: Optional[int] = None,\n    max: Optional[int] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with integer settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        min (Optional[int]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n        max (Optional[int]): The maximum  valid value. If none is provided, it will be computed from the values provided in the records.\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings.\n    \"\"\"\n    super().__init__(client=client)\n\n    try:\n        settings = IntegerMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.integer)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.integer,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.TermsMetadataProperty","title":"TermsMetadataProperty","text":"

Bases: MetadataPropertyBase

Source code in src/argilla/settings/_metadata.py
class TermsMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        options: Optional[List[str]] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with terms settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            options (Optional[List[str]]): The list of options\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings\n        \"\"\"\n        super().__init__(client=client)\n\n        try:\n            settings = TermsMetadataPropertySettings(values=options, type=MetadataPropertyType.terms)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.terms,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def options(self) -> Optional[List[str]]:\n        return self._model.settings.values\n\n    @options.setter\n    def options(self, value: list[str]) -> None:\n        self._model.settings.values = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"TermsMetadataProperty\":\n        instance = TermsMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.TermsMetadataProperty.__init__","title":"__init__(name, options=None, title=None, visible_for_annotators=True, client=None)","text":"

Create a metadata field with terms settings.

Parameters:

Name Type Description Default name str

The name of the metadata field

required options Optional[List[str]]

The list of options

None title Optional[str]

The title of the metadata to be shown in the UI

None visible_for_annotators Optional[bool]

Whether the metadata field is visible for annotators.

True

Raises:

Type Description MetadataError

If an error occurs while defining metadata settings

Source code in src/argilla/settings/_metadata.py
def __init__(\n    self,\n    name: str,\n    options: Optional[List[str]] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with terms settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        options (Optional[List[str]]): The list of options\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings\n    \"\"\"\n    super().__init__(client=client)\n\n    try:\n        settings = TermsMetadataPropertySettings(values=options, type=MetadataPropertyType.terms)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.terms,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
"},{"location":"reference/argilla/settings/questions/","title":"Questions","text":"

Argilla uses questions to gather the feedback. The questions will be answered by users or models.

"},{"location":"reference/argilla/settings/questions/#usage-examples","title":"Usage Examples","text":"

To define a label question, for example, instantiate the LabelQuestion class and pass it to the Settings class.

label_question = rg.LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])\n\nsettings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        label_question,\n    ],\n)\n

Questions can be combined in extensible ways based on the type of feedback you want to collect. For example, you can combine a label question with a text question to collect both a label and a text response.

label_question = rg.LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])\ntext_question = rg.TextQuestion(name=\"response\")\n\nsettings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        label_question,\n        text_question,\n    ],\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

To add records with responses to questions, refer to the rg.Response class documentation.

"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.LabelQuestion","title":"LabelQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class LabelQuestion(QuestionPropertyBase):\n    _model: LabelQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        labels: Union[List[str], Dict[str, str]],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n        visible_labels: Optional[int] = None,\n    ) -> None:\n        \"\"\" Define a new label question for `Settings` of a `Dataset`. A label \\\n            question is a question where the user can select one label from \\\n            a list of available labels.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            title (Optional[str]): The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n        \"\"\"\n        self._model = LabelQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=LabelQuestionSettings(\n                options=self._render_values_as_options(labels), visible_options=visible_labels\n            ),\n        )\n\n    @classmethod\n    def from_model(cls, model: LabelQuestionModel) -> \"LabelQuestion\":\n        instance = cls(name=model.name, labels=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"LabelQuestion\":\n        model = LabelQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    ##############################\n    # Public properties\n    ##############################\n\n    @property\n    def labels(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @labels.setter\n    def labels(self, labels: List[str]) -> None:\n        self._model.settings.options = self._render_values_as_options(labels)\n\n    @property\n    def visible_labels(self) -> Optional[int]:\n        return self._model.settings.visible_options\n\n    @visible_labels.setter\n    def visible_labels(self, visible_labels: Optional[int]) -> None:\n        self._model.settings.visible_options = visible_labels\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.LabelQuestion.__init__","title":"__init__(name, labels, title=None, description=None, required=True, visible_labels=None)","text":"

Define a new label question for Settings of a Dataset. A label question is a question where the user can select one label from a list of available labels.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required labels Union[List[str], Dict[str, str]]

The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

required title Optional[str]

The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True visible_labels Optional[int]

The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

None Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    labels: Union[List[str], Dict[str, str]],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n    visible_labels: Optional[int] = None,\n) -> None:\n    \"\"\" Define a new label question for `Settings` of a `Dataset`. A label \\\n        question is a question where the user can select one label from \\\n        a list of available labels.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        title (Optional[str]): The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n        visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n            Setting it to None show all options.\n    \"\"\"\n    self._model = LabelQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=LabelQuestionSettings(\n            options=self._render_values_as_options(labels), visible_options=visible_labels\n        ),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.MultiLabelQuestion","title":"MultiLabelQuestion","text":"

Bases: LabelQuestion

Source code in src/argilla/settings/_question.py
class MultiLabelQuestion(LabelQuestion):\n    _model: MultiLabelQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        labels: Union[List[str], Dict[str, str]],\n        visible_labels: Optional[int] = None,\n        labels_order: Literal[\"natural\", \"suggestion\"] = \"natural\",\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new multi-label question for `Settings` of a `Dataset`. A \\\n            multi-label question is a question where the user can select multiple \\\n            labels from a list of available labels.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n            labels_order (Literal[\"natural\", \"suggestion\"]): The order of the labels in the UI. \\\n                Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). \\\n                The score of the suggestion will be taken into account for ordering if available.\n            title (Optional[str]: The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = MultiLabelQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=MultiLabelQuestionSettings(\n                options=self._render_values_as_options(labels),\n                visible_options=visible_labels,\n                options_order=labels_order,\n            ),\n        )\n\n    @classmethod\n    def from_model(cls, model: MultiLabelQuestionModel) -> \"MultiLabelQuestion\":\n        instance = cls(\n            name=model.name,\n            labels=cls._render_options_as_values(model.settings.options),\n            labels_order=model.settings.options_order,\n        )\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"MultiLabelQuestion\":\n        model = MultiLabelQuestionModel(**data)\n        return cls.from_model(model=model)\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.MultiLabelQuestion.__init__","title":"__init__(name, labels, visible_labels=None, labels_order='natural', title=None, description=None, required=True)","text":"

Create a new multi-label question for Settings of a Dataset. A multi-label question is a question where the user can select multiple labels from a list of available labels.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required labels Union[List[str], Dict[str, str]]

The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

required visible_labels Optional[int]

The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

None labels_order Literal['natural', 'suggestion']

The order of the labels in the UI. Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). The score of the suggestion will be taken into account for ordering if available.

'natural' title Optional[str]

The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    labels: Union[List[str], Dict[str, str]],\n    visible_labels: Optional[int] = None,\n    labels_order: Literal[\"natural\", \"suggestion\"] = \"natural\",\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new multi-label question for `Settings` of a `Dataset`. A \\\n        multi-label question is a question where the user can select multiple \\\n        labels from a list of available labels.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n            Setting it to None show all options.\n        labels_order (Literal[\"natural\", \"suggestion\"]): The order of the labels in the UI. \\\n            Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). \\\n            The score of the suggestion will be taken into account for ordering if available.\n        title (Optional[str]: The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = MultiLabelQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=MultiLabelQuestionSettings(\n            options=self._render_values_as_options(labels),\n            visible_options=visible_labels,\n            options_order=labels_order,\n        ),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RankingQuestion","title":"RankingQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class RankingQuestion(QuestionPropertyBase):\n    _model: RankingQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        values: Union[List[str], Dict[str, str]],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new ranking question for `Settings` of a `Dataset`. A ranking question \\\n            is a question where the user can rank a list of options.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            values (Union[List[str], Dict[str, str]]): The list of options to be ranked, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = RankingQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=RankingQuestionSettings(options=self._render_values_as_options(values)),\n        )\n\n    @classmethod\n    def from_model(cls, model: RankingQuestionModel) -> \"RankingQuestion\":\n        instance = cls(name=model.name, values=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"RankingQuestion\":\n        model = RankingQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def values(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @values.setter\n    def values(self, values: List[int]) -> None:\n        self._model.settings.options = self._render_values_as_options(values)\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RankingQuestion.__init__","title":"__init__(name, values, title=None, description=None, required=True)","text":"

Create a new ranking question for Settings of a Dataset. A ranking question is a question where the user can rank a list of options.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required values Union[List[str], Dict[str, str]]

The list of options to be ranked, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

required title Optional[str]

) The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    values: Union[List[str], Dict[str, str]],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new ranking question for `Settings` of a `Dataset`. A ranking question \\\n        is a question where the user can rank a list of options.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        values (Union[List[str], Dict[str, str]]): The list of options to be ranked, or a \\\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        title (Optional[str]:) The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = RankingQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=RankingQuestionSettings(options=self._render_values_as_options(values)),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.TextQuestion","title":"TextQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class TextQuestion(QuestionPropertyBase):\n    _model: TextQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n        use_markdown: bool = False,\n    ) -> None:\n        \"\"\"Create a new text question for `Settings` of a `Dataset`. A text question \\\n            is a question where the user can input text.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            title (Optional[str]): The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n            use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n                to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n        \"\"\"\n        self._model = TextQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=TextQuestionSettings(use_markdown=use_markdown),\n        )\n\n    @classmethod\n    def from_model(cls, model: TextQuestionModel) -> \"TextQuestion\":\n        instance = cls(name=model.name)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"TextQuestion\":\n        model = TextQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def use_markdown(self) -> bool:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, use_markdown: bool) -> None:\n        self._model.settings.use_markdown = use_markdown\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.TextQuestion.__init__","title":"__init__(name, title=None, description=None, required=True, use_markdown=False)","text":"

Create a new text question for Settings of a Dataset. A text question is a question where the user can input text.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required title Optional[str]

The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True use_markdown Optional[bool]

Whether to render the markdown in the UI. When True, you will be able to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.

False Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n    use_markdown: bool = False,\n) -> None:\n    \"\"\"Create a new text question for `Settings` of a `Dataset`. A text question \\\n        is a question where the user can input text.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        title (Optional[str]): The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n        use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n            to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n    \"\"\"\n    self._model = TextQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=TextQuestionSettings(use_markdown=use_markdown),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RatingQuestion","title":"RatingQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class RatingQuestion(QuestionPropertyBase):\n    _model: RatingQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        values: List[int],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new rating question for `Settings` of a `Dataset`. A rating question \\\n            is a question where the user can select a value from a sequential list of options.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            values (List[int]): The list of selectable values. It should be defined in the range [0, 10].\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = RatingQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            values=values,\n            settings=RatingQuestionSettings(options=self._render_values_as_options(values)),\n        )\n\n    @classmethod\n    def from_model(cls, model: RatingQuestionModel) -> \"RatingQuestion\":\n        instance = cls(name=model.name, values=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"RatingQuestion\":\n        model = RatingQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def values(self) -> List[int]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @values.setter\n    def values(self, values: List[int]) -> None:\n        self._model.values = self._render_values_as_options(values)\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RatingQuestion.__init__","title":"__init__(name, values, title=None, description=None, required=True)","text":"

Create a new rating question for Settings of a Dataset. A rating question is a question where the user can select a value from a sequential list of options.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required values List[int]

The list of selectable values. It should be defined in the range [0, 10].

required title Optional[str]

) The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    values: List[int],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new rating question for `Settings` of a `Dataset`. A rating question \\\n        is a question where the user can select a value from a sequential list of options.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        values (List[int]): The list of selectable values. It should be defined in the range [0, 10].\n        title (Optional[str]:) The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = RatingQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        values=values,\n        settings=RatingQuestionSettings(options=self._render_values_as_options(values)),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.SpanQuestion","title":"SpanQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class SpanQuestion(QuestionPropertyBase):\n    _model: SpanQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        field: str,\n        labels: Union[List[str], Dict[str, str]],\n        allow_overlapping: bool = False,\n        visible_labels: Optional[int] = None,\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ):\n        \"\"\" Create a new span question for `Settings` of a `Dataset`. A span question \\\n            is a question where the user can select a section of text within a text field \\\n            and assign it a label.\n\n            Parameters:\n                name (str): The name of the question to be used as a reference.\n                field (str): The name of the text field where the span question will be applied.\n                labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                    dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n                allow_overlapping (bool): This value specifies whether overlapped spans are allowed or not.\n                visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                    Setting it to None show all options.\n                title (Optional[str]:) The title of the question to be shown in the UI.\n                description (Optional[str]): The description of the question to be shown in the UI.\n                required (bool): If the question is required for a record to be valid. At least one question must be required.\n            \"\"\"\n        self._model = SpanQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=SpanQuestionSettings(\n                field=field,\n                allow_overlapping=allow_overlapping,\n                visible_options=visible_labels,\n                options=self._render_values_as_options(labels),\n            ),\n        )\n\n    @property\n    def name(self):\n        return self._model.name\n\n    @property\n    def field(self):\n        return self._model.settings.field\n\n    @field.setter\n    def field(self, field: str):\n        self._model.settings.field = field\n\n    @property\n    def allow_overlapping(self):\n        return self._model.settings.allow_overlapping\n\n    @allow_overlapping.setter\n    def allow_overlapping(self, allow_overlapping: bool):\n        self._model.settings.allow_overlapping = allow_overlapping\n\n    @property\n    def visible_labels(self) -> Optional[int]:\n        return self._model.settings.visible_options\n\n    @visible_labels.setter\n    def visible_labels(self, visible_labels: Optional[int]) -> None:\n        self._model.settings.visible_options = visible_labels\n\n    @property\n    def labels(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @labels.setter\n    def labels(self, labels: List[str]) -> None:\n        self._model.settings.options = self._render_values_as_options(labels)\n\n    @classmethod\n    def from_model(cls, model: SpanQuestionModel) -> \"SpanQuestion\":\n        instance = cls(\n            name=model.name,\n            field=model.settings.field,\n            labels=cls._render_options_as_values(model.settings.options),\n        )\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"SpanQuestion\":\n        model = SpanQuestionModel(**data)\n        return cls.from_model(model=model)\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.SpanQuestion.__init__","title":"__init__(name, field, labels, allow_overlapping=False, visible_labels=None, title=None, description=None, required=True)","text":"

Create a new span question for Settings of a Dataset. A span question is a question where the user can select a section of text within a text field and assign it a label.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required field str

The name of the text field where the span question will be applied.

required labels Union[List[str], Dict[str, str]]

The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

required allow_overlapping bool

This value specifies whether overlapped spans are allowed or not.

False visible_labels Optional[int]

The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

None title Optional[str]

) The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    field: str,\n    labels: Union[List[str], Dict[str, str]],\n    allow_overlapping: bool = False,\n    visible_labels: Optional[int] = None,\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n):\n    \"\"\" Create a new span question for `Settings` of a `Dataset`. A span question \\\n        is a question where the user can select a section of text within a text field \\\n        and assign it a label.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            field (str): The name of the text field where the span question will be applied.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            allow_overlapping (bool): This value specifies whether overlapped spans are allowed or not.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n    self._model = SpanQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=SpanQuestionSettings(\n            field=field,\n            allow_overlapping=allow_overlapping,\n            visible_options=visible_labels,\n            options=self._render_values_as_options(labels),\n        ),\n    )\n
"},{"location":"reference/argilla/settings/settings/","title":"rg.Settings","text":"

rg.Settings is used to define the setttings of an Argilla Dataset. The settings can be used to configure the behavior of the dataset, such as the fields, questions, guidelines, metadata, and vectors. The Settings class is passed to the Dataset class and used to create the dataset on the server. Once created, the settings of a dataset cannot be changed.

"},{"location":"reference/argilla/settings/settings/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/settings/settings/#creating-a-new-dataset-with-settings","title":"Creating a new dataset with settings","text":"

To create a new dataset with settings, instantiate the Settings class and pass it to the Dataset class.

import argilla as rg\n\nsettings = rg.Settings(\n    guidelines=\"Select the sentiment of the prompt.\",\n    fields=[rg.TextField(name=\"prompt\", use_markdown=True)],\n    questions=[rg.LabelQuestion(name=\"sentiment\", labels=[\"positive\", \"negative\"])],\n)\n\ndataset = rg.Dataset(name=\"sentiment_analysis\", settings=settings)\n\n# Create the dataset on the server\ndataset.create()\n

To define the settings for fields, questions, metadata, vectors, or distribution, refer to the rg.TextField, rg.LabelQuestion, rg.TermsMetadataProperty, and rg.VectorField, rg.TaskDistribution class documentation.

"},{"location":"reference/argilla/settings/settings/#creating-settings-using-built-in-templates","title":"Creating settings using built in templates","text":"

Argilla provides built-in templates for creating settings for common dataset types. To use a template, use the class methods of the Settings class. There are three built-in templates available for classification, ranking, and rating tasks. Template settings also include default guidelines and mappings.

"},{"location":"reference/argilla/settings/settings/#classification-task","title":"Classification Task","text":"

You can define a classification task using the rg.Settings.for_classification class method. This will create a dataset with a text field and a label question. You can select field types using the field_type parameter with image or text.

settings = rg.Settings.for_classification(labels=[\"positive\", \"negative\"]) # (1)\n

This will return a Settings object with the following settings:

settings = Settings(\n    guidelines=\"Select a label for the document.\",\n    fields=[rg.TextField(field_type)(name=\"text\")],\n    questions=[LabelQuestion(name=\"label\", labels=labels)],\n    mapping={\"input\": \"text\", \"output\": \"label\", \"document\": \"text\"},\n)\n
"},{"location":"reference/argilla/settings/settings/#ranking-task","title":"Ranking Task","text":"

You can define a ranking task using the rg.Settings.for_ranking class method. This will create a dataset with a text field and a ranking question.

settings = rg.Settings.for_ranking()\n

This will return a Settings object with the following settings:

settings = Settings(\n    guidelines=\"Rank the responses.\",\n    fields=[\n        rg.TextField(name=\"instruction\"),\n        rg.TextField(name=\"response1\"),\n        rg.TextField(name=\"response2\"),\n    ],\n    questions=[RankingQuestion(name=\"ranking\", values=[\"response1\", \"response2\"])],\n    mapping={\n        \"input\": \"instruction\",\n        \"prompt\": \"instruction\",\n        \"chosen\": \"response1\",\n        \"rejected\": \"response2\",\n    },\n)\n
"},{"location":"reference/argilla/settings/settings/#rating-task","title":"Rating Task","text":"

You can define a rating task using the rg.Settings.for_rating class method. This will create a dataset with a text field and a rating question.

settings = rg.Settings.for_rating()\n

This will return a Settings object with the following settings:

settings = Settings(\n    guidelines=\"Rate the response.\",\n    fields=[\n        rg.TextField(name=\"instruction\"),\n        rg.TextField(name=\"response\"),\n    ],\n    questions=[RatingQuestion(name=\"rating\", values=[1, 2, 3, 4, 5])],\n    mapping={\n        \"input\": \"instruction\",\n        \"prompt\": \"instruction\",\n        \"output\": \"response\",\n        \"score\": \"rating\",\n    },\n)\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings","title":"Settings","text":"

Bases: DefaultSettingsMixin, Resource

Settings class for Argilla Datasets.

This class is used to define the representation of a Dataset within the UI.

Source code in src/argilla/settings/_resource.py
class Settings(DefaultSettingsMixin, Resource):\n    \"\"\"\n    Settings class for Argilla Datasets.\n\n    This class is used to define the representation of a Dataset within the UI.\n    \"\"\"\n\n    def __init__(\n        self,\n        fields: Optional[List[Field]] = None,\n        questions: Optional[List[QuestionType]] = None,\n        vectors: Optional[List[VectorField]] = None,\n        metadata: Optional[List[MetadataType]] = None,\n        guidelines: Optional[str] = None,\n        allow_extra_metadata: bool = False,\n        distribution: Optional[TaskDistribution] = None,\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        _dataset: Optional[\"Dataset\"] = None,\n    ) -> None:\n        \"\"\"\n        Args:\n            fields (List[Field]): A list of Field objects that represent the fields in the Dataset.\n            questions (List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]):\n                A list of Question objects that represent the questions in the Dataset.\n            vectors (List[VectorField]): A list of VectorField objects that represent the vectors in the Dataset.\n            metadata (List[MetadataField]): A list of MetadataField objects that represent the metadata in the Dataset.\n            guidelines (str): A string containing the guidelines for the Dataset.\n            allow_extra_metadata (bool): A boolean that determines whether or not extra metadata is allowed in the\n                Dataset. Defaults to False.\n            distribution (TaskDistribution): The annotation task distribution configuration.\n                Default to DEFAULT_TASK_DISTRIBUTION\n            mapping (Dict[str, Union[str, Sequence[str]]]): A dictionary that maps incoming data names to Argilla dataset attributes in DatasetRecords.\n        \"\"\"\n        super().__init__(client=_dataset._client if _dataset else None)\n\n        self._dataset = _dataset\n        self._distribution = distribution\n        self._mapping = mapping\n        self.__guidelines = self.__process_guidelines(guidelines)\n        self.__allow_extra_metadata = allow_extra_metadata\n\n        self.__questions = QuestionsProperties(self, questions)\n        self.__fields = SettingsProperties(self, fields)\n        self.__vectors = SettingsProperties(self, vectors)\n        self.__metadata = SettingsProperties(self, metadata)\n\n    #####################\n    # Properties        #\n    #####################\n\n    @property\n    def fields(self) -> \"SettingsProperties\":\n        return self.__fields\n\n    @fields.setter\n    def fields(self, fields: List[Field]):\n        self.__fields = SettingsProperties(self, fields)\n\n    @property\n    def questions(self) -> \"SettingsProperties\":\n        return self.__questions\n\n    @questions.setter\n    def questions(self, questions: List[QuestionType]):\n        self.__questions = QuestionsProperties(self, questions)\n\n    @property\n    def vectors(self) -> \"SettingsProperties\":\n        return self.__vectors\n\n    @vectors.setter\n    def vectors(self, vectors: List[VectorField]):\n        self.__vectors = SettingsProperties(self, vectors)\n\n    @property\n    def metadata(self) -> \"SettingsProperties\":\n        return self.__metadata\n\n    @metadata.setter\n    def metadata(self, metadata: List[MetadataType]):\n        self.__metadata = SettingsProperties(self, metadata)\n\n    @property\n    def guidelines(self) -> str:\n        return self.__guidelines\n\n    @guidelines.setter\n    def guidelines(self, guidelines: str):\n        self.__guidelines = self.__process_guidelines(guidelines)\n\n    @property\n    def allow_extra_metadata(self) -> bool:\n        return self.__allow_extra_metadata\n\n    @allow_extra_metadata.setter\n    def allow_extra_metadata(self, value: bool):\n        self.__allow_extra_metadata = value\n\n    @property\n    def distribution(self) -> TaskDistribution:\n        return self._distribution or TaskDistribution.default()\n\n    @distribution.setter\n    def distribution(self, value: TaskDistribution) -> None:\n        self._distribution = value\n\n    @property\n    def mapping(self) -> Dict[str, Union[str, Sequence[str]]]:\n        return self._mapping\n\n    @mapping.setter\n    def mapping(self, value: Dict[str, Union[str, Sequence[str]]]):\n        self._mapping = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, dataset: \"Dataset\"):\n        self._dataset = dataset\n        self._client = dataset._client\n\n    @cached_property\n    def schema(self) -> dict:\n        schema_dict = {}\n\n        for field in self.fields:\n            schema_dict[field.name] = field\n\n        for question in self.questions:\n            schema_dict[question.name] = question\n\n        for vector in self.vectors:\n            schema_dict[vector.name] = vector\n\n        for metadata in self.metadata:\n            schema_dict[metadata.name] = metadata\n\n        return schema_dict\n\n    @cached_property\n    def schema_by_id(self) -> Dict[UUID, Union[Field, QuestionType, MetadataType, VectorField]]:\n        return {v.id: v for v in self.schema.values()}\n\n    def validate(self) -> None:\n        self._validate_empty_settings()\n        self._validate_duplicate_names()\n\n        for field in self.fields:\n            field.validate()\n\n    #####################\n    #  Public methods   #\n    #####################\n\n    def get(self) -> \"Settings\":\n        self.fields = self._fetch_fields()\n        self.questions = self._fetch_questions()\n        self.vectors = self._fetch_vectors()\n        self.metadata = self._fetch_metadata()\n        self.__fetch_dataset_related_attributes()\n\n        self._update_last_api_call()\n        return self\n\n    def create(self) -> \"Settings\":\n        self.validate()\n\n        self._update_dataset_related_attributes()\n        self.__fields.create()\n        self.__questions.create()\n        self.__vectors.create()\n        self.__metadata.create()\n\n        self._update_last_api_call()\n        return self\n\n    def update(self) -> \"Resource\":\n        self.validate()\n\n        self._update_dataset_related_attributes()\n        self.__fields.update()\n        self.__vectors.update()\n        self.__metadata.update()\n        # self.questions.update()\n\n        self._update_last_api_call()\n        return self\n\n    def serialize(self):\n        try:\n            return {\n                \"guidelines\": self.guidelines,\n                \"questions\": self.__questions.serialize(),\n                \"fields\": self.__fields.serialize(),\n                \"vectors\": self.vectors.serialize(),\n                \"metadata\": self.metadata.serialize(),\n                \"allow_extra_metadata\": self.allow_extra_metadata,\n                \"distribution\": self.distribution.to_dict(),\n                \"mapping\": self.mapping,\n            }\n        except Exception as e:\n            raise ArgillaSerializeError(f\"Failed to serialize the settings. {e.__class__.__name__}\") from e\n\n    def to_json(self, path: Union[Path, str]) -> None:\n        \"\"\"Save the settings to a file on disk\n\n        Parameters:\n            path (str): The path to save the settings to\n        \"\"\"\n        if not isinstance(path, Path):\n            path = Path(path)\n        if path.exists():\n            raise FileExistsError(f\"File {path} already exists\")\n        with open(path, \"w\") as file:\n            json.dump(self.serialize(), file)\n\n    @classmethod\n    def from_json(cls, path: Union[Path, str]) -> \"Settings\":\n        \"\"\"Load the settings from a file on disk\"\"\"\n\n        with open(path, \"r\") as file:\n            settings_dict = json.load(file)\n            return cls._from_dict(settings_dict)\n\n    @classmethod\n    def from_hub(\n        cls,\n        repo_id: str,\n        subset: Optional[str] = None,\n        feature_mapping: Optional[Dict[str, Literal[\"question\", \"field\", \"metadata\"]]] = None,\n        **kwargs,\n    ) -> \"Settings\":\n        \"\"\"Load the settings from the Hub\n\n        Parameters:\n            repo_id (str): The ID of the repository to load the settings from on the Hub.\n            subset (Optional[str]): The subset of the repository to load the settings from.\n            feature_mapping (Dict[str, Literal[\"question\", \"field\", \"metadata\"]]): A dictionary that maps incoming column names to Argilla attributes.\n        \"\"\"\n\n        settings = build_settings_from_repo_id(repo_id=repo_id, feature_mapping=feature_mapping, subset=subset)\n        return settings\n\n    def __eq__(self, other: \"Settings\") -> bool:\n        return self.serialize() == other.serialize()  # TODO: Create proper __eq__ methods for fields and questions\n\n    #####################\n    #  Repr Methods     #\n    #####################\n\n    def __repr__(self) -> str:\n        return (\n            f\"Settings(guidelines={self.guidelines}, allow_extra_metadata={self.allow_extra_metadata}, \"\n            f\"distribution={self.distribution}, \"\n            f\"fields={self.fields}, questions={self.questions}, vectors={self.vectors}, metadata={self.metadata})\"\n        )\n\n    #####################\n    #  Private methods  #\n    #####################\n\n    @classmethod\n    def _from_dict(cls, settings_dict: dict) -> \"Settings\":\n        fields = settings_dict.get(\"fields\", [])\n        vectors = settings_dict.get(\"vectors\", [])\n        metadata = settings_dict.get(\"metadata\", [])\n        guidelines = settings_dict.get(\"guidelines\")\n        distribution = settings_dict.get(\"distribution\")\n        allow_extra_metadata = settings_dict.get(\"allow_extra_metadata\")\n        mapping = settings_dict.get(\"mapping\")\n\n        questions = [question_from_dict(question) for question in settings_dict.get(\"questions\", [])]\n        fields = [_field_from_dict(field) for field in fields]\n        vectors = [VectorField.from_dict(vector) for vector in vectors]\n        metadata = [MetadataField.from_dict(metadata) for metadata in metadata]\n\n        if distribution:\n            distribution = TaskDistribution.from_dict(distribution)\n\n        if mapping:\n            mapping = cls._validate_mapping(mapping)\n\n        return cls(\n            questions=questions,\n            fields=fields,\n            vectors=vectors,\n            metadata=metadata,\n            guidelines=guidelines,\n            allow_extra_metadata=allow_extra_metadata,\n            distribution=distribution,\n            mapping=mapping,\n        )\n\n    def _copy(self) -> \"Settings\":\n        instance = self.__class__._from_dict(self.serialize())\n        return instance\n\n    def _fetch_fields(self) -> List[Field]:\n        models = self._client.api.fields.list(dataset_id=self._dataset.id)\n        return [_field_from_model(model) for model in models]\n\n    def _fetch_questions(self) -> List[QuestionType]:\n        models = self._client.api.questions.list(dataset_id=self._dataset.id)\n        return [question_from_model(model) for model in models]\n\n    def _fetch_vectors(self) -> List[VectorField]:\n        models = self.dataset._client.api.vectors.list(self.dataset.id)\n        return [VectorField.from_model(model) for model in models]\n\n    def _fetch_metadata(self) -> List[MetadataType]:\n        models = self._client.api.metadata.list(dataset_id=self._dataset.id)\n        return [MetadataField.from_model(model) for model in models]\n\n    def __fetch_dataset_related_attributes(self):\n        # This flow may be a bit weird, but it's the only way to update the dataset related attributes\n        # Everything is point that we should have several settings-related endpoints in the API to handle this.\n        # POST /api/v1/datasets/{dataset_id}/settings\n        # {\n        #   \"guidelines\": ....,\n        #   \"allow_extra_metadata\": ....,\n        # }\n        # But this is not implemented yet, so we need to update the dataset model directly\n        dataset_model = self._client.api.datasets.get(self._dataset.id)\n\n        self.guidelines = dataset_model.guidelines\n        self.allow_extra_metadata = dataset_model.allow_extra_metadata\n\n        if dataset_model.distribution:\n            self.distribution = TaskDistribution.from_model(dataset_model.distribution)\n\n    def _update_dataset_related_attributes(self):\n        # This flow may be a bit weird, but it's the only way to update the dataset related attributes\n        # Everything is point that we should have several settings-related endpoints in the API to handle this.\n        # POST /api/v1/datasets/{dataset_id}/settings\n        # {\n        #   \"guidelines\": ....,\n        #   \"allow_extra_metadata\": ....,\n        # }\n        # But this is not implemented yet, so we need to update the dataset model directly\n        dataset_model = DatasetModel(\n            id=self._dataset.id,\n            name=self._dataset.name,\n            guidelines=self.guidelines,\n            allow_extra_metadata=self.allow_extra_metadata,\n            distribution=self.distribution._api_model(),\n        )\n        self._client.api.datasets.update(dataset_model)\n\n    def _validate_empty_settings(self):\n        if not all([self.fields, self.questions]):\n            message = \"Fields and questions are required\"\n            raise SettingsError(message=message)\n\n    def _validate_duplicate_names(self) -> None:\n        dataset_properties_by_name = {}\n\n        for properties in [self.fields, self.questions, self.vectors, self.metadata]:\n            for property in properties:\n                if property.name in dataset_properties_by_name:\n                    raise SettingsError(\n                        f\"names of dataset settings must be unique, \"\n                        f\"but the name {property.name!r} is used by {type(property).__name__!r} and {type(dataset_properties_by_name[property.name]).__name__!r} \"\n                    )\n                dataset_properties_by_name[property.name] = property\n\n    @classmethod\n    def _validate_mapping(cls, mapping: Dict[str, Union[str, Sequence[str]]]) -> dict:\n        validate_mapping = {}\n        for key, value in mapping.items():\n            if isinstance(value, str):\n                validate_mapping[key] = value\n            elif isinstance(value, list) or isinstance(value, tuple):\n                validate_mapping[key] = tuple(value)\n            else:\n                raise SettingsError(f\"Invalid mapping value for key {key!r}: {value}\")\n\n        return validate_mapping\n\n    @classmethod\n    def _sanitize_settings_name(cls, name: str) -> str:\n        \"\"\"Sanitize the name for the settings\"\"\"\n\n        for char in [\" \", \":\", \".\", \"&\", \"?\", \"!\"]:\n            name = name.replace(char, \"_\")\n\n        return name.lower()\n\n    def __process_guidelines(self, guidelines):\n        if guidelines is None:\n            return guidelines\n\n        if not isinstance(guidelines, str):\n            raise SettingsError(\"Guidelines must be a string or a path to a file\")\n\n        if os.path.exists(guidelines):\n            with open(guidelines, \"r\") as file:\n                return file.read()\n\n        return guidelines\n\n    @classmethod\n    def _is_valid_name(cls, name: str) -> bool:\n        \"\"\"Check if the name is valid\"\"\"\n        return bool(re.match(r\"^(?=.*[a-z0-9])[a-z0-9_-]+$\", name))\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.__init__","title":"__init__(fields=None, questions=None, vectors=None, metadata=None, guidelines=None, allow_extra_metadata=False, distribution=None, mapping=None, _dataset=None)","text":"

Parameters:

Name Type Description Default fields List[Field]

A list of Field objects that represent the fields in the Dataset.

None questions List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]

A list of Question objects that represent the questions in the Dataset.

None vectors List[VectorField]

A list of VectorField objects that represent the vectors in the Dataset.

None metadata List[MetadataField]

A list of MetadataField objects that represent the metadata in the Dataset.

None guidelines str

A string containing the guidelines for the Dataset.

None allow_extra_metadata bool

A boolean that determines whether or not extra metadata is allowed in the Dataset. Defaults to False.

False distribution TaskDistribution

The annotation task distribution configuration. Default to DEFAULT_TASK_DISTRIBUTION

None mapping Dict[str, Union[str, Sequence[str]]]

A dictionary that maps incoming data names to Argilla dataset attributes in DatasetRecords.

None Source code in src/argilla/settings/_resource.py
def __init__(\n    self,\n    fields: Optional[List[Field]] = None,\n    questions: Optional[List[QuestionType]] = None,\n    vectors: Optional[List[VectorField]] = None,\n    metadata: Optional[List[MetadataType]] = None,\n    guidelines: Optional[str] = None,\n    allow_extra_metadata: bool = False,\n    distribution: Optional[TaskDistribution] = None,\n    mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n    _dataset: Optional[\"Dataset\"] = None,\n) -> None:\n    \"\"\"\n    Args:\n        fields (List[Field]): A list of Field objects that represent the fields in the Dataset.\n        questions (List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]):\n            A list of Question objects that represent the questions in the Dataset.\n        vectors (List[VectorField]): A list of VectorField objects that represent the vectors in the Dataset.\n        metadata (List[MetadataField]): A list of MetadataField objects that represent the metadata in the Dataset.\n        guidelines (str): A string containing the guidelines for the Dataset.\n        allow_extra_metadata (bool): A boolean that determines whether or not extra metadata is allowed in the\n            Dataset. Defaults to False.\n        distribution (TaskDistribution): The annotation task distribution configuration.\n            Default to DEFAULT_TASK_DISTRIBUTION\n        mapping (Dict[str, Union[str, Sequence[str]]]): A dictionary that maps incoming data names to Argilla dataset attributes in DatasetRecords.\n    \"\"\"\n    super().__init__(client=_dataset._client if _dataset else None)\n\n    self._dataset = _dataset\n    self._distribution = distribution\n    self._mapping = mapping\n    self.__guidelines = self.__process_guidelines(guidelines)\n    self.__allow_extra_metadata = allow_extra_metadata\n\n    self.__questions = QuestionsProperties(self, questions)\n    self.__fields = SettingsProperties(self, fields)\n    self.__vectors = SettingsProperties(self, vectors)\n    self.__metadata = SettingsProperties(self, metadata)\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.to_json","title":"to_json(path)","text":"

Save the settings to a file on disk

Parameters:

Name Type Description Default path str

The path to save the settings to

required Source code in src/argilla/settings/_resource.py
def to_json(self, path: Union[Path, str]) -> None:\n    \"\"\"Save the settings to a file on disk\n\n    Parameters:\n        path (str): The path to save the settings to\n    \"\"\"\n    if not isinstance(path, Path):\n        path = Path(path)\n    if path.exists():\n        raise FileExistsError(f\"File {path} already exists\")\n    with open(path, \"w\") as file:\n        json.dump(self.serialize(), file)\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.from_json","title":"from_json(path) classmethod","text":"

Load the settings from a file on disk

Source code in src/argilla/settings/_resource.py
@classmethod\ndef from_json(cls, path: Union[Path, str]) -> \"Settings\":\n    \"\"\"Load the settings from a file on disk\"\"\"\n\n    with open(path, \"r\") as file:\n        settings_dict = json.load(file)\n        return cls._from_dict(settings_dict)\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.from_hub","title":"from_hub(repo_id, subset=None, feature_mapping=None, **kwargs) classmethod","text":"

Load the settings from the Hub

Parameters:

Name Type Description Default repo_id str

The ID of the repository to load the settings from on the Hub.

required subset Optional[str]

The subset of the repository to load the settings from.

None feature_mapping Dict[str, Literal['question', 'field', 'metadata']]

A dictionary that maps incoming column names to Argilla attributes.

None Source code in src/argilla/settings/_resource.py
@classmethod\ndef from_hub(\n    cls,\n    repo_id: str,\n    subset: Optional[str] = None,\n    feature_mapping: Optional[Dict[str, Literal[\"question\", \"field\", \"metadata\"]]] = None,\n    **kwargs,\n) -> \"Settings\":\n    \"\"\"Load the settings from the Hub\n\n    Parameters:\n        repo_id (str): The ID of the repository to load the settings from on the Hub.\n        subset (Optional[str]): The subset of the repository to load the settings from.\n        feature_mapping (Dict[str, Literal[\"question\", \"field\", \"metadata\"]]): A dictionary that maps incoming column names to Argilla attributes.\n    \"\"\"\n\n    settings = build_settings_from_repo_id(repo_id=repo_id, feature_mapping=feature_mapping, subset=subset)\n    return settings\n
"},{"location":"reference/argilla/settings/task_distribution/","title":"Distribution","text":"

Distribution settings are used to define the criteria used by the tool to automatically manage records in the dataset depending on the expected number of submitted responses per record.

"},{"location":"reference/argilla/settings/task_distribution/#usage-examples","title":"Usage Examples","text":"

The default minimum submitted responses per record is 1. If you wish to increase this value, you can define it through the TaskDistribution class and pass it to the Settings class.

settings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3)\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings\n)\n
"},{"location":"reference/argilla/settings/task_distribution/#src.argilla.settings._task_distribution.OverlapTaskDistribution","title":"OverlapTaskDistribution","text":"

The task distribution settings class.

This task distribution defines a number of submitted responses required to complete a record.

Parameters:

Name Type Description Default min_submitted int

The number of min. submitted responses to complete the record

required Source code in src/argilla/settings/_task_distribution.py
class OverlapTaskDistribution:\n    \"\"\"The task distribution settings class.\n\n    This task distribution defines a number of submitted responses required to complete a record.\n\n    Parameters:\n        min_submitted (int): The number of min. submitted responses to complete the record\n    \"\"\"\n\n    strategy: Literal[\"overlap\"] = \"overlap\"\n\n    def __init__(self, min_submitted: int):\n        self._model = OverlapTaskDistributionModel(min_submitted=min_submitted, strategy=self.strategy)\n\n    def __repr__(self) -> str:\n        return f\"OverlapTaskDistribution(min_submitted={self.min_submitted})\"\n\n    def __eq__(self, other) -> bool:\n        if not isinstance(other, self.__class__):\n            return False\n\n        return self._model == other._model\n\n    @classmethod\n    def default(cls) -> \"OverlapTaskDistribution\":\n        return cls(min_submitted=1)\n\n    @property\n    def min_submitted(self):\n        return self._model.min_submitted\n\n    @min_submitted.setter\n    def min_submitted(self, value: int):\n        self._model.min_submitted = value\n\n    @classmethod\n    def from_model(cls, model: OverlapTaskDistributionModel) -> \"OverlapTaskDistribution\":\n        return cls(min_submitted=model.min_submitted)\n\n    @classmethod\n    def from_dict(cls, dict: Dict[str, Any]) -> \"OverlapTaskDistribution\":\n        return cls.from_model(OverlapTaskDistributionModel.model_validate(dict))\n\n    def to_dict(self):\n        return self._model.model_dump()\n\n    def _api_model(self) -> OverlapTaskDistributionModel:\n        return self._model\n
"},{"location":"reference/argilla/settings/vectors/","title":"Vectors","text":"

Vector fields in Argilla are used to define the vector form of a record that will be reviewed by a user.

"},{"location":"reference/argilla/settings/vectors/#usage-examples","title":"Usage Examples","text":"

To define a vector field, instantiate the VectorField class with a name and dimensions, then pass it to the vectors parameter of the Settings class.

settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    vectors=[\n        rg.VectorField(\n            name=\"my_vector\",\n            dimension=768,\n            title=\"Document Embedding\",\n        ),\n    ],\n)\n

To add records with vectors, refer to the rg.Vector class documentation.

"},{"location":"reference/argilla/settings/vectors/#src.argilla.settings._vector.VectorField","title":"VectorField","text":"

Bases: Resource

Vector field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_vector.py
class VectorField(Resource):\n    \"\"\"Vector field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    _model: VectorFieldModel\n    _api: VectorsAPI\n    _dataset: Optional[\"Dataset\"]\n\n    def __init__(\n        self,\n        name: str,\n        dimensions: int,\n        title: Optional[str] = None,\n        _client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Vector field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the vector field\n            dimensions (int): The number of dimensions in the vector\n            title (Optional[str]): The title of the vector to be shown in the UI.\n        \"\"\"\n        client = _client or Argilla._get_default()\n        super().__init__(api=client.api.vectors, client=client)\n        self._model = VectorFieldModel(name=name, title=title, dimensions=dimensions)\n        self._dataset = None\n\n    @property\n    def name(self) -> str:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def title(self) -> Optional[str]:\n        return self._model.title\n\n    @title.setter\n    def title(self, value: Optional[str]) -> None:\n        self._model.title = value\n\n    @property\n    def dimensions(self) -> int:\n        return self._model.dimensions\n\n    @dimensions.setter\n    def dimensions(self, value: int) -> None:\n        self._model.dimensions = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n        self._model.dataset_id = self._dataset.id\n        self._with_client(self._dataset._client)\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(name={self.name}, title={self.title}, dimensions={self.dimensions})\"\n\n    @classmethod\n    def from_model(cls, model: VectorFieldModel) -> \"VectorField\":\n        instance = cls(name=model.name, dimensions=model.dimensions)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"VectorField\":\n        model = VectorFieldModel(**data)\n        return cls.from_model(model=model)\n\n    def _with_client(self, client: \"Argilla\") -> \"VectorField\":\n        # TODO: Review and simplify. Maybe only one of them is required\n        self._client = client\n        self._api = self._client.api.vectors\n\n        return self\n
"},{"location":"reference/argilla/settings/vectors/#src.argilla.settings._vector.VectorField.__init__","title":"__init__(name, dimensions, title=None, _client=None)","text":"

Vector field for use in Argilla Dataset Settings

Parameters:

Name Type Description Default name str

The name of the vector field

required dimensions int

The number of dimensions in the vector

required title Optional[str]

The title of the vector to be shown in the UI.

None Source code in src/argilla/settings/_vector.py
def __init__(\n    self,\n    name: str,\n    dimensions: int,\n    title: Optional[str] = None,\n    _client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Vector field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the vector field\n        dimensions (int): The number of dimensions in the vector\n        title (Optional[str]): The title of the vector to be shown in the UI.\n    \"\"\"\n    client = _client or Argilla._get_default()\n    super().__init__(api=client.api.vectors, client=client)\n    self._model = VectorFieldModel(name=name, title=title, dimensions=dimensions)\n    self._dataset = None\n
"},{"location":"reference/argilla-server/configuration/","title":"Server configuration","text":"

This section explains advanced operations and settings for running the Argilla Server and Argilla Python Client.

By default, the Argilla Server will look for your Elasticsearch (ES) endpoint at http://localhost:9200. You can customize this by setting the ARGILLA_ELASTICSEARCH environment variable. Have a look at the list of available environment variables to further configure the Argilla server.

From the Argilla version 1.19.0, you must set up the search engine manually to work with datasets. You should set the environment variable ARGILLA_SEARCH_ENGINE=opensearch or ARGILLA_SEARCH_ENGINE=elasticsearch depending on the backend you're using The default value for this variable is set to elasticsearch. The minimal version for Elasticsearch is 8.5.0, and for Opensearch is 2.4.0. Please, review your backend and upgrade it if necessary.

Warning

For vector search in OpenSearch, the filtering applied is using a post_filter step, since there is a bug that makes queries fail using filtering + knn from Argilla. See https://github.com/opensearch-project/k-NN/issues/1286

This may result in unexpected results when combining filtering with vector search with this engine.

"},{"location":"reference/argilla-server/configuration/#launching","title":"Launching","text":""},{"location":"reference/argilla-server/configuration/#using-a-proxy","title":"Using a proxy","text":"

If you run Argilla behind a proxy by adding some extra prefix to expose the service, you should set the ARGILLA_BASE_URL environment variable to properly route requests to the server application.

For example, if your proxy exposes Argilla in the URL https://my-proxy/custom-path-for-argilla, you should launch the Argilla server with ARGILLA_BASE_URL=/custom-path-for-argilla.

NGINX and Traefik have been tested and are known to work with Argilla:

  • NGINX example
  • Traefik example
"},{"location":"reference/argilla-server/configuration/#environment-variables","title":"Environment variables","text":"

You can set the following environment variables to further configure your server and client.

"},{"location":"reference/argilla-server/configuration/#server","title":"Server","text":""},{"location":"reference/argilla-server/configuration/#fastapi","title":"FastAPI","text":"
  • ARGILLA_HOME_PATH: The directory where Argilla will store all the files needed to run. If the path doesn't exist it will be automatically created (Default: ~/.argilla).

  • ARGILLA_BASE_URL: If you want to launch the Argilla server in a specific base path other than /, you should set up this environment variable. This can be useful when running Argilla behind a proxy that adds a prefix path to route the service (Default: \"/\").

  • ARGILLA_CORS_ORIGINS: List of host patterns for CORS origin access.

  • ARGILLA_DOCS_ENABLED: If False, disables openapi docs endpoint at /api/docs.

  • HF_HUB_DISABLE_TELEMETRY: If True, disables telemetry for usage metrics. Alternatively, you can disable telemetry by setting HF_HUB_OFFLINE=1.

"},{"location":"reference/argilla-server/configuration/#authentication","title":"Authentication","text":"
  • ARGILLA_AUTH_SECRET_KEY: The secret key used to sign the API token data. You can use openssl rand -hex 32 to generate a 32 character string to use with this environment variable. By default a random value is generated, so if you are using more than one server worker (or more than one Argilla server) you will need to set the same value for all of them.
  • USERNAME: If provided, the owner username (Default: None).
  • PASSWORD: If provided, the owner password (Default: None).

If USERNAME and PASSWORD are provided, the owner user will be created with these credentials on the server startup.

"},{"location":"reference/argilla-server/configuration/#database","title":"Database","text":"
  • ARGILLA_DATABASE_URL: A URL string that contains the necessary information to connect to a database. Argilla uses SQLite by default, PostgreSQL is also officially supported (Default: sqlite:///$ARGILLA_HOME_PATH/argilla.db?check_same_thread=False).
"},{"location":"reference/argilla-server/configuration/#sqlite","title":"SQLite","text":"

The following environment variables are useful only when SQLite is used:

  • ARGILLA_DATABASE_SQLITE_TIMEOUT: How many seconds the connection should wait before raising an OperationalError when a table is locked. If another connection opens a transaction to modify a table, that table will be locked until the transaction is committed. (Defaut: 15 seconds).
"},{"location":"reference/argilla-server/configuration/#postgresql","title":"PostgreSQL","text":"

The following environment variables are useful only when PostgreSQL is used:

  • ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE: The number of connections to keep open inside the database connection pool (Default: 15).

  • ARGILLA_DATABASE_POSTGRESQL_MAX_OVERFLOW: The number of connections that can be opened above and beyond ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE setting (Default: 10).

"},{"location":"reference/argilla-server/configuration/#search-engine","title":"Search engine","text":"
  • ARGILLA_ELASTICSEARCH: URL of the connection endpoint of the Elasticsearch instance (Default: http://localhost:9200).

  • ARGILLA_SEARCH_ENGINE: Search engine to use. Valid values are \"elasticsearch\" and \"opensearch\" (Default: \"elasticsearch\").

  • ARGILLA_ELASTICSEARCH_SSL_VERIFY: If \"False\", disables SSL certificate verification when connecting to the Elasticsearch backend.

  • ARGILLA_ELASTICSEARCH_CA_PATH: Path to CA cert for ES host. For example: /full/path/to/root-ca.pem (Optional)

"},{"location":"reference/argilla-server/configuration/#redis","title":"Redis","text":"

Redis is used by Argilla to store information about jobs to be processed on background. The following environment variables are useful to config how Argilla connects to Redis:

  • ARGILLA_REDIS_URL: A URL string that contains the necessary information to connect to a Redis instance (Default: redis://localhost:6379/0).
"},{"location":"reference/argilla-server/configuration/#datasets","title":"Datasets","text":"
  • ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS: Set the number of maximum items to be allowed by label and multi label questions (Default: 500).

  • ARGILLA_SPAN_OPTIONS_MAX_ITEMS: Set the number of maximum items to be allowed by span questions (Default: 500).

"},{"location":"reference/argilla-server/configuration/#hugging-face","title":"Hugging Face","text":"
  • ARGILLA_SHOW_HUGGINGFACE_SPACE_PERSISTENT_STORAGE_WARNING: When Argilla is running on Hugging Face Spaces you can use this environment variable to disable the warning message showed when persistent storage is disabled for the space (Default: true).
"},{"location":"reference/argilla-server/configuration/#docker-images-only","title":"Docker images only","text":"
  • REINDEX_DATASETS: If true or 1, the datasets will be reindexed in the search engine. This is needed when some search configuration changed or data must be refreshed (Default: 0).

  • USERNAME: If provided, the owner username. This can be combined with HF OAuth to define the argilla server owner (Default: \"\").

  • PASSWORD: If provided, the owner password. If USERNAME and PASSWORD are provided, the owner user will be created with these credentials on the server startup (Default: \"\").

  • WORKSPACE: If provided, the workspace name. If USERNAME, PASSWORD and WORSPACE are provided, a default workspace will be created with this name (Default: \"\").

  • API_KEY: The default user api key to user. If API_KEY is not provided, a new random api key will be generated (Default: \"\").

  • UVICORN_APP: [Advanced] The name of the FastAPI app to run. This is useful when you want to extend the FastAPI app with additional routes or middleware. The default value is argilla_server:app.

"},{"location":"reference/argilla-server/configuration/#rest-api-docs","title":"REST API docs","text":"

FastAPI also provides beautiful REST API docs that you can check at http://localhost:6900/api/v1/docs.

"},{"location":"reference/argilla-server/telemetry/","title":"Server Telemetry","text":"

Argilla uses telemetry to report anonymous usage and error information. As an open-source software, this type of information is important to improve and understand how the product is used. This is done through the Hugging Face Hub library telemetry implementations.

"},{"location":"reference/argilla-server/telemetry/#how-to-opt-out","title":"How to opt-out","text":"

You can opt out of telemetry reporting using the ENV variable HF_HUB_DISABLE_TELEMETRY before launching the server. Setting this variable to 1 will completely disable telemetry reporting.

If you are a Linux/MacOs user, you should run:

export HF_HUB_DISABLE_TELEMETRY=1\n

If you are a Windows user, you should run:

set HF_HUB_DISABLE_TELEMETRY=1\n

To opt in again, you can set the variable to 0.

"},{"location":"reference/argilla-server/telemetry/#why-reporting-telemetry","title":"Why reporting telemetry","text":"

Anonymous telemetry information enables us to continuously improve the product and detect recurring problems to better serve all users. We collect aggregated information about general usage and errors. We do NOT collect any information on users' data records, datasets, or metadata information.

"},{"location":"reference/argilla-server/telemetry/#sensitive-data","title":"Sensitive data","text":"

We do not collect any piece of information related to the source data you store in Argilla. We don't identify individual users. Your data does not leave your server at any time:

  • No dataset record is collected.
  • No dataset names or metadata are collected.
"},{"location":"reference/argilla-server/telemetry/#information-reported","title":"Information reported","text":"

The following usage and error information is reported:

  • The code of the raised error
  • The user-agent and accept-language http headers
  • Task name and number of records for bulk operations
  • An anonymous generated user uuid
  • An anonymous generated server uuid
  • The Argilla version running the server
  • The Python version, e.g. 3.8.13
  • The system/OS name, such as Linux, Darwin, Windows
  • The system\u2019s release version, e.g. Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020
  • The machine type, e.g. AMD64
  • The underlying platform spec with as much useful information as possible. (eg. macOS-10.16-x86_64-i386-64bit)
  • The type of deployment: huggingface_space or server
  • The dockerized deployment flag: True or False

For transparency, you can inspect the source code where this is performed here.

If you have any doubts, don't hesitate to join our Discord channel or open a GitHub issue. We'd be very happy to discuss how we can improve this.

"},{"location":"tutorials/","title":"Tutorials","text":"

These are the tutorials for the Argilla SDK. They provide step-by-step instructions for common tasks.

  • Text classification

    Learn about a standard workflow for a text classification task with model fine-tuning.

    Tutorial

  • Token classification

    Learn about a standard workflow for a token classification task with model fine-tuning.

    Tutorial

  • Image classification

    Learn about a standard workflow for an image classification task with model fine-tuning.

    Tutorial

  • Image preference

    Learn about a standard workflow for multi-modal preference datasets like image generation preference.

    Tutorial

"},{"location":"tutorials/image_classification/","title":"Image classification","text":"
  • Goal: Show a standard workflow for an image classification task.
  • Dataset: MNIST, a dataset of 28x28 grayscale images that need to be classified as digits.
  • Libraries: datasets, transformers
  • Components: ImageField, LabelQuestion, Suggestion

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

!pip install argilla\n
!pip install \"transformers[torch]~=4.0\" \"accelerate~=0.34\"\n

Let's make the required imports:

import base64\nimport io\nimport re\n\nfrom IPython.display import display\nimport numpy as np\nimport torch\nfrom PIL import Image\n\nfrom datasets import load_dataset, Dataset, load_metric\nfrom transformers import (\n    AutoImageProcessor,\n    AutoModelForImageClassification,\n    pipeline,\n    Trainer,\n    TrainingArguments\n)\n\nimport argilla as rg\n

You also need to connect to the Argilla server using the api_url and api_key.

# Replace api_url with your url if using Docker\n# Replace api_key with your API key under \"My Settings\" in the UI\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"[your-api-key]\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a field for the image column and a label question for the label column.

Note

Check this how-to guide to know more about configuring and creating a dataset.

labels = [str(x) for x in range(10)]\n\nsettings = rg.Settings(\n    guidelines=\"The goal of this task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9, inclusively.\",\n    fields=[\n        rg.ImageField(\n            name=\"image\",\n            title=\"An image of a handwritten digit.\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"image_label\",\n            title=\"What digit do you see on the image?\",\n            labels=labels,\n        )\n    ]\n)\n

Let's create the dataset with the name and the defined settings:

dataset = rg.Dataset(\n    name=\"image_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the ylecun/mnist dataset from the Hugging Face Hub. Specifically, we will use 100 examples. Because we are dealing with a potentially large image dataset, we will set streaming=True to avoid loading the entire dataset into memory and iterate over the data to lazily load it.

Tip

When working with Hugging Face datasets, you can set Image(decode=False) so that we can get public image URLs, but this depends on the dataset.

n_rows = 100\n\nhf_dataset = load_dataset(\"ylecun/mnist\", streaming=True)\ndataset_rows = [row for _,row in zip(range(n_rows), hf_dataset[\"train\"])]\nhf_dataset = Dataset.from_list(dataset_rows)\n\nhf_dataset\n
\nDataset({\n    features: ['image', 'label'],\n    num_rows: 100\n})\n

Let's have a look at the first image in the dataset.

hf_dataset[0]\n
\n{'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28>,\n 'label': 5}\n

We will easily add them to the dataset using log, without needing a mapping since the names already match the Argilla resources. Additionally, since the images are already in PIL format and defined as Image in the Hugging Face dataset\u2019s features, we can log them directly. We will also include an id column in each record, allowing us to easily trace back to the external data source.

hf_dataset = hf_dataset.add_column(\"id\", range(len(hf_dataset)))\ndataset.records.log(records=hf_dataset)\n

The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a zero-shot CLIP model. However, you can use a framework or technique of your choice.

We will start by loading the model using a transformers pipeline.

checkpoint = \"openai/clip-vit-large-patch14\"\ndetector = pipeline(model=checkpoint, task=\"zero-shot-image-classification\")\n

Now, let's try to make a model prediction and see if it makes sense.

predictions = detector(hf_dataset[1][\"image\"], candidate_labels=labels)\npredictions, display(hf_dataset[1][\"image\"])\n
\n([{'score': 0.5236628651618958, 'label': '0'},\n  {'score': 0.11496700346469879, 'label': '7'},\n  {'score': 0.08030630648136139, 'label': '8'},\n  {'score': 0.07141078263521194, 'label': '9'},\n  {'score': 0.05868939310312271, 'label': '6'},\n  {'score': 0.05507850646972656, 'label': '5'},\n  {'score': 0.0341767854988575, 'label': '1'},\n  {'score': 0.027202051132917404, 'label': '4'},\n  {'score': 0.018533246591687202, 'label': '3'},\n  {'score': 0.015973029658198357, 'label': '2'}],\n None)\n

It's time to make the predictions on the dataset! We will set a function that uses the zero-shot model. The model will infer the label based on the image. When working with large datasets, you can create a batch_predict method to speed up the process.

def predict(input, labels):\n    prediction = detector(input, candidate_labels=labels)\n    prediction = prediction[0]\n    return {\"image_label\": prediction[\"label\"], \"score\": prediction[\"score\"]}\n

To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"id\": sample[\"id\"],\n        **predict(sample[\"image\"], labels),\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data, mapping={\"score\": \"image_label.suggestion.score\"})\n

Voil\u00e0! We have added the suggestions to the dataset, and they will appear in the UI marked with a \u2728.

Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

Note

Check this how-to guide to know more about annotating in the UI.

After the annotation, we will have a robust dataset to train the main model. In our case, we will fine-tune using transformers. However, you can select the one that best fits your requirements.

So, let's start by retrieving the annotated records and exporting them as a Dataset, so images will be in PIL format.

Note

Check this how-to guide to know more about filtering and querying in Argilla. Also, you can check the Hugging Face docs on fine-tuning an image classification model.

dataset = client.datasets(\"image_classification_dataset\")\n
status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_datasets()\n

We now need to ensure our images are forwarded with the correct dimensions. Because the original MNIST dataset is greyscale and the VIT model expects RGB, we need to add a channel dimension to the images. We will do this by stacking the images along the channel axis.

def greyscale_to_rgb(img) -&gt; Image:\n    return Image.merge('RGB', (img, img, img))\n\nsubmitted_image_rgb = [\n    {\n        \"id\": sample[\"id\"],\n        \"image\": greyscale_to_rgb(sample[\"image\"]),\n        \"label\": sample[\"image_label.responses\"][0],\n    }\n    for sample in submitted\n]\nsubmitted_image_rgb[0]\n
\n{'id': '0', 'image': <PIL.Image.Image image mode=RGB size=28x28>, 'label': '0'}\n

Next, we will load the ImageProcessor to fine-tune the model. This processor will handle the image resizing and normalization in order to be compatible with the model we intend to use.

checkpoint = \"google/vit-base-patch16-224-in21k\"\nprocessor = AutoImageProcessor.from_pretrained(checkpoint)\n\nsubmitted_image_rgb_processed = [\n    {\n        \"pixel_values\": processor(sample[\"image\"], return_tensors='pt')[\"pixel_values\"],\n        \"label\": sample[\"label\"],\n    }\n    for sample in submitted_image_rgb\n]\nsubmitted_image_rgb_processed[0]\n

We can now convert the images to a Hugging Face Dataset that is ready for fine-tuning.

prepared_ds = Dataset.from_list(submitted_image_rgb_processed)\nprepared_ds = prepared_ds.train_test_split(test_size=0.2)\nprepared_ds\n
\nDatasetDict({\n    train: Dataset({\n        features: ['pixel_values', 'label'],\n        num_rows: 80\n    })\n    test: Dataset({\n        features: ['pixel_values', 'label'],\n        num_rows: 20\n    })\n})\n

We then need to define our data collator, which will ensure the data is unpacked and stacked correctly for the model.

def collate_fn(batch):\n    return {\n        'pixel_values': torch.stack([torch.tensor(x['pixel_values'][0]) for x in batch]),\n        'labels': torch.tensor([int(x['label']) for x in batch])\n    }\n

Next, we can define our training metrics. We will use the accuracy metric to evaluate the model's performance.

metric = load_metric(\"accuracy\", trust_remote_code=True)\ndef compute_metrics(p):\n    return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)\n

We then load our model and configure the labels that we will use for training.

model = AutoModelForImageClassification.from_pretrained(\n    checkpoint,\n    num_labels=len(labels),\n    id2label={int(i): int(c) for i, c in enumerate(labels)},\n    label2id={int(c): int(i) for i, c in enumerate(labels)}\n)\nmodel.config\n

Finally, we define the training arguments and start the training process.

training_args = TrainingArguments(\n  output_dir=\"./image-classifier\",\n  per_device_train_batch_size=16,\n  eval_strategy=\"steps\",\n  num_train_epochs=1,\n  fp16=False, # True if you have a GPU with mixed precision support\n  save_steps=100,\n  eval_steps=100,\n  logging_steps=10,\n  learning_rate=2e-4,\n  save_total_limit=2,\n  remove_unused_columns=True,\n  push_to_hub=False,\n  load_best_model_at_end=True,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    data_collator=collate_fn,\n    compute_metrics=compute_metrics,\n    train_dataset=prepared_ds[\"train\"],\n    eval_dataset=prepared_ds[\"test\"],\n    tokenizer=processor,\n)\n\ntrain_results = trainer.train()\ntrainer.save_model()\ntrainer.log_metrics(\"train\", train_results.metrics)\ntrainer.save_metrics(\"train\", train_results.metrics)\ntrainer.save_state()\n
\n{'train_runtime': 12.5374, 'train_samples_per_second': 6.381, 'train_steps_per_second': 0.399, 'train_loss': 2.0533515930175783, 'epoch': 1.0}\n***** train metrics *****\n  epoch                    =        1.0\n  total_flos               =  5774017GF\n  train_loss               =     2.0534\n  train_runtime            = 0:00:12.53\n  train_samples_per_second =      6.381\n  train_steps_per_second   =      0.399\n\n

As the training data was of better quality, we can expect a better model. So we can update the remainder of our original dataset with the new model's suggestions.

pipe = pipeline(\"image-classification\", model=model, image_processor=processor)\n\ndef run_inference(batch):\n    predictions = pipe(batch[\"image\"])\n    batch[\"image_label\"] = [prediction[0][\"label\"] for prediction in predictions]\n    batch[\"score\"] = [prediction[0][\"score\"] for prediction in predictions]\n    return batch\n\nhf_dataset = hf_dataset.map(run_inference, batched=True)\n
data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"image_label\": str(sample[\"image_label\"]),\n        \"id\": sample[\"id\"],\n        \"score\": sample[\"score\"],\n    }\n    for sample in hf_dataset\n]\ndataset.records.log(records=updated_data, mapping={\"score\": \"image_label.suggestion.score\"})\n

In this tutorial, we present an end-to-end example of an image classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

We started by configuring the dataset and adding records and suggestions from a zero-shot model. After the annotation process, we trained a new model with the annotated data and updated the remaining records with the new suggestions.

"},{"location":"tutorials/image_classification/#image-classification","title":"Image classification","text":""},{"location":"tutorials/image_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/image_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/image_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/image_classification/#vibe-check-the-dataset","title":"Vibe check the dataset","text":"

We will look at the dataset to understand its structure and the kind of data it contains. We do this by using the embedded Hugging Face Dataset Viewer.

"},{"location":"tutorials/image_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/image_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/image_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/image_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/image_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/image_classification/#formatting-the-data","title":"Formatting the data","text":""},{"location":"tutorials/image_classification/#the-actual-training","title":"The actual training","text":""},{"location":"tutorials/image_classification/#conclusions","title":"Conclusions","text":""},{"location":"tutorials/image_preference/","title":"Image preference","text":"
  • Goal: Show a standard workflow for working with complex multi-modal preference datasets, such as for image-generation preference.
  • Dataset: tomg-group-umd/pixelprose, is a comprehensive dataset of over 16M (million) synthetically generated captions, leveraging cutting-edge vision-language models (Gemini 1.0 Pro Vision) for detailed and accurate descriptions.
  • Libraries: datasets, sentence-transformers
  • Components: TextField, ImageField, TextQuestion, LabelQuestion VectorField, FloatMetadataProperty

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

!pip install argilla\n
!pip install \"sentence-transformers~=3.0\"\n

Let's make the required imports:

import io\nimport os\nimport time\n\nimport argilla as rg\nimport requests\nfrom PIL import Image\nfrom datasets import load_dataset, Dataset\nfrom sentence_transformers import SentenceTransformer\n

You also need to connect to the Argilla server using the api_url and api_key.

# Replace api_url with your url if using Docker\n# Replace api_key with your API key under \"My Settings\" in the UI\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"[your-api-key]\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. We will include a TextField, an ImageField corresponding to the url image column, and two additional ImageField fields representing the images we will generate based on the original_caption column from our dataset. Additionally, we will use a LabelQuestion and an optional TextQuestion, which will be used to collect the user's preference and the reason behind it. We will also be adding a VectorField to store the embeddings for the original_caption so that we can use semantic search and speed up our labeling process. Lastly, we will include two FloatMetadataProperty to store information from the toxicity and the identity_attack columns.

Note

Check this how-to guide to know more about configuring and creating a dataset.

settings = rg.Settings(\n    guidelines=\"The goal is to choose the image that best represents the caption.\",\n    fields=[\n        rg.TextField(\n            name=\"caption\",\n            title=\"An image caption belonging to the original image.\",\n        ),\n        rg.ImageField(\n            name=\"image_original\",\n            title=\"The original image, belonging to the caption.\",\n        ),\n        rg.ImageField(\n            name=\"image_1\",\n            title=\"An image that has been generated based on the caption.\",\n        ),\n        rg.ImageField(\n            name=\"image_2\",\n            title=\"An image that has been generated based on the caption.\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"preference\",\n            title=\"The chosen preference for the generation.\",\n            labels=[\"image_1\", \"image_2\"],\n        ),\n        rg.TextQuestion(\n            name=\"comment\",\n            title=\"Any additional comments.\",\n            required=False,\n        ),\n    ],\n    metadata=[\n        rg.FloatMetadataProperty(name=\"toxicity\", title=\"Toxicity score\"),\n        rg.FloatMetadataProperty(name=\"identity_attack\", title=\"Identity attack score\"),\n\n    ],\n    vectors=[\n        rg.VectorField(name=\"original_caption_vector\", dimensions=384),\n    ]\n)\n

Let's create the dataset with the name and the defined settings:

dataset = rg.Dataset(\n    name=\"image_preference_dataset\",\n    settings=settings,\n)\ndataset.create()\n
n_rows = 25\n\nhf_dataset = load_dataset(\"tomg-group-umd/pixelprose\", streaming=True)\ndataset_rows = [row for _,row in zip(range(n_rows), hf_dataset[\"train\"])]\nhf_dataset = Dataset.from_list(dataset_rows)\n\nhf_dataset\n
\nDataset({\n    features: ['uid', 'url', 'key', 'status', 'original_caption', 'vlm_model', 'vlm_caption', 'toxicity', 'severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat', 'sexual_explicit', 'watermark_class_id', 'watermark_class_score', 'aesthetic_score', 'error_message', 'width', 'height', 'original_width', 'original_height', 'exif', 'sha256', 'image_id', 'author', 'subreddit', 'score'],\n    num_rows: 25\n})\n

Let's have a look at the first entry in the dataset.

hf_dataset[0]\n
\n{'uid': '0065a9b1cb4da4696f2cd6640e00304257cafd97c0064d4c61e44760bf0fa31c',\n 'url': 'https://media.gettyimages.com/photos/plate-of-food-from-murray-bros-caddy-shack-at-the-world-golf-hall-of-picture-id916117812?s=612x612',\n 'key': '007740026',\n 'status': 'success',\n 'original_caption': 'A plate of food from Murray Bros Caddy Shack at the World Golf Hall of Fame',\n 'vlm_model': 'gemini-pro-vision',\n 'vlm_caption': ' This image displays: A plate of fried calamari with a lemon wedge and a side of green beans, served in a basket with a pink bowl of marinara sauce. The basket is sitting on a table with a checkered tablecloth. In the background is a glass of water and a plate with a burger and fries. The style of the image is a photograph.',\n 'toxicity': 0.0005555678508244455,\n 'severe_toxicity': 1.7323875454167137e-06,\n 'obscene': 3.8304504414554685e-05,\n 'identity_attack': 0.00010549413127591833,\n 'insult': 0.00014773994917050004,\n 'threat': 2.5982120860135183e-05,\n 'sexual_explicit': 2.0972733182134107e-05,\n 'watermark_class_id': 1.0,\n 'watermark_class_score': 0.733799934387207,\n 'aesthetic_score': 5.390625,\n 'error_message': None,\n 'width': 612,\n 'height': 408,\n 'original_width': 612,\n 'original_height': 408,\n 'exif': '{\"Image ImageDescription\": \"A plate of food from Murray Bros. Caddy Shack at the World Golf Hall of Fame. (Photo by: Jeffrey Greenberg/Universal Images Group via Getty Images)\", \"Image XResolution\": \"300\", \"Image YResolution\": \"300\"}',\n 'sha256': '0065a9b1cb4da4696f2cd6640e00304257cafd97c0064d4c61e44760bf0fa31c',\n 'image_id': 'null',\n 'author': 'null',\n 'subreddit': -1,\n 'score': -1}\n

As we can see, the url column does not contain an image extension, so we will apply some additional filtering to ensure we have only public image URLs.

hf_dataset = hf_dataset.filter(\n    lambda x: any([x[\"url\"].endswith(extension) for extension in [\".jpg\", \".png\", \".jpeg\"]]))\n\nhf_dataset\n
\nDataset({\n    features: ['uid', 'url', 'key', 'status', 'original_caption', 'vlm_model', 'vlm_caption', 'toxicity', 'severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat', 'sexual_explicit', 'watermark_class_id', 'watermark_class_score', 'aesthetic_score', 'error_message', 'width', 'height', 'original_width', 'original_height', 'exif', 'sha256', 'image_id', 'author', 'subreddit', 'score'],\n    num_rows: 18\n})\n
API_URL = \"https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-schnell\"\nheaders = {\"Authorization\": f\"Bearer {os.getenv('HF_TOKEN')}\"}\n\ndef query(payload):\n    response = requests.post(API_URL, headers=headers, json=payload)\n    if response.status_code == 200:\n        image_bytes = response.content\n        image = Image.open(io.BytesIO(image_bytes))\n    else:\n        print(f\"Request failed with status code {response.status_code}. retrying in 10 seconds.\")\n        time.sleep(10)\n        image = query(payload)\n    return image\n\nquery({\n    \"inputs\": \"Astronaut riding a horse\"\n})\n

Cool! Since we've evaluated the generation function, let's generate the PIL images for the dataset.

def generate_image(row):\n    caption = row[\"original_caption\"]\n    row[\"image_1\"] = query({\"inputs\": caption})\n    row[\"image_2\"] = query({\"inputs\": caption + \" \"}) # space to avoid caching and getting the same image\n    return row\n\nhf_dataset_with_images = hf_dataset.map(generate_image, batched=False)\n\nhf_dataset_with_images\n
\nDataset({\n    features: ['uid', 'url', 'key', 'status', 'original_caption', 'vlm_model', 'vlm_caption', 'toxicity', 'severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat', 'sexual_explicit', 'watermark_class_id', 'watermark_class_score', 'aesthetic_score', 'error_message', 'width', 'height', 'original_width', 'original_height', 'exif', 'sha256', 'image_id', 'author', 'subreddit', 'score', 'image_1', 'image_2'],\n    num_rows: 18\n})\n
model = SentenceTransformer(\"TaylorAI/bge-micro-v2\")\n\ndef encode_questions(batch):\n    vectors_as_numpy = model.encode(batch[\"original_caption\"])\n    batch[\"original_caption_vector\"] = [x.tolist() for x in vectors_as_numpy]\n    return batch\n\nhf_dataset_with_images_vectors = hf_dataset_with_images.map(encode_questions, batched=True)\n
dataset.records.log(records=hf_dataset_with_images_vectors, mapping={\n    \"key\": \"id\",\n    \"original_caption\": \"caption\",\n    \"url\": \"image_original\",\n})\n

Voil\u00e0! We have our Argilla dataset ready for annotation.

Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records.

Note

Check this how-to guide to know more about annotating in the UI.

In this tutorial, we present an end-to-end example of an image preference task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

We started by configuring the dataset and adding records with the original and generated images. After the annotation process, you can evaluate the results and potentially retrain the model to improve the quality of the generated images.

"},{"location":"tutorials/image_preference/#image-preference","title":"Image preference","text":""},{"location":"tutorials/image_preference/#getting-started","title":"Getting started","text":""},{"location":"tutorials/image_preference/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/image_preference/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/image_preference/#vibe-check-the-dataset","title":"Vibe check the dataset","text":"

We will take a look at the dataset to understand its structure and the types of data it contains. We can do this using the embedded Hugging Face Dataset Viewer.

"},{"location":"tutorials/image_preference/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/image_preference/#add-records","title":"Add records","text":"

Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the tomg-group-umd/pixelprose dataset from the Hugging Face Hub. Specifically, we will use 25 examples. Because we are dealing with a potentially large image dataset, we will set streaming=True to avoid loading the entire dataset into memory and iterate over the data to lazily load it.

Tip

When working with Hugging Face datasets, you can set Image(decode=False) so that we can get public image URLs, but this depends on the dataset.

"},{"location":"tutorials/image_preference/#generate-images","title":"Generate images","text":"

We'll start by generating images based on the original_caption column using the recently released black-forest-labs/FLUX.1-schnell model. For this, we will use the free but rate-limited Inference API provided by Hugging Face, but you can use any other model from the Hub or method. We will generate 2 images per example. Additionally, we will add a small retry mechanism to handle the rate limit.

Let's begin by defining and testing a generation function.

"},{"location":"tutorials/image_preference/#add-vectors","title":"Add vectors","text":"

We will use the sentence-transformers library to create vectors for the original_caption. We will use the TaylorAI/bge-micro-v2 model, which strikes a good balance between speed and performance. Note that we also need to convert the vectors to a list to store them in the Argilla dataset.

"},{"location":"tutorials/image_preference/#log-to-argilla","title":"Log to Argilla","text":"

We will easily add them to the dataset using log and the mapping, where we indicate which column from our dataset needs to be mapped to which Argilla resource if the names do not correspond. We are also using the key column as id for our record so we can easily backtrack the record to the external data source.

"},{"location":"tutorials/image_preference/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/image_preference/#conclusions","title":"Conclusions","text":""},{"location":"tutorials/text_classification/","title":"Text classification","text":"
  • Goal: Show a standard workflow for a text classification task, including zero-shot suggestions and model fine-tuning.
  • Dataset: IMDB, a dataset of movie reviews that need to be classified as positive or negative.
  • Libraries: datasets, transformers, setfit
  • Components: TextField, LabelQuestion, Suggestion, Query, Filter

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

!pip install argilla\n
!pip install setfit==1.0.3 transformers==4.40.2\n

Let's make the required imports:

import argilla as rg\n\nfrom datasets import load_dataset, Dataset\nfrom setfit import SetFitModel, Trainer, get_templated_dataset, sample_dataset\n

You also need to connect to the Argilla server using the api_url and api_key.

# Replace api_url with your url if using Docker\n# Replace api_key with your API key under \"My Settings\" in the UI\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"[your-api-key]\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a text field and a label question, corresponding to the text and label columns.

Note

Check this how-to guide to know more about configuring and creating a dataset.

labels = [\"positive\", \"negative\"]\n\nsettings = rg.Settings(\n    guidelines=\"Classify the reviews as positive or negative.\",\n    fields=[\n        rg.TextField(\n            name=\"review\",\n            title=\"Text from the review\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"sentiment_label\",\n            title=\"In which category does this article fit?\",\n            labels=labels,\n        )\n    ],\n)\n

Let's create the dataset with the name and the defined settings:

dataset = rg.Dataset(\n    name=\"text_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the imdb dataset from the Hugging Face Hub. Specifically, we will use 100 samples from the train split.

hf_dataset = load_dataset(\"imdb\", split=\"train[:100]\")\n

We will easily add them to the dataset using log and the mapping, where we indicate that the column text is the data that should be added to the field review.

dataset.records.log(records=hf_dataset, mapping={\"text\": \"review\"})\n

The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a zero-shot SetFit model. However, you can use a framework or technique of your choice.

We will start by defining an example training set with the required labels: positive and negative. Using get_templated_dataset will create sentences from the default template: \"This sentence is {label}.\"

zero_ds = get_templated_dataset(\n    candidate_labels=labels,\n    sample_size=8,\n)\n

Now, we will prepare a function to train the SetFit model.

Note

For further customization, you can check the SetFit documentation.

def train_model(model_name, dataset):\n    model = SetFitModel.from_pretrained(model_name)\n\n    trainer = Trainer(\n        model=model,\n        train_dataset=dataset,\n    )\n\n    trainer.train()\n\n    return model\n

Let's train the model. We will use TaylorAI/bge-micro-v2, available in the Hugging Face Hub.

model = train_model(model_name=\"TaylorAI/bge-micro-v2\", dataset=zero_ds)\n

You can save it locally or push it to the Hub. And then, load it from there.

# Save and load locally\n# model.save_pretrained(\"text_classification_model\")\n# model = SetFitModel.from_pretrained(\"text_classification_model\")\n\n# Push and load in HF\n# model.push_to_hub(\"[username]/text_classification_model\")\n# model = SetFitModel.from_pretrained(\"[username]/text_classification_model\")\n

It's time to make the predictions! We will set a function that uses the predict method to get the suggested label. The model will infer the label based on the text.

def predict(model, input, labels):\n    model.labels = labels\n\n    prediction = model.predict([input])\n\n    return prediction[0]\n

To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"sentiment_label\": predict(model, sample[\"review\"], labels),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

Voil\u00e0! We have added the suggestions to the dataset, and they will appear in the UI marked with a \u2728.

Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

Note

Check this how-to guide to know more about annotating in the UI.

After the annotation, we will have a robust dataset to train the main model. In our case, we will fine-tune using SetFit. However, you can select the one that best fits your requirements. So, let's start by retrieving the annotated records.

Note

Check this how-to guide to know more about filtering and querying in Argilla. Also, you can check the Hugging Face docs on fine-tuning an text classification model.

dataset = client.datasets(\"text_classification_dataset\")\n
status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_list(flatten=True)\n

As we have a single response per record, we can retrieve the selected label straightforwardly and create the training set with 8 samples per label. We selected 8 samples per label to have a balanced dataset for few-shot learning.

train_records = [\n    {\n        \"text\": r[\"review\"],\n        \"label\": r[\"sentiment_label.responses\"][0],\n    }\n    for r in submitted\n]\ntrain_dataset = Dataset.from_list(train_records)\ntrain_dataset = sample_dataset(train_dataset, label_column=\"label\", num_samples=8)\n

We can train the model using our previous function, but this time with a high-quality human-annotated training set.

model = train_model(model_name=\"TaylorAI/bge-micro-v2\", dataset=train_dataset)\n

As the training data was of better quality, we can expect a better model. So we can update the remaining non-annotated records with the new model's suggestions.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"sentiment_label\": predict(model, sample[\"review\"], labels),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

In this tutorial, we present an end-to-end example of a text classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

We started by configuring the dataset, adding records, and training a zero-shot SetFit model, as an example, to add suggestions. After the annotation process, we trained a new model with the annotated data and updated the remaining records with the new suggestions.

"},{"location":"tutorials/text_classification/#text-classification","title":"Text classification","text":""},{"location":"tutorials/text_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/text_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/text_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/text_classification/#vibe-check-the-dataset","title":"Vibe check the dataset","text":"

We will have a look at the dataset to understand its structure and the kind of data it contains. We do this by using the embedded Hugging Face Dataset Viewer.

"},{"location":"tutorials/text_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/text_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/text_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/text_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/text_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/text_classification/#conclusions","title":"Conclusions","text":""},{"location":"tutorials/token_classification/","title":"Token classification","text":"
  • Goal: Show a standard workflow for a token classification task, including zero-shot suggestions and model fine-tuning.
  • Dataset: ontonotes5, a large corpus comprising various genres of text that need to be classified for Named Entity Recognition.
  • Libraries: datasets, gliner, transformers, spanmarker
  • Components: TextField, SpanQuestion, Suggestion, Query, Filter

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

!pip install argilla\n
!pip install gliner==0.2.6 transformers==4.40.2 span_marker==1.5.0\n

Let's make the needed imports:

import re\n\nimport argilla as rg\n\nimport torch\nfrom datasets import load_dataset, Dataset, DatasetDict\nfrom gliner import GLiNER\nfrom span_marker import SpanMarkerModel, Trainer\nfrom transformers import TrainingArguments\n

You also need to connect to the Argilla server with the api_url and api_key.

# Replace api_url with your url if using Docker\n# Replace api_key with your API key under \"My Settings\" in the UI\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"[your-api-key]\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a text field and a span question, corresponding to the token and tags columns. We will focus on Name Entity Recognition, but this workflow can also be applied to Span Classification, which differs in that the spans are less clearly defined and often overlap.

labels = [\n    \"CARDINAL\",\n    \"DATE\",\n    \"PERSON\",\n    \"NORP\",\n    \"GPE\",\n    \"LAW\",\n    \"PERCENT\",\n    \"ORDINAL\",\n    \"MONEY\",\n    \"WORK_OF_ART\",\n    \"FAC\",\n    \"TIME\",\n    \"QUANTITY\",\n    \"PRODUCT\",\n    \"LANGUAGE\",\n    \"ORG\",\n    \"LOC\",\n    \"EVENT\",\n]\n\nsettings = rg.Settings(\n    guidelines=\"Classify individual tokens according to the specified categories, ensuring that any overlapping or nested entities are accurately captured.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n            title=\"Text\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.SpanQuestion(\n            name=\"span_label\",\n            field=\"text\",\n            labels=labels,\n            title=\"Classify the tokens according to the specified categories.\",\n            allow_overlapping=False,\n        )\n    ],\n)\n

Let's create the dataset with the name and the defined settings:

dataset = rg.Dataset(\n    name=\"token_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

We have created the dataset (you can check it in the UI), but we still need to add the data for annotation. In this case, we will use the ontonote5 dataset from the Hugging Face Hub. Specifically, we will use 2100 samples from the test split.

hf_dataset = load_dataset(\"tner/ontonotes5\", split=\"test[:2100]\")\n

We will iterate over the Hugging Face dataset, adding data to the corresponding field in the Record object for the Argilla dataset. Then, we will easily add them to the dataset using log.

records = [rg.Record(fields={\"text\": \" \".join(row[\"tokens\"])}) for row in hf_dataset]\n\ndataset.records.log(records)\n

The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a GLiNER model. However, you can use a framework or technique of your choice.

Note

For further information, you can check the GLiNER repository and the original paper.

We will start by loading the pre-trained GLiNER model. Specifically, we will use gliner_mediumv2, available in Hugging Face Hub.

gliner_model = GLiNER.from_pretrained(\"urchade/gliner_mediumv2.1\")\n

Next, we will create a function to generate predictions using this general model, which can identify the specified labels without being pre-trained on them. The function will return a dictionary formatted with the necessary schema to add entities to our Argilla dataset. This schema includes the keys 'start\u2019 and \u2018end\u2019 to indicate the indices where the span begins and ends, as well as \u2018label\u2019 for the entity label.

def predict_gliner(model, text, labels, threshold):\n    entities = model.predict_entities(text, labels, threshold)\n    return [\n        {k: v for k, v in ent.items() if k not in {\"score\", \"text\"}} for ent in entities\n    ]\n

To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"span_label\": predict_gliner(\n            model=gliner_model, text=sample[\"text\"], labels=labels, threshold=0.70\n        ),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

Voil\u00e0! We have added the suggestions to the dataset and they will appear in the UI marked with \u2728.

Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

Note

Check this how-to guide to know more about annotating in the UI.

After the annotation, we will have a robust dataset to train our model for entity recognition. For our case, we will train a SpanMarker model, but you can select any model of your choice. So, let's start by retrieving the annotated records.

Note

Check this how-to guide to learn more about filtering and querying in Argilla. Also, you can check the Hugging Face docs on fine-tuning an token classification model.

dataset = client.datasets(\"token_classification_dataset\")\n

In our case, we submitted 2000 annotations using the bulk view.

status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_list(flatten=True)\n

SpanMarker accepts any dataset as long as it has the tokens and ner_tags columns. The ner_tags can be annotated using the IOB, IOB2, BIOES or BILOU labeling scheme, as well as regular unschemed labels. In our case, we have chosen to use the IOB format. Thus, we will define a function to extract the annotated NER tags according to this schema.

Note

For further information, you can check the SpanMarker documentation.

def get_iob_tag_for_token(token_start, token_end, ner_spans):\n    for span in ner_spans:\n        if token_start &gt;= span[\"start\"] and token_end &lt;= span[\"end\"]:\n            if token_start == span[\"start\"]:\n                return f\"B-{span['label']}\"\n            else:\n                return f\"I-{span['label']}\"\n    return \"O\"\n\n\ndef extract_ner_tags(text, responses):\n    tokens = re.split(r\"(\\s+)\", text)\n    ner_tags = []\n\n    current_position = 0\n    for token in tokens:\n        if token.strip():\n            token_start = current_position\n            token_end = current_position + len(token)\n            tag = get_iob_tag_for_token(token_start, token_end, responses)\n            ner_tags.append(tag)\n        current_position += len(token)\n\n    return ner_tags\n

Let's now extract them and save two lists with the tokens and NER tags, which will help us build our dataset to train the SpanMarker model.

tokens = []\nner_tags = []\nfor r in submitted:\n    tags = extract_ner_tags(r[\"text\"], r[\"span_label.responses\"][0])\n    tks = r[\"text\"].split()\n    tokens.append(tks)\n    ner_tags.append(tags)\n

In addition, we will have to indicate the labels and they should be formatted as integers. So, we will retrieve them and map them.

labels = list(set([item for sublist in ner_tags for item in sublist]))\n\nid2label = {i: label for i, label in enumerate(labels)}\nlabel2id = {label: id_ for id_, label in id2label.items()}\n\nmapped_ner_tags = [[label2id[label] for label in ner_tag] for ner_tag in ner_tags]\n

Finally, we will create a dataset with the train and validation sets.

records = [\n    {\n        \"tokens\": token,\n        \"ner_tags\": ner_tag,\n    }\n    for token, ner_tag in zip(tokens, mapped_ner_tags)\n]\nspan_dataset = DatasetDict(\n    {\n        \"train\": Dataset.from_list(records[:1500]),\n        \"validation\": Dataset.from_list(records[1501:2000]),\n    }\n)\n

Now, let's prepare to train our model. For this, it is recommended to use GPU. You can check if it is available as shown below.

if torch.cuda.is_available():\n    device = torch.device(\"cuda\")\n    print(f\"Using {torch.cuda.get_device_name(0)}\")\nelif torch.backends.mps.is_available():\n    device = torch.device(\"mps\")\n    print(\"Using MPS device\")\nelse:\n    device = torch.device(\"cpu\")\n    print(\"No GPU available, using CPU instead.\")\n

We will define our model and arguments. In this case, we will use the bert-base-cased, available in the Hugging Face Hub, but others can be applied.

Note

The training arguments are inherited from the Transformers library. You can check more information here.

encoder_id = \"bert-base-cased\"\nmodel = SpanMarkerModel.from_pretrained(\n    encoder_id,\n    labels=labels,\n    model_max_length=256,\n    entity_max_length=8,\n)\n\nargs = TrainingArguments(\n    output_dir=\"models/span-marker\",\n    learning_rate=5e-5,\n    per_device_train_batch_size=8,\n    per_device_eval_batch_size=8,\n    num_train_epochs=1,\n    weight_decay=0.01,\n    warmup_ratio=0.1,\n    fp16=False,  # Set to True if available\n    logging_first_step=True,\n    logging_steps=50,\n    evaluation_strategy=\"steps\",\n    save_strategy=\"steps\",\n    eval_steps=500,\n    save_total_limit=2,\n    dataloader_num_workers=2,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=args,\n    train_dataset=span_dataset[\"train\"],\n    eval_dataset=span_dataset[\"validation\"],\n)\n

Let's train it! This time, we use a high-quality human-annotated training set, so the results are expected to have improved.

trainer.train()\n
trainer.evaluate()\n

You can save it locally or push it to the Hub. And then load it from there.

# Save and load locally\n# model.save_pretrained(\"token_classification_model\")\n# model = SpanMarkerModel.from_pretrained(\"token_classification_model\")\n\n# Push and load in HF\n# model.push_to_hub(\"[username]/token_classification_model\")\n# model = SpanMarkerModel.from_pretrained(\"[username]/token_classification_model\")\n

It's time to make the predictions! We will set a function that uses the predict method to get the suggested label. The model will infer the label based on the text. The function will return the spans in the corresponding structure for the Argilla dataset.

def predict_spanmarker(model, text):\n    entities = model.predict(text)\n    return [\n        {\n            \"start\": ent[\"char_start_index\"],\n            \"end\": ent[\"char_end_index\"],\n            \"label\": ent[\"label\"],\n        }\n        for ent in entities\n    ]\n

As the training data was of better quality, we can expect a better model. So we can update the remaining non-annotated records with the new model's suggestions.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"span_label\": predict_spanmarker(model=model, text=sample[\"text\"]),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

In this tutorial, we present an end-to-end example of a token classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

We started by configuring the dataset, adding records, and adding suggestions based on the GLiNer predictions. After the annotation process, we trained a SpanMarker model with the annotated data and updated the remaining records with the new suggestions.

"},{"location":"tutorials/token_classification/#token-classification","title":"Token classification","text":""},{"location":"tutorials/token_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/token_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/token_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/token_classification/#vibe-check-the-dataset","title":"Vibe check the dataset","text":"

We will have a look at the dataset to understand its structure and the kind of data it contains. We do this by using the embedded Hugging Face Dataset Viewer.

"},{"location":"tutorials/token_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/token_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/token_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/token_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/token_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/token_classification/#conclusions","title":"Conclusions","text":""}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to Argilla","text":"

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets.

To get started:

  • Get started in 5 minutes!

    Deploy Argilla for free on the Hugging Face Hub or with Docker. Install the Python SDK with pip and create your first project.

    Quickstart

  • How-to guides

    Get familiar with the basic workflows of Argilla. Learn how to manage Users, Workspaces, Datasets, and Records to set up your data annotation projects.

    Learn more

Or, play with the Argilla UI by signing in with your Hugging Face account:

Looking for Argilla 1.x?

Looking for documentation for Argilla 1.x? Visit the latest release.

Migrate to Argilla 2.x

Want to learn how to migrate from Argilla 1.x to 2.x? Take a look at our dedicated Migration Guide.

"},{"location":"#why-use-argilla","title":"Why use Argilla?","text":"

Argilla can be used for collecting human feedback for a wide variety of AI projects like traditional NLP (text classification, NER, etc.), LLMs (RAG, preference tuning, etc.), or multimodal models (text to image, etc.).

Argilla's programmatic approach lets you build workflows for continuous evaluation and model improvement. The goal of Argilla is to ensure your data work pays off by quickly iterating on the right data and models.

Improve your AI output quality through data quality

Compute is expensive and output quality is important. We help you focus on data, which tackles the root cause of both of these problems at once. Argilla helps you to achieve and keep high-quality standards for your data. This means you can improve the quality of your AI outputs.

Take control of your data and models

Most AI tools are black boxes. Argilla is different. We believe that you should be the owner of both your data and your models. That's why we provide you with all the tools your team needs to manage your data and models in a way that suits you best.

Improve efficiency by quickly iterating on the right data and models

Gathering data is a time-consuming process. Argilla helps by providing a tool that allows you to interact with your data in a more engaging way. This means you can quickly and easily label your data with filters, AI feedback suggestions and semantic search. So you can focus on training your models and monitoring their performance.

"},{"location":"#what-do-people-build-with-argilla","title":"What do people build with Argilla?","text":"

Datasets and models

Argilla is a tool that can be used to achieve and keep high-quality data standards with a focus on NLP and LLMs. The community uses Argilla to create amazing open-source datasets and models, and we love contributions to open-source too.

  • cleaned UltraFeedback dataset and the Notus and Notux models, where we improved benchmark and empirical human judgment for the Mistral and Mixtral models with cleaner data using human feedback.
  • distilabeled Intel Orca DPO dataset and the improved OpenHermes model, show how we improve model performance by filtering out 50% of the original dataset through human and AI feedback.

Projects and pipelines

AI teams from companies like the Red Cross, Loris.ai and Prolific use Argilla to improve the quality and efficiency of AI projects. They shared their experiences in the AI community meetup.

  • AI for good: the Red Cross presentation showcases how their experts and AI team collaborate by classifying and redirecting requests from refugees of the Ukrainian crisis to streamline the support processes of the Red Cross.
  • Customer support: during the Loris meetup they showed how their AI team uses unsupervised and few-shot contrastive learning to help them quickly validate and gain labelled samples for a huge amount of multi-label classifiers.
  • Research studies: the showcase from Prolific announced their integration with Argilla. They use it to actively distribute data collection projects among their annotating workforce. This allows them to quickly and efficiently collect high-quality data for their research studies.
"},{"location":"community/","title":"Community","text":"

We are an open-source community-driven project not only focused on building a great product but also on building a great community, where you can get support, share your experiences, and contribute to the project! We would love to hear from you and help you get started with Argilla.

  • Discord

    In our Discord channels (#argilla-distilabel-general and #argilla-distilabel-help), you can get direct support from the community.

    Discord \u2197

  • Community Meetup

    We host bi-weekly community meetups where you can listen in or present your work.

    Community Meetup \u2197

  • Changelog

    The changelog is where you can find the latest updates and changes to the Argilla project.

    Changelog \u2197

  • Roadmap

    We love to discuss our plans with the community. Feel encouraged to participate in our roadmap discussions.

    Roadmap \u2197

"},{"location":"community/changelog/","title":"Changelog","text":"

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

"},{"location":"community/changelog/#unreleased","title":"Unreleased","text":""},{"location":"community/changelog/#230","title":"2.3.0","text":""},{"location":"community/changelog/#added","title":"Added","text":"
  • Added support for CustomField. (#5422)
  • Added inserted_at and updated_at to Resource model as properties. (#5540)
  • Added limit argument when fetching records. (#5525
  • Added similarity search support. (#5546)
  • Added filter support for id, _server_id, inserted_at and updated_at record attributes. (#5545)
  • Added support to read argilla credentials from colab secrets. (#5541))
"},{"location":"community/changelog/#changed","title":"Changed","text":"
  • Changed the repr method for SettingsProperties to display the details of all the properties in Setting object. (#5380)
  • Changed error messages when creating datasets with insufficient permissions. (#5540)
"},{"location":"community/changelog/#fixed","title":"Fixed","text":"
  • Fixed serialization of ChatField when collecting records from the hub and exporting to datasets. (#5554)
"},{"location":"community/changelog/#222","title":"2.2.2","text":""},{"location":"community/changelog/#fixed_1","title":"Fixed","text":"
  • Fixed from_hub with unsupported column names. (#5524)
  • Fixed from_hub with missing dataset subset configuration value. (#5524)
"},{"location":"community/changelog/#changed_1","title":"Changed","text":"
  • Changed from_hub to only generate fields not questions for strings in dataset. (#5524)
"},{"location":"community/changelog/#221","title":"2.2.1","text":""},{"location":"community/changelog/#fixed_2","title":"Fixed","text":"
  • Fixed from_hub errors when columns names contain uppercase letters. (#5523)
  • Fixed from_hub errors when class feature values contains unlabelled values. (#5523)
  • Fixed from_hub errors when loading cached datasets. (#5523)
"},{"location":"community/changelog/#220","title":"2.2.0","text":"
  • Added new ChatField supporting chat messages. (#5376)
  • Added template settings to rg.Settings for classification, rating, and ranking questions. (#5426)
  • Added rg.Settings definition based on datasets.Features within rg.Dataset.from_hub. (#5426)
  • Added persistent record mapping to rg.Settings to be used in rg.Dataset.records.log. (#5466)
  • Added multiple error handling methods to the rg.Dataset.records.log method to warn, ignore, or raise errors. (#5466)
  • Changed dataset import and export of rg.LabelQuestion to use datasets.ClassLabel not datasets.Value. (#5474)
"},{"location":"community/changelog/#210","title":"2.1.0","text":""},{"location":"community/changelog/#added_1","title":"Added","text":"
  • Added new ImageField supporting URLs and Data URLs. (#5279)
  • Added dark mode (#5412)
  • Added settings parameter to rg.Dataset.from_hub to define the dataset settings before ingesting a dataset from the hub. (#5418)
"},{"location":"community/changelog/#201","title":"2.0.1","text":""},{"location":"community/changelog/#fixed_3","title":"Fixed","text":"
  • Fixed error when creating optional fields. (#5362)
  • Fixed error creating integer and float metadata with visible_for_annotators. (#5364)
  • Fixed error when logging records with suggestions or responses for non-existent questions. (#5396 by @maxserras)
  • Fixed error from conflicts in testing suite when running tests in parallel. (#5349)
  • Fixed error in response model when creating a response with a None value. (#5343)
"},{"location":"community/changelog/#changed_2","title":"Changed","text":"
  • Changed from_hub method to raise an error when a dataset with the same name exists. (#5258)
  • Changed log method when ingesting records with no known keys to raise a descriptive error. (#5356)
  • Changed code snippets to add new datasets (#5395)
"},{"location":"community/changelog/#added_2","title":"Added","text":"
  • Added Google Analytics to the documentation site. (#5366)
  • Added frontend skeletons to progress metrics to optimise load time and improve user experience. (#5391)
  • Added documentation in methods in API references for the Python SDK. (#5400)
"},{"location":"community/changelog/#fixed_4","title":"Fixed","text":"
  • Fix bug when submit the latest record, sometimes you navigate to non existing page #5419
"},{"location":"community/changelog/#200","title":"2.0.0","text":""},{"location":"community/changelog/#added_3","title":"Added","text":"
  • Added core class refactors. For an overview, see this blog post
  • Added TaskDistribution to define distribution of records to users .
  • Added new documentation site and structure and migrated legacy documentation.
"},{"location":"community/changelog/#changed_3","title":"Changed","text":"
  • Changed FeedbackDataset to Dataset.
  • Changed rg.init into rg.Argilla class to interact with Argilla server.
"},{"location":"community/changelog/#deprecated","title":"Deprecated","text":"
  • Deprecated task specific dataset classes like TextClassification and TokenClassification. To migrate legacy datasets to rg.Dataset class, see the how-to-guide.
  • Deprecated use case extensions like listeners and ArgillaTrainer.
"},{"location":"community/changelog/#200rc1","title":"2.0.0rc1","text":"

[!NOTE] This release for 2.0.0rc1 does not contain any changelog entries because it is the first release candidate for the 2.0.0 version. The following versions will contain the changelog entries again. For a general overview of the changes in the 2.0.0 version, please refer to our blog or our new documentation.

"},{"location":"community/changelog/#1290","title":"1.29.0","text":""},{"location":"community/changelog/#added_4","title":"Added","text":"
  • Added support for rating questions to include 0 as a valid value. (#4860)
  • Added support for Python 3.12. (#4837)
  • Added search by field in the FeedbackDataset UI search. (#4746)
  • Added record metadata info in the FeedbackDataset UI. (#4851)
  • Added highlight on search results in the FeedbackDataset UI. (#4747)
"},{"location":"community/changelog/#fixed_5","title":"Fixed","text":"
  • Fix wildcard import for the whole argilla module. (#4874)
  • Fix issue when record does not have vectors related. (#4856)
  • Fix issue on character level. (#4836)
"},{"location":"community/changelog/#1280","title":"1.28.0","text":""},{"location":"community/changelog/#added_5","title":"Added","text":"
  • Added suggestion multi score attribute. (#4730)
  • Added order by suggestion first. (#4731)
  • Added multi selection entity dropdown for span annotation overlap. (#4735)
  • Added pre selection highlight for span annotation. (#4726)
  • Added banner when persistent storage is not enabled. (#4744)
  • Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)
"},{"location":"community/changelog/#changed_4","title":"Changed","text":"
  • Changed the way how Hugging Face space and user is showed in sign in. (#4748)
"},{"location":"community/changelog/#fixed_6","title":"Fixed","text":"
  • Fixed Korean character reversed. (#4753)
"},{"location":"community/changelog/#fixed_7","title":"Fixed","text":"
  • Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)
"},{"location":"community/changelog/#1270","title":"1.27.0","text":""},{"location":"community/changelog/#added_6","title":"Added","text":"
  • Added Allow overlap spans in the FeedbackDataset. (#4668)
  • Added allow_overlapping parameter for span questions. (#4697)
  • Added overall progress bar on Datasets table. (#4696)
  • Added German language translation. (#4688)
"},{"location":"community/changelog/#changed_5","title":"Changed","text":"
  • New UI design for suggestions. (#4682)
"},{"location":"community/changelog/#fixed_8","title":"Fixed","text":"
  • Improve performance for more than 250 labels. (#4702)
"},{"location":"community/changelog/#1261","title":"1.26.1","text":""},{"location":"community/changelog/#added_7","title":"Added","text":"
  • Added support for automatic detection of RTL languages. (#4686)
"},{"location":"community/changelog/#1260","title":"1.26.0","text":""},{"location":"community/changelog/#added_8","title":"Added","text":"
  • If you expand the labels of a single or multi label Question, the state is maintained during the entire annotation process. (#4630)
  • Added support for span questions in the Python SDK. (#4617)
  • Added support for span values in suggestions and responses. (#4623)
  • Added span questions for FeedbackDataset. (#4622)
  • Added ARGILLA_CACHE_DIR environment variable to configure the client cache directory. (#4509)
"},{"location":"community/changelog/#fixed_9","title":"Fixed","text":"
  • Fixed contextualized workspaces. (#4665)
  • Fixed prepare for training when passing RankingValueSchema instances to suggestions. (#4628)
  • Fixed parsing ranking values in suggestions from HF datasets. (#4629)
  • Fixed reading description from API response payload. (#4632)
  • Fixed pulling (n*chunk_size)+1 records when using ds.pull or iterating over the dataset. (#4662)
  • Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)
"},{"location":"community/changelog/#1250","title":"1.25.0","text":"

[!NOTE] For changes in the argilla-server module, visit the argilla-server release notes

"},{"location":"community/changelog/#added_9","title":"Added","text":"
  • Reorder labels in dataset settings page for single/multi label questions (#4598)
  • Added pandas v2 support using the python SDK. (#4600)
"},{"location":"community/changelog/#removed","title":"Removed","text":"
  • Removed missing response for status filter. Use pending instead. (#4533)
"},{"location":"community/changelog/#fixed_10","title":"Fixed","text":"
  • Fixed FloatMetadataProperty: value is not a valid float (#4570)
  • Fixed redirect to user-settings instead of 404 user_settings (#4609)
"},{"location":"community/changelog/#1240","title":"1.24.0","text":"

[!NOTE] This release does not contain any new features, but it includes a major change in the argilla-server dependency. The package is using the argilla-server dependency defined here. (#4537)

"},{"location":"community/changelog/#changed_6","title":"Changed","text":"
  • The package is using the argilla-server dependency defined here. (#4537)
"},{"location":"community/changelog/#1231","title":"1.23.1","text":""},{"location":"community/changelog/#fixed_11","title":"Fixed","text":"
  • Fixed Responsive view for Feedback Datasets. (#4579)
"},{"location":"community/changelog/#1230","title":"1.23.0","text":""},{"location":"community/changelog/#added_10","title":"Added","text":"
  • Added bulk annotation by filter criteria. (#4516)
  • Automatically fetch new datasets on focus tab. (#4514)
  • API v1 responses returning Record schema now always include dataset_id as attribute. (#4482)
  • API v1 responses returning Response schema now always include record_id as attribute. (#4482)
  • API v1 responses returning Question schema now always include dataset_id attribute. (#4487)
  • API v1 responses returning Field schema now always include dataset_id attribute. (#4488)
  • API v1 responses returning MetadataProperty schema now always include dataset_id attribute. (#4489)
  • API v1 responses returning VectorSettings schema now always include dataset_id attribute. (#4490)
  • Added pdf_to_html function to .html_utils module that convert PDFs to dataURL to be able to render them in tha Argilla UI. (#4481)
  • Added ARGILLA_AUTH_SECRET_KEY environment variable. (#4539)
  • Added ARGILLA_AUTH_ALGORITHM environment variable. (#4539)
  • Added ARGILLA_AUTH_TOKEN_EXPIRATION environment variable. (#4539)
  • Added ARGILLA_AUTH_OAUTH_CFG environment variable. (#4546)
  • Added OAuth2 support for HuggingFace Hub. (#4546)
"},{"location":"community/changelog/#deprecated_1","title":"Deprecated","text":"
  • Deprecated ARGILLA_LOCAL_AUTH_* environment variables. Will be removed in the release v1.25.0. (#4539)
"},{"location":"community/changelog/#changed_7","title":"Changed","text":"
  • Changed regex pattern for username attribute in UserCreate. Now uppercase letters are allowed. (#4544)
"},{"location":"community/changelog/#removed_1","title":"Removed","text":"
  • Remove sending Authorization header from python SDK requests. (#4535)
"},{"location":"community/changelog/#fixed_12","title":"Fixed","text":"
  • Fixed keyboard shortcut for label questions. (#4530)
"},{"location":"community/changelog/#1220","title":"1.22.0","text":""},{"location":"community/changelog/#added_11","title":"Added","text":"
  • Added Bulk annotation support. (#4333)
  • Restore filters from feedback dataset settings. ([#4461])(https://github.com/argilla-io/argilla/pull/4461)
  • Warning on feedback dataset settings when leaving page with unsaved changes. (#4461)
  • Added pydantic v2 support using the python SDK. (#4459)
  • Added vector_settings to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset. (#4454)
  • Added integration for sentence-transformers using SentenceTransformersExtractor to configure vector_settings in FeedbackDataset and FeedbackRecord. (#4454)
"},{"location":"community/changelog/#changed_8","title":"Changed","text":"
  • Module argilla.cli.server definitions have been moved to argilla.server.cli module. (#4472)
  • [breaking] Changed vector_settings_by_name for generic property_by_name usage, which will return None instead of raising an error. (#4454)
  • The constant definition ES_INDEX_REGEX_PATTERN in module argilla._constants is now private. (#4472)
  • nan values in metadata properties will raise a 422 error when creating/updating records. (#4300)
  • None values are now allowed in metadata properties. (#4300)
  • Refactor and add width, height, autoplay and loop attributes as optional args in to_html functions. (#4481)
"},{"location":"community/changelog/#fixed_13","title":"Fixed","text":"
  • Paginating to a new record, automatically scrolls down to selected form area. (#4333)
"},{"location":"community/changelog/#deprecated_2","title":"Deprecated","text":"
  • The missing response status for filtering records is deprecated and will be removed in the release v1.24.0. Use pending instead. (#4433)
"},{"location":"community/changelog/#removed_2","title":"Removed","text":"
  • The deprecated python -m argilla database command has been removed. (#4472)
"},{"location":"community/changelog/#1210","title":"1.21.0","text":""},{"location":"community/changelog/#added_12","title":"Added","text":"
  • Added new draft queue for annotation view (#4334)
  • Added annotation metrics module for the FeedbackDataset (argilla.client.feedback.metrics). (#4175).
  • Added strategy to handle and translate errors from the server for 401 HTTP status code` (#4362)
  • Added integration for textdescriptives using TextDescriptivesExtractor to configure metadata_properties in FeedbackDataset and FeedbackRecord. (#4400). Contributed by @m-newhauser
  • Added POST /api/v1/me/responses/bulk endpoint to create responses in bulk for current user. (#4380)
  • Added list support for term metadata properties. (Closes #4359)
  • Added new CLI task to reindex datasets and records into the search engine. (#4404)
  • Added httpx_extra_kwargs argument to rg.init and Argilla to allow passing extra arguments to httpx.Client used by Argilla. (#4440)
  • Added ResponseStatusFilter enum in __init__ imports of Argilla (#4118). Contributed by @Piyush-Kumar-Ghosh.
"},{"location":"community/changelog/#changed_9","title":"Changed","text":"
  • More productive and simpler shortcut system (#4215)
  • Move ArgillaSingleton, init and active_client to a new module singleton. (#4347)
  • Updated argilla.load functions to also work with FeedbackDatasets. (#4347)
  • [breaking] Updated argilla.delete functions to also work with FeedbackDatasets. It now raises an error if the dataset does not exist. (#4347)
  • Updated argilla.list_datasets functions to also work with FeedbackDatasets. (#4347)
"},{"location":"community/changelog/#fixed_14","title":"Fixed","text":"
  • Fixed error in TextClassificationSettings.from_dict method in which the label_schema created was a list of dict instead of a list of str. (#4347)
  • Fixed total records on pagination component (#4424)
"},{"location":"community/changelog/#removed_3","title":"Removed","text":"
  • Removed draft auto save for annotation view (#4334)
"},{"location":"community/changelog/#1200","title":"1.20.0","text":""},{"location":"community/changelog/#added_13","title":"Added","text":"
  • Added GET /api/v1/datasets/:dataset_id/records/search/suggestions/options endpoint to return suggestion available options for searching. (#4260)
  • Added metadata_properties to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset.(#4192).
  • Added get_model_kwargs, get_trainer_kwargs, get_trainer_model, get_trainer_tokenizer and get_trainer -methods to the ArgillaTrainer to improve interoperability across frameworks. (#4214).
  • Added additional formatting checks to the ArgillaTrainer to allow for better interoperability of defaults and formatting_func usage. (#4214).
  • Added a warning to the update_config-method of ArgillaTrainer to emphasize if the kwargs were updated correctly. (#4214).
  • Added argilla.client.feedback.utils module with html_utils (this mainly includes video/audio/image_to_html that convert media to dataURL to be able to render them in tha Argilla UI and create_token_highlights to highlight tokens in a custom way. Both work on TextQuestion and TextField with use_markdown=True) and assignments (this mainly includes assign_records to assign records according to a number of annotators and records, an overlap and the shuffle option; and assign_workspace to assign and create if needed a workspace according to the record assignment). (#4121)
"},{"location":"community/changelog/#fixed_15","title":"Fixed","text":"
  • Fixed error in ArgillaTrainer, with numerical labels, using RatingQuestion instead of RankingQuestion (#4171)
  • Fixed error in ArgillaTrainer, now we can train for extractive_question_answering using a validation sample (#4204)
  • Fixed error in ArgillaTrainer, when training for sentence-similarity it didn't work with a list of values per record (#4211)
  • Fixed error in the unification strategy for RankingQuestion (#4295)
  • Fixed TextClassificationSettings.labels_schema order was not being preserved. Closes #3828 (#4332)
  • Fixed error when requesting non-existing API endpoints. Closes #4073 (#4325)
  • Fixed error when passing draft responses to create records endpoint. (#4354)
"},{"location":"community/changelog/#changed_10","title":"Changed","text":"
  • [breaking] Suggestions agent field only accepts now some specific characters and a limited length. (#4265)
  • [breaking] Suggestions score field only accepts now float values in the range 0 to 1. (#4266)
  • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support optional query attribute. (#4327)
  • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support filter and sort attributes. (#4327)
  • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support optional query attribute. (#4270)
  • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support filter and sort attributes. (#4270)
  • Changed the logging style while pulling and pushing FeedbackDataset to Argilla from tqdm style to rich. (#4267). Contributed by @zucchini-nlp.
  • Updated push_to_argilla to print repr of the pushed RemoteFeedbackDataset after push and changed show_progress to True by default. (#4223)
  • Changed models and tokenizer for the ArgillaTrainer to explicitly allow for changing them when needed. (#4214).
"},{"location":"community/changelog/#1190","title":"1.19.0","text":""},{"location":"community/changelog/#added_14","title":"Added","text":"
  • Added POST /api/v1/datasets/:dataset_id/records/search endpoint to search for records without user context, including responses by all users. (#4143)
  • Added POST /api/v1/datasets/:dataset_id/vectors-settings endpoint for creating vector settings for a dataset. (#3776)
  • Added GET /api/v1/datasets/:dataset_id/vectors-settings endpoint for listing the vectors settings for a dataset. (#3776)
  • Added DELETE /api/v1/vectors-settings/:vector_settings_id endpoint for deleting a vector settings. (#3776)
  • Added PATCH /api/v1/vectors-settings/:vector_settings_id endpoint for updating a vector settings. (#4092)
  • Added GET /api/v1/records/:record_id endpoint to get a specific record. (#4039)
  • Added support to include vectors for GET /api/v1/datasets/:dataset_id/records endpoint response using include query param. (#4063)
  • Added support to include vectors for GET /api/v1/me/datasets/:dataset_id/records endpoint response using include query param. (#4063)
  • Added support to include vectors for POST /api/v1/me/datasets/:dataset_id/records/search endpoint response using include query param. (#4063)
  • Added show_progress argument to from_huggingface() method to make the progress bar for parsing records process optional.(#4132).
  • Added a progress bar for parsing records process to from_huggingface() method with trange in tqdm.(#4132).
  • Added to sort by inserted_at or updated_at for datasets with no metadata. (4147)
  • Added max_records argument to pull() method for RemoteFeedbackDataset.(#4074)
  • Added functionality to push your models to the Hugging Face hub with ArgillaTrainer.push_to_huggingface (#3976). Contributed by @Racso-3141.
  • Added filter_by argument to ArgillaTrainer to filter by response_status (#4120).
  • Added sort_by argument to ArgillaTrainer to sort by metadata (#4120).
  • Added max_records argument to ArgillaTrainer to limit record used for training (#4120).
  • Added add_vector_settings method to local and remote FeedbackDataset. (#4055)
  • Added update_vectors_settings method to local and remote FeedbackDataset. (#4122)
  • Added delete_vectors_settings method to local and remote FeedbackDataset. (#4130)
  • Added vector_settings_by_name method to local and remote FeedbackDataset. (#4055)
  • Added find_similar_records method to local and remote FeedbackDataset. (#4023)
  • Added ARGILLA_SEARCH_ENGINE environment variable to configure the search engine to use. (#4019)
"},{"location":"community/changelog/#changed_11","title":"Changed","text":"
  • [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. (#4173)
  • [breaking] Users working with OpenSearch engines must use version >=2.4 and set ARGILLA_SEARCH_ENGINE=opensearch. (#4019 and #4111)
  • [breaking] Changed FeedbackDataset.*_by_name() methods to return None when no match is found (#4101).
  • [breaking] limit query parameter for GET /api/v1/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
  • [breaking] limit query parameter for GET /api/v1/me/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
  • Update GET /api/v1/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
  • Update GET /api/v1/me/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
  • Update POST /api/v1/datasets/:dataset_id/records endpoint to allow to create records with vectors (#4022)
  • Update PATCH /api/v1/datasets/:dataset_id endpoint to allow updating allow_extra_metadata attribute. (#4112)
  • Update PATCH /api/v1/datasets/:dataset_id/records endpoint to allow to update records with vectors. (#4062)
  • Update PATCH /api/v1/records/:record_id endpoint to allow to update record with vectors. (#4062)
  • Update POST /api/v1/me/datasets/:dataset_id/records/search endpoint to allow to search records with vectors. (#4019)
  • Update BaseElasticAndOpenSearchEngine.index_records method to also index record vectors. (#4062)
  • Update FeedbackDataset.__init__ to allow passing a list of vector settings. (#4055)
  • Update FeedbackDataset.push_to_argilla to also push vector settings. (#4055)
  • Update FeedbackDatasetRecord to support the creation of records with vectors. (#4043)
  • Using cosine similarity to compute similarity between vectors. (#4124)
"},{"location":"community/changelog/#fixed_16","title":"Fixed","text":"
  • Fixed svg images out of screen with too large images (#4047)
  • Fixed creating records with responses from multiple users. Closes #3746 and #3808 (#4142)
  • Fixed deleting or updating responses as an owner for annotators. (Commit 403a66d)
  • Fixed passing user_id when getting records by id. (Commit 98c7927)
  • Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes #4089 (#4200)
"},{"location":"community/changelog/#1180","title":"1.18.0","text":""},{"location":"community/changelog/#added_15","title":"Added","text":"
  • New GET /api/v1/datasets/:dataset_id/metadata-properties endpoint for listing dataset metadata properties. (#3813)
  • New POST /api/v1/datasets/:dataset_id/metadata-properties endpoint for creating dataset metadata properties. (#3813)
  • New PATCH /api/v1/metadata-properties/:metadata_property_id endpoint allowing the update of a specific metadata property. (#3952)
  • New DELETE /api/v1/metadata-properties/:metadata_property_id endpoint for deletion of a specific metadata property. (#3911)
  • New GET /api/v1/metadata-properties/:metadata_property_id/metrics endpoint to compute metrics for a specific metadata property. (#3856)
  • New PATCH /api/v1/records/:record_id endpoint to update a record. (#3920)
  • New PATCH /api/v1/dataset/:dataset_id/records endpoint to bulk update the records of a dataset. (#3934)
  • Missing validations to PATCH /api/v1/questions/:question_id. Now title and description are using the same validations used to create questions. (#3967)
  • Added TermsMetadataProperty, IntegerMetadataProperty and FloatMetadataProperty classes allowing to define metadata properties for a FeedbackDataset. (#3818)
  • Added metadata_filters to filter_by method in RemoteFeedbackDataset to filter based on metadata i.e. TermsMetadataFilter, IntegerMetadataFilter, and FloatMetadataFilter. (#3834)
  • Added a validation layer for both metadata_properties and metadata_filters in their schemas and as part of the add_records and filter_by methods, respectively. (#3860)
  • Added sort_by query parameter to listing records endpoints that allows to sort the records by inserted_at, updated_at or metadata property. (#3843)
  • Added add_metadata_property method to both FeedbackDataset and RemoteFeedbackDataset (i.e. FeedbackDataset in Argilla). (#3900)
  • Added fields inserted_at and updated_at in RemoteResponseSchema. (#3822)
  • Added support for sort_by for RemoteFeedbackDataset i.e. a FeedbackDataset uploaded to Argilla. (#3925)
  • Added metadata_properties support for both push_to_huggingface and from_huggingface. (#3947)
  • Add support for update records (metadata) from Python SDK. (#3946)
  • Added delete_metadata_properties method to delete metadata properties. (#3932)
  • Added update_metadata_properties method to update metadata_properties. (#3961)
  • Added automatic model card generation through ArgillaTrainer.save (#3857)
  • Added FeedbackDataset TaskTemplateMixin for pre-defined task templates. (#3969)
  • A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
  • New last_activity_at field to FeedbackDataset exposing when the last activity for the associated dataset occurs. (#3992)
"},{"location":"community/changelog/#changed_12","title":"Changed","text":"
  • GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records and POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to return the total number of records. (#3848, #3903)
  • Implemented __len__ method for filtered datasets to return the number of records matching the provided filters. (#3916)
  • Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
  • Force elastic index refresh after records creation. (#3929)
  • Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
  • Using metadata property name instead of id for indexing data in search engine index. (#3994)
"},{"location":"community/changelog/#fixed_17","title":"Fixed","text":"
  • Fixed response schemas to allow values to be None i.e. when a record is discarded the response.values are set to None. (#3926)
"},{"location":"community/changelog/#1170","title":"1.17.0","text":""},{"location":"community/changelog/#added_16","title":"Added","text":"
  • Added fields inserted_at and updated_at in RemoteResponseSchema (#3822).
  • Added automatic model card generation through ArgillaTrainer.save (#3857).
  • Added task templates to the FeedbackDataset (#3973).
"},{"location":"community/changelog/#changed_13","title":"Changed","text":"
  • Updated Dockerfile to use multi stage build (#3221 and #3793).
  • Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
  • Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
  • FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
  • The unify_responses support for remote datasets (#3937).
"},{"location":"community/changelog/#fixed_18","title":"Fixed","text":"
  • Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
  • Updated active learning for text classification notebooks to pass ids of type int to TextClassificationRecord (#3831).
  • Fixed record fields validation that was preventing from logging records with optional fields (i.e. required=True) when the field value was None (#3846).
  • Always set pretrained_model_name_or_path attribute as string in ArgillaTrainer (#3914).
  • The inserted_at and updated_at attributes are create using the utcnow factory to avoid unexpected race conditions on timestamp creation (#3945)
  • Fixed configure_dataset_settings when providing the workspace via the arg workspace (#3887).
  • Fixed saving of models trained with ArgillaTrainer with a peft_config parameter (#3795).
  • Fixed backwards compatibility on from_huggingface when loading a FeedbackDataset from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829).
  • Fixed wrong __repr__ problem for TrainingTask. (#3969)
  • Fixed wrong key return error prepare_for_training_with_* for TrainingTask. (#3969)
"},{"location":"community/changelog/#deprecated_3","title":"Deprecated","text":"
  • Function rg.configure_dataset is deprecated in favour of rg.configure_dataset_settings. The former will be removed in version 1.19.0
"},{"location":"community/changelog/#1160","title":"1.16.0","text":""},{"location":"community/changelog/#added_17","title":"Added","text":"
  • Added ArgillaTrainer integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739)
  • Added ArgillaTrainer integration with TrainingTask.for_question_answering (#3740)
  • Added Auto save record to save automatically the current record that you are working on (#3541)
  • Added ArgillaTrainer integration with OpenAI, allowing fine tuning for chat completion (#3615)
  • Added workspaces list command to list Argilla workspaces (#3594).
  • Added datasets list command to list Argilla datasets (#3658).
  • Added users create command to create users (#3667).
  • Added whoami command to get current user (#3673).
  • Added users delete command to delete users (#3671).
  • Added users list command to list users (#3688).
  • Added workspaces delete-user command to remove a user from a workspace (#3699).
  • Added datasets list command to list Argilla datasets (#3658).
  • Added users create command to create users (#3667).
  • Added users delete command to delete users (#3671).
  • Added workspaces create command to create an Argilla workspace (#3676).
  • Added datasets push-to-hub command to push a FeedbackDataset from Argilla into the HuggingFace Hub (#3685).
  • Added info command to get info about the used Argilla client and server (#3707).
  • Added datasets delete command to delete a FeedbackDataset from Argilla (#3703).
  • Added created_at and updated_at properties to RemoteFeedbackDataset and FilteredRemoteFeedbackDataset (#3709).
  • Added handling PermissionError when executing a command with a logged in user with not enough permissions (#3717).
  • Added workspaces add-user command to add a user to workspace (#3712).
  • Added workspace_id param to GET /api/v1/me/datasets endpoint (#3727).
  • Added workspace_id arg to list_datasets in the Python SDK (#3727).
  • Added argilla script that allows to execute Argilla CLI using the argilla command (#3730).
  • Added support for passing already initialized model and tokenizer instances to the ArgillaTrainer (#3751)
  • Added server_info function to check the Argilla server information (also accessible via rg.server_info) (#3772).
"},{"location":"community/changelog/#changed_14","title":"Changed","text":"
  • Move database commands under server group of commands (#3710)
  • server commands only included in the CLI app when server extra requirements are installed (#3710).
  • Updated PUT /api/v1/responses/{response_id} to replace values stored with received values in request (#3711).
  • Display a UserWarning when the user_id in Workspace.add_user and Workspace.delete_user is the ID of an user with the owner role as they don't require explicit permissions (#3716).
  • Rename tasks sub-package to cli (#3723).
  • Changed argilla database command in the CLI to now be accessed via argilla server database, to be deprecated in the upcoming release (#3754).
  • Changed visible_options (of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).
"},{"location":"community/changelog/#fixed_19","title":"Fixed","text":"
  • Fixed remove user modification in text component on clear answers (#3775)
  • Fixed Highlight raw text field in dataset feedback task (#3731)
  • Fixed Field title too long (#3734)
  • Fixed error messages when deleting a DatasetForTextClassification (#3652)
  • Fixed Pending queue pagination problems when during data annotation (#3677)
  • Fixed visible_labels default value to be 20 just when visible_labels not provided and len(labels) > 20, otherwise it will either be the provided visible_labels value or None, for LabelQuestion and MultiLabelQuestion (#3702).
  • Fixed DatasetCard generation when RemoteFeedbackDataset contains suggestions (#3718).
  • Add missing draft status in ResponseSchema as now there can be responses with draft status when annotating via the UI (#3749).
  • Searches when queried words are distributed along the record fields (#3759).
  • Fixed Python 3.11 compatibility issue with /api/datasets endpoints due to the TaskType enum replacement in the endpoint URL (#3769).
  • Fixed RankingValueSchema and FeedbackRankingValueModel schemas to allow rank=None when status=draft (#3781).
"},{"location":"community/changelog/#1151","title":"1.15.1","text":""},{"location":"community/changelog/#fixed_20","title":"Fixed","text":"
  • Fixed Text component text content sanitization behavior just for markdown to prevent disappear the text(#3738)
  • Fixed Text component now you need to press Escape to exit the text area (#3733)
  • Fixed SearchEngine was creating the same number of primary shards and replica shards for each FeedbackDataset (#3736).
"},{"location":"community/changelog/#1150","title":"1.15.0","text":""},{"location":"community/changelog/#added_18","title":"Added","text":"
  • Added Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI (#3489)
  • Added ArgillaTrainer integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467)
  • Added formatting_func to ArgillaTrainer for FeedbackDataset datasets add a custom formatting for the data (#3599).
  • Added login function in argilla.client.login to login into an Argilla server and store the credentials locally (#3582).
  • Added login command to login into an Argilla server (#3600).
  • Added logout command to logout from an Argilla server (#3605).
  • Added DELETE /api/v1/suggestions/{suggestion_id} endpoint to delete a suggestion given its ID (#3617).
  • Added DELETE /api/v1/records/{record_id}/suggestions endpoint to delete several suggestions linked to the same record given their IDs (#3617).
  • Added response_status param to GET /api/v1/datasets/{dataset_id}/records to be able to filter by response_status as previously included for GET /api/v1/me/datasets/{dataset_id}/records (#3613).
  • Added list classmethod to ArgillaMixin to be used as FeedbackDataset.list(), also including the workspace to list from as arg (#3619).
  • Added filter_by method in RemoteFeedbackDataset to filter based on response_status (#3610).
  • Added list_workspaces function (to be used as rg.list_workspaces, but Workspace.list is preferred) to list all the workspaces from an user in Argilla (#3641).
  • Added list_datasets function (to be used as rg.list_datasets) to list the TextClassification, TokenClassification, and Text2Text datasets in Argilla (#3638).
  • Added RemoteSuggestionSchema to manage suggestions in Argilla, including the delete method to delete suggestios from Argilla via DELETE /api/v1/suggestions/{suggestion_id} (#3651).
  • Added delete_suggestions to RemoteFeedbackRecord to remove suggestions from Argilla via DELETE /api/v1/records/{record_id}/suggestions (#3651).
"},{"location":"community/changelog/#changed_15","title":"Changed","text":"
  • Changed Optional label for * mark for required question (#3608)
  • Updated RemoteFeedbackDataset.delete_records to use batch delete records endpoint (#3580).
  • Included allowed_for_roles for some RemoteFeedbackDataset, RemoteFeedbackRecords, and RemoteFeedbackRecord methods that are only allowed for users with roles owner and admin (#3601).
  • Renamed ArgillaToFromMixin to ArgillaMixin (#3619).
  • Move users CLI app under database CLI app (#3593).
  • Move server Enum classes to argilla.server.enums module (#3620).
"},{"location":"community/changelog/#fixed_21","title":"Fixed","text":"
  • Fixed Filter by workspace in breadcrumbs (#3577)
  • Fixed Filter by workspace in datasets table (#3604)
  • Fixed Query search highlight for Text2Text and TextClassification (#3621)
  • Fixed RatingQuestion.values validation to raise a ValidationError when values are out of range i.e. [1, 10] (#3626).
"},{"location":"community/changelog/#removed_4","title":"Removed","text":"
  • Removed multi_task_text_token_classification from TaskType as not used (#3640).
  • Removed argilla_id in favor of id from RemoteFeedbackDataset (#3663).
  • Removed fetch_records from RemoteFeedbackDataset as now the records are lazily fetched from Argilla (#3663).
  • Removed push_to_argilla from RemoteFeedbackDataset, as it just works when calling it through a FeedbackDataset locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663).
  • Removed set_suggestions in favor of update(suggestions=...) for both FeedbackRecord and RemoteFeedbackRecord, as all the updates of any \"updateable\" attribute of a record will go through update instead (#3663).
  • Remove unused owner attribute for client Dataset data model (#3665)
"},{"location":"community/changelog/#1141","title":"1.14.1","text":""},{"location":"community/changelog/#fixed_22","title":"Fixed","text":"
  • Fixed PostgreSQL database not being updated after begin_nested because of missing commit (#3567).
"},{"location":"community/changelog/#fixed_23","title":"Fixed","text":"
  • Fixed settings could not be provided when updating a rating or ranking question (#3552).
"},{"location":"community/changelog/#1140","title":"1.14.0","text":""},{"location":"community/changelog/#added_19","title":"Added","text":"
  • Added PATCH /api/v1/fields/{field_id} endpoint to update the field title and markdown settings (#3421).
  • Added PATCH /api/v1/datasets/{dataset_id} endpoint to update dataset name and guidelines (#3402).
  • Added PATCH /api/v1/questions/{question_id} endpoint to update question title, description and some settings (depending on the type of question) (#3477).
  • Added DELETE /api/v1/records/{record_id} endpoint to remove a record given its ID (#3337).
  • Added pull method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) to pull all the records from it and return it as a local copy as a FeedbackDataset (#3465).
  • Added delete method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) (#3512).
  • Added delete_records method in RemoteFeedbackDataset, and delete method in RemoteFeedbackRecord to delete records from Argilla (#3526).
"},{"location":"community/changelog/#changed_16","title":"Changed","text":"
  • Improved efficiency of weak labeling when dataset contains vectors (#3444).
  • Added ArgillaDatasetMixin to detach the Argilla-related functionality from the FeedbackDataset (#3427)
  • Moved FeedbackDataset-related pydantic.BaseModel schemas to argilla.client.feedback.schemas instead, to be better structured and more scalable and maintainable (#3427)
  • Update CLI to use database async connection (#3450).
  • Limit rating questions values to the positive range [1, 10] (#3451).
  • Updated POST /api/users endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462).
  • Updated Python client User.create method to be able to provide a list of workspace names to which the user should be linked to (#3462).
  • Updated GET /api/v1/me/datasets/{dataset_id}/records endpoint to allow getting records matching one of the response statuses provided via query param (#3359).
  • Updated POST /api/v1/me/datasets/{dataset_id}/records endpoint to allow searching records matching one of the response statuses provided via query param (#3359).
  • Updated SearchEngine.search method to allow searching records matching one of the response statuses provided (#3359).
  • After calling FeedbackDataset.push_to_argilla, the methods FeedbackDataset.add_records and FeedbackRecord.set_suggestions will automatically call Argilla with no need of calling push_to_argilla explicitly (#3465).
  • Now calling FeedbackDataset.push_to_huggingface dumps the responses as a List[Dict[str, Any]] instead of Sequence to make it more readable via \ud83e\udd17datasets (#3539).
"},{"location":"community/changelog/#fixed_24","title":"Fixed","text":"
  • Fixed issue with bool values and default from Jinja2 while generating the HuggingFace DatasetCard from argilla_template.md (#3499).
  • Fixed DatasetConfig.from_yaml which was failing when calling FeedbackDataset.from_huggingface as the UUIDs cannot be deserialized automatically by PyYAML, so UUIDs are neither dumped nor loaded anymore (#3502).
  • Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
  • TextClassificationSettings and TokenClassificationSettings labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).
  • Fixed PUT /api/v1/datasets/{dataset_id}/publish to check whether at least one field and question has required=True (#3511).
  • Fixed FeedbackDataset.from_huggingface as suggestions were being lost when there were no responses (#3539).
  • Fixed QuestionSchema and FieldSchema not validating name attribute (#3550).
"},{"location":"community/changelog/#deprecated_4","title":"Deprecated","text":"
  • After calling FeedbackDataset.push_to_argilla, calling push_to_argilla again won't do anything since the dataset is already pushed to Argilla (#3465).
  • After calling FeedbackDataset.push_to_argilla, calling fetch_records won't do anything since the records are lazily fetched from Argilla (#3465).
  • After calling FeedbackDataset.push_to_argilla, the Argilla ID is no longer stored in the attribute/property argilla_id but in id instead (#3465).
"},{"location":"community/changelog/#1133","title":"1.13.3","text":""},{"location":"community/changelog/#fixed_25","title":"Fixed","text":"
  • Fixed ModuleNotFoundError caused because the argilla.utils.telemetry module used in the ArgillaTrainer was importing an optional dependency not installed by default (#3471).
  • Fixed ImportError caused because the argilla.client.feedback.config module was importing pyyaml optional dependency not installed by default (#3471).
"},{"location":"community/changelog/#1132","title":"1.13.2","text":""},{"location":"community/changelog/#fixed_26","title":"Fixed","text":"
  • The suggestion_type_enum ENUM data type created in PostgreSQL didn't have any value (#3445).
"},{"location":"community/changelog/#1131","title":"1.13.1","text":""},{"location":"community/changelog/#fixed_27","title":"Fixed","text":"
  • Fix database migration for PostgreSQL (See #3438)
"},{"location":"community/changelog/#1130","title":"1.13.0","text":""},{"location":"community/changelog/#added_20","title":"Added","text":"
  • Added GET /api/v1/users/{user_id}/workspaces endpoint to list the workspaces to which a user belongs (#3308 and #3343).
  • Added HuggingFaceDatasetMixin for internal usage, to detach the FeedbackDataset integrations from the class itself, and use Mixins instead (#3326).
  • Added GET /api/v1/records/{record_id}/suggestions API endpoint to get the list of suggestions for the responses associated to a record (#3304).
  • Added POST /api/v1/records/{record_id}/suggestions API endpoint to create a suggestion for a response associated to a record (#3304).
  • Added support for RankingQuestionStrategy, RankingQuestionUnification and the .for_text_classification method for the TrainingTaskMapping (#3364)
  • Added PUT /api/v1/records/{record_id}/suggestions API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391).
  • Added suggestions attribute to FeedbackRecord, and allow adding and retrieving suggestions from the Python client (#3370)
  • Added allowed_for_roles Python decorator to check whether the current user has the required role to access the decorated function/method for User and Workspace (#3383)
  • Added API and Python Client support for workspace deletion (Closes #3260)
  • Added GET /api/v1/me/workspaces endpoint to list the workspaces of the current active user (#3390)
"},{"location":"community/changelog/#changed_17","title":"Changed","text":"
  • Updated output payload for GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records, POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to include the suggestions of the records based on the value of the include query parameter (#3304).
  • Updated POST /api/v1/datasets/{dataset_id}/records input payload to add suggestions (#3304).
  • The POST /api/datasets/:dataset-id/:task/bulk endpoints don't create the dataset if does not exists (Closes #3244)
  • Added Telemetry support for ArgillaTrainer (closes #3325)
  • User.workspaces is no longer an attribute but a property, and is calling list_user_workspaces to list all the workspace names for a given user ID (#3334)
  • Renamed FeedbackDatasetConfig to DatasetConfig and export/import from YAML as default instead of JSON (just used internally on push_to_huggingface and from_huggingface methods of FeedbackDataset) (#3326).
  • The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
  • Updated Dockerfile parent image from python:3.9.16-slim to python:3.10.12-slim (#3425).
  • Updated quickstart.Dockerfile parent image from elasticsearch:8.5.3 to argilla/argilla-server:${ARGILLA_VERSION} (#3425).
"},{"location":"community/changelog/#removed_5","title":"Removed","text":"
  • Removed support to non-prefixed environment variables. All valid env vars start with ARGILLA_ (See #3392).
"},{"location":"community/changelog/#fixed_28","title":"Fixed","text":"
  • Fixed GET /api/v1/me/datasets/{dataset_id}/records endpoint returning always the responses for the records even if responses was not provided via the include query parameter (#3304).
  • Values for protected metadata fields are not truncated (Closes #3331).
  • Big number ids are properly rendered in UI (Closes #3265)
  • Fixed ArgillaDatasetCard to include the values/labels for all the existing questions (#3366)
"},{"location":"community/changelog/#deprecated_5","title":"Deprecated","text":"
  • Integer support for record id in text classification, token classification and text2text datasets.
"},{"location":"community/changelog/#1121","title":"1.12.1","text":""},{"location":"community/changelog/#fixed_29","title":"Fixed","text":"
  • Using rg.init with default argilla user skips setting the default workspace if not available. (Closes #3340)
  • Resolved wrong import structure for ArgillaTrainer and TrainingTaskMapping (Closes #3345)
  • Pin pydantic dependency to version < 2 (Closes 3348)
"},{"location":"community/changelog/#1120","title":"1.12.0","text":""},{"location":"community/changelog/#added_21","title":"Added","text":"
  • Added RankingQuestionSettings class allowing to create ranking questions in the API using POST /api/v1/datasets/{dataset_id}/questions endpoint (#3232)
  • Added RankingQuestion in the Python client to create ranking questions (#3275).
  • Added Ranking component in feedback task question form (#3177 & #3246).
  • Added FeedbackDataset.prepare_for_training method for generaring a framework-specific dataset with the responses provided for RatingQuestion, LabelQuestion and MultiLabelQuestion (#3151).
  • Added ArgillaSpaCyTransformersTrainer class for supporting the training with spacy-transformers (#3256).
"},{"location":"community/changelog/#docs","title":"Docs","text":"
  • Added instructions for how to run the Argilla frontend in the developer docs (#3314).
"},{"location":"community/changelog/#changed_18","title":"Changed","text":"
  • All docker related files have been moved into the docker folder (#3053).
  • release.Dockerfile have been renamed to Dockerfile (#3133).
  • Updated rg.load function to raise a ValueError with a explanatory message for the cases in which the user tries to use the function to load a FeedbackDataset (#3289).
  • Updated ArgillaSpaCyTrainer to allow re-using tok2vec (#3256).
"},{"location":"community/changelog/#fixed_30","title":"Fixed","text":"
  • Check available workspaces on Argilla on rg.set_workspace (Closes #3262)
"},{"location":"community/changelog/#1110","title":"1.11.0","text":""},{"location":"community/changelog/#fixed_31","title":"Fixed","text":"
  • Replaced np.float alias by float to avoid AttributeError when using find_label_errors function with numpy>=1.24.0 (#3214).
  • Fixed format_as(\"datasets\") when no responses or optional respones in FeedbackRecord, to set their value to what \ud83e\udd17 Datasets expects instead of just None (#3224).
  • Fixed push_to_huggingface() when generate_card=True (default behaviour), as we were passing a sample record to the ArgillaDatasetCard class, and UUIDs introduced in 1.10.0 (#3192), are not JSON-serializable (#3231).
  • Fixed from_argilla and push_to_argilla to ensure consistency on both field and question re-construction, and to ensure UUIDs are properly serialized as str, respectively (#3234).
  • Refactored usage of import argilla as rg to clarify package navigation (#3279).
"},{"location":"community/changelog/#docs_1","title":"Docs","text":"
  • Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
  • Fixed library buttons' formatting on Tutorials page (#3255).
  • Modified styling of error code outputs in notebooks (#3270).
  • Added ElasticSearch and OpenSearch versions (#3280).
  • Removed template notebook from table of contents (#3271).
  • Fixed tutorials with pip install argilla to not use older versions of the package (#3282).
"},{"location":"community/changelog/#added_22","title":"Added","text":"
  • Added metadata attribute to the Record of the FeedbackDataset (#3194)
  • New users update command to update the role for an existing user (#3188)
  • New Workspace class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180)
  • Added User class to let users manage their Argilla users via the Python client (#3169).
  • Added an option to display tqdm progress bar to FeedbackDataset.push_to_argilla when looping over the records to upload (#3233).
"},{"location":"community/changelog/#changed_19","title":"Changed","text":"
  • The role system now support three different roles owner, admin and annotator (#3104)
  • admin role is scoped to workspace-level operations (#3115)
  • The owner user is created among the default pool of users in the quickstart, and the default user in the server has now owner role (#3248), reverting (#3188).
"},{"location":"community/changelog/#deprecated_6","title":"Deprecated","text":"
  • As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
"},{"location":"community/changelog/#1100","title":"1.10.0","text":""},{"location":"community/changelog/#added_23","title":"Added","text":"
  • Added search component for feedback datasets (#3138)
  • Added markdown support for feedback dataset guidelines (#3153)
  • Added Train button for feedback datasets (#3170)
"},{"location":"community/changelog/#changed_20","title":"Changed","text":"
  • Updated SearchEngine and POST /api/v1/me/datasets/{dataset_id}/records/search to return the total number of records matching the search query (#3166)
"},{"location":"community/changelog/#fixed_32","title":"Fixed","text":"
  • Replaced Enum for string value in URLs for client API calls (Closes #3149)
  • Resolve breaking issue with ArgillaSpanMarkerTrainer for Named Entity Recognition with span_marker v1.1.x onwards.
  • Move ArgillaDatasetCard import under @requires_version decorator, so that the ImportError on huggingface_hub is handled properly (#3174)
  • Allow flow FeedbackDataset.from_argilla -> FeedbackDataset.push_to_argilla under different dataset names and/or workspaces (#3192)
"},{"location":"community/changelog/#docs_2","title":"Docs","text":"
  • Resolved typos in the docs (#3240).
  • Fixed mention of master branch (#3254).
"},{"location":"community/changelog/#190","title":"1.9.0","text":""},{"location":"community/changelog/#added_24","title":"Added","text":"
  • Added boolean use_markdown property to TextFieldSettings model.
  • Added boolean use_markdown property to TextQuestionSettings model.
  • Added new status draft for the Response model.
  • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API (#3005)
  • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API (#3010).
  • Added POST /api/v1/me/datasets/{dataset_id}/records/search endpoint (#3068).
  • Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
  • Added docstrings to the pydantic.BaseModels defined at argilla/client/feedback/schemas.py (#3137)
  • Added the information about executing tests in the developer documentation ([#3143]).
"},{"location":"community/changelog/#changed_21","title":"Changed","text":"
  • Updated GET /api/v1/me/datasets/:dataset_id/metrics output payload to include the count of responses with draft status.
  • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API.
  • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API.
  • Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
  • Updated alembic setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044)
  • Improved DatasetCard generation on FeedbackDataset.push_to_huggingface when generate_card=True, following the official HuggingFace Hub template, but suited to FeedbackDatasets from Argilla (#3110)
"},{"location":"community/changelog/#fixed_33","title":"Fixed","text":"
  • Disallow fields and questions in FeedbackDataset with the same name (#3126).
  • Fixed broken links in the documentation and updated the development branch name from development to develop ([#3145]).
"},{"location":"community/changelog/#180","title":"1.8.0","text":""},{"location":"community/changelog/#added_25","title":"Added","text":"
  • /api/v1/datasets new endpoint to list and create datasets (#2615).
  • /api/v1/datasets/{dataset_id} new endpoint to get and delete datasets (#2615).
  • /api/v1/datasets/{dataset_id}/publish new endpoint to publish a dataset (#2615).
  • /api/v1/datasets/{dataset_id}/questions new endpoint to list and create dataset questions (#2615)
  • /api/v1/datasets/{dataset_id}/fields new endpoint to list and create dataset fields (#2615)
  • /api/v1/datasets/{dataset_id}/questions/{question_id} new endpoint to delete a dataset questions (#2615)
  • /api/v1/datasets/{dataset_id}/fields/{field_id} new endpoint to delete a dataset field (#2615)
  • /api/v1/workspaces/{workspace_id} new endpoint to get workspaces by id (#2615)
  • /api/v1/responses/{response_id} new endpoint to update and delete a response (#2615)
  • /api/v1/datasets/{dataset_id}/records new endpoint to create and list dataset records (#2615)
  • /api/v1/me/datasets new endpoint to list user visible datasets (#2615)
  • /api/v1/me/dataset/{dataset_id}/records new endpoint to list dataset records with user responses (#2615)
  • /api/v1/me/datasets/{dataset_id}/metrics new endpoint to get the dataset user metrics (#2615)
  • /api/v1/me/records/{record_id}/responses new endpoint to create record user responses (#2615)
  • showing new feedback task datasets in datasets list ([#2719])
  • new page for feedback task ([#2680])
  • show feedback task metrics ([#2822])
  • user can delete dataset in dataset settings page ([#2792])
  • Support for FeedbackDataset in Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003])
  • Integration with the HuggingFace Hub ([#2949])
  • Added ArgillaPeftTrainer for text and token classificaiton #2854
  • Added predict_proba() method to ArgillaSetFitTrainer
  • Added ArgillaAutoTrainTrainer for Text Classification #2664
  • New database revisions command showing database revisions info
"},{"location":"community/changelog/#fixes","title":"Fixes","text":"
  • Avoid rendering html for invalid html strings in Text2text ([#2911]https://github.com/argilla-io/argilla/issues/2911)
"},{"location":"community/changelog/#changed_22","title":"Changed","text":"
  • The database migrate command accepts a --revision param to provide specific revision id
  • tokens_length metrics function returns empty data (#3045)
  • token_length metrics function returns empty data (#3045)
  • mention_length metrics function returns empty data (#3045)
  • entity_density metrics function returns empty data (#3045)
"},{"location":"community/changelog/#deprecated_7","title":"Deprecated","text":"
  • Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
  • tokens_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • token_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • mention_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • entity_density metrics function has been deprecated and will be removed in 1.10.0 (#3045)
"},{"location":"community/changelog/#removed_6","title":"Removed","text":"
  • Removed mention density, tokens_length and chars_length metrics from token classification metrics storage (#3045)
  • Removed token char_start, char_end, tag, and score metrics from token classification metrics storage (#3045)
  • Removed tags-related metrics from token classification metrics storage (#3045)
"},{"location":"community/changelog/#170","title":"1.7.0","text":""},{"location":"community/changelog/#added_26","title":"Added","text":"
  • add max_retries and num_threads parameters to rg.log to run data logging request concurrently with backoff retry policy. See #2458 and #2533
  • rg.load accepts include_vectors and include_metrics when loading data. Closes #2398
  • Added settings param to prepare_for_training (#2689)
  • Added prepare_for_training for openai (#2658)
  • Added ArgillaOpenAITrainer (#2659)
  • Added ArgillaSpanMarkerTrainer for Named Entity Recognition (#2693)
  • Added ArgillaTrainer CLI support. Closes (#2809)
"},{"location":"community/changelog/#fixes_1","title":"Fixes","text":"
  • fix image alignment on token classification
"},{"location":"community/changelog/#changed_23","title":"Changed","text":"
  • Argilla quickstart image dependencies are externalized into quickstart.requirements.txt. See #2666
  • bulk endpoints will upsert data when record id is present. Closes #2535
  • moved from click to typer CLI support. Closes (#2815)
  • Argilla server docker image is built with PostgreSQL support. Closes #2686
  • The rg.log computes all batches and raise an error for all failed batches.
  • The default batch size for rg.log is now 100.
"},{"location":"community/changelog/#fixed_34","title":"Fixed","text":"
  • argilla.training bugfixes and unification (#2665)
  • Resolved several small bugs in the ArgillaTrainer.
"},{"location":"community/changelog/#deprecated_8","title":"Deprecated","text":"
  • The rg.log_async function is deprecated and will be removed in next minor release.
"},{"location":"community/changelog/#160","title":"1.6.0","text":""},{"location":"community/changelog/#added_27","title":"Added","text":"
  • ARGILLA_HOME_PATH new environment variable (#2564).
  • ARGILLA_DATABASE_URL new environment variable (#2564).
  • Basic support for user roles with admin and annotator (#2564).
  • id, first_name, last_name, role, inserted_at and updated_at new user fields (#2564).
  • /api/users new endpoint to list and create users (#2564).
  • /api/users/{user_id} new endpoint to delete users (#2564).
  • /api/workspaces new endpoint to list and create workspaces (#2564).
  • /api/workspaces/{workspace_id}/users new endpoint to list workspace users (#2564).
  • /api/workspaces/{workspace_id}/users/{user_id} new endpoint to create and delete workspace users (#2564).
  • argilla.tasks.users.migrate new task to migrate users from old YAML file to database (#2564).
  • argilla.tasks.users.create new task to create a user (#2564).
  • argilla.tasks.users.create_default new task to create a user with default credentials (#2564).
  • argilla.tasks.database.migrate new task to execute database migrations (#2564).
  • release.Dockerfile and quickstart.Dockerfile now creates a default argilladata volume to persist data (#2564).
  • Add user settings page. Closes #2496
  • Added Argilla.training module with support for spacy, setfit, and transformers. Closes #2504
"},{"location":"community/changelog/#fixes_2","title":"Fixes","text":"
  • Now the prepare_for_training method is working when multi_label=True. Closes #2606
"},{"location":"community/changelog/#changed_24","title":"Changed","text":"
  • ARGILLA_USERS_DB_FILE environment variable now it's only used to migrate users from YAML file to database (#2564).
  • full_name user field is now deprecated and first_name and last_name should be used instead (#2564).
  • password user field now requires a minimum of 8 and a maximum of 100 characters in size (#2564).
  • quickstart.Dockerfile image default users from team and argilla to admin and annotator including new passwords and API keys (#2564).
  • Datasets to be managed only by users with admin role (#2564).
  • The list of rules is now accessible while metrics are computed. Closes#2117
  • Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648
"},{"location":"community/changelog/#removed_7","title":"Removed","text":"
  • email user field (#2564).
  • disabled user field (#2564).
  • Support for private workspaces (#2564).
  • ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY and ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD environment variables. Use python -m argilla.tasks.users.create_default instead (#2564).
  • The old headers for API Key and workspace from python client
  • The default value for old API Key constant. Closes #2251
"},{"location":"community/changelog/#151-2023-03-30","title":"1.5.1 - 2023-03-30","text":""},{"location":"community/changelog/#fixes_3","title":"Fixes","text":"
  • Copying datasets between workspaces with proper owner/workspace info. Closes #2562
  • Copy dataset with empty workspace to the default user workspace 905d4de
  • Using elasticsearch config to request backend version. Closes #2311
  • Remove sorting by score in labels. Closes #2622
"},{"location":"community/changelog/#changed_25","title":"Changed","text":"
  • Update field name in metadata for image url. See #2609
  • Improvements in tutorial doc cards. Closes #2216
"},{"location":"community/changelog/#150-2023-03-21","title":"1.5.0 - 2023-03-21","text":""},{"location":"community/changelog/#added_28","title":"Added","text":"
  • Add the fields to retrieve when loading the data from argilla. rg.load takes too long because of the vector field, even when users don't need it. Closes #2398
  • Add new page and components for dataset settings. Closes #2442
  • Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
  • Non-searchable fields support in metadata. #2570
  • Add record ID references to the prepare for training methods. Closes #2483
  • Add tutorial on Image Classification. #2420
  • Add Train button, visible for \"admin\" role, with code snippets from a selection of libraries. Closes [#2591] (https://github.com/argilla-io/argilla/pull/2591)
"},{"location":"community/changelog/#changed_26","title":"Changed","text":"
  • Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see https://github.com/argilla-io/argilla/issues/2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
  • The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
  • Update \"Define a labeling schema\" section in docs.
  • The record inputs are sorted alphabetically in UI by default. #2581
  • The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587
"},{"location":"community/changelog/#fixes_4","title":"Fixes","text":"
  • Allow URL to be clickable in Jupyter notebook again. Closes #2527
"},{"location":"community/changelog/#removed_8","title":"Removed","text":"
  • Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client <v1.3.0
  • Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version <1.3.0
  • Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.
"},{"location":"community/contributor/","title":"How to contribute?","text":"

Thank you for investing your time in contributing to the project! Any contribution you make will be reflected in the most recent version of Argilla \ud83e\udd29.

New to contributing in general?

If you're a new contributor, read the README to get an overview of the project. In addition, here are some resources to help you get started with open-source contributions:

  • Discord: You are welcome to join the Argilla Discord community, where you can keep in touch with other users, contributors and the Argilla team. In the following section, you can find more information on how to get started in Discord.
  • Git: This is a very useful tool to keep track of the changes in your files. Using the command-line interface (CLI), you can make your contributions easily. For that, you need to have it installed and updated on your computer.
  • GitHub: It is a platform and cloud-based service that uses git and allows developers to collaborate on projects. To contribute to Argilla, you'll need to create an account. Check the Contributor Workflow with Git and Github for more info.
  • Developer Documentation: To collaborate, you'll need to set up an efficient environment. Check the developer documentation to know how to do it.
"},{"location":"community/contributor/#first-contact-in-discord","title":"First Contact in Discord","text":"

Discord is a handy tool for more casual conversations and to answer day-to-day questions. As part of Hugging Face, we have set up some Argilla channels on the server. Click here to join the Hugging Face Discord community effortlessly.

When part of the Hugging Face Discord, you can select \"Channels & roles\" and select \"Argilla\" along with any of the other groups that are interesting to you. \"Argilla\" will cover anything about Argilla and Distilabel. You can join the following channels:

  • #argilla-announcements: \ud83d\udce2 Important announcements and updates.
  • #argilla-distilabel-general: \ud83d\udcac General discussions about Argilla and Distilabel.
  • #argilla-distilabel-help: \ud83d\ude4b\u200d\u2640\ufe0f Need assistance? We're always here to help. Select the appropriate label (argilla or distilabel) for your issue and post it.

So now there is only one thing left to do: introduce yourself and talk to the community. You'll always be welcome! \ud83e\udd17\ud83d\udc4b

"},{"location":"community/contributor/#contributor-workflow-with-git-and-github","title":"Contributor Workflow with Git and GitHub","text":"

If you're working with Argilla and suddenly a new idea comes to your mind or you find an issue that can be improved, it's time to actively participate and contribute to the project!

"},{"location":"community/contributor/#report-an-issue","title":"Report an issue","text":"

If you spot a problem, search if an issue already exists. You can use the Label filter. If that is the case, participate in the conversation. If it does not exist, create an issue by clicking on New Issue.

This will show various templates, choose the one that best suits your issue.

Below, you can see an example of the Feature request template. Once you choose one, you will need to fill in it following the guidelines. Try to be as clear as possible. In addition, you can assign yourself to the issue and add or choose the right labels. Finally, click on Submit new issue.

"},{"location":"community/contributor/#work-with-a-fork","title":"Work with a fork","text":""},{"location":"community/contributor/#fork-the-argilla-repository","title":"Fork the Argilla repository","text":"

After having reported the issue, you can start working on it. For that, you will need to create a fork of the project. To do that, click on the Fork button.

Now, fill in the information. Remember to uncheck the Copy develop branch only if you are going to work in or from another branch (for instance, to fix documentation the main branch is used). Then, click on Create fork.

Now, you will be redirected to your fork. You can see that you are in your fork because the name of the repository will be your username/argilla, and it will indicate forked from argilla-io/argilla.

"},{"location":"community/contributor/#clone-your-forked-repository","title":"Clone your forked repository","text":"

In order to make the required adjustments, clone the forked repository to your local machine. Choose the destination folder and run the following command:

git clone https://github.com/[your-github-username]/argilla.git\ncd argilla\n

To keep your fork\u2019s main/develop branch up to date with our repo, add it as an upstream remote branch.

git remote add upstream https://github.com/argilla-io/argilla.git\n
"},{"location":"community/contributor/#create-a-new-branch","title":"Create a new branch","text":"

For each issue you're addressing, it's advisable to create a new branch. GitHub offers a straightforward method to streamline this process.

\u26a0\ufe0f Never work directly on the main or develop branch. Always create a new branch for your changes.

Navigate to your issue and on the right column, select Create a branch.

After the new window pops up, the branch will be named after the issue, include a prefix such as feature/, bug/, or docs/ to facilitate quick recognition of the issue type. In the Repository destination, pick your fork ( [your-github-username]/argilla), and then select Change branch source to specify the source branch for creating the new one. Complete the process by clicking Create branch.

\ud83e\udd14 Remember that the main branch is only used to work with the documentation. For any other changes, use the develop branch.

Now, locally change to the new branch you just created.

git fetch origin\ngit checkout [branch-name]\n
"},{"location":"community/contributor/#use-changelogmd","title":"Use CHANGELOG.md","text":"

If you are working on a new feature, it is a good practice to make note of it for others to keep up with the changes. For that, we utilize the CHANGELOG.md file in the root directory. This file is used to list changes made in each version of the project and there are headers that we use to denote each type of change.

  • Added: for new features.
  • Changed: for changes in existing functionality.
  • Deprecated: for soon-to-be removed features.
  • Removed: for now removed features.
  • Fixed: for any bug fixes.
  • Security: in case of vulnerabilities.

A sample addition would be:

- Fixed the key errors for the `init` method ([#NUMBER_OF_PR](LINK_TO_PR)). Contributed by @github_handle.\n

You can have a look at the CHANGELOG.md) file to see more cases and examples.

"},{"location":"community/contributor/#make-changes-and-push-them","title":"Make changes and push them","text":"

Make the changes you want in your local repository, and test that everything works and you are following the guidelines.

Check the developer documentation to set up your environment and start working on the project.

Once you have finished, you can check the status of your repository and synchronize with the upstreaming repo with the following command:

# Check the status of your repository\ngit status\n\n# Synchronize with the upstreaming repo\ngit checkout [branch-name]\ngit rebase [default-branch]\n

If everything is right, we need to commit and push the changes to your fork. For that, run the following commands:

# Add the changes to the staging area\ngit add filename\n\n# Commit the changes by writing a proper message\ngit commit -m \"commit-message\"\n\n# Push the changes to your fork\ngit push origin [branch-name]\n

When pushing, you will be asked to enter your GitHub login credentials. Once the push is complete, all local commits will be on your GitHub repository.

"},{"location":"community/contributor/#create-a-pull-request","title":"Create a pull request","text":"

Come back to GitHub, navigate to the original repository where you created your fork, and click on Compare & pull request.

First, click on compare across forks and select the right repositories and branches.

In the base repository, keep in mind to select either main or develop based on the modifications made. In the head repository, indicate your forked repository and the branch corresponding to the issue.

Then, fill in the pull request template. You should add a prefix to the PR name as we did with the branch above. If you are working on a new feature, you can name your PR as feat: TITLE. If your PR consists of a solution for a bug, you can name your PR as bug: TITLE And, if your work is for improving the documentation, you can name your PR as docs: TITLE.

In addition, on the right side, you can select a reviewer (for instance, if you discussed the issue with a member of the Argilla team) and assign the pull request to yourself. It is highly advisable to add labels to PR as well. You can do this again by the labels section right to the screen. For instance, if you are addressing a bug, add the bug label or if the PR is related to the documentation, add the documentation label. This way, PRs can be easily filtered.

Finally, fill in the template carefully and follow the guidelines. Remember to link the original issue and enable the checkbox to allow maintainer edits so the branch can be updated for a merge. Then, click on Create pull request.

"},{"location":"community/contributor/#review-your-pull-request","title":"Review your pull request","text":"

Once you submit your PR, a team member will review your proposal. We may ask questions, request additional information or ask for changes to be made before a PR can be merged, either using suggested changes or pull request comments.

You can apply the changes directly through the UI (check the files changed and click on the right-corner three dots, see image below) or from your fork, and then commit them to your branch. The PR will be updated automatically and the suggestions will appear as outdated.

If you run into any merge issues, check out this git tutorial to help you resolve merge conflicts and other issues.

"},{"location":"community/contributor/#your-pr-is-merged","title":"Your PR is merged!","text":"

Congratulations \ud83c\udf89\ud83c\udf8a We thank you \ud83e\udd29

Once your PR is merged, your contributions will be publicly visible on the Argilla GitHub.

Additionally, we will include your changes in the next release based on our development branch.

"},{"location":"community/contributor/#additional-resources","title":"Additional resources","text":"

Here are some helpful resources for your reference.

  • Configuring Discord, a guide to learn how to get started with Discord.
  • Pro Git, a book to learn Git.
  • Git in VSCode, a guide to learn how to easily use Git in VSCode.
  • GitHub Skills, an interactive course to learn GitHub.
"},{"location":"community/developer/","title":"Developer documentation","text":"

As an Argilla developer, you are already part of the community, and your contribution is to our development. This guide will help you set up your development environment and start contributing.

Argilla core components

  • Documentation: Argilla's documentation serves as an invaluable resource, providing a comprehensive and in-depth guide for users seeking to explore, understand, and effectively harness the core components of the Argilla ecosystem.

  • Python SDK: A Python SDK installable with pip install argilla to interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration, and annotation workflows.

  • FastAPI Server: The core of Argilla is a Python FastAPI server that manages the data by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides an REST API that interacts with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data.

  • Relational Database: A relational database to store the metadata of the records and the annotations. SQLite is used as the default built-in option and is deployed separately with the Argilla Server, but a separate PostgreSQL can be used.

  • Vector Database: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support ElasticSearch and OpenSearch, which can be deployed as separate Docker images.

  • Vue.js UI: A web application to visualize and annotate your data, users, and teams. It is built with Vue.js and is directly deployed alongside the Argilla Server within our Argilla Docker image.

"},{"location":"community/developer/#the-argilla-repository","title":"The Argilla repository","text":"

The Argilla repository has a monorepo structure, which means that all the components are located in the same repository: argilla-io/argilla. This repo is divided into the following folders:

  • argilla: The python SDK project
  • argilla-server: The FastAPI server project
  • argilla-frontend: The Vue.js UI project
  • argilla/docs: The documentation project
  • examples: Example resources for deployments, scripts and notebooks

How to contribute?

Before starting to develop, we recommend reading our contribution guide to understand the contribution process and the guidelines to follow. Once you have cloned the Argilla repository and checked out to the correct branch, you can start setting up your development environment.

"},{"location":"community/developer/#set-up-the-python-environment","title":"Set up the Python environment","text":"

To work on the Argilla Python SDK, you must install the Argilla package on your system.

Create a virtual environment

We recommend creating a dedicated virtual environment for SDK development to prevent conflicts. For this, you can use the manager of your choice, such as venv, conda, pyenv, or uv.

From the root of the cloned Argilla repository, you should move to the argilla folder in your terminal.

cd argilla\n

Next, activate your virtual environment and make the required installations:

# Install the `pdm` package manager\npip install pdm\n\n# Install argilla in editable mode and the development dependencies\npdm install --dev\n
"},{"location":"community/developer/#linting-and-formatting","title":"Linting and formatting","text":"

To maintain a consistent code format, install the pre-commit hooks to run before each commit automatically.

pre-commit install\n

In addition, run the following scripts to check the code formatting and linting:

pdm run format\npdm run lint\n
"},{"location":"community/developer/#running-tests","title":"Running tests","text":"

Running tests at the end of every development cycle is indispensable to ensure no breaking changes.

# Run all tests\npdm run tests\n\n# Run specific tests\npytest tests/integration\npytest tests/unit\n
Running linting, formatting, and tests

You can run all the checks at once by using the following command:

    pdm run all\n
"},{"location":"community/developer/#set-up-the-databases","title":"Set up the databases","text":"

To run your development environment, you need to set up Argilla's databases.

"},{"location":"community/developer/#vector-database","title":"Vector database","text":"

Argilla supports ElasticSearch as its primary search engine for the vector database by default. For more information about setting OpenSearch, check the Server configuration.

You can run ElasticSearch locally using Docker:

# Argilla supports ElasticSearch versions >=8.5\ndocker run -d --name elasticsearch-for-argilla -p 9200:9200 -p 9300:9300 -e \"ES_JAVA_OPTS=-Xms512m -Xmx512m\" -e \"discovery.type=single-node\" -e \"xpack.security.enabled=false\" docker.elastic.co/elasticsearch/elasticsearch:8.5.3\n

Install Docker

You can find the Docker installation guides for Windows, macOS and Linux on Docker website.

"},{"location":"community/developer/#relational-database","title":"Relational database","text":"

Argilla will use SQLite as the default built-in option to store information about users, workspaces, etc., for the relational database. No additional configuration is required to start using SQLite.

By default, the database file will be created at ~/.argilla/argilla.db; this can be configured by setting different values for ARGILLA_DATABASE_URL and ARGILLA_HOME_PATH environment variables.

Manage the database

For more information about the database migration and user management, refer to the Argilla server README.

"},{"location":"community/developer/#set-up-the-server","title":"Set up the server","text":"

Once you have set up the databases, you can start the Argilla server. To run the server, you can check the Argilla server README file.

"},{"location":"community/developer/#set-up-the-frontend","title":"Set up the frontend","text":"

Optionally, if you need to run the Argilla frontend, you can follow the instructions in the Argilla frontend README.

"},{"location":"community/developer/#set-up-the-documentation","title":"Set up the documentation","text":"

Documentation is essential to provide users with a comprehensive guide about Argilla.

From main or develop?

If you are updating, improving, or fixing the current documentation without a code change, work on the main branch. For new features or bug fixes that require documentation, use the develop branch.

To contribute to the documentation and generate it locally, ensure you installed the development dependencies as shown in the \"Set up the Python environment\" section, and run the following command to create the development server with mkdocs:

mkdocs serve\n
"},{"location":"community/developer/#documentation-guidelines","title":"Documentation guidelines","text":"

As mentioned, we use mkdocs to build the documentation. You can write the documentation in markdown format, and it will automatically be converted to HTML. In addition, you can include elements such as tables, tabs, images, and others, as shown in this guide. We recommend following these guidelines:

  • Use clear and concise language: Ensure the documentation is easy to understand for all users by using straightforward language and including meaningful examples. Images are not easy to maintain, so use them only when necessary and place them in the appropriate folder within the docs/assets/images directory.
  • Verify code snippets: Double-check that all code snippets are correct and runnable.
  • Review spelling and grammar: Check the spelling and grammar of the documentation.
  • Update the table of contents: If you add a new page, include it in the relevant index.md or the mkdocs.yml file.

Contribute with a tutorial

You can also contribute a tutorial (.ipynb) to the \"Community\" section. We recommend aligning the tutorial with the structure of the existing tutorials. For an example, check this tutorial.

"},{"location":"community/popular_issues/","title":"Issue dashboard","text":"Most engaging open issuesLatest issues open by the communityPlanned issues for upcoming releases Rank Issue Reactions Comments 1 4637 - [FEATURE] Label breakdown in Feedback dataset stats \ud83d\udc4d 6 \ud83d\udcac 4 2 1607 - Support for hierarchical multilabel text classification (taxonomy) \ud83d\udc4d 5 \ud83d\udcac 15 3 4658 - Active listeners for Feedback Dataset \ud83d\udc4d 5 \ud83d\udcac 5 4 1800 - Add comments/notes to annotation datasets to share with teammates. \ud83d\udc4d 2 \ud83d\udcac 6 5 1837 - Custom Record UI Templates \ud83d\udc4d 2 \ud83d\udcac 6 6 1922 - Show potential number of records during filter selection \ud83d\udc4d 2 \ud83d\udcac 4 7 1630 - Accepting several predictions/annotations for the same record \ud83d\udc4d 2 \ud83d\udcac 2 8 5348 - [FEATURE] Ability to create new labels on-the-fly \ud83d\udc4d 2 \ud83d\udcac 0 9 3625 - [IMPROVE] Fields with empty title shall have exactly the same value as the user entered in the name field, without altering it \ud83d\udc4d 2 \ud83d\udcac 0 10 4372 - [FEATURE] distribution indication for filters \ud83d\udc4d 1 \ud83d\udcac 6 Rank Issue Author 1 \ud83d\udfe2 5570 - [BUG-python/deployment] by lecheuklun 2 \ud83d\udfe2 5561 - [FEATURE] Force predetermined sorting for a dataset by lgienapp 3 \ud83d\udfe2 5557 - [DOCS] \"Bulk Labeling Multimodal Data\" Notebook outdated by trojblue 4 \ud83d\udfe2 5548 - [BUG-python/deployment] verify=False parameter is not passed to httpx.Client through Argilla class (v2.2.0) by xiajing10 5 \ud83d\udfe3 5543 - automatically load token from collab secrets if it exists by not-lain 6 \ud83d\udfe3 5530 - [FEATURE] updated_at / inserted_at properties on retrieved Records by maxserras 7 \ud83d\udfe3 5529 - [BUG-UI/UX] API Key copy button not working by cceyda 8 \ud83d\udfe2 5528 - [FEATURE] Filter by responses & suggestions by cceyda 9 \ud83d\udfe2 5516 - [FEATURE] Allow all annotators in workspace to see all the submitted records by cceyda 10 \ud83d\udfe2 5513 - [ENHANCEMENT] Improve ImageField error messaging to deal with paths, urls, none by cceyda Rank Issue Milestone 1 \ud83d\udfe2 5415 - [FEATURE] Do not stop logging records if UnprocessableEntityError is raised because one single record v2.2.0 2 \ud83d\udfe2 5534 - [FEATURE] preview custom field data in dataset settings page v2.3.0 3 \ud83d\udfe2 5520 - [BUG-UI/UX] Incorrect iframe height calculation in sandBox Component v2.4.0 4 \ud83d\udfe2 5513 - [ENHANCEMENT] Improve ImageField error messaging to deal with paths, urls, none v2.4.0 5 \ud83d\udfe2 5458 - [FEATURE] Controls for data schema for images when exporting datasets and records v2.4.0 6 \ud83d\udfe2 4931 - [REFACTOR] Improve handling of question models and dicts v2.4.0 7 \ud83d\udfe2 4935 - [CONFIG] Resolve python requirements for python version and dependencies with server. v2.4.0 8 \ud83d\udfe2 1836 - Webhooks v2.4.0

Last update: 2024-10-07

"},{"location":"community/integrations/llamaindex_rag_github/","title":"LlamaIndex","text":"
!pip install \"argilla-llama-index\"\n!pip install \"llama-index-readers-github==0.1.9\"\n

Let's make the required imports:

from llama_index.core import (\n    Settings,\n    VectorStoreIndex,\n    set_global_handler,\n)\nfrom llama_index.llms.openai import OpenAI\nfrom llama_index.readers.github import (\n    GithubClient,\n    GithubRepositoryReader,\n)\n

We need to set the OpenAI API key and the GitHub token. The OpenAI API key is required to run queries using GPT models, while the GitHub token ensures you have access to the repository you're using. Although the GitHub token might not be necessary for public repositories, it is still recommended.

import os\n\nos.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\nopenai_api_key = os.getenv(\"OPENAI_API_KEY\")\n\nos.environ[\"GITHUB_TOKEN\"] = \"ghp_...\"\ngithub_token = os.getenv(\"GITHUB_TOKEN\")\n
set_global_handler(\n    \"argilla\",\n    dataset_name=\"github_query_model\",\n    api_url=\"http://localhost:6900\",\n    api_key=\"argilla.apikey\",\n    number_of_retrievals=2,\n)\n
github_client = GithubClient(github_token=github_token, verbose=True)\n

Before creating our GithubRepositoryReader instance, we need to adjust the nesting. Since the Jupyter kernel operates on an event loop, we must prevent this loop from finishing before the repository is fully read.

import nest_asyncio\n\nnest_asyncio.apply()\n

Now, let\u2019s create a GithubRepositoryReader instance with the necessary repository details. In this case, we'll target the main branch of the argilla repository. As we will focus on the documentation, we will focus on the argilla/docs/ folder, excluding images, json files, and ipynb files.

documents = GithubRepositoryReader(\n    github_client=github_client,\n    owner=\"argilla-io\",\n    repo=\"argilla\",\n    use_parser=False,\n    verbose=False,\n    filter_directories=(\n        [\"argilla/docs/\"],\n        GithubRepositoryReader.FilterType.INCLUDE,\n    ),\n    filter_file_extensions=(\n        [\n            \".png\",\n            \".jpg\",\n            \".jpeg\",\n            \".gif\",\n            \".svg\",\n            \".ico\",\n            \".json\",\n            \".ipynb\",   # Erase this line if you want to include notebooks\n\n        ],\n        GithubRepositoryReader.FilterType.EXCLUDE,\n    ),\n).load_data(branch=\"main\")\n

Now, let's create a LlamaIndex index out of this document, and we can start querying the RAG system.

# LLM settings\nSettings.llm = OpenAI(\n    model=\"gpt-3.5-turbo\", temperature=0.8, openai_api_key=openai_api_key\n)\n\n# Load the data and create the index\nindex = VectorStoreIndex.from_documents(documents)\n\n# Create the query engine\nquery_engine = index.as_query_engine()\n
response = query_engine.query(\"How do I create a Dataset in Argilla?\")\nresponse\n

The generated response will be automatically logged in our Argilla instance. Check it out! From Argilla you can quickly have a look at your predictions and annotate them, so you can combine both synthetic data and human feedback.

Let's ask a couple of more questions to see the overall behavior of the RAG chatbot. Remember that the answers are automatically logged into your Argilla instance.

questions = [\n    \"How can I list the available datasets?\",\n    \"Which are the user credentials?\",\n    \"Can I use markdown in Argilla?\",\n    \"Could you explain how to annotate datasets in Argilla?\",\n]\n\nanswers = []\n\nfor question in questions:\n    answers.append(query_engine.query(question))\n\nfor question, answer in zip(questions, answers):\n    print(f\"Question: {question}\")\n    print(f\"Answer: {answer}\")\n    print(\"----------------------------\")\n
\nQuestion: How can I list the available datasets?\nAnswer: You can list all the datasets available in a workspace by utilizing the `datasets` attribute of the `Workspace` class. Additionally, you can determine the number of datasets in a workspace by using `len(workspace.datasets)`. To list the datasets, you can iterate over them and print out each dataset. Remember that dataset settings are not preloaded when listing datasets, and if you need to work with settings, you must load them explicitly for each dataset.\n----------------------------\nQuestion: Which are the user credentials?\nAnswer: The user credentials in Argilla consist of a username, password, and API key.\n----------------------------\nQuestion: Can I use markdown in Argilla?\nAnswer: Yes, you can use Markdown in Argilla.\n----------------------------\nQuestion: Could you explain how to annotate datasets in Argilla?\nAnswer: To annotate datasets in Argilla, users can manage their data annotation projects by setting up `Users`, `Workspaces`, `Datasets`, and `Records`. By deploying Argilla on the Hugging Face Hub or with `Docker`, installing the Python SDK with `pip`, and creating the first project, users can get started in just 5 minutes. The tool allows for interacting with data in a more engaging way through features like quick labeling with filters, AI feedback suggestions, and semantic search, enabling users to focus on training models and monitoring their performance effectively.\n----------------------------\n\n
\n\n\n
"},{"location":"community/integrations/llamaindex_rag_github/#create-a-rag-system-expert-in-a-github-repository-and-log-your-predictions-in-argilla","title":"\ud83d\udd75\ud83c\udffb\u200d\u2640\ufe0f Create a RAG system expert in a GitHub repository and log your predictions in Argilla","text":"

In this tutorial, we'll show you how to create a RAG system that can answer questions about a specific GitHub repository. As example, we will target the Argilla repository. This RAG system will target the docs of the repository, as that's where most of the natural language information about the repository can be found.

This tutorial includes the following steps:

  • Setting up the Argilla callback handler for LlamaIndex.
  • Initializing a GitHub client
  • Creating an index with a specific set of files from the GitHub repository of our choice.
  • Create a RAG system out of the Argilla repository, ask questions, and automatically log the answers to Argilla.

This tutorial is based on the Github Repository Reader made by LlamaIndex.

"},{"location":"community/integrations/llamaindex_rag_github/#getting-started","title":"Getting started","text":""},{"location":"community/integrations/llamaindex_rag_github/#deploy-the-argilla-server","title":"Deploy the Argilla server\u00b6","text":"

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

"},{"location":"community/integrations/llamaindex_rag_github/#set-up-the-environment","title":"Set up the environment\u00b6","text":"

To complete this tutorial, you need to install this integration and a third-party library via pip.

Note

Check the integration GitHub repository here.

"},{"location":"community/integrations/llamaindex_rag_github/#set-the-argillas-llamaindex-handler","title":"Set the Argilla's LlamaIndex handler","text":"

To easily log your data into Argilla within your LlamaIndex workflow, you only need a simple step. Just call the Argilla global handler for Llama Index before starting production with your LLM. This ensured that the predictions obtained using Llama Index are automatically logged to the Argilla instance.

  • dataset_name: The name of the dataset. If the dataset does not exist, it will be created with the specified name. Otherwise, it will be updated.
  • api_url: The URL to connect to the Argilla instance.
  • api_key: The API key to authenticate with the Argilla instance.
  • number_of_retrievals: The number of retrieved documents to be logged. Defaults to 0.
  • workspace_name: The name of the workspace to log the data. By default, the first available workspace.

> For more information about the credentials, check the documentation for users and workspaces.

"},{"location":"community/integrations/llamaindex_rag_github/#retrieve-the-data-from-github","title":"Retrieve the data from GitHub","text":"

First, we need to initialize the GitHub client, which will include the GitHub token for repository access.

"},{"location":"community/integrations/llamaindex_rag_github/#create-the-index-and-make-some-queries","title":"Create the index and make some queries","text":""},{"location":"getting_started/faq/","title":"FAQs","text":"What is Argilla?

Argilla is a collaboration tool for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency. It is designed to help you achieve and keep high-quality data standards, store your training data, store the results of your models, evaluate their performance, and improve the data through human and AI feedback.

Does Argilla cost money?

No. Argilla is an open-source project and is free to use. You can deploy Argilla on your own infrastructure or use our cloud offering.

What data types does Argilla support?

Text data, mostly. Argilla natively supports textual data, however, we do support rich text, which means you can represent different types of data in Argilla as long as you can convert it to text. For example, you can store images, audio, video, and any other type of data as long as you can convert it to their base64 representation or render them as HTML in for example an IFrame.

Does Argilla train models?

No. Argilla is a collaboration tool to achieve and keep high-quality data standards. You can use Argilla to store your training data, store the results of your models, evaluate their performance and improve the data. For training models, you can use any machine learning framework or library that you prefer even though we recommend starting with Hugging Face Transformers.

Does Argilla provide annotation workforces?

Yes, kind of. We don't provide annotation workforce in-house but we do have partnerships with workforce providers that ensure ethical practices and secure work environments. Feel free to schedule a meeting here or contact us via email.

How does Argilla differ from competitors like Lilac, Snorkel, Prodigy and Scale?

Argilla distinguishes itself for its focus on specific use cases and human-in-the-loop approaches. While it does offer programmatic features, Argilla\u2019s core value lies in actively involving human experts in the tool-building process, setting it apart from other competitors.

Furthermore, Argilla places particular emphasis on smooth integration with other tools in the community, particularly within the realms of MLOps and NLP. So, its compatibility with popular frameworks like spaCy and Hugging Face makes it exceptionally user-friendly and accessible.

Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, often require a significant commitment. Argilla, on the other hand, works more as a tool within the MLOps ecosystem, allowing users to begin with specific use cases and then scale up as needed. This flexibility is particularly beneficial for users and customers who prefer to start small and expand their applications over time, as opposed to committing to an all-encompassing tool from the outset.

What is the difference between Argilla 2.0 and the legacy datasets in 1.0?

Argilla 1.0 relied on 3 main task datasets: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text. These tasks were designed to be simple, easy to use and high in functionality but they were limited in adaptability. With the introduction of Large Language Models (LLMs) and the increasing complexity of NLP tasks, we realized that we needed to expand the capabilities of Argilla to support more advanced feedback mechanisms which led to the introduction of the FeedbackDataset. Compared to its predecessor it was high in adaptability but still limited in functionality. After having ported all of the functionality of the legacy tasks to the new FeedbackDataset, we decided to deprecate the legacy tasks in favor of a brand new SDK with the FeedbackDataset at its core.

"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/","title":"Hugging Face Spaces Settings","text":"

This section details how to configure and deploy Argilla on Hugging Face Spaces. It covers:

  • Persistent storage
  • How to deploy Argilla under a Hugging Face Organization
  • How to configure and disable HF OAuth access
  • How to use Private Spaces

Looking to get started easily?

If you just discovered Argilla and want to get started quickly, go to the Quickstart guide.

"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#persistent-storage","title":"Persistent storage","text":"

In the Space creation UI, persistent storage is set to Small PAID, which is a paid service, charged per hour of usage.

Spaces get restarted due to maintainance, inactivity, and every time you change your Spaces settings. Persistent storage enables Argilla to save to disk your datasets and configurations across restarts.

Ephimeral FREE persistent storage

Not setting persistent storage to Small means that you will loose your data when the Space restarts.

If you plan to use the Argilla Space beyond testing, it's highly recommended to set persistent storage to Small.

If you just want to quickly test or use Argilla for a few hours with the risk of loosing your datasets, choose Ephemeral FREE. Ephemeral FREE means your datasets and configuration will not be saved to disk, when the Space is restarted your datasets, workspaces, and users will be lost.

If you want to disable the persistence storage warning, you can set the environment variable ARGILLA_SHOW_HUGGINGFACE_SPACE_PERSISTENT_STORAGE_WARNING=false

Read this if you have datasets and want to enable persistent storage

If you want to enable persistent storage Small PAID and you have created datasets, users, or workspaces, follow this process:

  • First, make a local or remote copy of your datasets, following the Import and Export guide. This is the most important step, because changing the settings of your Space leads to a restart and thus a data loss.
  • If you have created users (not signed in with Hugging Face login), consider storing a copy of users following the manage users guide.
  • Once you have stored all your data safely, go to you Space Settings Tab and select Small.
  • Your Space will be restarted and existing data will be lost. From now on, all the new data you create in Argilla will be kept safely
  • Recover your data, by following the above mentioned guides.
"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-configure-and-disable-oauth-access","title":"How to configure and disable OAuth access","text":"

By default, Argilla Spaces are configured with Hugging Face OAuth, in the following way:

  • Any Hugging Face user that can see your Space, can use the Sign in button, join as an annotator, and contribute to the datasets available under the argilla workspace. This workspace is created during the deployment process.
  • These users can only explore and annotate datasets in the argilla workspace but can't perform any critical operation like create, delete, update, or configure datasets. By default, any other workspace you create, won't be visible to these users.

To restrict access or change the default behaviour, there's two options:

Set your Space to private. This is especially useful if your Space is under an organization. This will only allow members within your organization to see and join your Argilla space. It can also be used for personal, solo projects.

Modify the .oauth.yml configuration file. You can find and modify this file under the Files tab of your Space. The default file looks like this:

# Change to `false` to disable HF oauth integration\n#enabled: false\n\nproviders:\n  - name: huggingface\n\n# Allowed workspaces must exists\nallowed_workspaces:\n  - name: argilla\n
You can modify two things:

  • Uncomment enabled: false to completely disable the Sign in with Hugging Face. If you disable it make sure to set the USERNAME and PASSWORD Space secrets to be able to login as an owner.
  • Change the list of allowed workspaces.

For example if you want to let users join a new workspace community-initiative:

allowed_workspaces:\n  - name: argilla\n  - name: community-initiative\n
"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-deploy-argilla-under-a-hugging-face-organization","title":"How to deploy Argilla under a Hugging Face Organization","text":"

Creating an Argilla Space within an organization is useful for several scenarios:

  • You want to only enable members of your organization to join your Space. You can achieve this by setting your Space to private.
  • You want manage the Space together with other users (e.g., Space settings, etc.). Note that if you just want to manage your Argilla datasets, workspaces, you can achieve this by adding other Argilla owner roles to your Argilla Server.
  • More generally, you want to make available your space under an organization/community umbrella.

The steps are very similar the Quickstart guide with two important differences:

Setup USERNAME

You need to set up the USERNAME Space Secret with your Hugging Face username. This way, the first time you enter with the Hugging Face Sign in button, you'll be granted the owner role.

Enable Persistent Storage SMALL

Not setting persistent storage to Small means that you will loose your data when the Space restarts.

For Argilla Spaces with many users, it's strongly recommended to set persistent storage to Small.

"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-use-private-spaces","title":"How to use Private Spaces","text":"

Setting your Space visibility to private can be useful if:

  • You want to work on your personal, solo project.
  • You want your Argilla to be available only to members of the organization where you deploy the Argilla Space.

You can set the visibility of the Space during the Space creation process or afterwards under the Settings Tab.

To use the Python SDK with private Spaces you need to specify your HF_TOKEN which can be found here, when creating the client:

import argilla as rg\n\nHF_TOKEN = \"...\"\n\nclient = rg.Argilla(\n    api_url=\"<api_url>\",\n    api_key=\"<api_key>\"\n    headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n
"},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#space-secrets-overview","title":"Space Secrets overview","text":"

There's two optional secrets to set up the USERNAME and PASSWORD of the owner of the Argilla Space. Remember that, by default Argilla Spaces are configured with a Sign in with Hugging Face button, which is also used to grant an owner to the creator of the Space for personal spaces.

The USERNAME and PASSWORD are only useful in a couple of scenarios:

  • You have disabled Hugging Face OAuth.
  • You want to set up Argilla under an organization and want your Hugging Face username to be granted the owner role.

In summary, when setting up a Space:

Creating a Space under your personal account

If you are creating the Space under your personal account, don't insert any value for USERNAME and PASSWORD. Once you launch the Space you will be able to Sign in with your Hugging Face username and the owner role.

Creating a Space under an organization

If you are creating the Space under an organization make sure to insert your Hugging Face username in the secret USERNAME. In this way, you'll be able to Sign in with your Hugging Face user.

"},{"location":"getting_started/how-to-deploy-argilla-with-docker/","title":"Deploy with Docker","text":"

This guide describes how to deploy the Argilla Server with docker compose. This is useful if you want to deploy Argilla locally, and/or have full control over the configuration the server, database, and search engine (Elasticsearch).

First, you need to install docker on your machine and make sure you can run docker compose.

Then, create a folder (you can modify the folder name):

mkdir argilla && cd argilla\n

Download docker-compose.yaml:

wget -O docker-compose.yaml https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml\n

or using curl:

curl https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml -o docker-compose.yaml\n

Run to deploy the server on http://localhost:6900:

docker compose up -d\n

Once is completed, go to this URL with your browser: http://localhost:6900 and you should see the Argilla login page.

If it's not available, check the logs:

docker compose logs -f\n

Most of the deployment issues are related to ElasticSearch. Join Hugging Face Discord's server and ask for support on the Argilla channel.

"},{"location":"getting_started/quickstart/","title":"Quickstart","text":"

Argilla is a free, open-source, self-hosted tool. This means you need to deploy its UI to start using it. There is two main ways to deploy Argilla:

Deploy on the Hugging Face Hub

The recommended choice to get started. You can get up and running in under 5 minutes and don't need to maintain a server or run any commands.

If you're just getting started with Argilla, click the deploy button below:

You can use the default values following these steps:

  • Leave the default Space owner (your personal account)
  • Leave USERNAME and PASSWORD secrets empty since you'll sign in with your HF user as the Argilla Space owner.
  • Click create Space to launch Argilla \ud83d\ude80.
  • Once you see the Argilla UI, go to the Sign in into the Argilla UI section. If you see the Building message for longer than 2-3 min refresh the page.

Persistent storage SMALL

Not setting persistent storage to SMALL means that you will loose your data when the Space restarts. Spaces get restarted due to maintainance, inactivity, and every time you change your Spaces settings. If you want to use the Space just for testing you can use FREE temporarily.

If you want to deploy Argilla within a Hugging Face organization, setup a more stable Space, or understand the settings, check out the HF Spaces settings guide.

Deploy with Docker

If you want to run Argilla locally on your machine or a server, or tune the server configuration, choose this option. To use this option, check this guide.

"},{"location":"getting_started/quickstart/#sign-in-into-the-argilla-ui","title":"Sign in into the Argilla UI","text":"

If everything went well, you should see the Argilla sign in page that looks like this:

Building errors

If you get a build error, sometimes restarting the Space from the Settings page works, otherwise check the HF Spaces settings guide.

In the sign in page:

  1. Click on Sign in with Hugging Face
  2. Authorize the application and you will be logged in into Argilla as an owner.

Unauthorized error

Sometimes, after authorizing you'll see an unauthorized error, and get redirected to the sign in page. Typically, clicking the Sign in button solves the issue.

Congrats! Your Argilla server is ready to start your first project using the Python SDK. You now have full rights to create datasets. Follow the instructions in the home page, or keep reading this guide if you want a more detailed explanation.

"},{"location":"getting_started/quickstart/#install-the-python-sdk","title":"Install the Python SDK","text":"

To manage workspaces and datasets in Argilla, you need to use the Argilla Python SDK. You can install it with pip as follows:

pip install argilla\n
"},{"location":"getting_started/quickstart/#create-your-first-dataset","title":"Create your first dataset","text":"

For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab.

To start interacting with your Argilla server, you need to create a instantiate a client with an API key and API URL:

  • The <api_key> is in the My Settings page of your Argilla Space.

  • The <api_url> is the URL shown in your browser if it ends with *.hf.space.

import argilla as rg\n\nclient = rg.Argilla(\n    api_url=\"<api_url>\",\n    api_key=\"<api_key>\"\n)\n

You can't find your API URL

If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, there are two options: 1. Click on the three points menu at the top of the Space, select \"Embed this Space\", and open the direct URL. 2. Use this pattern: https://[your-owner-name]-[your_space_name].hf.space.

To create a dataset with a simple text classification task, first, you need to define the dataset settings.

settings = rg.Settings(\n    guidelines=\"Classify the reviews as positive or negative.\",\n    fields=[\n        rg.TextField(\n            name=\"review\",\n            title=\"Text from the review\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"my_label\",\n            title=\"In which category does this article fit?\",\n            labels=[\"positive\", \"negative\"],\n        )\n    ],\n)\n

Now you can create the dataset with these settings. Publish the dataset to make it available in the UI and add the records.

About workspaces

Workspaces in Argilla group datasets and user access rights. The workspace parameter is optional in this case. If you don't specify it, the dataset will be created in the default workspace argilla.

By default, this workspace will be visible to users joining with the Sign in with Hugging Face button. You can create other workspaces and decide to grant access to users either with the SDK or the changing the OAuth configuration.

dataset = rg.Dataset(\n    name=f\"my_first_dataset\",\n    settings=settings,\n    client=client,\n    #workspace=\"argilla\"\n)\ndataset.create()\n

Now you can add records to your dataset. We will use the IMDB dataset from the Hugging Face Datasets library as an example. The mapping parameter indicates which keys/columns in the source dataset correspond to the Argilla dataset fields.

from datasets import load_dataset\n\ndata = load_dataset(\"imdb\", split=\"train[:100]\").to_list()\n\ndataset.records.log(records=data, mapping={\"text\": \"review\"})\n

\ud83c\udf89 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records.

"},{"location":"getting_started/quickstart/#next-steps","title":"Next steps","text":"
  • To learn how to create your datasets, workspace, and manage users, check the how-to guides.

  • To learn Argilla with hands-on examples, check the Tutorials section.

  • To further configure your Argilla Space, check the Hugging Face Spaces settings guide.

"},{"location":"how_to_guides/","title":"How-to guides","text":"

These guides provide step-by-step instructions for common scenarios, including detailed explanations and code samples. They are divided into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Argilla, while the advanced guides will help you explore more advanced features.

"},{"location":"how_to_guides/#basic","title":"Basic","text":"
  • Manage users and credentials

    Learn what they are and how to manage (create, read and delete) Users in Argilla.

    How-to guide

  • Manage workspaces

    Learn what they are and how to manage (create, read and delete) Workspaces in Argilla.

    How-to guide

  • Create, update, and delete datasets

    Learn what they are and how to manage (create, read and delete) Datasets and customize them using the Settings for Fields, Questions, Metadata and Vectors.

    How-to guide

  • Add, update, and delete records

    Learn what they are and how to add, update and delete the values for a Record, which are made up of Metadata, Vectors, Suggestions and Responses.

    How-to guide

  • Distribute the annotation

    Learn how to use Argilla's automatic TaskDistribution to annotate as a team efficiently.

    How-to guide

  • Annotate a dataset

    Learn how to use the Argilla UI to navigate Datasets and submit Responses.

    How-to guide

  • Query and filter a dataset

    Learn how to query and filter a Dataset.

    How-to guide

  • Import and export datasets and records

    Learn how to export your Dataset or its Records to Python, your local disk, or the Hugging Face Hub.

    How-to guide

"},{"location":"how_to_guides/#advanced","title":"Advanced","text":"
  • Custom fields with layout templates

    Learn how to create CustomFields with HTML, CSS and JavaScript templates.

    How-to guide

  • Use Markdown to format rich content

    Learn how to use Markdown and HTML in TextField to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

    How-to guide

  • Migrate to Argilla V2

    Learn how to migrate Users, Workspaces and Datasets from Argilla V1 to V2.

    How-to guide

"},{"location":"how_to_guides/annotate/","title":"Annotate your dataset","text":"

To experience the UI features firsthand, you can take a look at the Demo \u2197.

Argilla UI offers many functions to help you manage your annotation workflow, aiming to provide the most flexible approach to fit the wide variety of use cases handled by the community.

"},{"location":"how_to_guides/annotate/#annotation-interface-overview","title":"Annotation interface overview","text":""},{"location":"how_to_guides/annotate/#flexible-layout","title":"Flexible layout","text":"

The UI is responsive with two columns for larger devices and one column for smaller devices. This enables you to annotate data using your mobile phone for simple datasets (i.e., not very long text and 1-2 questions) or resize your screen to get a more compact UI.

HeaderLeft paneRight paneLeft bottom panelRight bottom panel

At the right side of the navigation breadcrumb, you can customize the dataset settings and edit your profile.

This area displays the control panel on the top. The control panel is used for performing keyword-based search, applying filters, and sorting the results.

Below the control panel, the record card(s) are displayed one by one (Focus view) or in a vertical list (Bulk view).

This is where you annotate your dataset. Simply fill it out as a form, then choose to Submit, Save as Draft, or Discard.

This expandable area displays the annotation guidelines. The annotation guidelines can be edited by owner and admin roles in the dataset settings.

This expandable area displays your annotation progress.

"},{"location":"how_to_guides/annotate/#shortcuts","title":"Shortcuts","text":"

The Argilla UI includes a range of shortcuts. For the main actions (submit, discard, save as draft and selecting labels) the keys are showed in the corresponding button.

To learn how to move from one question to another or between records using the keyboard, take a look at the table below.

Shortcuts provide a smoother annotation experience, especially with datasets using a single question (Label, MultiLabel, Rating, or Ranking).

Available shortcuts Action Keys Activate form \u21e5 Tab Move between questions \u2193 Down arrow\u00a0or\u00a0\u2191 Up arrow Select and unselect label 1,\u00a02,\u00a03 Move between labels or ranking options \u21e5 Tab\u00a0or\u00a0\u21e7 Shift\u00a0\u21e5 Tab Select rating and rank 1,\u00a02,\u00a03 Fit span to character selection Hold\u00a0\u21e7 Shift Activate text area \u21e7 Shift\u00a0\u21b5 Enter Exit text area Esc Discard \u232b Backspace Save draft (Mac os) \u2318 Cmd\u00a0S Save draft (Other) Ctrl\u00a0S Submit \u21b5 Enter Move between pages \u2192 Right arrow\u00a0or\u00a0\u2190 Left arrow"},{"location":"how_to_guides/annotate/#view-by-status","title":"View by status","text":"

The view selector is set by default on Pending.

If you are starting an annotation effort, all the records are initially kept in the Pending view. Once you start annotating, the records will move to the other queues: Draft, Submitted, Discarded.

  • Pending: The records without a response.
  • Draft: The records with partial responses. They can be submitted or discarded later. You can\u2019t move them back to the pending queue.
  • Discarded: The records may or may not have responses. They can be edited but you can\u2019t move them back to the pending queue.
  • Submitted: The records have been fully annotated and have already been submitted. You can remove them from this queue and send them to the draft or discarded queues, but never back to the pending queue.

Note

If you are working as part of a team, the number of records in your Pending queue may change as other members of the team submit responses and those records get completed.

Tip

If you are working as part of a team, the records in the draft queue that have been completed by other team members will show a check mark to indicate that there is no need to provide a response.

"},{"location":"how_to_guides/annotate/#suggestions","title":"Suggestions","text":"

If your dataset includes model predictions, you will see them represented by a sparkle icon \u2728 in the label or value button. We call them \u201cSuggestions\u201d and they appear in the form as pre-filled responses. If confidence scores have been included by the dataset admin, they will be shown alongside with the label. Additionally, admins can choose to always show suggested labels at the beginning of the list. This can be configured from the dataset settings.

If you agree with the suggestions, you just need to click on the Submit button, and they will be considered as your response. If the suggestion is incorrect, you can modify it and submit your final response.

"},{"location":"how_to_guides/annotate/#focus-view","title":"Focus view","text":"

This is the default view to annotate your dataset linearly, displaying one record after another.

Tip

You should use this view if you have a large number of required questions or need a strong focus on the record content to be labelled. This is also the recommended view for annotating a dataset sample to avoid potential biases introduced by using filters, search, sorting and bulk labelling.

Once you submit your first response, the next record will appear automatically. To see again your submitted response, just click on Prev.

Navigating through the records

To navigate through the records, you can use the\u00a0Prev, shown as\u00a0<, and\u00a0Next,\u00a0> buttons on top of the record card.

Each time the page is fully refreshed, the records with modified statuses (Pending to Discarded, Pending to Save as Draft, Pending to Submitted) are sent to the corresponding queue. The control panel displays the status selector, which is set to Pending by default.

"},{"location":"how_to_guides/annotate/#bulk-view","title":"Bulk view","text":"

The bulk view is designed to speed up the annotation and get a quick overview of the whole dataset.

The bulk view displays the records in a vertical list. Once this view is active, some functions from the control panel will activate to optimize the view. You can define the number of records to display by page between 10, 25, 50, 100 and whether records are shown with a fixed (Collapse records) or their natural height (Expand records).

Tip

You should use this to quickly explore a dataset. This view is also recommended if you have a good understanding of the domain and want to apply your knowledge based on things like similarity and keyword search, filters, and suggestion score thresholds. For a datasets with a large number of required questions or very long fields, the focus view would be more suitable.

With multiple questions, think about using the bulk view to annotate massively one question. Then, you can complete the annotation per record from the draft queue.

Note

Please note that suggestions are not shown in bulk view (except for Spans) and that you will need to save as a draft when you are not providing responses to all required questions.

"},{"location":"how_to_guides/annotate/#annotation-progress","title":"Annotation progress","text":"

You can track the progress of an annotation task in the progress bar shown in the dataset list and in the progress panel inside the dataset. This bar shows the number of records that have been completed (i.e., those that have the minimum number of submitted responses) and those left to be completed.

You can also track your own progress in real time expanding the right-bottom panel inside the dataset page. There you can see the number of records for which you have Pending,\u00a0Draft,\u00a0Submitted\u00a0and\u00a0Discarded responses.

Note

You can also explore the dataset progress from the SDK. Check the Track your team's progress to know more about it.

"},{"location":"how_to_guides/annotate/#use-search-filters-and-sort","title":"Use search, filters, and sort","text":"

The UI offers various features designed for data exploration and understanding. Combining these features with bulk labelling can save you and your team hours of time.

Tip

You should use this when you are familiar with your data and have large volumes to annotate based on verified beliefs and experience.

"},{"location":"how_to_guides/annotate/#search","title":"Search","text":"

From the control panel at the top of the left pane, you can search by keyword across the entire dataset. If you have more than one field in your records, you may specify if the search is to be performed \u201cAll\u201d fields or on a specific one. Matched results are highlighted in color.

Note

If you introduce more than one keyword, the search will return results where all keywords have a match.

Tip

For more advanced searches, take a look at the advanced queries DSL.

"},{"location":"how_to_guides/annotate/#order-by-record-semantic-similarity","title":"Order by record semantic similarity","text":"

You can retrieve records based on their similarity to another record if vectors have been added to the dataset.

Note

Check these guides to know how to add vectors to your\u00a0dataset and\u00a0records.

To use the search by semantic similarity function, click on Find similar within the record you wish to use as a reference. If multiple vectors are available, select the desired vector. You can also choose whether to retrieve the most or least similar records.

The retrieved records are then ordered by similarity, with the similarity score displayed on each record card.

While the semantic search is active, you can update the selected vector or adjust the order of similarity, and specify the number of desired results.

To cancel the search, click on the cross icon next to the reference record.

"},{"location":"how_to_guides/annotate/#filter-and-sort-by-metadata-responses-and-suggestions","title":"Filter and sort by metadata, responses, and suggestions","text":""},{"location":"how_to_guides/annotate/#filter","title":"Filter","text":"

If the dataset contains metadata, responses and suggestions, click on\u00a0Filter in the control panel to display the available filters. You can select multiple filters and combine them.

Note

Record info including metadata is visible from the ellipsis menu in the record card.

From the Metadata dropdown, type and select the property. You can set a range for integer and float properties, and select specific values for term metadata.

Note

Note that if a metadata property was set to visible_for_annotators=False this metadata property will only appear in the metadata filter for users with the admin or owner role.

From the Responses dropdown, type and select the question. You can set a range for rating questions and select specific values for label, multi-label, and span questions.

Note

The text and ranking questions are not available for filtering.

From the Suggestions dropdown, filter the suggestions by\u00a0Suggestion values,\u00a0Score\u00a0, or\u00a0Agent.\u00a0

"},{"location":"how_to_guides/annotate/#sort","title":"Sort","text":"

You can sort your records according to one or several attributes.

The insertion time and last update are general to all records.

The suggestion scores, response, and suggestion values for rating questions and metadata properties are available only when they were provided.

"},{"location":"how_to_guides/custom_fields/","title":"Custom fields with layout templates","text":"

This guide demonstrates how to create custom fields in Argilla using HTML, CSS, and JavaScript templates.

Main Class

rg.CustomField(\n    name=\"custom\",\n    title=\"Custom\",\n    template=\"<div>{{record.fields.custom.key}}</div>\",\n    advanced_mode=False,\n    required=True,\n    description=\"Field description\",\n)\n

Check the CustomField - Python Reference to see the attributes, arguments, and methods of the CustomField class in detail.

"},{"location":"how_to_guides/custom_fields/#understanding-the-record-object","title":"Understanding the Record Object","text":"

The record object is the main JavaScript object that contains all the information about the Argilla record object in the UI, like fields, metadata, etc. Your template can use this object to display record information within the custom field. You can for example access the fields of the record by navigating to record.fields.<field_name> and this generally works the same for metadata, responses, etc.

"},{"location":"how_to_guides/custom_fields/#using-handlebars-in-your-template","title":"Using Handlebars in your template","text":"

By default, custom fields will use the handlebars syntax engine to render templates with record information. This engine will convert the content inside the brackets {{}} to the values of record's field's object that you reference within your template. As described in the Understanding the Record Object section, you can access the fields of the record by navigating to {{record.fields.<field_name>}}. For more complex use cases, handlebars has various expressions, partials, and helpers that you can use to render your data. You can deactivate the handlebars engine with the advanced_mode=True parameter in CustomField, then you will need to define custom javascript to access the record attributes, like described in the Advanced Mode section.

"},{"location":"how_to_guides/custom_fields/#usage-example","title":"Usage example","text":"

Because of the handlebars syntax engine, we only need to pass the HTML and potentially some CSS in between the <style> tags.

css_template = \"\"\"\n<style>\n#container {\n    display: flex;\n    gap: 10px;\n}\n.column {\n    flex: 1;\n}\n</style>\n\"\"\" # (1)\n\nhtml_template = \"\"\"\n<div id=\"container\">\n    <div class=\"column\">\n        <h3>Original</h3>\n        <img src=\"{{record.fields.image.original}}\" />\n    </div>\n    <div class=\"column\">\n        <h3>Revision</h3>\n        <img src=\"{{record.fields.image.revision}}\" />\n    </div>\n</div>\n\"\"\" # (2)\n
  1. This is a CSS template, which ensures that the container and columns are styled.
  2. This is an HTML template, which creates a container with two columns and injects the value corresponding to the key of the image field into it.

We can now pass these templates to the CustomField class.

import argilla as rg\n\ncustom_field = rg.CustomField(\n    name=\"image\",\n    template=css_template + html_template,\n)\n\nsettings = rg.Settings(\n    fields=[custom_field],\n    questions=[rg.TextQuestion(name=\"response\")],\n)\n\ndataset = rg.Dataset(\n    name=\"custom_field_dataset\",\n    settings=settings,\n).create()\n\ndataset.records.log([\n    rg.Record(\n        fields={\n            \"image\": {\n                \"original\": \"https://argilla.io/brand-assets/argilla/argilla-logo-color-black.png\",\n                \"revision\": \"https://argilla.io/brand-assets/argilla/argilla-logo-black.png\",\n            }\n        }\n    )]\n)\n

The result will be the following:

"},{"location":"how_to_guides/custom_fields/#example-gallery","title":"Example Gallery","text":"Metadata in a table

You can make it easier to read metadata by displaying it in a table. This uses handlebars to iterate over the metadata object and display each key-value pair in a row.

template = \"\"\"\n<style>\n    .container {\n        border: 1px solid #ddd;\n        font-family: sans-serif;\n    }\n    .row {\n        display: flex;\n        border-bottom: 1px solid #ddd;\n    }\n    .row:last-child {\n        border-bottom: none;\n    }\n    .column {\n        flex: 1;\n        padding: 8px;\n    }\n    .column:first-child {\n        border-right: 1px solid #ddd;\n    }\n</style>\n<div class=\"container\">\n    <div class=\"header\">\n        <div class=\"column\">Metadata</div>\n        <div class=\"column\">Value</div>\n    </div>\n    {{#each record.metadata}}\n    <div class=\"row\">\n        <div class=\"column\">{{@key}}</div>\n        <div class=\"column\">{{this}}</div>\n    </div>\n    {{/each}}\n</div>\n\"\"\"\nrecord = rg.Record(\n    fields={\"text\": \"hello\"},\n    metadata={\n        \"name\": \"John Doe\",\n        \"age\": 25,\n    }\n)\n

JSON viewer

The value of a custom field is a dictionary in Python and a JavaScript object in the browser. You can render this object as a JSON string using the json helper. This is implemented in Argilla's frontend for convenience. If you want to learn more about handlebars helpers, you can check the handlebars documentation.

template = \"{{ json record.fields.user_profile }}\"\n\nrecord = rg.Record(\n    fields={\n        \"user_profile\": {\n            \"name\": \"John Doe\",\n            \"age\": 30,\n            \"address\": \"123 Main St\",\n            \"email\": \"john.doe@hooli.com\",\n        }\n    },\n)\n
"},{"location":"how_to_guides/custom_fields/#advanced-mode","title":"Advanced Mode","text":"

When advanced_mode=True, you can use the template argument to pass a full HTML page. This allows for more complex customizations, including the use of JavaScript. The record object will be available in the global scope, so you can access it in your JavaScript code as described in the Understanding the Record Object section.

"},{"location":"how_to_guides/custom_fields/#usage-example_1","title":"Usage example","text":"

Let's reproduce example from the Without advanced mode section but this time we will insert the handlebars syntax engine into the template ourselves.

template = \"\"\"\n<div id=\"content\"></div>\n<script id=\"template\" type=\"text/x-handlebars-template\">\n    <style>\n    #container {\n        display: flex;\n        gap: 10px;\n    }\n    .column {\n        flex: 1;\n    }\n    </style>\n    <div id=\"container\">\n        <div class=\"column\">\n            <h3>Original</h3>\n            <img src=\"{{record.fields.image.original}}\" />\n        </div>\n        <div class=\"column\">\n            <h3>Revision</h3>\n            <img src=\"{{record.fields.image.revision}}\" />\n        </div>\n    </div>\n</script>\n\"\"\" # (1)\n\nscript = \"\"\"\n<script src=\"https://cdn.jsdelivr.net/npm/handlebars@latest/dist/handlebars.js\"></script>\n<script>\n    const template = document.getElementById(\"template\").innerHTML;\n    const compiledTemplate = Handlebars.compile(template);\n    const html = compiledTemplate({ record });\n    document.getElementById(\"content\").innerHTML = html;\n</script>\n\"\"\" # (2)\n
  1. This is a JavaScript template script. We set id to template to use it later in our JavaScript code and type to text/x-handlebars-template to indicate that this is a Handlebars template. Note that we also added a div with id to content to render the template into.
  2. This is a JavaScript template script. We load the Handlebars library and then use it to compile the template and render the record. Eventually, we render the result into the div with id to content.

We can now pass these templates to the CustomField class, ensuring that the advanced_mode is set to True.

import argilla as rg\n\ncustom_field = rg.CustomField(\n    name=\"image\",\n    template=template + script,\n    advanced_mode=True\n)\n

Besides the new CustomField code above, reusing the same approach as in the Using handlebars in your template section, will create a dataset and log a record to it, yielding the same result.

"},{"location":"how_to_guides/custom_fields/#example-gallery_1","title":"Example Gallery","text":"3D object viewer

We will now use native javascript and three.js to create a 3D object viewer. We will then use the record object directly to insert URLs from the record's fields.

template = \"\"\"\n<script src=\"https://cdnjs.cloudflare.com/ajax/libs/three.js/r128/three.min.js\"></script>\n<script src=\"https://cdn.jsdelivr.net/npm/three@0.128.0/examples/js/loaders/GLTFLoader.js\"></script>\n<script src=\"https://cdn.jsdelivr.net/npm/three@0.128.0/examples/js/controls/OrbitControls.js\"></script>\n\n\n<div style=\"display: flex;\">\n    <div>\n        <h3>Option A</h3>\n        <canvas id=\"canvas1\" width=\"400\" height=\"400\"></canvas>\n    </div>\n    <div>\n        <h3>Option B</h3>\n        <canvas id=\"canvas2\" width=\"400\" height=\"400\"></canvas>\n    </div>\n</div>\n\n<script>\n    function init(canvasId, modelUrl) {\n    let scene, camera, renderer, controls;\n\n    const canvas = document.getElementById(canvasId);\n    scene = new THREE.Scene();\n    camera = new THREE.PerspectiveCamera(75, 1, 0.1, 1000);\n    renderer = new THREE.WebGLRenderer({ canvas, alpha: true });\n\n    renderer.setSize(canvas.clientWidth, canvas.clientHeight);\n\n    const directionalLight = new THREE.DirectionalLight(0xffffff, 1);\n    directionalLight.position.set(2, 2, 5);\n    scene.add(directionalLight);\n\n    const ambientLight = new THREE.AmbientLight(0x404040, 7);\n    scene.add(ambientLight);\n\n    controls = new THREE.OrbitControls(camera, renderer.domElement);\n    controls.maxPolarAngle = Math.PI / 2;\n\n    const loader = new THREE.GLTFLoader();\n    loader.load(\n        modelUrl,\n        function (gltf) {\n        const model = gltf.scene;\n        scene.add(model);\n        model.position.set(0, 0, 0);\n\n        const box = new THREE.Box3().setFromObject(model);\n        const center = box.getCenter(new THREE.Vector3());\n        model.position.sub(center);\n        camera.position.set(center.x, center.y, center.z + 1.2);\n\n        animate();\n        },\n        undefined,\n        function (error) {\n        console.error(error);\n        }\n    );\n\n    function animate() {\n        requestAnimationFrame(animate);\n        controls.update();\n        renderer.render(scene, camera);\n    }\n    }\n\n    init(\"canvas1\", record.fields.object.option_a);\n    init(\"canvas2\", record.fields.object.option_b);\n</script>\n\n\"\"\"\n

Next, we will create a record with two URLs to 3D objects from the 3d-arena dataset.

record = rg.Record(\n    fields={\n        \"object\": {\n            \"option_a\": \"https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/outputs/Strawb3rry/a_bookshelf_with_ten_books_stacked_vertically.glb\",\n            \"option_b\": \"https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/outputs/MeshFormer/a_bookshelf_with_ten_books_stacked_vertically.glb\",\n        }\n    }\n)\n

"},{"location":"how_to_guides/custom_fields/#updating-templates","title":"Updating templates","text":"

As described in the dataset guide, you can update certain setting attributes for a published dataset. This includes the custom fields templates, which is a usefule feature when you want to iterate on the template of a custom field without the need to create a new dataset. The following example shows how to update the template of a custom field.

dataset.settings.fields[\"custom\"].template = \"<new-template>\"\ndataset.update()\n
"},{"location":"how_to_guides/dataset/","title":"Dataset management","text":"

This guide provides an overview of datasets, explaining the basics of how to set them up and manage them in Argilla.

A dataset is a collection of records that you can configure for labelers to provide feedback using the UI. Depending on the specific requirements of your task, you may need various types of feedback. You can customize the dataset to include different kinds of questions, so the first step will be to define the aim of your project and the kind of data and feedback you will need. With this information, you can start configuring a dataset by defining fields, questions, metadata, vectors, and guidelines through settings.

Question: Who can manage datasets?

Only users with the owner role can manage (create, retrieve, update and delete) all the datasets.

The users with the admin role can manage (create, retrieve, update and delete) the datasets in the workspaces they have access to.

Main Classes

rg.Datasetrg.Settings
rg.Dataset(\n    name=\"name\",\n    workspace=\"workspace\",\n    settings=settings,\n    client=client\n)\n

Check the Dataset - Python Reference to see the attributes, arguments, and methods of the Dataset class in detail.

rg.Settings(\n    fields=[rg.TextField(name=\"text\")],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        )\n    ],\n    metadata=[rg.TermsMetadataProperty(name=\"metadata\")],\n    vectors=[rg.VectorField(name=\"vector\", dimensions=10)],\n    guidelines=\"guidelines\",\n    allow_extra_metadata=True,\n    distribution=rg.TaskDistribution(min_submitted=2),\n)\n

Check the Settings - Python Reference to see the attributes, arguments, and methods of the Settings class in detail.

"},{"location":"how_to_guides/dataset/#create-a-dataset","title":"Create a dataset","text":"

To create a dataset, you can define it in the Dataset class and then call the create method that will send the dataset to the server so that it can be visualized in the UI. If the dataset does not appear in the UI, you may need to click the refresh button to update the view. For further configuration of the dataset, you can refer to the settings section.

Info

If you have deployed Argilla with Hugging Face Spaces and HF Sign in, you can use argilla as a workspace name. Otherwise, you might need to create a workspace following this guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nsettings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    settings=settings,\n)\n\ndataset.create()\n

The created dataset will be empty, to add records go to this how-to guide.

Accessing attributes

Access the attributes of a dataset by calling them directly on the dataset object. For example, dataset.id, dataset.name or dataset.settings. You can similarly access the fields, questions, metadata, vectors and guidelines. For instance, dataset.fields or dataset.questions.

"},{"location":"how_to_guides/dataset/#create-multiple-datasets-with-the-same-settings","title":"Create multiple datasets with the same settings","text":"

To create multiple datasets with the same settings, define the settings once and pass it to each dataset.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nsettings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[rg.TextField(name=\"text\", use_markdown=True)],\n    questions=[\n        rg.LabelQuestion(name=\"label\", labels=[\"label_1\", \"label_2\", \"label_3\"])\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3),\n)\n\ndataset1 = rg.Dataset(name=\"my_dataset_1\", settings=settings)\ndataset2 = rg.Dataset(name=\"my_dataset_2\", settings=settings)\n\n# Create the datasets on the server\ndataset1.create()\ndataset2.create()\n
"},{"location":"how_to_guides/dataset/#create-a-dataset-from-an-existing-dataset","title":"Create a dataset from an existing dataset","text":"

To create a new dataset from an existing dataset, get the settings from the existing dataset and pass them to the new dataset.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nexisting_dataset = client.datasets(\"my_dataset\")\n\nnew_dataset = rg.Dataset(name=\"my_dataset_copy\", settings=existing_dataset.settings)\n\nnew_dataset.create()\n

Info

You can also copy the records from the original dataset to the new one:

records = list(existing_dataset.records)\nnew_dataset.records.log(records)\n
"},{"location":"how_to_guides/dataset/#define-dataset-settings","title":"Define dataset settings","text":"

Tip

Instead of defining your own custom settings, you can use some of our pre-built templates for text classification, ranking and rating. Learn more here.

"},{"location":"how_to_guides/dataset/#fields","title":"Fields","text":"

The fields in a dataset consist of one or more data items requiring annotation. Currently, Argilla supports plain text and markdown through the TextField, images through the ImageField, chat formatted data through the ChatField and full custom templates through our CustomField.

Note

The order of the fields in the UI follows the order in which these are added to the fields attribute in the Python SDK.

Check the Field - Python Reference to see the field classes in detail.

TextImageChatCustom

rg.TextField(\n    name=\"text\",\n    title=\"Text\",\n    use_markdown=False,\n    required=True,\n    description=\"Field description\",\n)\n

rg.ImageField(\n    name=\"image\",\n    title=\"Image\",\n    required=True,\n    description=\"Field description\",\n)\n

rg.ChatField(\n    name=\"chat\",\n    title=\"Chat\",\n    use_markdown=True,\n    required=True,\n    description=\"Field description\",\n)\n

A CustomField allows you to use a custom template for the field. This is useful if you want to use a custom UI for the field. You can use the template argument to pass a string that will be rendered as the field's UI.

By default, advanced_mode=False, which will use a brackets syntax engine for the templates. This engine converts {{record.fields.field.key}} to the values of record's field's object. You can also use advanced_mode=True, which deactivates the above brackets syntax engine and allows you to add custom javascript to your template to render the field.

rg.CustomField(\n    name=\"custom\",\n    title=\"Custom\",\n    template=\"<div>{{record.fields.custom.key}}</div>\",\n    advanced_mode=False,\n    required=True,\n    description=\"Field description\",\n)\n

Tip

To learn more about how to create custom fields with HTML and CSS templates, check this how-to guide.

"},{"location":"how_to_guides/dataset/#questions","title":"Questions","text":"

To collect feedback for your dataset, you need to formulate questions that annotators will be asked to answer.

Check the Questions - Python Reference to see the question classes in detail.

LabelMulti-labelRankingRatingSpanText

A LabelQuestion asks annotators to choose a unique label from a list of options. This type is useful for text classification tasks. In the UI, they will have a rounded shape.

rg.LabelQuestion(\n    name=\"label\",\n    labels={\"YES\": \"Yes\", \"NO\": \"No\"}, # or [\"YES\", \"NO\"]\n    title=\"Is the response relevant for the given prompt?\",\n    description=\"Select the one that applies.\",\n    required=True,\n    visible_labels=10\n)\n

A MultiLabelQuestion asks annotators to choose all applicable labels from a list of options. This type is useful for multi-label text classification tasks. In the UI, they will have a squared shape.

rg.MultiLabelQuestion(\n    name=\"multi_label\",\n    labels={\n        \"hate\": \"Hate Speech\",\n        \"sexual\": \"Sexual content\",\n        \"violent\": \"Violent content\",\n        \"pii\": \"Personal information\",\n        \"untruthful\": \"Untruthful info\",\n        \"not_english\": \"Not English\",\n        \"inappropriate\": \"Inappropriate content\"\n    }, # or [\"hate\", \"sexual\", \"violent\", \"pii\", \"untruthful\", \"not_english\", \"inappropriate\"]\n    title=\"Does the response include any of the following?\",\n    description=\"Select all that apply.\",\n    required=True,\n    visible_labels=10,\n    labels_order=\"natural\"\n)\n

A RankingQuestion asks annotators to order a list of options. It is useful to gather information on the preference or relevance of a set of options.

rg.RankingQuestion(\n    name=\"ranking\",\n    values={\n        \"reply-1\": \"Reply 1\",\n        \"reply-2\": \"Reply 2\",\n        \"reply-3\": \"Reply 3\"\n    } # or [\"reply-1\", \"reply-2\", \"reply-3\"]\n    title=\"Order replies based on your preference\",\n    description=\"1 = best, 3 = worst. Ties are allowed.\",\n    required=True,\n)\n

A RatingQuestion asks annotators to select one option from a list of integer values. This type is useful for collecting numerical scores.

rg.RatingQuestion(\n    name=\"rating\",\n    values=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n    title=\"How satisfied are you with the response?\",\n    description=\"1 = very unsatisfied, 10 = very satisfied\",\n    required=True,\n)\n

A SpanQuestion asks annotators to select a portion of the text of a specific field and apply a label to it. This type of question is useful for named entity recognition or information extraction tasks.

rg.SpanQuestion(\n    name=\"span\",\n    field=\"text\",\n    labels={\n        \"PERSON\": \"Person\",\n        \"ORG\": \"Organization\",\n        \"LOC\": \"Location\",\n        \"MISC\": \"Miscellaneous\"\n    }, # or [\"PERSON\", \"ORG\", \"LOC\", \"MISC\"]\n    title=\"Select the entities in the text\",\n    description=\"Select the entities in the text\",\n    required=True,\n    allow_overlapping=False,\n    visible_labels=10\n)\n

A TextQuestion offers to annotators a free-text area where they can enter any text. This type is useful for collecting natural language data, such as corrections or explanations.

rg.TextQuestion(\n    name=\"text\",\n    title=\"Please provide feedback on the response\",\n    description=\"Please provide feedback on the response\",\n    required=True,\n    use_markdown=True\n)\n

"},{"location":"how_to_guides/dataset/#metadata","title":"Metadata","text":"

Metadata properties allow you to configure the use of metadata information for the filtering and sorting features available in the UI and Python SDK.

Check the Metadata - Python Reference to see the metadata classes in detail.

TermsIntegerFloat

A TermsMetadataProperty allows to add a list of strings as metadata options.

rg.TermsMetadataProperty(\n    name=\"terms\",\n    options=[\"group-a\", \"group-b\", \"group-c\"]\n    title=\"Annotation groups\",\n    visible_for_annotators=True,\n)\n

An IntegerMetadataProperty allows to add integer values as metadata.

rg.IntegerMetadataProperty(\n    name=\"integer\",\n    title=\"length-input\",\n    min=42,\n    max=1984,\n)\n

A FloatMetadataProperty allows to add float values as metadata.

rg.FloatMetadataProperty(\n    name=\"float\",\n    title=\"Reading ease\",\n    min=-92.29914,\n    max=119.6975,\n)\n

Note

You can also set the allow_extra_metadata argument in the dataset to True to specify whether the dataset will allow metadata fields in the records other than those specified under metadata. Note that these will not be accessible from the UI for any user, only retrievable using the Python SDK.

"},{"location":"how_to_guides/dataset/#vectors","title":"Vectors","text":"

To use the similarity search in the UI and the Python SDK, you will need to configure vectors using the VectorField class.

Check the Vector - Python Reference to see the VectorField class in detail.

rg.VectorField(\n    name=\"my_vector\",\n    title=\"My Vector\",\n    dimensions=768\n)\n

"},{"location":"how_to_guides/dataset/#guidelines","title":"Guidelines","text":"

Once you have decided on the data to show and the questions to ask, it's important to provide clear guidelines to the annotators. These guidelines help them understand the task and answer the questions consistently. You can provide guidelines in two ways:

  • In the dataset guidelines: this is added as an argument when you create your dataset in the Python SDK. They will appear in the annotation interface.

guidelines = \"In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. [...]\"\n

  • As question descriptions: these are added as an argument when you create questions in the Python SDK. This text will appear in a tooltip next to the question in the UI.

It is good practice to use at least the dataset guidelines if not both methods. Question descriptions should be short and provide context to a specific question. They can be a summary of the guidelines to that question, but often that is not sufficient to align the whole annotation team. In the guidelines, you can include a description of the project, details on how to answer each question with examples, instructions on when to discard a record, etc.

Tip

If you want further guidance on good practices for guidelines during the project development, check our blog post.

"},{"location":"how_to_guides/dataset/#distribution","title":"Distribution","text":"

When working as a team, you may want to distribute the annotation task to ensure efficiency and quality. You can use the\u00a0TaskDistribution settings to configure the number of minimum submitted responses expected for each record. Argilla will use this setting to automatically handle records in your team members' pending queues.

Check the Task Distribution - Python Reference to see the TaskDistribution class in detail.

rg.TaskDistribution(\n    min_submitted = 2\n)\n

To learn more about how to distribute the task among team members in the Distribute the annotation guide.

"},{"location":"how_to_guides/dataset/#list-datasets","title":"List datasets","text":"

You can list all the datasets available in a workspace using the datasets attribute of the Workspace class. You can also use len(workspace.datasets) to get the number of datasets in a workspace.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\ndatasets = workspace.datasets\n\nfor dataset in datasets:\n    print(dataset)\n

When you list datasets, dataset settings are not preloaded, since this can introduce extra requests to the server. If you want to work with settings when listing datasets, you need to load them:

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nfor dataset in client.datasets:\n    dataset.settings.get() # this will get the dataset settings from the server\n    print(dataset.settings)\n

Notebooks

When using a notebook, executing client.datasets will display a table with the nameof the existing datasets, the id, workspace_id to which they belong, and the last update as updated_at. .

"},{"location":"how_to_guides/dataset/#retrieve-a-dataset","title":"Retrieve a dataset","text":"

You can retrieve a dataset by calling the datasets method on the Argilla class and passing the name or id of the dataset as an argument. If the dataset does not exist, a warning message will be raised and None will be returned.

By nameBy id

By default, this method attempts to retrieve the dataset from the first workspace. If the dataset is in a different workspace, you must specify either the workspace or workspace name as an argument.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\n# Retrieve the dataset from the first workspace\nretrieved_dataset = client.datasets(name=\"my_dataset\")\n\n# Retrieve the dataset from the specified workspace\nretrieved_dataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(id=\"<uuid-or-uuid-string>\")\n
"},{"location":"how_to_guides/dataset/#check-dataset-existence","title":"Check dataset existence","text":"

You can check if a dataset exists. The client.datasets method will return None if the dataset was not found.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\nif dataset is not None:\n    pass\n
"},{"location":"how_to_guides/dataset/#update-a-dataset","title":"Update a dataset","text":"

Once a dataset is published, there are limited things you can update. Here is a summary of the attributes you can change for each setting:

FieldsQuestionsMetadataVectorsGuidelinesDistribution Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Required \u274c \u274c Use markdown \u2705 \u2705 Template \u2705 \u274c Attributes From SDK From UI Name \u274c \u274c Title \u274c \u2705 Description \u274c \u2705 Required \u274c \u274c Labels \u274c \u274c Values \u274c \u274c Label order \u274c \u2705 Suggestions first \u274c \u2705 Visible labels \u274c \u2705 Field \u274c \u274c Allow overlapping \u274c \u274c Use markdown \u274c \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Options \u274c \u274c Minimum value \u274c \u274c Maximum value \u274c \u274c Visible for annotators \u2705 \u2705 Allow extra metadata \u2705 \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Dimensions \u274c \u274c From SDK From UI \u2705 \u2705 Attributes From SDK From UI Minimum submitted \u2705 \u2705

To modify these attributes, you can simply set the new value of the attributes you wish to change and call the update method on the Dataset object.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.fields[\"text\"].use_markdown = True\ndataset.settings.metadata[\"my_metadata\"].visible_for_annotators = False\n\ndataset.update()\n

You can also add and delete metadata properties and vector fields using the add and delete methods.

AddDelete
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.vectors.add(rg.VectorField(name=\"my_new_vector\", dimensions=123))\ndataset.settings.metadata.add(\n    rg.TermsMetadataProperty(\n        name=\"my_new_metadata\",\n        options=[\"option_1\", \"option_2\", \"option_3\"],\n    ),\n)\ndataset.update()\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.vectors[\"my_old_vector\"].delete()\ndataset.settings.metadata[\"my_old_metadata\"].delete()\n\ndataset.update()\n
"},{"location":"how_to_guides/dataset/#delete-a-dataset","title":"Delete a dataset","text":"

You can delete an existing dataset by calling the delete method on the Dataset class.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset_to_delete = client.datasets(name=\"my_dataset\")\n\ndataset_deleted = dataset_to_delete.delete()\n
"},{"location":"how_to_guides/distribution/","title":"Distribute the annotation task among the team","text":"

This guide explains how you can use Argilla\u2019s automatic task distribution to efficiently divide the task of annotating a dataset among multiple team members.

Owners and admins can define the minimum number of submitted responses expected for each record. Argilla will use this setting to handle automatically the records that will be shown in the pending queues of all users with access to the dataset.

When a record has met the minimum number of submissions, the status of the record will change to completed, and the record will be removed from the Pending queue of all team members so they can focus on providing responses where they are most needed. The dataset\u2019s annotation task will be fully completed once all records have the completed status.

Note

The status of a record can be either completed, when it has the required number of responses with submitted status, or pending, when it doesn\u2019t meet this requirement.

Each record can have multiple responses, and each of those can have the status submitted, discarded, or draft.

Main Class

rg.TaskDistribution(\n    min_submitted = 2\n)\n

Check the Task Distribution - Python Reference to see the attributes, arguments, and methods of the TaskDistribution class in detail.

"},{"location":"how_to_guides/distribution/#configure-task-distribution-settings","title":"Configure task distribution settings","text":"

By default, Argilla will set the required minimum submitted responses to 1. This means that whenever a record has at least 1 response with the status submitted the status of the record will be completed and removed from the Pending queue of other team members.

Tip

Leave the default value of minimum submissions (1) if you are working on your own or when you don't require more than one submitted response per record.

If you wish to set a different number, you can do so through the distribution setting in your dataset settings:

settings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3)\n)\n

Learn more about configuring dataset settings in the Dataset management guide.

Tip

Increase the number of minimum subsmissions if you\u2019d like to ensure you get more than one submitted response per record. Make sure that this number is never higher than the number of members in your team. Note that the lower this number is, the faster the task will be completed.

Note

Note that some records may have more responses than expected if multiple team members submit responses on the same record simultaneously.

"},{"location":"how_to_guides/distribution/#change-task-distribution-settings","title":"Change task distribution settings","text":"

If you wish to change the minimum submitted responses required in a dataset, you can do so as long as the annotation hasn\u2019t started, i.e., the dataset has no responses for any records.

Admins and owners can change this value from the dataset settings page in the UI or from the SDK:

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.distribution.min_submitted = 4\n\ndataset.update()\n
"},{"location":"how_to_guides/distribution/#track-your-teams-progress","title":"Track your team's progress","text":"

You can check the progress of the annotation task by using the dataset.progress method. This method will return the number of records that have the status completed, pending, and the total number of records in the dataset.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\nprogress = dataset.progress()\n
{\n    \"total\": 100,\n    \"completed\": 10,\n    \"pending\": 90\n}\n

You can see also include to the progress the users distribution by setting the with_users_distribution parameter to True. This will return the number of records that have the status completed, pending, and the total number of records in the dataset, as well as the number of completed submissions per user. You can visit the Annotation Progress section for more information.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\nprogress = dataset.progress(with_users_distribution=True)\n
{\n    \"total\": 100,\n    \"completed\": 50,\n    \"pending\": 50,\n    \"users\": {\n        \"user1\": {\n           \"completed\": { \"submitted\": 10, \"draft\": 5, \"discarded\": 5},\n           \"pending\": { \"submitted\": 5, \"draft\": 10, \"discarded\": 10},\n        },\n        \"user2\": {\n           \"completed\": { \"submitted\": 20, \"draft\": 10, \"discarded\": 5},\n           \"pending\": { \"submitted\": 2, \"draft\": 25, \"discarded\": 0},\n        },\n        ...\n}\n

Note

Since the completed records can contain submissions from multiple users, the number of completed submissions per user may not match the total number of completed records.

"},{"location":"how_to_guides/import_export/","title":"Importing and exporting datasets and records","text":"

This guide provides an overview of how to import and export your dataset or its records to Python, your local disk, or the Hugging Face Hub.

In Argilla, you can import/export two main components of a dataset:

  • The dataset's complete configuration is defined in rg.Settings. This is useful if you want to share your feedback task or restore it later in Argilla.
  • The records stored in the dataset, including Metadata, Vectors, Suggestions, and Responses. This is useful if you want to use your dataset's records outside of Argilla.

Check the Dataset - Python Reference to see the attributes, arguments, and methods of the export Dataset class in detail.

Main Classes

rg.Dataset.to_hubrg.Dataset.from_hubrg.Dataset.to_diskrg.Dataset.from_diskrg.Dataset.records.to_datasets()rg.Dataset.records.to_dict()rg.Dataset.records.to_list()
rg.Dataset.to_hub(\n    repo_id=\"<my_org>/<my_dataset>\",\n    with_records=True,\n    generate_card=True\n)\n
rg.Dataset.from_hub(\n    repo_id=\"<my_org>/<my_dataset>\",\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    client=rg.Client(),\n    with_records=True\n)\n
rg.Dataset.to_disk(\n    path=\"<path-empty-directory>\",\n    with_records=True\n)\n
rg.Dataset.from_disk(\n    path=\"<path-dataset-directory>\",\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    client=rg.Client(),\n    with_records=True\n)\n
rg.Dataset.records.to_datasets()\n
rg.Dataset.records.to_dict()\n
rg.Dataset.records.to_list()\n

Check the Dataset - Python Reference to see the attributes, arguments, and methods of the export Dataset class in detail.

Check the Record - Python Reference to see the attributes, arguments, and methods of the Record class in detail.

"},{"location":"how_to_guides/import_export/#importing-and-exporting-datasets","title":"Importing and exporting datasets","text":"

First, we will go through exporting a complete dataset from Argilla. This includes the dataset's settings and records. All of these methods use the rg.Dataset.from_* and rg.Dataset.to_* methods.

"},{"location":"how_to_guides/import_export/#hugging-face-hub","title":"Hugging Face Hub","text":""},{"location":"how_to_guides/import_export/#export-to-hub","title":"Export to Hub","text":"

You can push a dataset from Argilla to the Hugging Face Hub. This is useful if you want to share your dataset with the community or version control it. You can push the dataset to the Hugging Face Hub using the rg.Dataset.to_hub method.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\ndataset.to_hub(repo_id=\"<my_org>/<my_dataset>\")\n

With or without records

The example above will push the dataset's Settings and records to the hub. If you only want to push the dataset's configuration, you can set the with_records parameter to False. This is useful if you're just interested in a specific dataset template or you want to make changes in the dataset settings and/or records.

dataset.to_hub(repo_id=\"<my_org>/<my_dataset>\", with_records=False)\n
"},{"location":"how_to_guides/import_export/#import-from-hub","title":"Import from Hub","text":"

You can pull a dataset from the Hugging Face Hub to Argilla. This is useful if you want to restore a dataset and its configuration. You can pull the dataset from the Hugging Face Hub using the rg.Dataset.from_hub method.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\")\n

The rg.Dataset.from_hub method loads the configuration and records from the dataset repo. If you only want to load records, you can pass a datasets.Dataset object to the rg.Dataset.log method. This enables you to configure your own dataset and reuse existing Hub datasets. See the guide on records for more information.

With or without records

The example above will pull the dataset's Settings and records from the hub. If you only want to pull the dataset's configuration, you can set the with_records parameter to False. This is useful if you're just interested in a specific dataset template or you want to make changes in the records.

dataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\", with_records=False)\n

You could then log the dataset's records using the load_dataset method of the datasets package and pass the dataset to the rg.Dataset.log method.

hf_dataset = load_dataset(\"<my_org>/<my_dataset>\")\ndataset.records.log(hf_dataset) # (1)\n
  1. You could also use the mapping parameter to map record field names to argilla field and question names.
"},{"location":"how_to_guides/import_export/#import-settings-from-hub","title":"Import settings from Hub","text":"

When importing datasets from the hub, Argilla will load settings from the hub in three ways:

  1. If the dataset was pushed to hub by Argilla, then the settings will be loaded from the hub via the configuration file.
  2. If the dataset was loaded by another source, then Argilla will define the settings based on the dataset's features in datasets.Features. For example, creating a TextField for a text feature or a LabelQuestion for a label class.
  3. You can pass a custom rg.Settings object to the rg.Dataset.from_hub method via the settings parameter. This will override the settings loaded from the hub.
settings = rg.Settings(\n    fields=[rg.TextField(name=\"text\")],\n    questions=[rg.TextQuestion(name=\"answer\")]\n) # (1)\n\ndataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\", settings=settings)\n
  1. The settings that you pass to the rg.Dataset.from_hub method will override the settings loaded from the hub, and need to align with the dataset being loaded.
"},{"location":"how_to_guides/import_export/#local-disk","title":"Local Disk","text":""},{"location":"how_to_guides/import_export/#export-to-disk","title":"Export to Disk","text":"

You can save a dataset from Argilla to your local disk. This is useful if you want to back up your dataset. You can use the rg.Dataset.to_disk method. We recommend you to use an empty directory.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\ndataset.to_disk(path=\"<path-empty-directory>\")\n

This will save the dataset's configuration and records to the specified path. If you only want to save the dataset's configuration, you can set the with_records parameter to False.

dataset.to_disk(path=\"<path-empty-directory>\", with_records=False)\n
"},{"location":"how_to_guides/import_export/#import-from-disk","title":"Import from Disk","text":"

You can load a dataset from your local disk to Argilla. This is useful if you want to restore a dataset's configuration. You can use the rg.Dataset.from_disk method.

import argilla as rg\n\ndataset = rg.Dataset.from_disk(path=\"<path-dataset-directory>\")\n

Directing the dataset to a name and workspace

You can also specify the name and workspace of the dataset when loading it from the disk.

dataset = rg.Dataset.from_disk(path=\"<path-dataset-directory>\", name=\"my_dataset\", workspace=\"my_workspace\")\n
"},{"location":"how_to_guides/import_export/#importing-and-exporting-records","title":"Importing and exporting records","text":"

The records alone can be exported from a dataset in Argilla. This is useful if you want to process the records in Python, export them to a different platform, or use them in model training. All of these methods use the rg.Dataset.records attribute.

"},{"location":"how_to_guides/import_export/#export-records","title":"Export records","text":"

The records can be exported as a dictionary, a list of dictionaries, or a Dataset of the datasets package.

With images

If your dataset includes images, the recommended approach for exporting records is to use the to_datasets method, which exports the images as rescaled PIL objects. With other methods, the images will be exported using the data URI schema.

To a python dictionaryTo a python listTo the datasets package

Records can be exported from Dataset.records as a dictionary. The to_dict method can be used to export records as a dictionary. You can specify the orientation of the dictionary output. You can also decide if to flatten or not the dictionary.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\")\n\n# Export records as a dictionary\nexported_records = dataset.records.to_dict()\n# {'fields': [{'text': 'Hello'},{'text': 'World'}], suggestions': [{'label': {'value': 'positive'}}, {'label': {'value': 'negative'}}]\n\n# Export records as a dictionary with orient=index\nexported_records = dataset.records.to_dict(orient=\"index\")\n# {\"uuid\": {'fields': {'text': 'Hello'}, 'suggestions': {'label': {'value': 'positive'}}}, {\"uuid\": {'fields': {'text': 'World'}, 'suggestions': {'label': {'value': 'negative'}}},\n\n# Export records as a dictionary with flatten=True\nexported_records = dataset.records.to_dict(flatten=True)\n# {\"text\": [\"Hello\", \"World\"], \"label.suggestion\": [\"greeting\", \"greeting\"]}\n

Records can be exported from Dataset.records as a list of dictionaries. The to_list method can be used to export records as a list of dictionaries. You can decide if to flatten it or not.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=workspace)\n\n# Export records as a list of dictionaries\nexported_records = dataset.records.to_list()\n# [{'fields': {'text': 'Hello'}, 'suggestion': {'label': {value: 'greeting'}}}, {'fields': {'text': 'World'}, 'suggestion': {'label': {value: 'greeting'}}}]\n\n# Export records as a list of dictionaries with flatten=True\nexported_records = dataset.records.to_list(flatten=True)\n# [{\"text\": \"Hello\", \"label\": \"greeting\"}, {\"text\": \"World\", \"label\": \"greeting\"}]\n

Records can be exported from Dataset.records to the datasets package. The to_dataset method can be used to export records to the datasets package. You can specify the name of the dataset and the split to export the records.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\")\n\n# Export records as a dictionary\nexported_dataset = dataset.records.to_datasets()\n
"},{"location":"how_to_guides/import_export/#import-records","title":"Import records","text":"

To import records to a dataset, use the rg.Datasets.records.log method. There is a guide on how to do this in How-to guides - Record, or you can check the Record - Python Reference.

"},{"location":"how_to_guides/migrate_from_legacy_datasets/","title":"Migrate users, workspaces and datasets to Argilla 2.x","text":"

This guide will help you migrate task to Argilla V2. These do not include the FeedbackDataset which is just an interim naming convention for the latest extensible dataset. Task-specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the SDK migration blog post. Additionally, we will provide guidance on how to maintain your User's and Workspace's within the new Argilla V2 format.

Note

Legacy datasets include: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text.

FeedbackDataset's do not need to be migrated as they are already in the Argilla V2 format. Anyway, since the 2.x version includes changes to the search index structure, you should reindex the datasets by enabling the docker environment variable REINDEX_DATASET (This step is automatically executed if you're running Argilla in an HF Space). See the server configuration docs section for more details.

To follow this guide, you will need to have the following prerequisites:

  • An argilla 1.* server instance running with legacy datasets.
  • An argilla >=1.29 server instance running. If you don't have one, you can create one by following this Argilla guide.
  • The argilla sdk package installed in your environment.

Warning

This guide will recreate all User's' and Workspace's' on a new server. Hence, they will be created with new passwords and IDs. If you want to keep the same passwords and IDs, you can can copy the datasets to a temporary v2 instance, then upgrade your current instance to v2.0 and copy the datasets back to your original instance after.

If your current legacy datasets are on a server with Argilla release after 1.29, you could chose to recreate your legacy datasets as new datasets on the same server. You could then upgrade the server to Argilla 2.0 and carry on working their. Your legacy datasets will not be visible on the new server, but they will remain in storage layers if you need to access them.

For migrating the guides you will need to install the new argilla package. This includes a new v1 module that allows you to connect to the Argilla V1 server.

pip install \"argilla>=2.0.0\"\n
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#migrate-users-and-workspaces","title":"Migrate Users and Workspaces","text":"

The guide will take you through two steps:

  1. Retrieve the old users and workspaces from the Argilla V1 server using the new argilla package.
  2. Recreate the users and workspaces on the Argilla V2 server based op name as unique identifier.
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-1-retrieve-the-old-users-and-workspaces","title":"Step 1: Retrieve the old users and workspaces","text":"

You can use the v1 module to connect to the Argilla V1 server.

import argilla.v1 as rg_v1\n\n# Initialize the API with an Argilla server less than 2.0\napi_url = \"<your-url>\"\napi_key = \"<your-api-key>\"\nrg_v1.init(api_url, api_key)\n

Next, load the dataset User and Workspaces and from the Argilla V1 server:

users_v1 = rg_v1.User.list()\nworkspaces_v1 = rg_v1.Workspace.list()\n
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-2-recreate-the-users-and-workspaces","title":"Step 2: Recreate the users and workspaces","text":"

To recreate the users and workspaces on the Argilla V2 server, you can use the argilla package.

First, instantiate the Argilla class to connect to the Argilla V2 server:

import argilla as rg\n\nclient = rg.Argilla()\n

Next, recreate the users and workspaces on the Argilla V2 server:

for workspace in workspaces_v1:\n    rg.Workspace(\n        name=workspace.name\n    ).create()\n
for user in users_v1:\n    user = rg.User(\n        username=user.username,\n        first_name=user.first_name,\n        last_name=user.last_name,\n        role=user.role,\n        password=\"<your_chosen_password>\" # (1)\n    ).create()\n    if user.role == \"owner\":\n       continue\n\n    for workspace_name in user.workspaces:\n        if workspace_name != user.name:\n            workspace = client.workspaces(name=workspace_name)\n            user.add_to_workspace(workspace)\n
  1. You need to chose a new password for the user, to do this programmatically you can use the uuid package to generate a random password. Take care to keep track of the passwords you chose, since you will not be able to retrieve them later.

Now you have successfully migrated your users and workspaces to Argilla V2 and can continue with the next steps.

"},{"location":"how_to_guides/migrate_from_legacy_datasets/#migrate-datasets","title":"Migrate datasets","text":"

The guide will take you through three steps:

  1. Retrieve the legacy dataset from the Argilla V1 server using the new argilla package.
  2. Define the new dataset in the Argilla V2 format.
  3. Upload the dataset records to the new Argilla V2 dataset format and attributes.
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-1-retrieve-the-legacy-dataset","title":"Step 1: Retrieve the legacy dataset","text":"

You can use the v1 module to connect to the Argilla V1 server.

import argilla.v1 as rg_v1\n\n# Initialize the API with an Argilla server less than 2.0\napi_url = \"<your-url>\"\napi_key = \"<your-api-key>\"\nrg_v1.init(api_url, api_key)\n

Next, load the dataset settings and records from the Argilla V1 server:

dataset_name = \"news-programmatic-labeling\"\nworkspace = \"demo\"\n\nsettings_v1 = rg_v1.load_dataset_settings(dataset_name, workspace)\nrecords_v1 = rg_v1.load(dataset_name, workspace)\nhf_dataset = records_v1.to_datasets()\n

Your legacy dataset is now loaded into the hf_dataset object.

"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-2-define-the-new-dataset","title":"Step 2: Define the new dataset","text":"

Define the new dataset in the Argilla V2 format. The new dataset format is defined in the argilla package. You can create a new dataset with the Settings and Dataset classes:

First, instantiate the Argilla class to connect to the Argilla V2 server:

import argilla as rg\n\nclient = rg.Argilla()\n

Next, define the new dataset settings:

For single-label classificationFor multi-label classificationFor token classificationFor text generation
settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"), # (1)\n    ],\n    questions=[\n        rg.LabelQuestion(name=\"label\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (2)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (3)\n    ],\n)\n
  1. The default field in DatasetForTextClassification is text, but make sure you provide all fields included in record.inputs.
  2. Make sure you provide all relevant metadata fields available in the dataset.
  3. Make sure you provide all relevant vectors available in the dataset.
settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"), # (1)\n    ],\n    questions=[\n        rg.MultiLabelQuestion(name=\"labels\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (2)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (3)\n    ],\n)\n
  1. The default field in DatasetForTextClassification is text, but we should provide all fields included in record.inputs.
  2. Make sure you provide all relevant metadata fields available in the dataset.
  3. Make sure you provide all relevant vectors available in the dataset.
settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        rg.SpanQuestion(name=\"spans\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (1)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (2)\n    ],\n)\n
  1. Make sure you provide all relevant metadata fields available in the dataset.
  2. Make sure you provide all relevant vectors available in the dataset.
settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        rg.TextQuestion(name=\"text_generation\"),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (1)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (2)\n    ],\n)\n
  1. We should provide all relevant metadata fields available in the dataset.
  2. We should provide all relevant vectors available in the dataset.

Finally, create the new dataset on the Argilla V2 server:

dataset = rg.Dataset(name=dataset_name, workspace=workspace, settings=settings)\ndataset.create()\n

Note

If a dataset with the same name already exists, the create method will raise an exception. You can check if the dataset exists and delete it before creating a new one.

dataset = client.datasets(name=dataset_name, workspace=workspace)\n\nif dataset is not None:\n    dataset.delete()\n
"},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-3-upload-the-dataset-records","title":"Step 3: Upload the dataset records","text":"

To upload the records to the new server, we will need to convert the records from the Argilla V1 format to the Argilla V2 format. The new argilla sdk package uses a generic Record class, but legacy datasets have specific record classes. We will need to convert the records to the generic Record class.

Here are a set of example functions to convert the records for single-label and multi-label classification. You can modify these functions to suit your dataset.

For single-label classificationFor multi-label classificationFor token classificationFor text generation
def map_to_record_for_single_label(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        label, score = prediction[0].values()\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"label\", # (1)\n                value=label,\n                score=score,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"label\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields=data[\"inputs\"],\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        suggestions=suggestions,\n        responses=responses,\n    )\n
  1. Make sure the question_name matches the name of the question in question settings.

  2. Make sure the question_name matches the name of the question in question settings.

def map_to_record_for_multi_label(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        labels, scores = zip(*[(pred[\"label\"], pred[\"score\"]) for pred in prediction])\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"labels\", # (1)\n                value=labels,\n                score=scores,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"labels\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields=data[\"inputs\"],\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        suggestions=suggestions,\n        responses=responses,\n    )\n
  1. Make sure the question_name matches the name of the question in question settings.

  2. Make sure the question_name matches the name of the question in question settings.

def map_to_record_for_span(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a token classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        scores = [span[\"score\"] for span in prediction]\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"spans\", # (1)\n                value=prediction,\n                score=scores,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"spans\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields={\"text\": data[\"text\"]},\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        # The vectors field should be a dictionary with the same keys as the `vectors` in the settings\n        suggestions=suggestions,\n        responses=responses,\n    )\n
  1. Make sure the question_name matches the name of the question in question settings.

  2. Make sure the question_name matches the name of the question in question settings.

def map_to_record_for_text_generation(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text2text record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        first = prediction[0]\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"text_generation\", # (1)\n                value=first[\"text\"],\n                score=first[\"score\"],\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        # From data[annotation]\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"text_generation\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields={\"text\": data[\"text\"]},\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        # The vectors field should be a dictionary with the same keys as the `vectors` in the settings\n        suggestions=suggestions,\n        responses=responses,\n    )\n
  1. Make sure the question_name matches the name of the question in question settings.

  2. Make sure the question_name matches the name of the question in question settings.

The functions above depend on the users_by_name dictionary and the current_user object to assign responses to users, we need to load the existing users. You can retrieve the users from the Argilla V2 server and the current user as follows:

users_by_name = {user.username: user for user in client.users}\ncurrent_user = client.me\n

Finally, upload the records to the new dataset using the log method and map functions.

records = []\n\nfor data in hf_records:\n    records.append(map_to_record_for_single_label(data, users_by_name, current_user))\n\n# Upload the records to the new dataset\ndataset.records.log(records)\n

You have now successfully migrated your legacy dataset to Argilla V2. For more guides on how to use the Argilla SDK, please refer to the How to guides.

"},{"location":"how_to_guides/query/","title":"Query and filter records","text":"

This guide provides an overview of how to query and filter a dataset in Argilla.

You can search for records in your dataset by querying or filtering. The query focuses on the content of the text field, while the filter is used to filter the records based on conditions. You can use them independently or combine multiple filters to create complex search queries. You can also export records from a dataset either as a single dictionary or a list of dictionaries.

Main Classes

rg.Queryrg.Filterrg.Similar
rg.Query(\n    query=\"query\",\n    filter=filter\n)\n

Check the Query - Python Reference to see the attributes, arguments, and methods of the Query class in detail.

rg.Filter(\n    [\n        (\"field\", \"==\", \"value\"),\n    ]\n)\n

Check the Filter - Python Reference to see the attributes, arguments, and methods of the Filter class in detail.

rg.Similar(\n    name=\"vector\",\n    value=[0.1, 0.2, 0.3],\n)\n

Check the Similar - Python Reference to see the attributes, arguments, and methods of the Similar class in detail.

"},{"location":"how_to_guides/query/#query-with-search-terms","title":"Query with search terms","text":"

To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field. You can search a single term or various terms, in the latter, all of them should appear in the record to be retrieved.

Single term searchMultiple terms search
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery = rg.Query(query=\"my_term\")\n\nqueried_records = dataset.records(query=query).to_list(flatten=True)\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery = rg.Query(query=\"my_term1 my_term2\")\n\nqueried_records = dataset.records(query=query).to_list(flatten=True)\n
"},{"location":"how_to_guides/query/#advanced-queries","title":"Advanced queries","text":"

If you need more complex searches, you can use Elasticsearch's simple query string syntax. Here is a summary of the different available operators:

operator description example + or space AND: search both terms argilla + distilabel or argilla distilabel return records that include the terms \"argilla\" and \"distilabel\" | OR: search either term argilla | distilabel returns records that include the term \"argilla\" or \"distilabel\" - Negation: exclude a term argilla -distilabel returns records that contain the term \"argilla\" and don't have the term \"distilabel\" * Prefix: search a prefix arg* returns records with any words starting with \"arg-\" \" Phrase: search a phrase \"argilla and distilabel\" returns records that contain the phrase \"argilla and distilabel\" ( and ) Precedence: group terms (argilla | distilabel) rules returns records that contain either \"argilla\" or \"distilabel\" and \"rules\" ~N Edit distance: search a term or phrase with an edit distance argilla~1 returns records that contain the term \"argilla\" with an edit distance of 1, e.g. \"argila\"

Tip

To use one of these characters literally, escape it with a preceding backslash \\, e.g. \"1 \\+ 2\" would match records where the phrase \"1 + 2\" is found.

"},{"location":"how_to_guides/query/#filter-by-conditions","title":"Filter by conditions","text":"

You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include \"==\", \">=\", \"<=\", or \"in\". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses. You can use a single condition or multiple conditions to filter records.

operator description == The field value is equal to the value >= The field value is greater than or equal to the value <= The field value is less than or equal to the value in The field value is included in a list of values Single conditionMultiple conditions
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nfilter_label = rg.Filter((\"label\", \"==\", \"positive\"))\n\nfiltered_records = dataset.records(query=rg.Query(filter=filter_label)).to_list(\n    flatten=True\n)\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nfilters = rg.Filter(\n    [\n        (\"label.suggestion\", \"==\", \"positive\"),\n        (\"metadata.count\", \">=\", 10),\n        (\"metadata.count\", \"<=\", 20),\n        (\"label\", \"in\", [\"positive\", \"negative\"])\n    ]\n)\n\nfiltered_records = dataset.records(\n    query=rg.Query(filter=filters), with_suggestions=True\n).to_list(flatten=True)\n
"},{"location":"how_to_guides/query/#available-fields","title":"Available fields","text":"

You can filter records based on the following fields:

field description example id The record id (\"id\", \"in\", [\"1\",\"2\",\"3\"]) _server_id The internal record id. This value must be a valida UUID (\"_server_id\", \"==\", \"ba69a996-85c2-4af0-a473-23138929641b\") inserted_at The date and time the record was inserted. You can pass a datetime or a string (\"inserted_at\" \">=\", \"2024-10-10\") updated_at The date and time the record was updated. (\"updated_at\" \">=\", \"2024-10-10\") status The record status, which can be pending or completed. (\"status\", \"==\", \"completed\") response.status The response status, which can be draft, submitted, or discarded. (\"response.status\", \"==\", \"submitted\") metadata.<name> Filter by a metadata property (\"metadata.split\", \"==\", \"train\") <question>.suggestion Filter by a question suggestion value (\"label.sugggestion\", \"==\", \"positive\") <question>.score Filter by a suggestion score (\"label.score\", \"<=\", \"0.9\") <question>.agent Filter by a suggestion agent (\"label.agent\", \"<=\", \"ChatGPT4.0\") <question>.response Filter by a question response (\"label.response\", \"==\", \"negative\")"},{"location":"how_to_guides/query/#filter-by-status","title":"Filter by status","text":"

You can filter records based on record or response status. Record status can be pending or completed, and response status can be draft, submitted, or discarded.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nstatus_filter = rg.Query(\n    filter=rg.Filter(\n        [\n            (\"status\", \"==\", \"completed\"),\n            (\"response.status\", \"==\", \"discarded\")\n        ]\n    )\n)\n\nfiltered_records = dataset.records(status_filter).to_list(flatten=True)\n
"},{"location":"how_to_guides/query/#similarity-search","title":"Similarity search","text":"

You can search for records that are similar to a given vector. You can use the Similar class to define the vector and pass it as part of the query argument to the Dataset.records.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\n\nsimilar_filter = rg.Query(\n    similar=rg.Similar(\n        name=\"vector\", value=[0.1, 0.2, 0.3],\n    )\n)\n\nfiltered_records = dataset.records(similar_filter).to_list(flatten=True)\n

Note

The Similar search expects a vector field definition as part of the dataset settings. If the dataset does not have a vector field, the search will return an error. Vist the Vectors section for more details on how to define a vector field.

"},{"location":"how_to_guides/query/#query-and-filter-a-dataset","title":"Query and filter a dataset","text":"

As mentioned, you can use a query with a search term and a filter or various filters to create complex search queries.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery_filter = rg.Query(\n    query=\"my_term\",\n    filter=rg.Filter(\n        [\n            (\"label.suggestion\", \"==\", \"positive\"),\n            (\"metadata.count\", \">=\", 10),\n        ]\n    )\n)\n\nqueried_filtered_records = dataset.records(\n    query=query_filter,\n    with_metadata=True,\n    with_suggestions=True\n).to_list(flatten=True)\n
"},{"location":"how_to_guides/record/","title":"Add, update, and delete records","text":"

This guide provides an overview of records, explaining the basics of how to define and manage them in Argilla.

A record in Argilla is a data item that requires annotation, consisting of one or more fields. These are the pieces of information displayed to the user in the UI to facilitate the completion of the annotation task. Each record also includes questions that annotators are required to answer, with the option of adding suggestions and responses to assist them. Guidelines are also provided to help annotators effectively complete their tasks.

A record is part of a dataset, so you will need to create a dataset before adding records. Check this guide to learn how to create a dataset.

Main Class

rg.Record(\n    external_id=\"1234\",\n    fields={\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\"\n    },\n    metadata={\n        \"category\": \"A\"\n    },\n    vectors={\n        \"my_vector\": [0.1, 0.2, 0.3],\n    },\n    suggestions=[\n        rg.Suggestion(\"my_label\", \"positive\", score=0.9, agent=\"model_name\")\n    ],\n    responses=[\n        rg.Response(\"label\", \"positive\", user_id=user_id)\n    ],\n)\n

Check the Record - Python Reference to see the attributes, arguments, and methods of the Record class in detail.

"},{"location":"how_to_guides/record/#add-records","title":"Add records","text":"

You can add records to a dataset in two different ways: either by using a dictionary or by directly initializing a Record object. You should ensure that fields, metadata and vectors match those configured in the dataset settings. In both cases, are added via the Dataset.records.log method. As soon as you add the records, these will be available in the Argilla UI. If they do not appear in the UI, you may need to click the refresh button to update the view.

Tip

Take some time to inspect the data before adding it to the dataset in case this triggers changes in the questions or fields.

Note

If you are planning to use public data, the Datasets page of the Hugging Face Hub is a good place to start. Remember to always check the license to make sure you can legally use it for your specific use case.

As Record objectsFrom a generic data structureFrom a Hugging Face dataset

You can add records to a dataset by initializing a Record object directly. This is ideal if you need to apply logic to the data before defining the record. If the data is already structured, you should consider adding it directly as a dictionary or Hugging Face dataset.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n    ), # (1)\n]\n\ndataset.records.log(records)\n
  1. This is an illustration of a definition. In a real-world scenario, you would iterate over a data structure and create Record objects for each iteration.

You can add the data directly as a dictionary like structure, where the keys correspond to the names of fields, questions, metadata or vectors in the dataset and the values are the data to be added.

If your data structure does not correspond to your Argilla dataset names, you can use a mapping to indicate which keys in the source data correspond to the dataset fields, metadata, vectors, suggestions, or responses. If you need to add the same data to multiple attributes, you can also use a list with the name of the attributes.

We illustrate this python dictionaries that represent your data, but we would not advise you to define dictionaries. Instead, use the Record object to instantiate records.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\n# Add records to the dataset with the fields 'question' and 'answer'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n    }, # (1)\n]\ndataset.records.log(data)\n\n# Add records to the dataset with a mapping of the fields 'question' and 'answer'\ndata = [\n    {\n        \"query\": \"Do you need oxygen to breathe?\",\n        \"response\": \"Yes\",\n    },\n    {\n        \"query\": \"What is the boiling point of water?\",\n        \"response\": \"100 degrees Celsius\",\n    },\n]\ndataset.records.log(data, mapping={\"query\": \"question\", \"response\": \"answer\"}) # (2)\n
  1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
  2. The data structure has keys query and response, and the Argilla dataset has fields question and answer. You can use the mapping parameter to map the keys in the data structure to the fields in the Argilla dataset.

You can also add records to a dataset using a Hugging Face dataset. This is useful when you want to use a dataset from the Hugging Face Hub and add it to your Argilla dataset.

You can add the dataset where the column names correspond to the names of fields, metadata or vectors in the Argilla dataset.

import argilla as rg\nfrom datasets import load_dataset\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\") # (1)\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (2)\n\ndataset.records.log(records=hf_dataset)\n
  1. In this case, we are using the my_dataset dataset from the Argilla workspace. The dataset has a text field and a label question.

  2. In this example, the Hugging Face dataset matches the Argilla dataset schema. If that is not the case, you could use the .map of the datasets library to prepare the data before adding it to the Argilla dataset.

If the Hugging Face dataset's schema does not correspond to your Argilla dataset field names, you can use a mapping to specify the relationship. You should indicate as key the column name of the Hugging Face dataset and, as value, the field name of the Argilla dataset.

dataset.records.log(\n    records=hf_dataset, mapping={\"text\": \"review\", \"label\": \"sentiment\"}\n) # (1)\n
  1. In this case, the text key in the Hugging Face dataset would correspond to the review field in the Argilla dataset, and the label key in the Hugging Face dataset would correspond to the sentiment field in the Argilla dataset.
"},{"location":"how_to_guides/record/#fields","title":"Fields","text":"

Fields are the main pieces of information of the record. These are shown at first sight in the UI together with the questions form. You may only include fields that you have previously configured in the dataset settings. Depending on the type of fields included in the dataset, the data format may be slightly different:

TextImageChatCustom

Text fields expect input in the form of a string.

record = rg.Record(\n    fields={\"text\": \"Hello World, how are you?\"}\n)\n

Image fields expect a remote URL or local path to an image file in the form of a string, or a PIL object.

Check the Dataset.records - Python Reference to see how to add records with with images in detail.

records = [\n    rg.Record(\n        fields={\"image\": \"https://example.com/image.jpg\"}\n    ),\n    rg.Record(\n        fields={\"image\": \"path/to/image.jpg\"}\n    ),\n    rg.Record(\n        fields={\"image\": Image.open(\"path/to/image.jpg\")}\n    ),\n]\n

Chat fields expect a list of dictionaries with the keys role and content, where the role identifies the interlocutor type (e.g., user, assistant, model, etc.), whereas the content contains the text of the message.

record = rg.Record(\n    fields={\n        \"chat\": [\n            {\"role\": \"user\", \"content\": \"What is Argilla?\"},\n            {\"role\": \"assistant\", \"content\": \"Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets\"},\n        ]\n    }\n)\n

Custom fields expect a dictionary with the keys and values you define in the dataset settings. You need to ensure these are aligned with CustomField.template in order for them to be rendered in the UI.

record = rg.Record(\n    fields={\"custom\": {\"key\": \"value\"}}\n)\n
"},{"location":"how_to_guides/record/#metadata","title":"Metadata","text":"

Record metadata can include any information about the record that is not part of the fields in the form of a dictionary. To use metadata for filtering and sorting records, make sure that the key of the dictionary corresponds with the metadata property name. When the key doesn't correspond, this will be considered extra metadata that will get stored with the record (as long as allow_extra_metadata is set to True for the dataset), but will not be usable for filtering and sorting.

Note

Remember that to use metadata within a dataset, you must define a metadata property in the dataset settings.

Check the Metadata - Python Reference to see the attributes, arguments, and methods for using metadata in detail.

As Record objectsFrom a generic data structure

You can add metadata to a record in an initialized Record object.

# Add records to the dataset with the metadata 'category'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        metadata={\"my_metadata\": \"option_1\"},\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        metadata={\"my_metadata\": \"option_1\"},\n    ),\n]\ndataset.records.log(records)\n

You can add metadata to a record directly as a dictionary structure, where the keys correspond to the names of metadata properties in the dataset and the values are the metadata to be added. Remember that you can also use the mapping parameter to specify the data structure.

# Add records to the dataset with the metadata 'category'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"my_metadata\": \"option_1\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"my_metadata\": \"option_1\",\n    },\n]\ndataset.records.log(data)\n
"},{"location":"how_to_guides/record/#vectors","title":"Vectors","text":"

You can associate vectors, like text embeddings, to your records. They can be used for semantic search in the UI and the Python SDK. Make sure that the length of the list corresponds to the dimensions set in the vector settings.

Note

Remember that to use vectors within a dataset, you must define them in the dataset settings.

Check the Vector - Python Reference to see the attributes, arguments, and methods of the Vector class in detail.

As Record objectsFrom a generic data structure

You can also add vectors to a record in an initialized Record object.

# Add records to the dataset with the vector 'my_vector' and dimension=3\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        vectors={\n            \"my_vector\": [0.1, 0.2, 0.3]\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        vectors={\n            \"my_vector\": [0.2, 0.5, 0.3]\n        },\n    ),\n]\ndataset.records.log(records)\n

You can add vectors from a dictionary-like structure, where the keys correspond to the names of the vector settings that were configured for your dataset and the value is a list of floats. Remember that you can also use the mapping parameter to specify the data structure.

# Add records to the dataset with the vector 'my_vector' and dimension=3\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"my_vector\": [0.1, 0.2, 0.3],\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"my_vector\": [0.2, 0.5, 0.3],\n    },\n]\ndataset.records.log(data)\n
"},{"location":"how_to_guides/record/#suggestions","title":"Suggestions","text":"

Suggestions refer to suggested responses (e.g. model predictions) that you can add to your records to make the annotation process faster. These can be added during the creation of the record or at a later stage. Only one suggestion can be provided for each question, and suggestion values must be compliant with the pre-defined questions e.g. if we have a RatingQuestion between 1 and 5, the suggestion should have a valid value within that range.

Check the Suggestions - Python Reference to see the attributes, arguments, and methods of the Suggestion class in detail.

Tip

Check the Suggestions - Python Reference for different formats per Question type.

As Record objectsFrom a generic data structure

You can also add suggestions to a record in an initialized Record object.

# Add records to the dataset with the label 'my_label'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        suggestions=[\n            rg.Suggestion(\n                \"my_label\",\n                \"positive\",\n                score=0.9,\n                agent=\"model_name\"\n            )\n        ],\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        suggestions=[\n            rg.Suggestion(\n                \"my_label\",\n                \"negative\",\n                score=0.9,\n                agent=\"model_name\"\n            )\n        ],\n    ),\n]\ndataset.records.log(records)\n

You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure.

# Add records to the dataset with the label question 'my_label'\ndata =  [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"label\": \"positive\",\n        \"score\": 0.9,\n        \"agent\": \"model_name\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"label\": \"negative\",\n        \"score\": 0.9,\n        \"agent\": \"model_name\",\n    },\n]\ndataset.records.log(\n    data=data,\n    mapping={\n        \"label\": \"my_label\",\n        \"score\": \"my_label.suggestion.score\",\n        \"agent\": \"my_label.suggestion.agent\",\n    },\n)\n
"},{"location":"how_to_guides/record/#responses","title":"Responses","text":"

If your dataset includes some annotations, you can add those to the records as you create them. Make sure that the responses adhere to the same format as Argilla's output and meet the schema requirements for the specific type of question being answered. Make sure to include the user_id in case you're planning to add more than one response for the same question, if not responses will apply to all the annotators.

Check the Responses - Python Reference to see the attributes, arguments, and methods of the Response class in detail.

Note

Keep in mind that records with responses will be displayed as \"Draft\" in the UI.

Tip

Check the Responses - Python Reference for different formats per Question type.

As Record objectsFrom a generic data structure

You can also add suggestions to a record in an initialized Record object.

# Add records to the dataset with the label 'my_label'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        responses=[\n            rg.Response(\"my_label\", \"positive\", user_id=user.id)\n        ]\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        responses=[\n            rg.Response(\"my_label\", \"negative\", user_id=user.id)\n        ]\n    ),\n]\ndataset.records.log(records)\n

You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure. If you want to specify the user that added the response, you can use the user_id parameter.

# Add records to the dataset with the label 'my_label'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"label\": \"positive\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"label\": \"negative\",\n    },\n]\ndataset.records.log(data, user_id=user.id, mapping={\"label\": \"my_label.response\"})\n
"},{"location":"how_to_guides/record/#list-records","title":"List records","text":"

To list records in a dataset, you can use the records method on the Dataset object. This method returns a list of Record objects that can be iterated over to access the record properties.

for record in dataset.records(\n    with_suggestions=True,\n    with_responses=True,\n    with_vectors=True\n):\n\n    # Access the record properties\n    print(record.metadata)\n    print(record.vectors)\n    print(record.suggestions)\n    print(record.responses)\n\n    # Access the responses of the record\n    for response in record.responses:\n        print(response.value)\n
"},{"location":"how_to_guides/record/#update-records","title":"Update records","text":"

You can update records in a dataset by calling the log method on the Dataset object. To update a record, you need to provide the record id and the new data to be updated.

data = dataset.records.to_list(flatten=True)\n\nupdated_data = [\n    {\n        \"text\": sample[\"text\"],\n        \"label\": \"positive\",\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n
Update the metadataUpdate vectorsUpdate suggestionsUpdate responses

The metadata of the Record object is a python dictionary. To update it, you can iterate over the records and update the metadata by key. After that, you should update the records in the dataset.

Tip

Check the Metadata - Python Reference for different formats per MetadataProperty type.

updated_records = []\n\nfor record in dataset.records():\n\n    record.metadata[\"my_metadata\"] = \"new_value\"\n    record.metadata[\"my_new_metadata\"] = \"new_value\"\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

If a new vector field is added to the dataset settings or some value for the existing record vectors must be updated, you can iterate over the records and update the vectors by key. After that, you should update the records in the dataset.

updated_records = []\n\nfor record in dataset.records(with_vectors=True):\n\n    record.vectors[\"my_vector\"] = [ 0, 1, 2, 3, 4, 5 ]\n    record.vectors[\"my_new_vector\"] = [ 0, 1, 2, 3, 4, 5 ]\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

If some value for the existing record suggestions must be updated, you can iterate over the records and update the suggestions by key. You can also add a suggestion using the add method. After that, you should update the records in the dataset.

Tip

Check the Suggestions - Python Reference for different formats per Question type.

updated_records = []\n\nfor record in dataset.records(with_suggestions=True):\n\n    # We can update existing suggestions\n    record.suggestions[\"label\"].value = \"new_value\"\n    record.suggestions[\"label\"].score = 0.9\n    record.suggestions[\"label\"].agent = \"model_name\"\n\n    # We can also add new suggestions with the `add` method:\n    if not record.suggestions[\"label\"]:\n        record.suggestions.add(\n            rg.Suggestion(\"value\", \"label\", score=0.9, agent=\"model_name\")\n        )\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

If some value for the existing record responses must be updated, you can iterate over the records and update the responses by key. You can also add a response using the add method. After that, you should update the records in the dataset.

Tip

Check the Responses - Python Reference for different formats per Question type.

updated_records = []\n\nfor record in dataset.records(with_responses=True):\n\n    for response in record.responses[\"label\"]:\n\n        if response:\n                response.value = \"new_value\"\n                response.user_id = \"existing_user_id\"\n\n        else:\n            record.responses.add(rg.Response(\"label\", \"YES\", user_id=user.id))\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n
"},{"location":"how_to_guides/record/#delete-records","title":"Delete records","text":"

You can delete records in a dataset calling the delete method on the Dataset object. To delete records, you need to retrieve them from the server and get a list with those that you want to delete.

records_to_delete = list(dataset.records)[:5]\ndataset.records.delete(records=records_to_delete)\n

Delete records based on a query

It can be very useful to avoid eliminating records with responses.

For more information about the query syntax, check this how-to guide.

status_filter = rg.Query(\n    filter = rg.Filter((\"response.status\", \"==\", \"pending\"))\n)\nrecords_to_delete = list(dataset.records(status_filter))\n\ndataset.records.delete(records_to_delete)\n
"},{"location":"how_to_guides/use_markdown_to_format_rich_content/","title":"Use Markdown to format rich content","text":"

This guide provides an overview of how to use Markdown and HTML in TextFields to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

The TextField and TextQuestion provide the option to enable Markdown and therefore HTML by setting use_markdown=True. Given the flexibility of HTML, we can get great control over the presentation of data to our annotators. We provide some out-of-the-box methods for multi-modality and chat templates in the examples below.

Main Methods

image_to_htmlaudio_to_htmlvideo_to_htmlpdf_to_htmlchat_to_html
image_to_html(\"local_image_file.png\")\n
audio_to_html(\"local_audio_file.mp3\")\n
audio_to_html(\"local_video_file.mp4\")\n
pdf_to_html(\"local_pdf_file.pdf\")\n
chat_to_html([{\"role\": \"user\", \"content\": \"hello\"}])\n

Check the Markdown - Python Reference to see the arguments of the rg.markdown methods in detail.

Tip

You can get pretty creative with HTML. For example, think about visualizing graphs and tables. You can use some interesting Python packages methods like pandas.DataFrame.to_html and plotly.io.to_html.

"},{"location":"how_to_guides/use_markdown_to_format_rich_content/#multi-modal-support-images-audio-video-pdfs-and-more","title":"Multi-modal support: images, audio, video, PDFs and more","text":"

Argilla has basic multi-modal support in different ways, each with pros and cons, but they both offer the same UI experience because they both rely on HTML.

"},{"location":"how_to_guides/use_markdown_to_format_rich_content/#local-content-through-dataurls","title":"Local content through DataURLs","text":"

A DataURL is a scheme that allows data to be encoded into a base64-encoded string and then embedded directly into HTML. To facilitate this, we offer some functions: image_to_html, audio_to_html, video_to_thml, and pdf_to_html. These functions accept either the file path or the file's byte data and return the corresponding HTMurl to render the media file within the Argilla user interface. Additionally, you can also set the width and height in pixel or percentage for video and image (defaults to the original dimensions) and the autoplay and loop attributes to True for audio and video (defaults to False).

Warning

DataURLs increase the memory usage of the original filesize. Additionally, different browsers enforce different size limitations for rendering DataURLs which might block the visualization experience per user.

ImageAudioVideoPDF
from argilla.markdown import image_to_html\n\nhtml = image_to_html(\n    \"local_image_file.png\",\n    width=\"300px\",\n    height=\"300px\"\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
from argilla.markdown import audio_to_html\n\nhtml = audio_to_html(\n    \"local_audio_file.mp3\",\n    width=\"300px\",\n    height=\"300px\",\n    autoplay=True,\n    loop=True\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
from argilla.markdown import video_to_thml\n\nhtml = video_to_html(\n    \"local_video_file.mp4\",\n    width=\"300px\",\n    height=\"300px\",\n    autoplay=True,\n    loop=True\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
from argilla.markdown import pdf_to_html\n\nhtml = pdf_to_html(\n    \"local_pdf_file.pdf\",\n    width=\"300px\",\n    height=\"300px\"\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
"},{"location":"how_to_guides/use_markdown_to_format_rich_content/#hosted-content","title":"Hosted content","text":"

Instead of uploading local files through DataURLs, we can also visualize URLs directly linking to media files such as images, audio, video, and PDFs hosted on a public or private server. In this case, you can use basic HTML to visualize content available on platforms like Google Drive or decide to configure a private media server.

Warning

When trying to access content from a private media server you have to ensure that the Argilla server has network access to the private media server, which might be done through something like IP whitelisting.

ImageAudioVideoPDF
html = \"<img src='https://example.com/public-image-file.jpg'>\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
html = \"\"\"\n<audio controls>\n    <source src=\"https://example.com/public-audio-file.mp3\" type=\"audio/mpeg\">\n</audio>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
html = \"\"\"\n<video width=\"320\" height=\"240\" controls>\n    <source src=\"https://example.com/public-video-file.mp4\" type=\"video/mp4\">\n</video>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
html = \"\"\"\n<iframe\n    src=\"https://example.com/public-pdf-file.pdf\"\n    width=\"600\"\n    height=\"500\">\n</iframe>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
"},{"location":"how_to_guides/use_markdown_to_format_rich_content/#chat-and-conversation-support","title":"Chat and conversation support","text":"

When working with chat data from multi-turn interaction with a Large Language Model, it might be nice to be able to visualize the conversation in a similar way as a common chat interface. To facilitate this, we offer the chat_to_html function, which converts messages from OpenAI chat format to an HTML-formatted chat interface.

OpenAI chat format

The OpenAI chat format is a way to structure a list of messages as input from users and returns a model-generated message as output. These messages can only contain the roles \"user\" for human messages and \"assistant\", \"system\" or \"model\" for model-generated messages.

from argilla.markdown import chat_to_html\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Hello! How are you?\"},\n    {\"role\": \"assistant\", \"content\": \"I'm good, thank you!\"}\n]\n\nhtml = chat_to_html(messages)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n

"},{"location":"how_to_guides/user/","title":"User management","text":"

This guide provides an overview of user roles and credentials, explaining how to set up and manage users in Argilla.

A user in Argilla is an authorized person who, depending on their role, can use the Python SDK and access the UI in a running Argilla instance. We differentiate between three types of users depending on their role, permissions and needs: owner, admin and annotator.

OverviewOwnerAdminAnnotator Owner Admin Annotator Number Unlimited Unlimited Unlimited Create and delete workspaces Yes No No Assign users to workspaces Yes No No Create, configure, update, and delete datasets Yes Only within assigned workspaces No Create, update, and delete users Yes No No Provide feedback with Argila UI Yes Yes Yes

The owner refers to the root user who created the Argilla instance. Using workspaces within Argilla proves highly beneficial for organizing tasks efficiently. So, the owner has full access to all workspaces and their functionalities:

  • Workspace management: It can create, read and delete a workspace.
  • User management: It can create a new user, assign it to a workspace, and delete it. It can also list them and search for a specific one.
  • Dataset management: It can create, configure, retrieve, update, and delete datasets.
  • Annotation: It can annotate datasets in the Argilla UI.
  • Feedback: It can provide feedback with the Argilla UI.

An admin user can only access the workspaces it has been assigned to and cannot assign other users to it. An admin user has the following permissions:

  • Dataset management: It can create, configure, retrieve, update, and delete datasets only on the assigned workspaces.
  • Annotation: It can annotate datasets in the assigned workspaces via the Argilla UI.
  • Feedback: It can provide feedback with the Argilla UI.

An annotator user is limited to accessing only the datasets assigned to it within the workspace. It has two specific permissions:

  • Annotation: It can annotate the assigned datasets in the Argilla UI.
  • Feedback: It can provide feedback with the Argilla UI.
Question: Who can manage users?

Only users with the owner role can manage (create, retrieve, delete) other users.

"},{"location":"how_to_guides/user/#initial-users-and-credentials","title":"Initial users and credentials","text":"

Depending on your Argilla deployment, the initial user with the owner role will vary.

  • If you deploy on the Hugging Face Hub, the initial user will correspond to the Space owner (your personal account). The API key is automatically generated and can be copied from the \"Settings\" section of the UI.
  • If you deploy with Docker, the default values for the environment variables are: USERNAME: argilla, PASSWORD: 12345678, API_KEY: argilla.apikey.

For the new users, the username and password are set during the creation process. The API key can be copied from the \"Settings\" section of the UI.

Main Class

rg.User(\n    username=\"username\",\n    first_name=\"first_name\",\n    last_name=\"last_name\",\n    role=\"owner\",\n    password=\"password\",\n    client=client\n)\n

Check the User - Python Reference to see the attributes, arguments, and methods of the User class in detail.

"},{"location":"how_to_guides/user/#get-current-user","title":"Get current user","text":"

To ensure you're using the correct credentials for managing users, you can get the current user in Argilla using the me attribute of the Argilla class.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ncurrent_user = client.me\n
"},{"location":"how_to_guides/user/#create-a-user","title":"Create a user","text":"

To create a new user in Argilla, you can define it in the User class and then call the create method. This method is inherited from the Resource base class and operates without modifications.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser_to_create = rg.User(\n    username=\"my_username\",\n    password=\"12345678\",\n)\n\ncreated_user = user_to_create.create()\n

Accessing attributes

Access the attributes of a user by calling them directly on the User object. For example, user.id or user.username.

"},{"location":"how_to_guides/user/#list-users","title":"List users","text":"

You can list all the existing users in Argilla by accessing the users attribute on the Argilla class and iterating over them. You can also use len(client.users) to get the number of users.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nusers = client.users\n\nfor user in users:\n    print(user)\n

Notebooks

When using a notebook, executing client.users will display a table with username, id, role, and the last update as updated_at.

"},{"location":"how_to_guides/user/#retrieve-a-user","title":"Retrieve a user","text":"

You can retrieve an existing user from Argilla by accessing the users attribute on the Argilla class and passing the username or id as an argument. If the user does not exist, a warning message will be raised and None will be returned.

By usernameBy id
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_user = client.users(\"my_username\")\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_user = client.users(id=\"<uuid-or-uuid-string>\")\n
"},{"location":"how_to_guides/user/#check-user-existence","title":"Check user existence","text":"

You can check if a user exists. The client.users method will return None if the user was not found.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users(\"my_username\")\n\nif user is not None:\n    pass\n
"},{"location":"how_to_guides/user/#list-users-in-a-workspace","title":"List users in a workspace","text":"

You can list all the users in a workspace by accessing the users attribute on the Workspace class and iterating over them. You can also use len(workspace.users) to get the number of users by workspace.

For further information on how to manage workspaces, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces('my_workspace')\n\nfor user in workspace.users:\n    print(user)\n
"},{"location":"how_to_guides/user/#add-a-user-to-a-workspace","title":"Add a user to a workspace","text":"

You can add an existing user to a workspace in Argilla by calling the add_to_workspace method on the User class.

For further information on how to manage workspaces, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users('my_username')\nworkspace = client.workspaces('my_workspace')\n\nadded_user = user.add_to_workspace(workspace)\n
"},{"location":"how_to_guides/user/#remove-a-user-from-a-workspace","title":"Remove a user from a workspace","text":"

You can remove an existing user from a workspace in Argilla by calling the remove_from_workspace method on the User class.

For further information on how to manage workspaces, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users('my_username')\nworkspace = client.workspaces('my_workspace')\n\nremoved_user = user.remove_from_workspace(workspace)\n
"},{"location":"how_to_guides/user/#delete-a-user","title":"Delete a user","text":"

You can delete an existing user from Argilla by calling the delete method on the User class.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser_to_delete = client.users('my_username')\n\ndeleted_user = user_to_delete.delete()\n
"},{"location":"how_to_guides/workspace/","title":"Workspace management","text":"

This guide provides an overview of workspaces, explaining how to set up and manage workspaces in Argilla.

A workspace is a space inside your Argilla instance where authorized users can collaborate on datasets. It is accessible through the Python SDK and the UI.

Question: Who can manage workspaces?

Only users with the owner role can manage (create, read and delete) workspaces.

A user with the admin role can only read the workspace to which it belongs.

"},{"location":"how_to_guides/workspace/#initial-workspaces","title":"Initial workspaces","text":"

Depending on your Argilla deployment, the initial workspace will vary.

  • If you deploy on the Hugging Face Hub, the initial workspace will be the one indicated in the .oauth.yaml file. By default, argilla.
  • If you deploy with Docker, you will need to create a workspace as shown in the next section.

Main Class

rg.Workspace(\n    name = \"name\",\n    client=client\n)\n

Check the Workspace - Python Reference to see the attributes, arguments, and methods of the Workspace class in detail.

"},{"location":"how_to_guides/workspace/#create-a-new-workspace","title":"Create a new workspace","text":"

To create a new workspace in Argilla, you can define it in the Workspace class and then call the create method. This method is inherited from the Resource base class and operates without modifications.

When you create a new workspace, it will be empty. To create and add a new dataset, check these guides.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace_to_create = rg.Workspace(name=\"my_workspace\")\n\ncreated_workspace = workspace_to_create.create()\n

Accessing attributes

Access the attributes of a workspace by calling them directly on the Workspace object. For example, workspace.id or workspace.name.

"},{"location":"how_to_guides/workspace/#list-workspaces","title":"List workspaces","text":"

You can list all the existing workspaces in Argilla by calling the workspaces attribute on the Argilla class and iterating over them. You can also use len(client.workspaces) to get the number of workspaces.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspaces = client.workspaces\n\nfor workspace in workspaces:\n    print(workspace)\n

Notebooks

When using a notebook, executing client.workspaces will display a table with the number of datasets in each workspace, name, id, and the last update as updated_at.

"},{"location":"how_to_guides/workspace/#retrieve-a-workspace","title":"Retrieve a workspace","text":"

You can retrieve a workspace by accessing the workspaces method on the Argilla class and passing the name or id of the workspace as an argument. If the workspace does not exist, a warning message will be raised and None will be returned.

By nameBy id
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_workspace = client.workspaces(\"my_workspace\")\n
import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_workspace = client.workspaces(id=\"<uuid-or-uuid-string>\")\n
"},{"location":"how_to_guides/workspace/#check-workspace-existence","title":"Check workspace existence","text":"

You can check if a workspace exists. The client.workspaces method will return None if the workspace is not found.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nif workspace is not None:\n    pass\n
"},{"location":"how_to_guides/workspace/#list-users-in-a-workspace","title":"List users in a workspace","text":"

You can list all the users in a workspace by accessing the users attribute on the Workspace class and iterating over them. You can also use len(workspace.users) to get the number of users by workspace.

For further information on how to manage users, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces('my_workspace')\n\nfor user in workspace.users:\n    print(user)\n
"},{"location":"how_to_guides/workspace/#add-a-user-to-a-workspace","title":"Add a user to a workspace","text":"

You can also add a user to a workspace by calling the add_user method on the Workspace class.

For further information on how to manage users, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nadded_user = workspace.add_user(\"my_username\")\n
"},{"location":"how_to_guides/workspace/#remove-a-user-from-workspace","title":"Remove a user from workspace","text":"

You can also remove a user from a workspace by calling the remove_user method on the Workspace class.

For further information on how to manage users, check this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nremoved_user = workspace.remove_user(\"my_username\")\n
"},{"location":"how_to_guides/workspace/#delete-a-workspace","title":"Delete a workspace","text":"

To delete a workspace, no dataset can be associated with it. If the workspace contains any dataset, deletion will fail. You can delete a workspace by calling the delete method on the Workspace class.

To clear a workspace and delete all their datasets, refer to this how-to guide.

import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace_to_delete = client.workspaces(\"my_workspace\")\n\ndeleted_workspace = workspace_to_delete.delete()\n
"},{"location":"reference/argilla/SUMMARY/","title":"SUMMARY","text":"
  • rg.Argilla
  • rg.Workspace
  • rg.User
  • rg.Dataset
    • rg.Dataset.records
  • rg.Settings
    • Fields
    • Questions
    • Metadata
    • Vectors
    • Distribution
  • rg.Record
    • rg.Response
    • rg.Suggestion
    • rg.Vector
    • rg.Metadata
  • rg.Query
  • rg.markdown
"},{"location":"reference/argilla/client/","title":"rg.Argilla","text":"

To interact with the Argilla server from Python you can use the Argilla class. The Argilla client is used to create, get, update, and delete all Argilla resources, such as workspaces, users, datasets, and records.

"},{"location":"reference/argilla/client/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/client/#connecting-to-an-argilla-server","title":"Connecting to an Argilla server","text":"

To connect to an Argilla server, instantiate the Argilla class and pass the api_url of the server and the api_key to authenticate.

import argilla as rg\n\nclient = rg.Argilla(\n    api_url=\"https://argilla.example.com\",\n    api_key=\"my_api_key\",\n)\n
"},{"location":"reference/argilla/client/#accessing-dataset-workspace-and-user-objects","title":"Accessing Dataset, Workspace, and User objects","text":"

The Argilla clients provides access to the Dataset, Workspace, and User objects of the Argilla server.

my_dataset = client.datasets(\"my_dataset\")\n\nmy_workspace = client.workspaces(\"my_workspace\")\n\nmy_user = client.users(\"my_user\")\n

These resources can then be interacted with to access their properties and methods. For example, to list all datasets in a workspace:

for dataset in my_workspace.datasets:\n    print(dataset.name)\n
"},{"location":"reference/argilla/client/#src.argilla.client.Argilla","title":"Argilla","text":"

Bases: APIClient

Argilla API client. This is the main entry point to interact with the API.

Attributes:

Name Type Description workspaces Workspaces

A collection of workspaces.

datasets Datasets

A collection of datasets.

users Users

A collection of users.

me User

The current user.

Source code in src/argilla/client.py
class Argilla(_api.APIClient):\n    \"\"\"Argilla API client. This is the main entry point to interact with the API.\n\n    Attributes:\n        workspaces: A collection of workspaces.\n        datasets: A collection of datasets.\n        users: A collection of users.\n        me: The current user.\n    \"\"\"\n\n    # Default instance of Argilla\n    _default_client: Optional[\"Argilla\"] = None\n\n    def __init__(\n        self,\n        api_url: Optional[str] = DEFAULT_HTTP_CONFIG.api_url,\n        api_key: Optional[str] = DEFAULT_HTTP_CONFIG.api_key,\n        timeout: int = DEFAULT_HTTP_CONFIG.timeout,\n        retries: int = DEFAULT_HTTP_CONFIG.retries,\n        **http_client_args,\n    ) -> None:\n        \"\"\"Inits the `Argilla` client.\n\n        Args:\n            api_url: the URL of the Argilla API. If not provided, then the value will try\n                to be set from `ARGILLA_API_URL` environment variable. Defaults to\n                `\"http://localhost:6900\"`.\n            api_key: the key to be used to authenticate in the Argilla API. If not provided,\n                then the value will try to be set from `ARGILLA_API_KEY` environment variable.\n                Defaults to `None`.\n            timeout: the maximum time in seconds to wait for a request to the Argilla API\n                to be completed before raising an exception. Defaults to `60`.\n            retries: the number of times to retry the HTTP connection to the Argilla API\n                before raising an exception. Defaults to `5`.\n        \"\"\"\n        super().__init__(api_url=api_url, api_key=api_key, timeout=timeout, retries=retries, **http_client_args)\n\n        self._set_default(self)\n\n    @property\n    def workspaces(self) -> \"Workspaces\":\n        \"\"\"A collection of workspaces on the server.\"\"\"\n        return Workspaces(client=self)\n\n    @property\n    def datasets(self) -> \"Datasets\":\n        \"\"\"A collection of datasets on the server.\"\"\"\n        return Datasets(client=self)\n\n    @property\n    def users(self) -> \"Users\":\n        \"\"\"A collection of users on the server.\"\"\"\n        return Users(client=self)\n\n    @cached_property\n    def me(self) -> \"User\":\n        from argilla.users import User\n\n        return User(client=self, _model=self.api.users.get_me())\n\n    ############################\n    # Private methods\n    ############################\n\n    @classmethod\n    def _set_default(cls, client: \"Argilla\") -> None:\n        \"\"\"Set the default instance of Argilla.\"\"\"\n        cls._default_client = client\n\n    @classmethod\n    def _get_default(cls) -> \"Argilla\":\n        \"\"\"Get the default instance of Argilla. If it doesn't exist, create a new one.\"\"\"\n        if cls._default_client is None:\n            cls._default_client = Argilla()\n        return cls._default_client\n
"},{"location":"reference/argilla/client/#src.argilla.client.Argilla.workspaces","title":"workspaces: Workspaces property","text":"

A collection of workspaces on the server.

"},{"location":"reference/argilla/client/#src.argilla.client.Argilla.datasets","title":"datasets: Datasets property","text":"

A collection of datasets on the server.

"},{"location":"reference/argilla/client/#src.argilla.client.Argilla.users","title":"users: Users property","text":"

A collection of users on the server.

"},{"location":"reference/argilla/client/#src.argilla.client.Argilla.__init__","title":"__init__(api_url=DEFAULT_HTTP_CONFIG.api_url, api_key=DEFAULT_HTTP_CONFIG.api_key, timeout=DEFAULT_HTTP_CONFIG.timeout, retries=DEFAULT_HTTP_CONFIG.retries, **http_client_args)","text":"

Inits the Argilla client.

Parameters:

Name Type Description Default api_url Optional[str]

the URL of the Argilla API. If not provided, then the value will try to be set from ARGILLA_API_URL environment variable. Defaults to \"http://localhost:6900\".

api_url api_key Optional[str]

the key to be used to authenticate in the Argilla API. If not provided, then the value will try to be set from ARGILLA_API_KEY environment variable. Defaults to None.

api_key timeout int

the maximum time in seconds to wait for a request to the Argilla API to be completed before raising an exception. Defaults to 60.

timeout retries int

the number of times to retry the HTTP connection to the Argilla API before raising an exception. Defaults to 5.

retries Source code in src/argilla/client.py
def __init__(\n    self,\n    api_url: Optional[str] = DEFAULT_HTTP_CONFIG.api_url,\n    api_key: Optional[str] = DEFAULT_HTTP_CONFIG.api_key,\n    timeout: int = DEFAULT_HTTP_CONFIG.timeout,\n    retries: int = DEFAULT_HTTP_CONFIG.retries,\n    **http_client_args,\n) -> None:\n    \"\"\"Inits the `Argilla` client.\n\n    Args:\n        api_url: the URL of the Argilla API. If not provided, then the value will try\n            to be set from `ARGILLA_API_URL` environment variable. Defaults to\n            `\"http://localhost:6900\"`.\n        api_key: the key to be used to authenticate in the Argilla API. If not provided,\n            then the value will try to be set from `ARGILLA_API_KEY` environment variable.\n            Defaults to `None`.\n        timeout: the maximum time in seconds to wait for a request to the Argilla API\n            to be completed before raising an exception. Defaults to `60`.\n        retries: the number of times to retry the HTTP connection to the Argilla API\n            before raising an exception. Defaults to `5`.\n    \"\"\"\n    super().__init__(api_url=api_url, api_key=api_key, timeout=timeout, retries=retries, **http_client_args)\n\n    self._set_default(self)\n
"},{"location":"reference/argilla/markdown/","title":"rg.markdown","text":"

To support the usage of Markdown within Argilla, we've created some helper functions to easy the usage of DataURL conversions and chat message visualizations.

"},{"location":"reference/argilla/markdown/#src.argilla.markdown.media","title":"media","text":""},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.video_to_html","title":"video_to_html(file_source, file_type=None, width=None, height=None, autoplay=False, loop=False)","text":"

Convert a video file to an HTML tag with embedded base64 data.

Parameters:

Name Type Description Default file_source Union[str, bytes]

The path to the media file or a non-b64 encoded byte string.

required file_type Optional[str]

The type of the video file. If not provided, it will be inferred from the file extension.

None width Optional[str]

Display width in HTML. Defaults to None.

None height Optional[str]

Display height in HTML. Defaults to None.

None autoplay bool

True to autoplay media. Defaults to False.

False loop bool

True to loop media. Defaults to False.

False

Returns:

Type Description str

The HTML tag with embedded base64 data.

Examples:

from argilla.markdown import video_to_html\nhtml = video_to_html(\"my_video.mp4\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n
Source code in src/argilla/markdown/media.py
def video_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n    autoplay: bool = False,\n    loop: bool = False,\n) -> str:\n    \"\"\"\n    Convert a video file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the video file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n        autoplay: True to autoplay media. Defaults to False.\n        loop: True to loop media. Defaults to False.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import video_to_html\n        html = video_to_html(\"my_video.mp4\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n        ```\n    \"\"\"\n    return _media_to_html(\"video\", file_source, file_type, width, height, autoplay, loop)\n
"},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.audio_to_html","title":"audio_to_html(file_source, file_type=None, width=None, height=None, autoplay=False, loop=False)","text":"

Convert an audio file to an HTML tag with embedded base64 data.

Parameters:

Name Type Description Default file_source Union[str, bytes]

The path to the media file or a non-b64 encoded byte string.

required file_type Optional[str]

The type of the audio file. If not provided, it will be inferred from the file extension.

None width Optional[str]

Display width in HTML. Defaults to None.

None height Optional[str]

Display height in HTML. Defaults to None.

None autoplay bool

True to autoplay media. Defaults to False.

False loop bool

True to loop media. Defaults to False.

False

Returns:

Type Description str

The HTML tag with embedded base64 data.

Examples:

from argilla.markdown import audio_to_html\nhtml = audio_to_html(\"my_audio.mp3\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n
Source code in src/argilla/markdown/media.py
def audio_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n    autoplay: bool = False,\n    loop: bool = False,\n) -> str:\n    \"\"\"\n    Convert an audio file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the audio file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n        autoplay: True to autoplay media. Defaults to False.\n        loop: True to loop media. Defaults to False.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import audio_to_html\n        html = audio_to_html(\"my_audio.mp3\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n        ```\n    \"\"\"\n    return _media_to_html(\"audio\", file_source, file_type, width, height, autoplay, loop)\n
"},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.image_to_html","title":"image_to_html(file_source, file_type=None, width=None, height=None)","text":"

Convert an image file to an HTML tag with embedded base64 data.

Parameters:

Name Type Description Default file_source Union[str, bytes]

The path to the media file or a non-b64 encoded byte string.

required file_type Optional[str]

The type of the image file. If not provided, it will be inferred from the file extension.

None width Optional[str]

Display width in HTML. Defaults to None.

None height Optional[str]

Display height in HTML. Defaults to None.

None

Returns:

Type Description str

The HTML tag with embedded base64 data.

Examples:

from argilla.markdown import image_to_html\nhtml = image_to_html(\"my_image.png\", width=\"300px\", height=\"300px\")\n
Source code in src/argilla/markdown/media.py
def image_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n) -> str:\n    \"\"\"\n    Convert an image file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the image file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import image_to_html\n        html = image_to_html(\"my_image.png\", width=\"300px\", height=\"300px\")\n        ```\n    \"\"\"\n    return _media_to_html(\"image\", file_source, file_type, width, height)\n
"},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.pdf_to_html","title":"pdf_to_html(file_source, width='1000px', height='1000px')","text":"

Convert a pdf file to an HTML tag with embedded data.

Parameters:

Name Type Description Default file_source Union[str, bytes]

The path to the PDF file, a bytes object with PDF data, or a URL.

required width Optional[str]

Display width in HTML. Defaults to \"1000px\".

'1000px' height Optional[str]

Display height in HTML. Defaults to \"1000px\".

'1000px'

Returns:

Type Description str

HTML string embedding the PDF.

Raises:

Type Description ValueError

If the width and height are not pixel or percentage.

Examples:

from argilla.markdown import pdf_to_html\nhtml = pdf_to_html(\"my_pdf.pdf\", width=\"300px\", height=\"300px\")\n
Source code in src/argilla/markdown/media.py
def pdf_to_html(\n    file_source: Union[str, bytes], width: Optional[str] = \"1000px\", height: Optional[str] = \"1000px\"\n) -> str:\n    \"\"\"\n    Convert a pdf file to an HTML tag with embedded data.\n\n    Args:\n        file_source: The path to the PDF file, a bytes object with PDF data, or a URL.\n        width: Display width in HTML. Defaults to \"1000px\".\n        height: Display height in HTML. Defaults to \"1000px\".\n\n    Returns:\n        HTML string embedding the PDF.\n\n    Raises:\n        ValueError: If the width and height are not pixel or percentage.\n\n    Examples:\n        ```python\n        from argilla.markdown import pdf_to_html\n        html = pdf_to_html(\"my_pdf.pdf\", width=\"300px\", height=\"300px\")\n        ```\n    \"\"\"\n    if not _is_valid_dimension(width) or not _is_valid_dimension(height):\n        raise ValueError(\"Width and height must be valid pixel (e.g., '300px') or percentage (e.g., '50%') values.\")\n\n    if isinstance(file_source, str) and urlparse(file_source).scheme in [\"http\", \"https\"]:\n        return f'<embed src=\"{file_source}\" type=\"application/pdf\" width=\"{width}\" height=\"{height}\"></embed>'\n\n    file_data, _ = _get_file_data(file_source, \"pdf\")\n    pdf_base64 = base64.b64encode(file_data).decode(\"utf-8\")\n    data_url = f\"data:application/pdf;base64,{pdf_base64}\"\n    return f'<object id=\"pdf\" data=\"{data_url}\" type=\"application/pdf\" width=\"{width}\" height=\"{height}\"></object>'\n
"},{"location":"reference/argilla/markdown/#src.argilla.markdown.chat","title":"chat","text":""},{"location":"reference/argilla/markdown/#src.argilla.markdown.chat.chat_to_html","title":"chat_to_html(messages)","text":"

Converts a list of chat messages in the OpenAI format to HTML.

Parameters:

Name Type Description Default messages List[Dict[str, str]]

A list of dictionaries where each dictionary represents a chat message. Each dictionary should have the keys: - \"role\": A string indicating the role of the sender (e.g., \"user\", \"model\", \"assistant\", \"system\"). - \"content\": The content of the message.

required

Returns:

Name Type Description str str

An HTML string that represents the chat conversation.

Raises:

Type Description ValueError

If the an invalid role is passed.

Examples:

from argilla.markdown import chat_to_html\nhtml = chat_to_html([\n    {\"role\": \"user\", \"content\": \"hello\"},\n    {\"role\": \"assistant\", \"content\": \"goodbye\"}\n])\n
Source code in src/argilla/markdown/chat.py
def chat_to_html(messages: List[Dict[str, str]]) -> str:\n    \"\"\"\n    Converts a list of chat messages in the OpenAI format to HTML.\n\n    Args:\n        messages (List[Dict[str, str]]): A list of dictionaries where each dictionary represents a chat message.\n            Each dictionary should have the keys:\n                - \"role\": A string indicating the role of the sender (e.g., \"user\", \"model\", \"assistant\", \"system\").\n                - \"content\": The content of the message.\n\n    Returns:\n        str: An HTML string that represents the chat conversation.\n\n    Raises:\n        ValueError: If the an invalid role is passed.\n\n    Examples:\n        ```python\n        from argilla.markdown import chat_to_html\n        html = chat_to_html([\n            {\"role\": \"user\", \"content\": \"hello\"},\n            {\"role\": \"assistant\", \"content\": \"goodbye\"}\n        ])\n        ```\n    \"\"\"\n    chat_html = \"\"\n    for message in messages:\n        role = message[\"role\"]\n        content = message[\"content\"]\n        content_html = markdown.markdown(content)\n\n        if role == \"user\":\n            html = '<div class=\"user-message\">' + '<div class=\"message-content\">'\n        elif role in [\"model\", \"assistant\", \"system\"]:\n            html = '<div class=\"system-message\">' + '<div class=\"message-content\">'\n        else:\n            raise ValueError(f\"Invalid role: {role}\")\n\n        html += f\"{content_html}\"\n        html += \"</div></div>\"\n        chat_html += html\n\n    return f\"<body>{CHAT_CSS_STYLE}{chat_html}</body>\"\n
"},{"location":"reference/argilla/search/","title":"rg.Query","text":"

To collect records based on searching criteria, you can use the Query and Filter classes. The Query class is used to define the search criteria, while the Filter class is used to filter the search results. Filter is passed to a Query object so you can combine multiple filters to create complex search queries. A Query object can also be passed to Dataset.records to fetch records based on the search criteria.

"},{"location":"reference/argilla/search/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/search/#searching-for-records-with-terms","title":"Searching for records with terms","text":"

To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field.

for record in dataset.records(query=\"paris\"):\n    print(record)\n
"},{"location":"reference/argilla/search/#filtering-records-by-conditions","title":"Filtering records by conditions","text":"

Argilla allows you to filter records based on conditions. You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include \"==\", \">=\", \"<=\", or \"in\". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses.

# create a range from 10 to 20\nrange_filter = rg.Filter(\n    [\n        (\"metadata.count\", \">=\", 10),\n        (\"metadata.count\", \"<=\", 20)\n    ]\n)\n\n# query records with metadata count greater than 10 and less than 20\nquery = rg.Query(filters=range_filter, query=\"paris\")\n\n# iterate over the results\nfor record in dataset.records(query=query):\n    print(record)\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Query","title":"Query","text":"

This class is used to map user queries to the internal query models

Source code in src/argilla/records/_search.py
class Query:\n    \"\"\"This class is used to map user queries to the internal query models\"\"\"\n\n    def __init__(\n        self,\n        *,\n        query: Union[str, None] = None,\n        similar: Union[Similar, None] = None,\n        filter: Union[Filter, Conditions, None] = None,\n    ):\n        \"\"\"Create a query object for use in Argilla search requests.add()\n\n        Parameters:\n            query (Union[str, None], optional): The query string that will be used to search.\n            similar (Union[Similar, None], optional): The similar object that will be used to search for similar records\n            filter (Union[Filter, None], optional): The filter object that will be used to filter the search results.\n        \"\"\"\n\n        if isinstance(filter, tuple):\n            filter = [filter]\n\n        if isinstance(filter, list):\n            filter = Filter(conditions=filter)\n\n        self.query = query\n        self.filter = filter\n        self.similar = similar\n\n    def has_search(self) -> bool:\n        return bool(self.query or self.similar or self.filter)\n\n    def api_model(self) -> SearchQueryModel:\n        model = SearchQueryModel()\n\n        if self.query or self.similar:\n            query = QueryModel()\n\n            if self.query is not None:\n                query.text = TextQueryModel(q=self.query)\n\n            if self.similar is not None:\n                query.vector = self.similar.api_model()\n\n            model.query = query\n\n        if self.filter is not None:\n            model.filters = self.filter.api_model()\n\n        return model\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Query.__init__","title":"__init__(*, query=None, similar=None, filter=None)","text":"

Create a query object for use in Argilla search requests.add()

Parameters:

Name Type Description Default query Union[str, None]

The query string that will be used to search.

None similar Union[Similar, None]

The similar object that will be used to search for similar records

None filter Union[Filter, None]

The filter object that will be used to filter the search results.

None Source code in src/argilla/records/_search.py
def __init__(\n    self,\n    *,\n    query: Union[str, None] = None,\n    similar: Union[Similar, None] = None,\n    filter: Union[Filter, Conditions, None] = None,\n):\n    \"\"\"Create a query object for use in Argilla search requests.add()\n\n    Parameters:\n        query (Union[str, None], optional): The query string that will be used to search.\n        similar (Union[Similar, None], optional): The similar object that will be used to search for similar records\n        filter (Union[Filter, None], optional): The filter object that will be used to filter the search results.\n    \"\"\"\n\n    if isinstance(filter, tuple):\n        filter = [filter]\n\n    if isinstance(filter, list):\n        filter = Filter(conditions=filter)\n\n    self.query = query\n    self.filter = filter\n    self.similar = similar\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Filter","title":"Filter","text":"

This class is used to map user filters to the internal filter models

Source code in src/argilla/records/_search.py
class Filter:\n    \"\"\"This class is used to map user filters to the internal filter models\"\"\"\n\n    def __init__(self, conditions: Union[Conditions, None] = None):\n        \"\"\" Create a filter object for use in Argilla search requests.\n\n        Parameters:\n            conditions (Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None], optional): \\\n                The conditions that will be used to filter the search results. \\\n                The conditions should be a list of tuples where each tuple contains \\\n                the field, operator, and value. For example `(\"label\", \"in\", [\"positive\",\"happy\"])`.\\\n        \"\"\"\n\n        if isinstance(conditions, tuple):\n            conditions = [conditions]\n        self.conditions = [Condition(condition) for condition in conditions]\n\n    def api_model(self) -> AndFilterModel:\n        return AndFilterModel.model_validate({\"and\": [condition.api_model() for condition in self.conditions]})\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Filter.__init__","title":"__init__(conditions=None)","text":"

Create a filter object for use in Argilla search requests.

Parameters:

Name Type Description Default conditions Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None]

The conditions that will be used to filter the search results. The conditions should be a list of tuples where each tuple contains the field, operator, and value. For example (\"label\", \"in\", [\"positive\",\"happy\"]).

None Source code in src/argilla/records/_search.py
def __init__(self, conditions: Union[Conditions, None] = None):\n    \"\"\" Create a filter object for use in Argilla search requests.\n\n    Parameters:\n        conditions (Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None], optional): \\\n            The conditions that will be used to filter the search results. \\\n            The conditions should be a list of tuples where each tuple contains \\\n            the field, operator, and value. For example `(\"label\", \"in\", [\"positive\",\"happy\"])`.\\\n    \"\"\"\n\n    if isinstance(conditions, tuple):\n        conditions = [conditions]\n    self.conditions = [Condition(condition) for condition in conditions]\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Similar","title":"Similar","text":"

This class is used to map user similar queries to the internal query models

Source code in src/argilla/records/_search.py
class Similar:\n    \"\"\"This class is used to map user similar queries to the internal query models\"\"\"\n\n    def __init__(self, name: str, value: Union[Iterable[float], \"Record\"], most_similar: bool = True):\n        \"\"\"\n        Create a similar object for use in Argilla search requests.\n\n        Parameters:\n            name: The name of the vector field\n            value: The vector value or the record to search for similar records\n            most_similar: Whether to search for the most similar records or the least similar records\n        \"\"\"\n\n        self.name = name\n        self.value = value\n        self.most_similar = most_similar if most_similar is not None else True\n\n    def api_model(self) -> VectorQueryModel:\n        from argilla.records import Record\n\n        order = \"most_similar\" if self.most_similar else \"least_similar\"\n\n        if isinstance(self.value, Record):\n            return VectorQueryModel(name=self.name, record_id=self.value._server_id, order=order)\n\n        return VectorQueryModel(name=self.name, value=self.value, order=order)\n
"},{"location":"reference/argilla/search/#src.argilla.records._search.Similar.__init__","title":"__init__(name, value, most_similar=True)","text":"

Create a similar object for use in Argilla search requests.

Parameters:

Name Type Description Default name str

The name of the vector field

required value Union[Iterable[float], Record]

The vector value or the record to search for similar records

required most_similar bool

Whether to search for the most similar records or the least similar records

True Source code in src/argilla/records/_search.py
def __init__(self, name: str, value: Union[Iterable[float], \"Record\"], most_similar: bool = True):\n    \"\"\"\n    Create a similar object for use in Argilla search requests.\n\n    Parameters:\n        name: The name of the vector field\n        value: The vector value or the record to search for similar records\n        most_similar: Whether to search for the most similar records or the least similar records\n    \"\"\"\n\n    self.name = name\n    self.value = value\n    self.most_similar = most_similar if most_similar is not None else True\n
"},{"location":"reference/argilla/users/","title":"rg.User","text":"

A user in Argilla is a profile that uses the SDK or UI. Their profile can be used to track their feedback activity and to manage their access to the Argilla server.

"},{"location":"reference/argilla/users/#usage-examples","title":"Usage Examples","text":"

To create a new user, instantiate the User object with the client and the username:

user = rg.User(username=\"my_username\", password=\"my_password\")\nuser.create()\n

Existing users can be retrieved by their username:

user = client.users(\"my_username\")\n

The current user of the rg.Argilla client can be accessed using the me attribute:

client.me\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User","title":"User","text":"

Bases: Resource

Class for interacting with Argilla users in the Argilla server. User profiles are used to manage access to the Argilla server and track responses to records.

Attributes:

Name Type Description username str

The username of the user.

first_name str

The first name of the user.

last_name str

The last name of the user.

role str

The role of the user, either 'annotator' or 'admin'.

password str

The password of the user.

id UUID

The ID of the user.

Source code in src/argilla/users/_resource.py
class User(Resource):\n    \"\"\"Class for interacting with Argilla users in the Argilla server. User profiles \\\n        are used to manage access to the Argilla server and track responses to records.\n\n    Attributes:\n        username (str): The username of the user.\n        first_name (str): The first name of the user.\n        last_name (str): The last name of the user.\n        role (str): The role of the user, either 'annotator' or 'admin'.\n        password (str): The password of the user.\n        id (UUID): The ID of the user.\n    \"\"\"\n\n    _model: UserModel\n    _api: UsersAPI\n\n    def __init__(\n        self,\n        username: Optional[str] = None,\n        first_name: Optional[str] = None,\n        last_name: Optional[str] = None,\n        role: Optional[str] = None,\n        password: Optional[str] = None,\n        client: Optional[\"Argilla\"] = None,\n        id: Optional[UUID] = None,\n        _model: Optional[UserModel] = None,\n    ) -> None:\n        \"\"\"Initializes a User object with a client and a username\n\n        Parameters:\n            username (str): The username of the user\n            first_name (str): The first name of the user\n            last_name (str): The last name of the user\n            role (str): The role of the user, either 'annotator', admin, or 'owner'\n            password (str): The password of the user\n            client (Argilla): The client used to interact with Argilla\n\n        Returns:\n            User: The initialized user object\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.users)\n\n        if _model is None:\n            _model = UserModel(\n                username=username,\n                password=password,\n                first_name=first_name or username,\n                last_name=last_name,\n                role=role or Role.annotator,\n                id=id,\n            )\n            self._log_message(f\"Initialized user with username {username}\")\n        self._model = _model\n\n    def create(self) -> \"User\":\n        \"\"\"Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.\n\n        Returns:\n            User: The user that was created in Argilla.\n        \"\"\"\n        model_create = self.api_model()\n        model = self._api.create(model_create)\n        # The password is not returned in the response\n        model.password = model_create.password\n        self._model = model\n        return self\n\n    def delete(self) -> None:\n        \"\"\"Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.\"\"\"\n        super().delete()\n        # exists relies on the id, so we need to set it to None\n        self._model = UserModel(username=self.username)\n\n    def add_to_workspace(self, workspace: \"Workspace\") -> \"User\":\n        \"\"\"Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets\n        in the workspace.\n\n        Args:\n            workspace (Workspace): The workspace to add the user to.\n\n        Returns:\n            User: The user that was added to the workspace.\n        \"\"\"\n        self._model = self._api.add_to_workspace(workspace.id, self.id)\n        return self\n\n    def remove_from_workspace(self, workspace: \"Workspace\") -> \"User\":\n        \"\"\"Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to\n        the datasets in the workspace.\n\n        Args:\n            workspace (Workspace): The workspace to remove the user from.\n\n        Returns:\n            User: The user that was removed from the workspace.\n\n        \"\"\"\n        self._model = self._api.delete_from_workspace(workspace.id, self.id)\n        return self\n\n    ############################\n    # Properties\n    ############################\n    @property\n    def username(self) -> str:\n        return self._model.username\n\n    @username.setter\n    def username(self, value: str) -> None:\n        self._model.username = value\n\n    @property\n    def password(self) -> str:\n        return self._model.password\n\n    @password.setter\n    def password(self, value: str) -> None:\n        self._model.password = value\n\n    @property\n    def first_name(self) -> str:\n        return self._model.first_name\n\n    @first_name.setter\n    def first_name(self, value: str) -> None:\n        self._model.first_name = value\n\n    @property\n    def last_name(self) -> str:\n        return self._model.last_name\n\n    @last_name.setter\n    def last_name(self, value: str) -> None:\n        self._model.last_name = value\n\n    @property\n    def role(self) -> Role:\n        return self._model.role\n\n    @role.setter\n    def role(self, value: Role) -> None:\n        self._model.role = value\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.__init__","title":"__init__(username=None, first_name=None, last_name=None, role=None, password=None, client=None, id=None, _model=None)","text":"

Initializes a User object with a client and a username

Parameters:

Name Type Description Default username str

The username of the user

None first_name str

The first name of the user

None last_name str

The last name of the user

None role str

The role of the user, either 'annotator', admin, or 'owner'

None password str

The password of the user

None client Argilla

The client used to interact with Argilla

None

Returns:

Name Type Description User None

The initialized user object

Source code in src/argilla/users/_resource.py
def __init__(\n    self,\n    username: Optional[str] = None,\n    first_name: Optional[str] = None,\n    last_name: Optional[str] = None,\n    role: Optional[str] = None,\n    password: Optional[str] = None,\n    client: Optional[\"Argilla\"] = None,\n    id: Optional[UUID] = None,\n    _model: Optional[UserModel] = None,\n) -> None:\n    \"\"\"Initializes a User object with a client and a username\n\n    Parameters:\n        username (str): The username of the user\n        first_name (str): The first name of the user\n        last_name (str): The last name of the user\n        role (str): The role of the user, either 'annotator', admin, or 'owner'\n        password (str): The password of the user\n        client (Argilla): The client used to interact with Argilla\n\n    Returns:\n        User: The initialized user object\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.users)\n\n    if _model is None:\n        _model = UserModel(\n            username=username,\n            password=password,\n            first_name=first_name or username,\n            last_name=last_name,\n            role=role or Role.annotator,\n            id=id,\n        )\n        self._log_message(f\"Initialized user with username {username}\")\n    self._model = _model\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.create","title":"create()","text":"

Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.

Returns:

Name Type Description User User

The user that was created in Argilla.

Source code in src/argilla/users/_resource.py
def create(self) -> \"User\":\n    \"\"\"Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.\n\n    Returns:\n        User: The user that was created in Argilla.\n    \"\"\"\n    model_create = self.api_model()\n    model = self._api.create(model_create)\n    # The password is not returned in the response\n    model.password = model_create.password\n    self._model = model\n    return self\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.delete","title":"delete()","text":"

Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.

Source code in src/argilla/users/_resource.py
def delete(self) -> None:\n    \"\"\"Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.\"\"\"\n    super().delete()\n    # exists relies on the id, so we need to set it to None\n    self._model = UserModel(username=self.username)\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.add_to_workspace","title":"add_to_workspace(workspace)","text":"

Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets in the workspace.

Parameters:

Name Type Description Default workspace Workspace

The workspace to add the user to.

required

Returns:

Name Type Description User User

The user that was added to the workspace.

Source code in src/argilla/users/_resource.py
def add_to_workspace(self, workspace: \"Workspace\") -> \"User\":\n    \"\"\"Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets\n    in the workspace.\n\n    Args:\n        workspace (Workspace): The workspace to add the user to.\n\n    Returns:\n        User: The user that was added to the workspace.\n    \"\"\"\n    self._model = self._api.add_to_workspace(workspace.id, self.id)\n    return self\n
"},{"location":"reference/argilla/users/#src.argilla.users._resource.User.remove_from_workspace","title":"remove_from_workspace(workspace)","text":"

Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to the datasets in the workspace.

Parameters:

Name Type Description Default workspace Workspace

The workspace to remove the user from.

required

Returns:

Name Type Description User User

The user that was removed from the workspace.

Source code in src/argilla/users/_resource.py
def remove_from_workspace(self, workspace: \"Workspace\") -> \"User\":\n    \"\"\"Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to\n    the datasets in the workspace.\n\n    Args:\n        workspace (Workspace): The workspace to remove the user from.\n\n    Returns:\n        User: The user that was removed from the workspace.\n\n    \"\"\"\n    self._model = self._api.delete_from_workspace(workspace.id, self.id)\n    return self\n
"},{"location":"reference/argilla/workspaces/","title":"rg.Workspace","text":"

In Argilla, workspaces are used to organize datasets in to groups. For example, you might have a workspace for each project or team.

"},{"location":"reference/argilla/workspaces/#usage-examples","title":"Usage Examples","text":"

To create a new workspace, instantiate the Workspace object with the client and the name:

workspace = rg.Workspace(name=\"my_workspace\")\nworkspace.create()\n

To retrieve an existing workspace, use the client.workspaces attribute:

workspace = client.workspaces(\"my_workspace\")\n
"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace","title":"Workspace","text":"

Bases: Resource

Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.

Attributes:

Name Type Description name str

The name of the workspace.

id UUID

The ID of the workspace. This is a unique identifier for the workspace in the server.

datasets List[Dataset]

A list of all datasets in the workspace.

users WorkspaceUsers

A list of all users in the workspace.

Source code in src/argilla/workspaces/_resource.py
class Workspace(Resource):\n    \"\"\"Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.\n\n    Attributes:\n        name (str): The name of the workspace.\n        id (UUID): The ID of the workspace. This is a unique identifier for the workspace in the server.\n        datasets (List[Dataset]): A list of all datasets in the workspace.\n        users (WorkspaceUsers): A list of all users in the workspace.\n    \"\"\"\n\n    name: Optional[str]\n\n    _api: \"WorkspacesAPI\"\n\n    def __init__(\n        self,\n        name: Optional[str] = None,\n        id: Optional[UUID] = None,\n        client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Initializes a Workspace object with a client and a name or id\n\n        Parameters:\n            client (Argilla): The client used to interact with Argilla\n            name (str): The name of the workspace\n            id (UUID): The id of the workspace\n\n        Returns:\n            Workspace: The initialized workspace object\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.workspaces)\n\n        self._model = WorkspaceModel(name=name, id=id)\n\n    def add_user(self, user: Union[\"User\", str]) -> \"User\":\n        \"\"\"Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets\n        in the workspace.\n\n        Args:\n            user (Union[User, str]): The user to add to the workspace. Can be a User object or a username.\n\n        Returns:\n            User: The user that was added to the workspace\n        \"\"\"\n        return self.users.add(user)\n\n    def remove_user(self, user: Union[\"User\", str]) -> \"User\":\n        \"\"\"Removes a user from the workspace. After removing a user from the workspace, it will no longer have access\n\n        Args:\n            user (Union[User, str]): The user to remove from the workspace. Can be a User object or a username.\n\n        Returns:\n            User: The user that was removed from the workspace.\n        \"\"\"\n        return self.users.delete(user)\n\n    # TODO: Make this method private\n    def list_datasets(self) -> List[\"Dataset\"]:\n        from argilla.datasets import Dataset\n\n        datasets = self._client.api.datasets.list(self.id)\n        self._log_message(f\"Got {len(datasets)} datasets for workspace {self.id}\")\n        return [Dataset.from_model(model=dataset, client=self._client) for dataset in datasets]\n\n    @classmethod\n    def from_model(cls, model: WorkspaceModel, client: Argilla) -> \"Workspace\":\n        instance = cls(name=model.name, id=model.id, client=client)\n        instance._model = model\n\n        return instance\n\n    ############################\n    # Properties\n    ############################\n\n    @property\n    def name(self) -> Optional[str]:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def datasets(self) -> List[\"Dataset\"]:\n        \"\"\"List all datasets in the workspace\n\n        Returns:\n            List[Dataset]: A list of all datasets in the workspace\n        \"\"\"\n        return self.list_datasets()\n\n    @property\n    def users(self) -> \"WorkspaceUsers\":\n        \"\"\"List all users in the workspace\n\n        Returns:\n            WorkspaceUsers: A list of all users in the workspace\n        \"\"\"\n        return WorkspaceUsers(workspace=self)\n
"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.datasets","title":"datasets: List[Dataset] property","text":"

List all datasets in the workspace

Returns:

Type Description List[Dataset]

List[Dataset]: A list of all datasets in the workspace

"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.users","title":"users: WorkspaceUsers property","text":"

List all users in the workspace

Returns:

Name Type Description WorkspaceUsers WorkspaceUsers

A list of all users in the workspace

"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.__init__","title":"__init__(name=None, id=None, client=None)","text":"

Initializes a Workspace object with a client and a name or id

Parameters:

Name Type Description Default client Argilla

The client used to interact with Argilla

None name str

The name of the workspace

None id UUID

The id of the workspace

None

Returns:

Name Type Description Workspace None

The initialized workspace object

Source code in src/argilla/workspaces/_resource.py
def __init__(\n    self,\n    name: Optional[str] = None,\n    id: Optional[UUID] = None,\n    client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Initializes a Workspace object with a client and a name or id\n\n    Parameters:\n        client (Argilla): The client used to interact with Argilla\n        name (str): The name of the workspace\n        id (UUID): The id of the workspace\n\n    Returns:\n        Workspace: The initialized workspace object\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.workspaces)\n\n    self._model = WorkspaceModel(name=name, id=id)\n
"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.add_user","title":"add_user(user)","text":"

Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets in the workspace.

Parameters:

Name Type Description Default user Union[User, str]

The user to add to the workspace. Can be a User object or a username.

required

Returns:

Name Type Description User User

The user that was added to the workspace

Source code in src/argilla/workspaces/_resource.py
def add_user(self, user: Union[\"User\", str]) -> \"User\":\n    \"\"\"Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets\n    in the workspace.\n\n    Args:\n        user (Union[User, str]): The user to add to the workspace. Can be a User object or a username.\n\n    Returns:\n        User: The user that was added to the workspace\n    \"\"\"\n    return self.users.add(user)\n
"},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.remove_user","title":"remove_user(user)","text":"

Removes a user from the workspace. After removing a user from the workspace, it will no longer have access

Parameters:

Name Type Description Default user Union[User, str]

The user to remove from the workspace. Can be a User object or a username.

required

Returns:

Name Type Description User User

The user that was removed from the workspace.

Source code in src/argilla/workspaces/_resource.py
def remove_user(self, user: Union[\"User\", str]) -> \"User\":\n    \"\"\"Removes a user from the workspace. After removing a user from the workspace, it will no longer have access\n\n    Args:\n        user (Union[User, str]): The user to remove from the workspace. Can be a User object or a username.\n\n    Returns:\n        User: The user that was removed from the workspace.\n    \"\"\"\n    return self.users.delete(user)\n
"},{"location":"reference/argilla/datasets/dataset_records/","title":"rg.Dataset.records","text":""},{"location":"reference/argilla/datasets/dataset_records/#usage-examples","title":"Usage Examples","text":"

In most cases, you will not need to create a DatasetRecords object directly. Instead, you can access it via the Dataset object:

dataset.records\n

For user familiar with legacy approaches

  1. Dataset.records object is used to interact with the records in a dataset. It interactively fetches records from the server in batches without using a local copy of the records.
  2. The log method of Dataset.records is used to both add and update records in a dataset. If the record includes a known id field, the record will be updated. If the record does not include a known id field, the record will be added.
"},{"location":"reference/argilla/datasets/dataset_records/#adding-records-to-a-dataset","title":"Adding records to a dataset","text":"

To add records to a dataset, use the log method. Records can be added as dictionaries or as Record objects. Single records can also be added as a dictionary or Record.

As a Record objectFrom a data structureFrom a data structure with a mappingFrom a Hugging Face dataset

You can also add records to a dataset by initializing a Record object directly.

records = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n    ),\n] # (1)\n\ndataset.records.log(records)\n
  1. This is an illustration of a definition. In a real world scenario, you would iterate over a data structure and create Record objects for each iteration.
data = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n    },\n] # (1)\n\ndataset.records.log(data)\n
  1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
data = [\n    {\n        \"query\": \"Do you need oxygen to breathe?\",\n        \"response\": \"Yes\",\n    },\n    {\n        \"query\": \"What is the boiling point of water?\",\n        \"response\": \"100 degrees Celsius\",\n    },\n] # (1)\ndataset.records.log(\n    records=data,\n    mapping={\"query\": \"question\", \"response\": \"answer\"} # (2)\n)\n
  1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
  2. The data structure has keys query and response and the Argilla dataset has question and answer. You can use the mapping parameter to map the keys in the data structure to the fields in the Argilla dataset.

You can also add records to a dataset using a Hugging Face dataset. This is useful when you want to use a dataset from the Hugging Face Hub and add it to your Argilla dataset.

You can add the dataset where the column names correspond to the names of fields, questions, metadata or vectors in the Argilla dataset.

If the dataset's schema does not correspond to your Argilla dataset names, you can use a mapping to indicate which columns in the dataset correspond to the Argilla dataset fields.

from datasets import load_dataset\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (1)\n\ndataset.records.log(records=hf_dataset)\n
  1. In this example, the Hugging Face dataset matches the Argilla dataset schema. If that is not the case, you could use the .map of the datasets library to prepare the data before adding it to the Argilla dataset.

Here we use the mapping parameter to specify the relationship between the Hugging Face dataset and the Argilla dataset.

dataset.records.log(records=hf_dataset, mapping={\"txt\": \"text\", \"y\": \"label\"}) # (1)\n
  1. In this case, the txt key in the Hugging Face dataset corresponds to the text field in the Argilla dataset, and the y key in the Hugging Face dataset corresponds to the label field in the Argilla dataset.
"},{"location":"reference/argilla/datasets/dataset_records/#updating-records-in-a-dataset","title":"Updating records in a dataset","text":"

Records can also be updated using the log method with records that contain an id to identify the records to be updated. As above, records can be added as dictionaries or as Record objects.

As a Record objectFrom a data structureFrom a data structure with a mappingFrom a Hugging Face dataset

You can update records in a dataset by initializing a Record object directly and providing the id field.

records = [\n    rg.Record(\n        metadata={\"department\": \"toys\"},\n        id=\"2\" # (1)\n    ),\n]\n\ndataset.records.log(records)\n
  1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.

You can also update records in a dataset by providing the id field in the data structure.

data = [\n    {\n        \"metadata\": {\"department\": \"toys\"},\n        \"id\": \"2\" # (1)\n    },\n]\n\ndataset.records.log(data)\n
  1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.

You can also update records in a dataset by providing the id field in the data structure and using a mapping to map the keys in the data structure to the fields in the dataset.

data = [\n    {\n        \"metadata\": {\"department\": \"toys\"},\n        \"my_id\": \"2\" # (1)\n    },\n]\n\ndataset.records.log(\n    records=data,\n    mapping={\"my_id\": \"id\"} # (2)\n)\n
  1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.
  2. Let's say that your data structure has keys my_id instead of id. You can use the mapping parameter to map the keys in the data structure to the fields in the dataset.

You can also update records to an Argilla dataset using a Hugging Face dataset. To update records, the Hugging Face dataset must contain an id field to identify the records to be updated, or you can use a mapping to map the keys in the Hugging Face dataset to the fields in the Argilla dataset.

from datasets import load_dataset\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (1)\n\ndataset.records.log(records=hf_dataset, mapping={\"uuid\": \"id\"}) # (2)\n
  1. In this example, the Hugging Face dataset matches the Argilla dataset schema.
  2. The uuid key in the Hugging Face dataset corresponds to the id field in the Argilla dataset.
"},{"location":"reference/argilla/datasets/dataset_records/#adding-and-updating-records-with-images","title":"Adding and updating records with images","text":"

Argilla datasets can contain image fields. You can add images to a dataset by passing the image to the record object as either a remote URL, a local path to an image file, or a PIL object. The field names must be defined as an rg.ImageField in the dataset's Settings object to be accepted. Images will be stored in the Argilla database and returned using the data URI schema.

As PIL objects

To retrieve the images as rescaled PIL objects, you can use the to_datasets method when exporting the records, as shown in this how-to guide.

From a data structure with remote URLsFrom a data structure with local files or PIL objectsFrom a Hugging Face dataset
data = [\n    {\n        \"image\": \"https://example.com/image1.jpg\",\n    },\n    {\n        \"image\": \"https://example.com/image2.jpg\",\n    },\n]\n\ndataset.records.log(data)\n
import os\nfrom PIL import Image\n\nimage_dir = \"path/to/images\"\n\ndata = [\n    {\n        \"image\": os.path.join(image_dir, \"image1.jpg\"), # (1)\n    },\n    {\n        \"image\": Image.open(os.path.join(image_dir, \"image2.jpg\")), # (2)\n    },\n]\n\ndataset.records.log(data)\n
  1. The image is a local file path.
  2. The image is a PIL object.

Hugging Face datasets can be passed directly to the log method. The image field must be defined as an Image in the dataset's features.

hf_dataset = load_dataset(\"ylecun/mnist\", split=\"train[:100]\")\ndataset.records.log(records=hf_dataset)\n

If the image field is not defined as an Image in the dataset's features, you can cast the dataset to the correct schema before adding it to the Argilla dataset. This is only necessary if the image field is not defined as an Image in the dataset's features, and is not one of the supported image types by Argilla (URL, local path, or PIL object).

hf_dataset = load_dataset(\"<my_custom_dataset>\") # (1)\nhf_dataset = hf_dataset.cast(\n    features=Features({\"image\": Image(), \"label\": Value(\"string\")}),\n)\ndataset.records.log(records=hf_dataset)\n
  1. In this example, the Hugging Face dataset matches the Argilla dataset schema but the image field is not defined as an Image in the dataset's features.
"},{"location":"reference/argilla/datasets/dataset_records/#iterating-over-records-in-a-dataset","title":"Iterating over records in a dataset","text":"

Dataset.records can be used to iterate over records in a dataset from the server. The records will be fetched in batches from the server::

for record in dataset.records:\n    print(record)\n\n# Fetch records with suggestions and responses\nfor record in dataset.records(with_suggestions=True, with_responses=True):\n    print(record.suggestions)\n    print(record.responses)\n\n# Filter records by a query and fetch records with vectors\nfor record in dataset.records(query=\"capital\", with_vectors=True):\n    print(record.vectors)\n

Check out the rg.Record class reference for more information on the properties and methods available on a record and the rg.Query class reference for more information on the query syntax.

"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords","title":"DatasetRecords","text":"

Bases: Iterable[Record], LoggingMixin

This class is used to work with records from a dataset and is accessed via Dataset.records. The responsibility of this class is to provide an interface to interact with records in a dataset, by adding, updating, fetching, querying, deleting, and exporting records.

Attributes:

Name Type Description client Argilla

The Argilla client object.

dataset Dataset

The dataset object.

Source code in src/argilla/records/_dataset_records.py
class DatasetRecords(Iterable[Record], LoggingMixin):\n    \"\"\"This class is used to work with records from a dataset and is accessed via `Dataset.records`.\n    The responsibility of this class is to provide an interface to interact with records in a dataset,\n    by adding, updating, fetching, querying, deleting, and exporting records.\n\n    Attributes:\n        client (Argilla): The Argilla client object.\n        dataset (Dataset): The dataset object.\n    \"\"\"\n\n    _api: RecordsAPI\n\n    DEFAULT_BATCH_SIZE = 256\n    DEFAULT_DELETE_BATCH_SIZE = 64\n\n    def __init__(\n        self, client: \"Argilla\", dataset: \"Dataset\", mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None\n    ):\n        \"\"\"Initializes a DatasetRecords object with a client and a dataset.\n        Args:\n            client: An Argilla client object.\n            dataset: A Dataset object.\n        \"\"\"\n        self.__client = client\n        self.__dataset = dataset\n        self._mapping = mapping or {}\n        self._api = self.__client.api.records\n\n    def __iter__(self):\n        return DatasetRecordsIterator(self.__dataset, self.__client, with_suggestions=True, with_responses=True)\n\n    def __call__(\n        self,\n        query: Optional[Union[str, Query]] = None,\n        batch_size: Optional[int] = DEFAULT_BATCH_SIZE,\n        start_offset: int = 0,\n        with_suggestions: bool = True,\n        with_responses: bool = True,\n        with_vectors: Optional[Union[List, bool, str]] = None,\n        limit: Optional[int] = None,\n    ) -> DatasetRecordsIterator:\n        \"\"\"Returns an iterator over the records in the dataset on the server.\n\n        Parameters:\n            query: A string or a Query object to filter the records.\n            batch_size: The number of records to fetch in each batch. The default is 256.\n            start_offset: The offset from which to start fetching records. The default is 0.\n            with_suggestions: Whether to include suggestions in the records. The default is True.\n            with_responses: Whether to include responses in the records. The default is True.\n            with_vectors: A list of vector names to include in the records. The default is None.\n                If a list is provided, only the specified vectors will be included.\n                If True is provided, all vectors will be included.\n            limit: The maximum number of records to fetch. The default is None.\n\n        Returns:\n            An iterator over the records in the dataset on the server.\n\n        \"\"\"\n        if query and isinstance(query, str):\n            query = Query(query=query)\n\n        if with_vectors:\n            self._validate_vector_names(vector_names=with_vectors)\n\n        return DatasetRecordsIterator(\n            dataset=self.__dataset,\n            client=self.__client,\n            query=query,\n            batch_size=batch_size,\n            start_offset=start_offset,\n            with_suggestions=with_suggestions,\n            with_responses=with_responses,\n            with_vectors=with_vectors,\n            limit=limit,\n        )\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}({self.__dataset})\"\n\n    ############################\n    # Public methods\n    ############################\n\n    def log(\n        self,\n        records: Union[List[dict], List[Record], HFDataset],\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        user_id: Optional[UUID] = None,\n        batch_size: int = DEFAULT_BATCH_SIZE,\n        on_error: RecordErrorHandling = RecordErrorHandling.RAISE,\n    ) -> \"DatasetRecords\":\n        \"\"\"Add or update records in a dataset on the server using the provided records.\n        If the record includes a known `id` field, the record will be updated.\n        If the record does not include a known `id` field, the record will be added as a new record.\n        See `rg.Record` for more information on the record definition.\n\n        Parameters:\n            records: A list of `Record` objects, a Hugging Face Dataset, or a list of dictionaries representing the records.\n                     If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the\n                     fields in the Argilla dataset's fields and questions. `id` should be provided to identify the records when updating.\n            mapping: A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset.\n                     To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.\n            user_id: The user id to be associated with the records' response. If not provided, the current user id is used.\n            batch_size: The number of records to send in each batch. The default is 256.\n\n        Returns:\n            A list of Record objects representing the updated records.\n        \"\"\"\n        record_models = self._ingest_records(\n            records=records, mapping=mapping, user_id=user_id or self.__client.me.id, on_error=on_error\n        )\n        batch_size = self._normalize_batch_size(\n            batch_size=batch_size,\n            records_length=len(record_models),\n            max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,\n        )\n\n        created_or_updated = []\n        records_updated = 0\n\n        for batch in tqdm(\n            iterable=range(0, len(records), batch_size),\n            desc=\"Sending records...\",\n            total=len(records) // batch_size,\n            unit=\"batch\",\n        ):\n            self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n            batch_records = record_models[batch : batch + batch_size]\n            models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)\n            created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])\n            records_updated += updated\n\n        records_created = len(created_or_updated) - records_updated\n        self._log_message(\n            message=f\"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}\",\n            level=\"info\",\n        )\n\n        return self\n\n    def delete(\n        self,\n        records: List[Record],\n        batch_size: int = DEFAULT_DELETE_BATCH_SIZE,\n    ) -> List[Record]:\n        \"\"\"Delete records in a dataset on the server using the provided records\n            and matching based on the id.\n\n        Parameters:\n            records: A list of `Record` objects representing the records to be deleted.\n            batch_size: The number of records to send in each batch. The default is 64.\n\n        Returns:\n            A list of Record objects representing the deleted records.\n\n        \"\"\"\n        mapping = None\n        user_id = self.__client.me.id\n        record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id)\n        batch_size = self._normalize_batch_size(\n            batch_size=batch_size,\n            records_length=len(record_models),\n            max_value=self._api.MAX_RECORDS_PER_DELETE_BULK,\n        )\n\n        records_deleted = 0\n        for batch in tqdm(\n            iterable=range(0, len(records), batch_size),\n            desc=\"Sending records...\",\n            total=len(records) // batch_size,\n            unit=\"batch\",\n        ):\n            self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n            batch_records = record_models[batch : batch + batch_size]\n            self._api.delete_many(dataset_id=self.__dataset.id, records=batch_records)\n            records_deleted += len(batch_records)\n\n        self._log_message(\n            message=f\"Deleted {len(record_models)} records from dataset {self.__dataset.name}\",\n            level=\"info\",\n        )\n\n        return records\n\n    def to_dict(self, flatten: bool = False, orient: str = \"names\") -> Dict[str, Any]:\n        \"\"\"\n        Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().\n\n        Parameters:\n            flatten (bool): The structure of the exported dictionary.\n                - True: The record fields, metadata, suggestions and responses will be flattened.\n                - False: The record fields, metadata, suggestions and responses will be nested.\n            orient (str): The orientation of the exported dictionary.\n                - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.\n                - \"index\": The keys of the dictionary will be the id of the records.\n        Returns:\n            A dictionary of records.\n\n        \"\"\"\n        return self().to_dict(flatten=flatten, orient=orient)\n\n    def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:\n        \"\"\"\n        Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().\n\n        Parameters:\n            flatten (bool): The structure of the exported dictionaries in the list.\n                - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, `label.suggestion` and `label.response`. Records responses are spread across multiple columns for values and users.\n                - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.\n        Returns:\n            A list of dictionaries of records.\n        \"\"\"\n        data = self().to_list(flatten=flatten)\n        return data\n\n    def to_json(self, path: Union[Path, str]) -> Path:\n        \"\"\"\n        Export the records to a file on disk.\n\n        Parameters:\n            path (str): The path to the file to save the records.\n\n        Returns:\n            The path to the file where the records were saved.\n\n        \"\"\"\n        return self().to_json(path=path)\n\n    def from_json(self, path: Union[Path, str]) -> List[Record]:\n        \"\"\"Creates a DatasetRecords object from a disk path to a JSON file.\n            The JSON file should be defined by `DatasetRecords.to_json`.\n\n        Args:\n            path (str): The path to the file containing the records.\n\n        Returns:\n            DatasetRecords: The DatasetRecords object created from the disk path.\n\n        \"\"\"\n        records = JsonIO._records_from_json(path=path)\n        return self.log(records=records)\n\n    def to_datasets(self) -> HFDataset:\n        \"\"\"\n        Export the records to a HFDataset.\n\n        Returns:\n            The dataset containing the records.\n\n        \"\"\"\n\n        return self().to_datasets()\n\n    ############################\n    # Private methods\n    ############################\n\n    def _ingest_records(\n        self,\n        records: Union[List[Dict[str, Any]], List[Record], HFDataset],\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        user_id: Optional[UUID] = None,\n        on_error: RecordErrorHandling = RecordErrorHandling.RAISE,\n    ) -> List[RecordModel]:\n        \"\"\"Ingests records from a list of dictionaries, a Hugging Face Dataset, or a list of Record objects.\"\"\"\n\n        mapping = mapping or self._mapping\n        if len(records) == 0:\n            raise ValueError(\"No records provided to ingest.\")\n\n        if HFDatasetsIO._is_hf_dataset(dataset=records):\n            records = HFDatasetsIO._record_dicts_from_datasets(hf_dataset=records)\n\n        ingested_records = []\n        record_mapper = IngestedRecordMapper(mapping=mapping, dataset=self.__dataset, user_id=user_id)\n        for record in records:\n            try:\n                if isinstance(record, dict):\n                    record = record_mapper(data=record)\n                elif isinstance(record, Record):\n                    record.dataset = self.__dataset\n                else:\n                    raise ValueError(\n                        \"Records should be a a list Record instances, \"\n                        \"a Hugging Face Dataset, or a list of dictionaries representing the records.\"\n                        f\"Found a record of type {type(record)}: {record}.\"\n                    )\n            except Exception as e:\n                if on_error == RecordErrorHandling.IGNORE:\n                    self._log_message(\n                        message=f\"Failed to ingest record from dict {record}: {e}\",\n                        level=\"info\",\n                    )\n                    continue\n                elif on_error == RecordErrorHandling.WARN:\n                    warnings.warn(f\"Failed to ingest record from dict {record}: {e}\")\n                    continue\n                raise RecordsIngestionError(f\"Failed to ingest record from dict {record}\") from e\n            ingested_records.append(record.api_model())\n        return ingested_records\n\n    def _normalize_batch_size(self, batch_size: int, records_length, max_value: int):\n        norm_batch_size = min(batch_size, records_length, max_value)\n\n        if batch_size != norm_batch_size:\n            self._log_message(\n                message=f\"The provided batch size {batch_size} was normalized. Using value {norm_batch_size}.\",\n                level=\"warning\",\n            )\n\n        return norm_batch_size\n\n    def _validate_vector_names(self, vector_names: Union[List[str], str]) -> None:\n        if not isinstance(vector_names, list):\n            vector_names = [vector_names]\n        for vector_name in vector_names:\n            if isinstance(vector_name, bool):\n                continue\n            if vector_name not in self.__dataset.schema:\n                raise ValueError(f\"Vector field {vector_name} not found in dataset schema.\")\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.__init__","title":"__init__(client, dataset, mapping=None)","text":"

Initializes a DatasetRecords object with a client and a dataset. Args: client: An Argilla client object. dataset: A Dataset object.

Source code in src/argilla/records/_dataset_records.py
def __init__(\n    self, client: \"Argilla\", dataset: \"Dataset\", mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None\n):\n    \"\"\"Initializes a DatasetRecords object with a client and a dataset.\n    Args:\n        client: An Argilla client object.\n        dataset: A Dataset object.\n    \"\"\"\n    self.__client = client\n    self.__dataset = dataset\n    self._mapping = mapping or {}\n    self._api = self.__client.api.records\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.__call__","title":"__call__(query=None, batch_size=DEFAULT_BATCH_SIZE, start_offset=0, with_suggestions=True, with_responses=True, with_vectors=None, limit=None)","text":"

Returns an iterator over the records in the dataset on the server.

Parameters:

Name Type Description Default query Optional[Union[str, Query]]

A string or a Query object to filter the records.

None batch_size Optional[int]

The number of records to fetch in each batch. The default is 256.

DEFAULT_BATCH_SIZE start_offset int

The offset from which to start fetching records. The default is 0.

0 with_suggestions bool

Whether to include suggestions in the records. The default is True.

True with_responses bool

Whether to include responses in the records. The default is True.

True with_vectors Optional[Union[List, bool, str]]

A list of vector names to include in the records. The default is None. If a list is provided, only the specified vectors will be included. If True is provided, all vectors will be included.

None limit Optional[int]

The maximum number of records to fetch. The default is None.

None

Returns:

Type Description DatasetRecordsIterator

An iterator over the records in the dataset on the server.

Source code in src/argilla/records/_dataset_records.py
def __call__(\n    self,\n    query: Optional[Union[str, Query]] = None,\n    batch_size: Optional[int] = DEFAULT_BATCH_SIZE,\n    start_offset: int = 0,\n    with_suggestions: bool = True,\n    with_responses: bool = True,\n    with_vectors: Optional[Union[List, bool, str]] = None,\n    limit: Optional[int] = None,\n) -> DatasetRecordsIterator:\n    \"\"\"Returns an iterator over the records in the dataset on the server.\n\n    Parameters:\n        query: A string or a Query object to filter the records.\n        batch_size: The number of records to fetch in each batch. The default is 256.\n        start_offset: The offset from which to start fetching records. The default is 0.\n        with_suggestions: Whether to include suggestions in the records. The default is True.\n        with_responses: Whether to include responses in the records. The default is True.\n        with_vectors: A list of vector names to include in the records. The default is None.\n            If a list is provided, only the specified vectors will be included.\n            If True is provided, all vectors will be included.\n        limit: The maximum number of records to fetch. The default is None.\n\n    Returns:\n        An iterator over the records in the dataset on the server.\n\n    \"\"\"\n    if query and isinstance(query, str):\n        query = Query(query=query)\n\n    if with_vectors:\n        self._validate_vector_names(vector_names=with_vectors)\n\n    return DatasetRecordsIterator(\n        dataset=self.__dataset,\n        client=self.__client,\n        query=query,\n        batch_size=batch_size,\n        start_offset=start_offset,\n        with_suggestions=with_suggestions,\n        with_responses=with_responses,\n        with_vectors=with_vectors,\n        limit=limit,\n    )\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.log","title":"log(records, mapping=None, user_id=None, batch_size=DEFAULT_BATCH_SIZE, on_error=RecordErrorHandling.RAISE)","text":"

Add or update records in a dataset on the server using the provided records. If the record includes a known id field, the record will be updated. If the record does not include a known id field, the record will be added as a new record. See rg.Record for more information on the record definition.

Parameters:

Name Type Description Default records Union[List[dict], List[Record], HFDataset]

A list of Record objects, a Hugging Face Dataset, or a list of dictionaries representing the records. If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the fields in the Argilla dataset's fields and questions. id should be provided to identify the records when updating.

required mapping Optional[Dict[str, Union[str, Sequence[str]]]]

A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset. To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.

None user_id Optional[UUID]

The user id to be associated with the records' response. If not provided, the current user id is used.

None batch_size int

The number of records to send in each batch. The default is 256.

DEFAULT_BATCH_SIZE

Returns:

Type Description DatasetRecords

A list of Record objects representing the updated records.

Source code in src/argilla/records/_dataset_records.py
def log(\n    self,\n    records: Union[List[dict], List[Record], HFDataset],\n    mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n    user_id: Optional[UUID] = None,\n    batch_size: int = DEFAULT_BATCH_SIZE,\n    on_error: RecordErrorHandling = RecordErrorHandling.RAISE,\n) -> \"DatasetRecords\":\n    \"\"\"Add or update records in a dataset on the server using the provided records.\n    If the record includes a known `id` field, the record will be updated.\n    If the record does not include a known `id` field, the record will be added as a new record.\n    See `rg.Record` for more information on the record definition.\n\n    Parameters:\n        records: A list of `Record` objects, a Hugging Face Dataset, or a list of dictionaries representing the records.\n                 If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the\n                 fields in the Argilla dataset's fields and questions. `id` should be provided to identify the records when updating.\n        mapping: A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset.\n                 To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.\n        user_id: The user id to be associated with the records' response. If not provided, the current user id is used.\n        batch_size: The number of records to send in each batch. The default is 256.\n\n    Returns:\n        A list of Record objects representing the updated records.\n    \"\"\"\n    record_models = self._ingest_records(\n        records=records, mapping=mapping, user_id=user_id or self.__client.me.id, on_error=on_error\n    )\n    batch_size = self._normalize_batch_size(\n        batch_size=batch_size,\n        records_length=len(record_models),\n        max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,\n    )\n\n    created_or_updated = []\n    records_updated = 0\n\n    for batch in tqdm(\n        iterable=range(0, len(records), batch_size),\n        desc=\"Sending records...\",\n        total=len(records) // batch_size,\n        unit=\"batch\",\n    ):\n        self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n        batch_records = record_models[batch : batch + batch_size]\n        models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)\n        created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])\n        records_updated += updated\n\n    records_created = len(created_or_updated) - records_updated\n    self._log_message(\n        message=f\"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}\",\n        level=\"info\",\n    )\n\n    return self\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.delete","title":"delete(records, batch_size=DEFAULT_DELETE_BATCH_SIZE)","text":"

Delete records in a dataset on the server using the provided records and matching based on the id.

Parameters:

Name Type Description Default records List[Record]

A list of Record objects representing the records to be deleted.

required batch_size int

The number of records to send in each batch. The default is 64.

DEFAULT_DELETE_BATCH_SIZE

Returns:

Type Description List[Record]

A list of Record objects representing the deleted records.

Source code in src/argilla/records/_dataset_records.py
def delete(\n    self,\n    records: List[Record],\n    batch_size: int = DEFAULT_DELETE_BATCH_SIZE,\n) -> List[Record]:\n    \"\"\"Delete records in a dataset on the server using the provided records\n        and matching based on the id.\n\n    Parameters:\n        records: A list of `Record` objects representing the records to be deleted.\n        batch_size: The number of records to send in each batch. The default is 64.\n\n    Returns:\n        A list of Record objects representing the deleted records.\n\n    \"\"\"\n    mapping = None\n    user_id = self.__client.me.id\n    record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id)\n    batch_size = self._normalize_batch_size(\n        batch_size=batch_size,\n        records_length=len(record_models),\n        max_value=self._api.MAX_RECORDS_PER_DELETE_BULK,\n    )\n\n    records_deleted = 0\n    for batch in tqdm(\n        iterable=range(0, len(records), batch_size),\n        desc=\"Sending records...\",\n        total=len(records) // batch_size,\n        unit=\"batch\",\n    ):\n        self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n        batch_records = record_models[batch : batch + batch_size]\n        self._api.delete_many(dataset_id=self.__dataset.id, records=batch_records)\n        records_deleted += len(batch_records)\n\n    self._log_message(\n        message=f\"Deleted {len(record_models)} records from dataset {self.__dataset.name}\",\n        level=\"info\",\n    )\n\n    return records\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_dict","title":"to_dict(flatten=False, orient='names')","text":"

Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().

Parameters:

Name Type Description Default flatten bool

The structure of the exported dictionary. - True: The record fields, metadata, suggestions and responses will be flattened. - False: The record fields, metadata, suggestions and responses will be nested.

False orient str

The orientation of the exported dictionary. - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses. - \"index\": The keys of the dictionary will be the id of the records.

'names'

Returns: A dictionary of records.

Source code in src/argilla/records/_dataset_records.py
def to_dict(self, flatten: bool = False, orient: str = \"names\") -> Dict[str, Any]:\n    \"\"\"\n    Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().\n\n    Parameters:\n        flatten (bool): The structure of the exported dictionary.\n            - True: The record fields, metadata, suggestions and responses will be flattened.\n            - False: The record fields, metadata, suggestions and responses will be nested.\n        orient (str): The orientation of the exported dictionary.\n            - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.\n            - \"index\": The keys of the dictionary will be the id of the records.\n    Returns:\n        A dictionary of records.\n\n    \"\"\"\n    return self().to_dict(flatten=flatten, orient=orient)\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_list","title":"to_list(flatten=False)","text":"

Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().

Parameters:

Name Type Description Default flatten bool

The structure of the exported dictionaries in the list. - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, label.suggestion and label.response. Records responses are spread across multiple columns for values and users. - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.

False

Returns: A list of dictionaries of records.

Source code in src/argilla/records/_dataset_records.py
def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:\n    \"\"\"\n    Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().\n\n    Parameters:\n        flatten (bool): The structure of the exported dictionaries in the list.\n            - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, `label.suggestion` and `label.response`. Records responses are spread across multiple columns for values and users.\n            - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.\n    Returns:\n        A list of dictionaries of records.\n    \"\"\"\n    data = self().to_list(flatten=flatten)\n    return data\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_json","title":"to_json(path)","text":"

Export the records to a file on disk.

Parameters:

Name Type Description Default path str

The path to the file to save the records.

required

Returns:

Type Description Path

The path to the file where the records were saved.

Source code in src/argilla/records/_dataset_records.py
def to_json(self, path: Union[Path, str]) -> Path:\n    \"\"\"\n    Export the records to a file on disk.\n\n    Parameters:\n        path (str): The path to the file to save the records.\n\n    Returns:\n        The path to the file where the records were saved.\n\n    \"\"\"\n    return self().to_json(path=path)\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.from_json","title":"from_json(path)","text":"

Creates a DatasetRecords object from a disk path to a JSON file. The JSON file should be defined by DatasetRecords.to_json.

Parameters:

Name Type Description Default path str

The path to the file containing the records.

required

Returns:

Name Type Description DatasetRecords List[Record]

The DatasetRecords object created from the disk path.

Source code in src/argilla/records/_dataset_records.py
def from_json(self, path: Union[Path, str]) -> List[Record]:\n    \"\"\"Creates a DatasetRecords object from a disk path to a JSON file.\n        The JSON file should be defined by `DatasetRecords.to_json`.\n\n    Args:\n        path (str): The path to the file containing the records.\n\n    Returns:\n        DatasetRecords: The DatasetRecords object created from the disk path.\n\n    \"\"\"\n    records = JsonIO._records_from_json(path=path)\n    return self.log(records=records)\n
"},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_datasets","title":"to_datasets()","text":"

Export the records to a HFDataset.

Returns:

Type Description HFDataset

The dataset containing the records.

Source code in src/argilla/records/_dataset_records.py
def to_datasets(self) -> HFDataset:\n    \"\"\"\n    Export the records to a HFDataset.\n\n    Returns:\n        The dataset containing the records.\n\n    \"\"\"\n\n    return self().to_datasets()\n
"},{"location":"reference/argilla/datasets/datasets/","title":"rg.Dataset","text":"

Dataset is a class that represents a collection of records. It is used to store and manage records in Argilla.

"},{"location":"reference/argilla/datasets/datasets/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/datasets/datasets/#creating-a-dataset","title":"Creating a Dataset","text":"

To create a new dataset you need to define its name and settings. Optional parameters are workspace and client, if you want to create the dataset in a specific workspace or on a specific Argilla instance.

dataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=rg.Settings(\n        fields=[\n            rg.TextField(name=\"text\"),\n        ],\n        questions=[\n            rg.TextQuestion(name=\"response\"),\n        ],\n    ),\n)\ndataset.create()\n

For a detail guide of the dataset creation and publication process, see the Dataset how to guide.

"},{"location":"reference/argilla/datasets/datasets/#retrieving-an-existing-dataset","title":"Retrieving an existing Dataset","text":"

To retrieve an existing dataset, use client.datasets(\"my_dataset\") instead.

dataset = client.datasets(\"my_dataset\")\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset","title":"Dataset","text":"

Bases: Resource, HubImportExportMixin, DiskImportExportMixin

Class for interacting with Argilla Datasets

Attributes:

Name Type Description name str

Name of the dataset.

records DatasetRecords

The records object for the dataset. Used to interact with the records of the dataset by iterating, searching, etc.

settings Settings

The settings object of the dataset. Used to configure the dataset with fields, questions, guidelines, etc.

fields list

The fields of the dataset, for example the rg.TextField of the dataset. Defined in the settings.

questions list

The questions of the dataset defined in the settings. For example, the rg.TextQuestion that you want labelers to answer.

guidelines str

The guidelines of the dataset defined in the settings. Used to provide instructions to labelers.

allow_extra_metadata bool

True if extra metadata is allowed, False otherwise.

Source code in src/argilla/datasets/_resource.py
class Dataset(Resource, HubImportExportMixin, DiskImportExportMixin):\n    \"\"\"Class for interacting with Argilla Datasets\n\n    Attributes:\n        name: Name of the dataset.\n        records (DatasetRecords): The records object for the dataset. Used to interact with the records of the dataset by iterating, searching, etc.\n        settings (Settings): The settings object of the dataset. Used to configure the dataset with fields, questions, guidelines, etc.\n        fields (list): The fields of the dataset, for example the `rg.TextField` of the dataset. Defined in the settings.\n        questions (list): The questions of the dataset defined in the settings. For example, the `rg.TextQuestion` that you want labelers to answer.\n        guidelines (str): The guidelines of the dataset defined in the settings. Used to provide instructions to labelers.\n        allow_extra_metadata (bool): True if extra metadata is allowed, False otherwise.\n    \"\"\"\n\n    name: str\n    id: Optional[UUID]\n\n    _api: \"DatasetsAPI\"\n    _model: \"DatasetModel\"\n\n    def __init__(\n        self,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str, UUID]] = None,\n        settings: Optional[Settings] = None,\n        client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Initializes a new Argilla Dataset object with the given parameters.\n\n        Parameters:\n            name (str): Name of the dataset. Replaced by random UUID if not assigned.\n            workspace (UUID): Workspace of the dataset. Default is the first workspace found in the server.\n            settings (Settings): Settings class to be used to configure the dataset.\n            client (Argilla): Instance of Argilla to connect with the server. Default is the default client.\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.datasets)\n        if name is None:\n            name = f\"dataset_{uuid4()}\"\n            self._log_message(f\"Settings dataset name to unique UUID: {name}\")\n\n        self._workspace = workspace\n        self._model = DatasetModel(name=name)\n        self._settings = settings._copy() if settings else Settings(_dataset=self)\n        self._settings.dataset = self\n        self.__records = DatasetRecords(client=self._client, dataset=self, mapping=self._settings.mapping)\n\n    #####################\n    #  Properties       #\n    #####################\n\n    @property\n    def name(self) -> str:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def records(self) -> \"DatasetRecords\":\n        return self.__records\n\n    @property\n    def settings(self) -> Settings:\n        return self._settings\n\n    @settings.setter\n    def settings(self, value: Settings) -> None:\n        settings_copy = value._copy()\n        settings_copy.dataset = self\n        self._settings = settings_copy\n\n    @property\n    def fields(self) -> list:\n        return self.settings.fields\n\n    @property\n    def questions(self) -> list:\n        return self.settings.questions\n\n    @property\n    def guidelines(self) -> str:\n        return self.settings.guidelines\n\n    @guidelines.setter\n    def guidelines(self, value: str) -> None:\n        self.settings.guidelines = value\n\n    @property\n    def allow_extra_metadata(self) -> bool:\n        return self.settings.allow_extra_metadata\n\n    @allow_extra_metadata.setter\n    def allow_extra_metadata(self, value: bool) -> None:\n        self.settings.allow_extra_metadata = value\n\n    @property\n    def schema(self) -> dict:\n        return self.settings.schema\n\n    @property\n    def workspace(self) -> Workspace:\n        self._workspace = self._resolve_workspace()\n        return self._workspace\n\n    @property\n    def distribution(self) -> TaskDistribution:\n        return self.settings.distribution\n\n    @distribution.setter\n    def distribution(self, value: TaskDistribution) -> None:\n        self.settings.distribution = value\n\n    #####################\n    #  Core methods     #\n    #####################\n\n    def get(self) -> \"Dataset\":\n        super().get()\n        self.settings.get()\n        return self\n\n    def create(self) -> \"Dataset\":\n        \"\"\"Creates the dataset on the server with the `Settings` configuration.\n\n        Returns:\n            Dataset: The created dataset object.\n        \"\"\"\n        try:\n            super().create()\n        except ForbiddenError as e:\n            settings_url = f\"{self._client.api_url}/user-settings\"\n            user_role = self._client.me.role.value\n            user_name = self._client.me.username\n            workspace_name = self.workspace.name\n            message = f\"\"\"User '{user_name}' is not authorized to create a dataset in workspace '{workspace_name}'\n            with role '{user_role}'. Go to {settings_url} to view your role.\"\"\"\n            raise ForbiddenError(message) from e\n        try:\n            return self._publish()\n        except Exception as e:\n            self._log_message(message=f\"Error creating dataset: {e}\", level=\"error\")\n            self._rollback_dataset_creation()\n            raise SettingsError from e\n\n    def update(self) -> \"Dataset\":\n        \"\"\"Updates the dataset on the server with the current settings.\n\n        Returns:\n            Dataset: The updated dataset object.\n        \"\"\"\n        self.settings.update()\n        return self\n\n    def progress(self, with_users_distribution: bool = False) -> dict:\n        \"\"\"Returns the team's progress on the dataset.\n\n        Parameters:\n            with_users_distribution (bool): If True, the progress of the dataset is returned\n                with users distribution. This includes the number of responses made by each user.\n\n        Returns:\n            dict: The team's progress on the dataset.\n\n        An example of a response when `with_users_distribution` is `True`:\n        ```json\n        {\n            \"total\": 100,\n            \"completed\": 50,\n            \"pending\": 50,\n            \"users\": {\n                \"user1\": {\n                   \"completed\": { \"submitted\": 10, \"draft\": 5, \"discarded\": 5},\n                   \"pending\": { \"submitted\": 5, \"draft\": 10, \"discarded\": 10},\n                },\n                \"user2\": {\n                   \"completed\": { \"submitted\": 20, \"draft\": 10, \"discarded\": 5},\n                   \"pending\": { \"submitted\": 2, \"draft\": 25, \"discarded\": 0},\n                },\n                ...\n        }\n        ```\n\n        \"\"\"\n\n        progress = self._api.get_progress(dataset_id=self._model.id).model_dump()\n\n        if with_users_distribution:\n            users_progress = self._api.list_users_progress(dataset_id=self._model.id)\n            users_distribution = {\n                user.username: {\n                    \"completed\": user.completed.model_dump(),\n                    \"pending\": user.pending.model_dump(),\n                }\n                for user in users_progress\n            }\n\n            progress.update({\"users\": users_distribution})\n\n        return progress\n\n    @classmethod\n    def from_model(cls, model: DatasetModel, client: \"Argilla\") -> \"Dataset\":\n        instance = cls(client=client, workspace=model.workspace_id, name=model.name)\n        instance._model = model\n\n        return instance\n\n    #####################\n    #  Utility methods  #\n    #####################\n\n    def api_model(self) -> DatasetModel:\n        self._model.workspace_id = self.workspace.id\n        return self._model\n\n    def _publish(self) -> \"Dataset\":\n        self._settings.create()\n        self._api.publish(dataset_id=self._model.id)\n\n        return self.get()\n\n    def _resolve_workspace(self) -> Workspace:\n        workspace = self._workspace\n\n        if workspace is None:\n            workspace = self._client.workspaces.default\n            warnings.warn(f\"Workspace not provided. Using default workspace: {workspace.name} id: {workspace.id}\")\n        elif isinstance(workspace, str):\n            workspace = self._client.workspaces(workspace)\n            if workspace is None:\n                available_workspace_names = [ws.name for ws in self._client.workspaces]\n                raise NotFoundError(\n                    f\"Workspace with name {workspace} not found. Available workspaces: {available_workspace_names}\"\n                )\n        elif isinstance(workspace, UUID):\n            ws_model = self._client.api.workspaces.get(workspace)\n            workspace = Workspace.from_model(ws_model, client=self._client)\n        elif not isinstance(workspace, Workspace):\n            raise ValueError(f\"Wrong workspace value found {workspace}\")\n\n        return workspace\n\n    def _rollback_dataset_creation(self):\n        if not self._is_published():\n            self.delete()\n\n    def _is_published(self) -> bool:\n        return self._model.status == \"ready\"\n\n    @classmethod\n    def _sanitize_name(cls, name: str):\n        name = name.replace(\" \", \"_\")\n\n        for character in [\"/\", \"\\\\\", \".\", \",\", \";\", \":\", \"-\", \"+\", \"=\"]:\n            name = name.replace(character, \"-\")\n        return name\n\n    def _with_client(self, client: Argilla) -> \"Self\":\n        return super()._with_client(client=client)\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.__init__","title":"__init__(name=None, workspace=None, settings=None, client=None)","text":"

Initializes a new Argilla Dataset object with the given parameters.

Parameters:

Name Type Description Default name str

Name of the dataset. Replaced by random UUID if not assigned.

None workspace UUID

Workspace of the dataset. Default is the first workspace found in the server.

None settings Settings

Settings class to be used to configure the dataset.

None client Argilla

Instance of Argilla to connect with the server. Default is the default client.

None Source code in src/argilla/datasets/_resource.py
def __init__(\n    self,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str, UUID]] = None,\n    settings: Optional[Settings] = None,\n    client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Initializes a new Argilla Dataset object with the given parameters.\n\n    Parameters:\n        name (str): Name of the dataset. Replaced by random UUID if not assigned.\n        workspace (UUID): Workspace of the dataset. Default is the first workspace found in the server.\n        settings (Settings): Settings class to be used to configure the dataset.\n        client (Argilla): Instance of Argilla to connect with the server. Default is the default client.\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.datasets)\n    if name is None:\n        name = f\"dataset_{uuid4()}\"\n        self._log_message(f\"Settings dataset name to unique UUID: {name}\")\n\n    self._workspace = workspace\n    self._model = DatasetModel(name=name)\n    self._settings = settings._copy() if settings else Settings(_dataset=self)\n    self._settings.dataset = self\n    self.__records = DatasetRecords(client=self._client, dataset=self, mapping=self._settings.mapping)\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.create","title":"create()","text":"

Creates the dataset on the server with the Settings configuration.

Returns:

Name Type Description Dataset Dataset

The created dataset object.

Source code in src/argilla/datasets/_resource.py
def create(self) -> \"Dataset\":\n    \"\"\"Creates the dataset on the server with the `Settings` configuration.\n\n    Returns:\n        Dataset: The created dataset object.\n    \"\"\"\n    try:\n        super().create()\n    except ForbiddenError as e:\n        settings_url = f\"{self._client.api_url}/user-settings\"\n        user_role = self._client.me.role.value\n        user_name = self._client.me.username\n        workspace_name = self.workspace.name\n        message = f\"\"\"User '{user_name}' is not authorized to create a dataset in workspace '{workspace_name}'\n        with role '{user_role}'. Go to {settings_url} to view your role.\"\"\"\n        raise ForbiddenError(message) from e\n    try:\n        return self._publish()\n    except Exception as e:\n        self._log_message(message=f\"Error creating dataset: {e}\", level=\"error\")\n        self._rollback_dataset_creation()\n        raise SettingsError from e\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.update","title":"update()","text":"

Updates the dataset on the server with the current settings.

Returns:

Name Type Description Dataset Dataset

The updated dataset object.

Source code in src/argilla/datasets/_resource.py
def update(self) -> \"Dataset\":\n    \"\"\"Updates the dataset on the server with the current settings.\n\n    Returns:\n        Dataset: The updated dataset object.\n    \"\"\"\n    self.settings.update()\n    return self\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.progress","title":"progress(with_users_distribution=False)","text":"

Returns the team's progress on the dataset.

Parameters:

Name Type Description Default with_users_distribution bool

If True, the progress of the dataset is returned with users distribution. This includes the number of responses made by each user.

False

Returns:

Name Type Description dict dict

The team's progress on the dataset.

An example of a response when with_users_distribution is True:

{\n    \"total\": 100,\n    \"completed\": 50,\n    \"pending\": 50,\n    \"users\": {\n        \"user1\": {\n           \"completed\": { \"submitted\": 10, \"draft\": 5, \"discarded\": 5},\n           \"pending\": { \"submitted\": 5, \"draft\": 10, \"discarded\": 10},\n        },\n        \"user2\": {\n           \"completed\": { \"submitted\": 20, \"draft\": 10, \"discarded\": 5},\n           \"pending\": { \"submitted\": 2, \"draft\": 25, \"discarded\": 0},\n        },\n        ...\n}\n

Source code in src/argilla/datasets/_resource.py
def progress(self, with_users_distribution: bool = False) -> dict:\n    \"\"\"Returns the team's progress on the dataset.\n\n    Parameters:\n        with_users_distribution (bool): If True, the progress of the dataset is returned\n            with users distribution. This includes the number of responses made by each user.\n\n    Returns:\n        dict: The team's progress on the dataset.\n\n    An example of a response when `with_users_distribution` is `True`:\n    ```json\n    {\n        \"total\": 100,\n        \"completed\": 50,\n        \"pending\": 50,\n        \"users\": {\n            \"user1\": {\n               \"completed\": { \"submitted\": 10, \"draft\": 5, \"discarded\": 5},\n               \"pending\": { \"submitted\": 5, \"draft\": 10, \"discarded\": 10},\n            },\n            \"user2\": {\n               \"completed\": { \"submitted\": 20, \"draft\": 10, \"discarded\": 5},\n               \"pending\": { \"submitted\": 2, \"draft\": 25, \"discarded\": 0},\n            },\n            ...\n    }\n    ```\n\n    \"\"\"\n\n    progress = self._api.get_progress(dataset_id=self._model.id).model_dump()\n\n    if with_users_distribution:\n        users_progress = self._api.list_users_progress(dataset_id=self._model.id)\n        users_distribution = {\n            user.username: {\n                \"completed\": user.completed.model_dump(),\n                \"pending\": user.pending.model_dump(),\n            }\n            for user in users_progress\n        }\n\n        progress.update({\"users\": users_distribution})\n\n    return progress\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._disk.DiskImportExportMixin","title":"DiskImportExportMixin","text":"

Bases: ABC

A mixin for exporting and importing datasets to and from disk.

Source code in src/argilla/datasets/_io/_disk.py
class DiskImportExportMixin(ABC):\n    \"\"\"A mixin for exporting and importing datasets to and from disk.\"\"\"\n\n    _model: DatasetModel\n    _DEFAULT_RECORDS_PATH = \"records.json\"\n    _DEFAULT_CONFIG_REPO_DIR = \".argilla\"\n    _DEFAULT_SETTINGS_PATH = f\"{_DEFAULT_CONFIG_REPO_DIR}/settings.json\"\n    _DEFAULT_DATASET_PATH = f\"{_DEFAULT_CONFIG_REPO_DIR}/dataset.json\"\n    _DEFAULT_CONFIGURATION_FILES = [_DEFAULT_SETTINGS_PATH, _DEFAULT_DATASET_PATH]\n\n    def to_disk(self: \"Dataset\", path: str, *, with_records: bool = True) -> str:\n        \"\"\"Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.\n\n        Parameters:\n            path (str): The path to export the dataset to. Must be an empty directory.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        \"\"\"\n        dataset_path, settings_path, records_path = self._define_child_paths(path=path)\n        logging.info(f\"Loading dataset from {dataset_path}\")\n        logging.info(f\"Loading settings from {settings_path}\")\n        logging.info(f\"Loading records from {records_path}\")\n        # Export the dataset model, settings and records\n        self._persist_dataset_model(path=dataset_path)\n        self.settings.to_json(path=settings_path)\n        if with_records:\n            self.records.to_json(path=records_path)\n\n        return path\n\n    @classmethod\n    def from_disk(\n        cls: Type[\"Dataset\"],\n        path: str,\n        *,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str]] = None,\n        client: Optional[\"Argilla\"] = None,\n        with_records: bool = True,\n    ) -> \"Dataset\":\n        \"\"\"Imports a dataset from disk as a directory containing the dataset model, settings and records.\n        The directory should be defined using the `to_disk` method.\n\n        Parameters:\n            path (str): The path to the directory containing the dataset model, settings and records.\n            name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n            workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n            client (Argilla, optional): The client to use for the import. Defaults to None and the default client is used.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        \"\"\"\n\n        client = client or Argilla._get_default()\n\n        try:\n            dataset_path, settings_path, records_path = cls._define_child_paths(path=path)\n            logging.info(f\"Loading dataset from {dataset_path}\")\n            logging.info(f\"Loading settings from {settings_path}\")\n            logging.info(f\"Loading records from {records_path}\")\n\n            dataset_model = cls._load_dataset_model(path=dataset_path)\n        except (NotADirectoryError, FileNotFoundError) as e:\n            raise ImportDatasetError(f\"Error loading dataset from disk. {e}\") from e\n\n        # Get the relevant workspace_id of the incoming dataset\n        if isinstance(workspace, str):\n            workspace = client.workspaces(workspace)\n            if not workspace:\n                raise ArgillaError(f\"Workspace {workspace} not found on the server.\")\n        else:\n            warnings.warn(\"Workspace not provided. Using default workspace.\")\n            workspace = client.workspaces.default\n        dataset_model.workspace_id = workspace.id\n\n        if name and (name != dataset_model.name):\n            logging.info(f\"Changing dataset name from {dataset_model.name} to {name}\")\n            dataset_model.name = name\n\n        if client.api.datasets.name_exists(name=dataset_model.name, workspace_id=workspace.id):\n            warnings.warn(\n                f\"Loaded dataset name {dataset_model.name} already exists in the workspace {workspace.name} so using it. To create a new dataset, provide a unique name to the `name` parameter.\"\n            )\n            dataset_model = client.api.datasets.get_by_name_and_workspace_id(\n                name=dataset_model.name, workspace_id=workspace.id\n            )\n            dataset = cls.from_model(model=dataset_model, client=client)\n        else:\n            # Create a new dataset and load the settings and records\n            if not os.path.exists(settings_path):\n                raise ImportDatasetError(f\"Settings file not found at {settings_path}\")\n\n            dataset = cls.from_model(model=dataset_model, client=client)\n            dataset.settings = Settings.from_json(path=settings_path)\n            dataset.create()\n\n        if os.path.exists(records_path) and with_records:\n            try:\n                dataset.records.from_json(path=records_path)\n            except RecordsIngestionError as e:\n                raise RecordsIngestionError(\n                    message=\"Error importing dataset records from disk. \"\n                    \"Records and datasets settings are not compatible.\"\n                ) from e\n\n        return dataset\n\n    ############################\n    # Utility methods\n    ############################\n\n    def _persist_dataset_model(self, path: Path):\n        \"\"\"Persists the dataset model to disk.\"\"\"\n        if path.exists():\n            raise FileExistsError(f\"Dataset already exists at {path}\")\n        with open(file=path, mode=\"w\") as f:\n            json.dump(self.api_model().model_dump(), f)\n\n    @classmethod\n    def _load_dataset_model(cls, path: Path):\n        \"\"\"Loads the dataset model from disk.\"\"\"\n        if not os.path.exists(path):\n            raise FileNotFoundError(f\"Dataset model not found at {path}\")\n        with open(file=path, mode=\"r\") as f:\n            dataset_model = json.load(f)\n            dataset_model = DatasetModel(**dataset_model)\n        return dataset_model\n\n    @classmethod\n    def _define_child_paths(cls, path: Union[Path, str]) -> Tuple[Path, Path, Path]:\n        path = Path(path)\n        if not path.is_dir():\n            raise NotADirectoryError(f\"Path {path} is not a directory\")\n        main_path = path / cls._DEFAULT_CONFIG_REPO_DIR\n        main_path.mkdir(exist_ok=True)\n        dataset_path = path / cls._DEFAULT_DATASET_PATH\n        settings_path = path / cls._DEFAULT_SETTINGS_PATH\n        records_path = path / cls._DEFAULT_RECORDS_PATH\n        return dataset_path, settings_path, records_path\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._disk.DiskImportExportMixin.to_disk","title":"to_disk(path, *, with_records=True)","text":"

Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.

Parameters:

Name Type Description Default path str

The path to export the dataset to. Must be an empty directory.

required with_records bool

whether to load the records from the Hugging Face dataset. Defaults to True.

True Source code in src/argilla/datasets/_io/_disk.py
def to_disk(self: \"Dataset\", path: str, *, with_records: bool = True) -> str:\n    \"\"\"Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.\n\n    Parameters:\n        path (str): The path to export the dataset to. Must be an empty directory.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n    \"\"\"\n    dataset_path, settings_path, records_path = self._define_child_paths(path=path)\n    logging.info(f\"Loading dataset from {dataset_path}\")\n    logging.info(f\"Loading settings from {settings_path}\")\n    logging.info(f\"Loading records from {records_path}\")\n    # Export the dataset model, settings and records\n    self._persist_dataset_model(path=dataset_path)\n    self.settings.to_json(path=settings_path)\n    if with_records:\n        self.records.to_json(path=records_path)\n\n    return path\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._disk.DiskImportExportMixin.from_disk","title":"from_disk(path, *, name=None, workspace=None, client=None, with_records=True) classmethod","text":"

Imports a dataset from disk as a directory containing the dataset model, settings and records. The directory should be defined using the to_disk method.

Parameters:

Name Type Description Default path str

The path to the directory containing the dataset model, settings and records.

required name str

The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.

None workspace Union[Workspace, str]

The workspace to import the dataset to. Defaults to None and default workspace is used.

None client Argilla

The client to use for the import. Defaults to None and the default client is used.

None with_records bool

whether to load the records from the Hugging Face dataset. Defaults to True.

True Source code in src/argilla/datasets/_io/_disk.py
@classmethod\ndef from_disk(\n    cls: Type[\"Dataset\"],\n    path: str,\n    *,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str]] = None,\n    client: Optional[\"Argilla\"] = None,\n    with_records: bool = True,\n) -> \"Dataset\":\n    \"\"\"Imports a dataset from disk as a directory containing the dataset model, settings and records.\n    The directory should be defined using the `to_disk` method.\n\n    Parameters:\n        path (str): The path to the directory containing the dataset model, settings and records.\n        name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n        workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n        client (Argilla, optional): The client to use for the import. Defaults to None and the default client is used.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n    \"\"\"\n\n    client = client or Argilla._get_default()\n\n    try:\n        dataset_path, settings_path, records_path = cls._define_child_paths(path=path)\n        logging.info(f\"Loading dataset from {dataset_path}\")\n        logging.info(f\"Loading settings from {settings_path}\")\n        logging.info(f\"Loading records from {records_path}\")\n\n        dataset_model = cls._load_dataset_model(path=dataset_path)\n    except (NotADirectoryError, FileNotFoundError) as e:\n        raise ImportDatasetError(f\"Error loading dataset from disk. {e}\") from e\n\n    # Get the relevant workspace_id of the incoming dataset\n    if isinstance(workspace, str):\n        workspace = client.workspaces(workspace)\n        if not workspace:\n            raise ArgillaError(f\"Workspace {workspace} not found on the server.\")\n    else:\n        warnings.warn(\"Workspace not provided. Using default workspace.\")\n        workspace = client.workspaces.default\n    dataset_model.workspace_id = workspace.id\n\n    if name and (name != dataset_model.name):\n        logging.info(f\"Changing dataset name from {dataset_model.name} to {name}\")\n        dataset_model.name = name\n\n    if client.api.datasets.name_exists(name=dataset_model.name, workspace_id=workspace.id):\n        warnings.warn(\n            f\"Loaded dataset name {dataset_model.name} already exists in the workspace {workspace.name} so using it. To create a new dataset, provide a unique name to the `name` parameter.\"\n        )\n        dataset_model = client.api.datasets.get_by_name_and_workspace_id(\n            name=dataset_model.name, workspace_id=workspace.id\n        )\n        dataset = cls.from_model(model=dataset_model, client=client)\n    else:\n        # Create a new dataset and load the settings and records\n        if not os.path.exists(settings_path):\n            raise ImportDatasetError(f\"Settings file not found at {settings_path}\")\n\n        dataset = cls.from_model(model=dataset_model, client=client)\n        dataset.settings = Settings.from_json(path=settings_path)\n        dataset.create()\n\n    if os.path.exists(records_path) and with_records:\n        try:\n            dataset.records.from_json(path=records_path)\n        except RecordsIngestionError as e:\n            raise RecordsIngestionError(\n                message=\"Error importing dataset records from disk. \"\n                \"Records and datasets settings are not compatible.\"\n            ) from e\n\n    return dataset\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._hub.HubImportExportMixin","title":"HubImportExportMixin","text":"

Bases: DiskImportExportMixin

Source code in src/argilla/datasets/_io/_hub.py
class HubImportExportMixin(DiskImportExportMixin):\n    def to_hub(\n        self: \"Dataset\",\n        repo_id: str,\n        *,\n        with_records: bool = True,\n        generate_card: Optional[bool] = True,\n        **kwargs: Any,\n    ) -> None:\n        \"\"\"Pushes the `Dataset` to the Hugging Face Hub. If the dataset has been previously pushed to the\n        Hugging Face Hub, it will be updated instead of creating a new dataset repo.\n\n        Parameters:\n            repo_id: the ID of the Hugging Face Hub repo to push the `Dataset` to.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n            generate_card: whether to generate a dataset card for the `Dataset` in the Hugging Face Hub. Defaults\n                to `True`.\n            **kwargs: the kwargs to pass to `datasets.Dataset.push_to_hub`.\n\n        Returns:\n            None\n        \"\"\"\n\n        from huggingface_hub import DatasetCardData, HfApi\n\n        from argilla.datasets._io.card import (\n            ArgillaDatasetCard,\n            size_categories_parser,\n        )\n\n        hf_api = HfApi(token=kwargs.get(\"token\"))\n\n        hfds = False\n        if with_records:\n            hfds = self.records(with_vectors=True, with_responses=True, with_suggestions=True).to_datasets()\n            hfds.push_to_hub(repo_id, **kwargs)\n        else:\n            hf_api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=kwargs.get(\"exist_ok\") or True)\n\n        with TemporaryDirectory() as tmpdirname:\n            config_dir = os.path.join(tmpdirname)\n\n            self.to_disk(path=config_dir, with_records=False)\n\n            if generate_card:\n                sample_argilla_record = next(iter(self.records(with_suggestions=True, with_responses=True)))\n                sample_huggingface_record = self._get_sample_hf_record(hfds) if with_records else None\n                dataset_size = len(hfds) if with_records else 0\n                card = ArgillaDatasetCard.from_template(\n                    card_data=DatasetCardData(\n                        size_categories=size_categories_parser(dataset_size),\n                        tags=[\"rlfh\", \"argilla\", \"human-feedback\"],\n                    ),\n                    repo_id=repo_id,\n                    argilla_fields=self.settings.fields,\n                    argilla_questions=self.settings.questions,\n                    argilla_guidelines=self.settings.guidelines or None,\n                    argilla_vectors_settings=self.settings.vectors or None,\n                    argilla_metadata_properties=self.settings.metadata,\n                    argilla_record=sample_argilla_record.to_dict(),\n                    huggingface_record=sample_huggingface_record,\n                )\n                card.save(filepath=os.path.join(tmpdirname, \"README.md\"))\n\n            hf_api.upload_folder(\n                folder_path=tmpdirname,\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n            )\n\n    @classmethod\n    def from_hub(\n        cls: Type[\"Dataset\"],\n        repo_id: str,\n        *,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str]] = None,\n        client: Optional[\"Argilla\"] = None,\n        with_records: bool = True,\n        settings: Optional[\"Settings\"] = None,\n        split: Optional[str] = None,\n        subset: Optional[str] = None,\n        **kwargs: Any,\n    ) -> \"Dataset\":\n        \"\"\"Loads a `Dataset` from the Hugging Face Hub.\n\n        Parameters:\n            repo_id: the ID of the Hugging Face Hub repo to load the `Dataset` from.\n            name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n            workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n            client: the client to use to load the `Dataset`. If not provided, the default client will be used.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n            settings: the settings to use to load the `Dataset`. If not provided, the settings will be loaded from the Hugging Face dataset.\n            split: the split to load from the Hugging Face dataset. If not provided, the first split will be loaded.\n            **kwargs: the kwargs to pass to `datasets.Dataset.load_from_hub`.\n\n        Returns:\n            A `Dataset` loaded from the Hugging Face Hub.\n        \"\"\"\n        from datasets import load_dataset\n        from huggingface_hub import snapshot_download\n        from argilla import Dataset\n\n        if name is None:\n            name = Dataset._sanitize_name(repo_id)\n\n        if settings is not None:\n            dataset = cls(name=name, settings=settings)\n            dataset.create()\n        else:\n            try:\n                # download configuration files from the hub\n                folder_path = snapshot_download(\n                    repo_id=repo_id,\n                    repo_type=\"dataset\",\n                    allow_patterns=cls._DEFAULT_CONFIGURATION_FILES,\n                    token=kwargs.get(\"token\"),\n                )\n\n                dataset = cls.from_disk(\n                    path=folder_path, workspace=workspace, name=name, client=client, with_records=with_records\n                )\n            except ImportDatasetError:\n                from argilla import Settings\n\n                settings = Settings.from_hub(repo_id=repo_id, subset=subset)\n                dataset = cls.from_hub(\n                    repo_id=repo_id,\n                    name=name,\n                    workspace=workspace,\n                    client=client,\n                    with_records=with_records,\n                    settings=settings,\n                    split=split,\n                    subset=subset,\n                    **kwargs,\n                )\n                return dataset\n\n        if with_records:\n            try:\n                hf_dataset = load_dataset(\n                    path=repo_id,\n                    split=split,\n                    name=subset,\n                    **kwargs,\n                )  # type: ignore\n                hf_dataset = cls._get_dataset_split(hf_dataset=hf_dataset, split=split, **kwargs)\n                cls._log_dataset_records(hf_dataset=hf_dataset, dataset=dataset)\n            except EmptyDatasetError:\n                warnings.warn(\n                    message=\"Trying to load a dataset `with_records=True` but dataset does not contain any records.\",\n                    category=UserWarning,\n                )\n\n        return dataset\n\n    @staticmethod\n    def _log_dataset_records(hf_dataset: \"HFDataset\", dataset: \"Dataset\"):\n        \"\"\"This method extracts the responses from a Hugging Face dataset and returns a list of `Record` objects\"\"\"\n        # THIS IS REQUIRED SINCE THE NAME RESTRICTION IN ARGILLA. HUGGING FACE DATASET COLUMNS ARE CASE SENSITIVE\n        # Also, there is a logic with column names including \".responses\" and \".suggestion\" in the name.\n        columns_map = {}\n        for column in hf_dataset.column_names:\n            if \".responses\" in column or \".suggestion\" in column:\n                columns_map[column] = column.lower()\n            else:\n                columns_map[column] = dataset.settings._sanitize_settings_name(column)\n\n        hf_dataset = hf_dataset.rename_columns(columns_map)\n\n        # Identify columns that columns that contain responses\n        responses_columns = [col for col in hf_dataset.column_names if \".responses\" in col]\n        response_questions = defaultdict(dict)\n        user_ids = {}\n        for col in responses_columns:\n            question_name = col.split(\".\")[0]\n            if col.endswith(\"users\"):\n                response_questions[question_name][\"users\"] = hf_dataset[col]\n                user_ids.update({UUID(user_id): UUID(user_id) for user_id in set(sum(hf_dataset[col], []))})\n            elif col.endswith(\"responses\"):\n                response_questions[question_name][\"responses\"] = hf_dataset[col]\n            elif col.endswith(\"status\"):\n                response_questions[question_name][\"status\"] = hf_dataset[col]\n\n        # Check if all user ids are known to this Argilla client\n        known_users_ids = [user.id for user in dataset._client.users]\n        unknown_user_ids = set(user_ids.keys()) - set(known_users_ids)\n        my_user = dataset._client.me\n        if len(unknown_user_ids) > 1:\n            warnings.warn(\n                message=f\"\"\"Found unknown user ids in dataset repo: {unknown_user_ids}.\n                    Assigning first response for each record to current user ({my_user.username}) and discarding the rest.\"\"\"\n            )\n        for unknown_user_id in unknown_user_ids:\n            user_ids[unknown_user_id] = my_user.id\n\n        # Create a mapper to map the Hugging Face dataset to a Record object\n        mapping = {col: col for col in hf_dataset.column_names if \".suggestion\" in col}\n        mapper = IngestedRecordMapper(dataset=dataset, mapping=mapping, user_id=my_user.id)\n\n        # Extract responses and create Record objects\n        records = []\n        hf_dataset = HFDatasetsIO.to_argilla(hf_dataset=hf_dataset)\n        for idx, row in enumerate(hf_dataset):\n            record = mapper(row)\n            for question_name, values in response_questions.items():\n                response_values = values[\"responses\"][idx]\n                response_users = values[\"users\"][idx]\n                response_status = values[\"status\"][idx]\n                for value, user_id, status in zip(response_values, response_users, response_status):\n                    user_id = user_ids[UUID(user_id)]\n                    if user_id in response_users:\n                        continue\n                    response_users[user_id] = True\n                    response = Response(\n                        user_id=user_id,\n                        question_name=question_name,\n                        value=value,\n                        status=status,\n                    )\n                    record.responses.add(response)\n            records.append(record)\n\n        try:\n            dataset.records.log(records=records)\n        except (RecordsIngestionError, UnprocessableEntityError) as e:\n            raise SettingsError(\n                message=f\"Failed to load records from Hugging Face dataset. Defined settings do not match dataset schema. Hugging face dataset features: {hf_dataset.features}. Argilla dataset settings : {dataset.settings}\"\n            ) from e\n\n    @staticmethod\n    def _get_dataset_split(hf_dataset: \"HFDataset\", split: Optional[str] = None, **kwargs: Dict) -> \"HFDataset\":\n        \"\"\"Get a single dataset from a Hugging Face dataset.\n\n        Parameters:\n            hf_dataset (HFDataset): The Hugging Face dataset to get a single dataset from.\n\n        Returns:\n            HFDataset: The single dataset.\n        \"\"\"\n\n        if isinstance(hf_dataset, DatasetDict) and split is None:\n            split = next(iter(hf_dataset.keys()))\n            if len(hf_dataset.keys()) > 1:\n                warnings.warn(\n                    message=f\"Multiple splits found in Hugging Face dataset. Using the first split: {split}. \"\n                    f\"Available splits are: {', '.join(hf_dataset.keys())}.\"\n                )\n            hf_dataset = hf_dataset[split]\n        return hf_dataset\n\n    @staticmethod\n    def _get_sample_hf_record(hf_dataset: \"HFDataset\") -> Dict:\n        \"\"\"Get a sample record from a Hugging Face dataset.\n\n        Parameters:\n            hf_dataset (HFDataset): The Hugging Face dataset to get a sample record from.\n\n        Returns:\n            Dict: The sample record.\n        \"\"\"\n\n        if hf_dataset:\n            sample_huggingface_record = {}\n            for key, value in hf_dataset[0].items():\n                try:\n                    json.dumps(value)\n                    sample_huggingface_record[key] = value\n                except TypeError:\n                    if isinstance(value, Image.Image):\n                        sample_huggingface_record[key] = pil_to_data_uri(value)\n                    else:\n                        sample_huggingface_record[key] = \"Record value is not serializable\"\n            return sample_huggingface_record\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._hub.HubImportExportMixin.to_hub","title":"to_hub(repo_id, *, with_records=True, generate_card=True, **kwargs)","text":"

Pushes the Dataset to the Hugging Face Hub. If the dataset has been previously pushed to the Hugging Face Hub, it will be updated instead of creating a new dataset repo.

Parameters:

Name Type Description Default repo_id str

the ID of the Hugging Face Hub repo to push the Dataset to.

required with_records bool

whether to load the records from the Hugging Face dataset. Defaults to True.

True generate_card Optional[bool]

whether to generate a dataset card for the Dataset in the Hugging Face Hub. Defaults to True.

True **kwargs Any

the kwargs to pass to datasets.Dataset.push_to_hub.

{}

Returns:

Type Description None

None

Source code in src/argilla/datasets/_io/_hub.py
def to_hub(\n    self: \"Dataset\",\n    repo_id: str,\n    *,\n    with_records: bool = True,\n    generate_card: Optional[bool] = True,\n    **kwargs: Any,\n) -> None:\n    \"\"\"Pushes the `Dataset` to the Hugging Face Hub. If the dataset has been previously pushed to the\n    Hugging Face Hub, it will be updated instead of creating a new dataset repo.\n\n    Parameters:\n        repo_id: the ID of the Hugging Face Hub repo to push the `Dataset` to.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        generate_card: whether to generate a dataset card for the `Dataset` in the Hugging Face Hub. Defaults\n            to `True`.\n        **kwargs: the kwargs to pass to `datasets.Dataset.push_to_hub`.\n\n    Returns:\n        None\n    \"\"\"\n\n    from huggingface_hub import DatasetCardData, HfApi\n\n    from argilla.datasets._io.card import (\n        ArgillaDatasetCard,\n        size_categories_parser,\n    )\n\n    hf_api = HfApi(token=kwargs.get(\"token\"))\n\n    hfds = False\n    if with_records:\n        hfds = self.records(with_vectors=True, with_responses=True, with_suggestions=True).to_datasets()\n        hfds.push_to_hub(repo_id, **kwargs)\n    else:\n        hf_api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=kwargs.get(\"exist_ok\") or True)\n\n    with TemporaryDirectory() as tmpdirname:\n        config_dir = os.path.join(tmpdirname)\n\n        self.to_disk(path=config_dir, with_records=False)\n\n        if generate_card:\n            sample_argilla_record = next(iter(self.records(with_suggestions=True, with_responses=True)))\n            sample_huggingface_record = self._get_sample_hf_record(hfds) if with_records else None\n            dataset_size = len(hfds) if with_records else 0\n            card = ArgillaDatasetCard.from_template(\n                card_data=DatasetCardData(\n                    size_categories=size_categories_parser(dataset_size),\n                    tags=[\"rlfh\", \"argilla\", \"human-feedback\"],\n                ),\n                repo_id=repo_id,\n                argilla_fields=self.settings.fields,\n                argilla_questions=self.settings.questions,\n                argilla_guidelines=self.settings.guidelines or None,\n                argilla_vectors_settings=self.settings.vectors or None,\n                argilla_metadata_properties=self.settings.metadata,\n                argilla_record=sample_argilla_record.to_dict(),\n                huggingface_record=sample_huggingface_record,\n            )\n            card.save(filepath=os.path.join(tmpdirname, \"README.md\"))\n\n        hf_api.upload_folder(\n            folder_path=tmpdirname,\n            repo_id=repo_id,\n            repo_type=\"dataset\",\n        )\n
"},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._io._hub.HubImportExportMixin.from_hub","title":"from_hub(repo_id, *, name=None, workspace=None, client=None, with_records=True, settings=None, split=None, subset=None, **kwargs) classmethod","text":"

Loads a Dataset from the Hugging Face Hub.

Parameters:

Name Type Description Default repo_id str

the ID of the Hugging Face Hub repo to load the Dataset from.

required name str

The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.

None workspace Union[Workspace, str]

The workspace to import the dataset to. Defaults to None and default workspace is used.

None client Optional[Argilla]

the client to use to load the Dataset. If not provided, the default client will be used.

None with_records bool

whether to load the records from the Hugging Face dataset. Defaults to True.

True settings Optional[Settings]

the settings to use to load the Dataset. If not provided, the settings will be loaded from the Hugging Face dataset.

None split Optional[str]

the split to load from the Hugging Face dataset. If not provided, the first split will be loaded.

None **kwargs Any

the kwargs to pass to datasets.Dataset.load_from_hub.

{}

Returns:

Type Description Dataset

A Dataset loaded from the Hugging Face Hub.

Source code in src/argilla/datasets/_io/_hub.py
@classmethod\ndef from_hub(\n    cls: Type[\"Dataset\"],\n    repo_id: str,\n    *,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str]] = None,\n    client: Optional[\"Argilla\"] = None,\n    with_records: bool = True,\n    settings: Optional[\"Settings\"] = None,\n    split: Optional[str] = None,\n    subset: Optional[str] = None,\n    **kwargs: Any,\n) -> \"Dataset\":\n    \"\"\"Loads a `Dataset` from the Hugging Face Hub.\n\n    Parameters:\n        repo_id: the ID of the Hugging Face Hub repo to load the `Dataset` from.\n        name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n        workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n        client: the client to use to load the `Dataset`. If not provided, the default client will be used.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        settings: the settings to use to load the `Dataset`. If not provided, the settings will be loaded from the Hugging Face dataset.\n        split: the split to load from the Hugging Face dataset. If not provided, the first split will be loaded.\n        **kwargs: the kwargs to pass to `datasets.Dataset.load_from_hub`.\n\n    Returns:\n        A `Dataset` loaded from the Hugging Face Hub.\n    \"\"\"\n    from datasets import load_dataset\n    from huggingface_hub import snapshot_download\n    from argilla import Dataset\n\n    if name is None:\n        name = Dataset._sanitize_name(repo_id)\n\n    if settings is not None:\n        dataset = cls(name=name, settings=settings)\n        dataset.create()\n    else:\n        try:\n            # download configuration files from the hub\n            folder_path = snapshot_download(\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n                allow_patterns=cls._DEFAULT_CONFIGURATION_FILES,\n                token=kwargs.get(\"token\"),\n            )\n\n            dataset = cls.from_disk(\n                path=folder_path, workspace=workspace, name=name, client=client, with_records=with_records\n            )\n        except ImportDatasetError:\n            from argilla import Settings\n\n            settings = Settings.from_hub(repo_id=repo_id, subset=subset)\n            dataset = cls.from_hub(\n                repo_id=repo_id,\n                name=name,\n                workspace=workspace,\n                client=client,\n                with_records=with_records,\n                settings=settings,\n                split=split,\n                subset=subset,\n                **kwargs,\n            )\n            return dataset\n\n    if with_records:\n        try:\n            hf_dataset = load_dataset(\n                path=repo_id,\n                split=split,\n                name=subset,\n                **kwargs,\n            )  # type: ignore\n            hf_dataset = cls._get_dataset_split(hf_dataset=hf_dataset, split=split, **kwargs)\n            cls._log_dataset_records(hf_dataset=hf_dataset, dataset=dataset)\n        except EmptyDatasetError:\n            warnings.warn(\n                message=\"Trying to load a dataset `with_records=True` but dataset does not contain any records.\",\n                category=UserWarning,\n            )\n\n    return dataset\n
"},{"location":"reference/argilla/records/metadata/","title":"metadata","text":"

Metadata in argilla is a dictionary that can be attached to a record. It is used to store additional information about the record that is not part of the record's fields or responses. For example, the source of the record, the date it was created, or any other information that is relevant to the record. Metadata can be added to a record directly or as valules within a dictionary.

"},{"location":"reference/argilla/records/metadata/#usage-examples","title":"Usage Examples","text":"

To use metadata within a dataset, you must define a metadata property in the dataset settings. The metadata property is a list of metadata properties that can be attached to a record. The following example demonstrates how to add metadata to a dataset and how to access metadata from a record object:

import argilla as rg\n\ndataset = Dataset(\n    name=\"dataset_with_metadata\",\n    settings=Settings(\n        fields=[TextField(name=\"text\")],\n        questions=[LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])],\n        metadata=[\n            rg.TermsMetadataProperty(name=\"category\", options=[\"A\", \"B\", \"C\"]),\n        ],\n    ),\n)\ndataset.create()\n

Then, you can add records to the dataset with metadata that corresponds to the metadata property defined in the dataset settings:

dataset_with_metadata.records.log(\n    [\n        {\"text\": \"text\", \"label\": \"positive\", \"category\": \"A\"},\n        {\"text\": \"text\", \"label\": \"negative\", \"category\": \"B\"},\n    ]\n)\n
"},{"location":"reference/argilla/records/metadata/#format-per-metadataproperty-type","title":"Format per MetadataProperty type","text":"

Depending on the MetadataProperty type, metadata might need to be formatted in a slightly different way.

For TermsMetadataPropertyFor FloatMetadataPropertyFor IntegerMetadataProperty
rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": \"A\"}\n)\n\n# with multiple terms\n\nrg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": [\"A\", \"B\"]}\n)\n
rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": 2.1}\n)\n
rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": 42}\n)\n
"},{"location":"reference/argilla/records/records/","title":"rg.Record","text":"

The Record object is used to represent a single record in Argilla. It contains fields, suggestions, responses, metadata, and vectors.

"},{"location":"reference/argilla/records/records/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/records/records/#creating-a-record","title":"Creating a Record","text":"

To create records, you can use the Record class and pass it to the Dataset.records.log method. The Record class requires a fields parameter, which is a dictionary of field names and values. The field names must match the field names in the dataset's Settings object to be accepted.

dataset.records.log(\n    records=[\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n        ),\n    ]\n) # (1)\n
  1. The Argilla dataset contains a field named text matching the key here.

To create records with image fields, pass the image to the record object as either a remote url, local path to an image file, or a PIL object. The field names must be defined as an rg.ImageFieldin the dataset's Settings object to be accepted. Images will be stored in the Argilla database and returned as rescaled PIL objects.

dataset.records.log(\n    records=[\n        rg.Record(\n            fields={\"image\": \"https://example.com/image.jpg\"}, # (1)\n        ),\n    ]\n)\n
  1. The image can be referenced as either a remote url, a local file path, or a PIL object.

Note

The image will be stored in the Argilla database and can impact the dataset's storage usage. Images should be less than 5mb in size and datasets should contain less than 10,000 images.

"},{"location":"reference/argilla/records/records/#accessing-record-attributes","title":"Accessing Record Attributes","text":"

The Record object has suggestions, responses, metadata, and vectors attributes that can be accessed directly whilst iterating over records in a dataset.

for record in dataset.records(\n    with_suggestions=True,\n    with_responses=True,\n    with_metadata=True,\n    with_vectors=True\n    ):\n    print(record.suggestions)\n    print(record.responses)\n    print(record.metadata)\n    print(record.vectors)\n

Record properties can also be updated whilst iterating over records in a dataset.

for record in dataset.records(with_metadata=True):\n    record.metadata = {\"department\": \"toys\"}\n

For changes to take effect, the user must call the update method on the Dataset object, or pass the updated records to Dataset.records.log. All core record atttributes can be updated in this way. Check their respective documentation for more information: Suggestions, Responses, Metadata, Vectors.

"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record","title":"Record","text":"

Bases: Resource

The class for interacting with Argilla Records. A Record is a single sample in a dataset. Records receives feedback in the form of responses and suggestions. Records contain fields, metadata, and vectors.

Attributes:

Name Type Description id Union[str, UUID]

The id of the record.

fields RecordFields

The fields of the record.

metadata RecordMetadata

The metadata of the record.

vectors RecordVectors

The vectors of the record.

responses RecordResponses

The responses of the record.

suggestions RecordSuggestions

The suggestions of the record.

dataset Dataset

The dataset to which the record belongs.

_server_id UUID

An id for the record generated by the Argilla server.

Source code in src/argilla/records/_resource.py
class Record(Resource):\n    \"\"\"The class for interacting with Argilla Records. A `Record` is a single sample\n    in a dataset. Records receives feedback in the form of responses and suggestions.\n    Records contain fields, metadata, and vectors.\n\n    Attributes:\n        id (Union[str, UUID]): The id of the record.\n        fields (RecordFields): The fields of the record.\n        metadata (RecordMetadata): The metadata of the record.\n        vectors (RecordVectors): The vectors of the record.\n        responses (RecordResponses): The responses of the record.\n        suggestions (RecordSuggestions): The suggestions of the record.\n        dataset (Dataset): The dataset to which the record belongs.\n        _server_id (UUID): An id for the record generated by the Argilla server.\n    \"\"\"\n\n    _model: RecordModel\n\n    def __init__(\n        self,\n        id: Optional[Union[UUID, str]] = None,\n        fields: Optional[Dict[str, FieldValue]] = None,\n        metadata: Optional[Dict[str, MetadataValue]] = None,\n        vectors: Optional[Dict[str, VectorValue]] = None,\n        responses: Optional[List[Response]] = None,\n        suggestions: Optional[List[Suggestion]] = None,\n        _server_id: Optional[UUID] = None,\n        _dataset: Optional[\"Dataset\"] = None,\n    ):\n        \"\"\"Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id.\n        Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions\n        and passed to Dataset.DatasetRecords.add() as a list of dictionaries.\n\n        Args:\n            id: An id for the record. If not provided, a UUID will be generated.\n            fields: A dictionary of fields for the record.\n            metadata: A dictionary of metadata for the record.\n            vectors: A dictionary of vectors for the record.\n            responses: A list of Response objects for the record.\n            suggestions: A list of Suggestion objects for the record.\n            _server_id: An id for the record. (Read-only and set by the server)\n            _dataset: The dataset object to which the record belongs.\n        \"\"\"\n\n        if fields is None and metadata is None and vectors is None and responses is None and suggestions is None:\n            raise ValueError(\"At least one of fields, metadata, vectors, responses, or suggestions must be provided.\")\n        if fields is None and id is None:\n            raise ValueError(\"If fields are not provided, an id must be provided.\")\n        if fields == {} and id is None:\n            raise ValueError(\"If fields are an empty dictionary, an id must be provided.\")\n\n        self._dataset = _dataset\n        self._model = RecordModel(external_id=id, id=_server_id)\n        self.__fields = RecordFields(fields=fields, record=self)\n        self.__vectors = RecordVectors(vectors=vectors)\n        self.__metadata = RecordMetadata(metadata=metadata)\n        self.__responses = RecordResponses(responses=responses, record=self)\n        self.__suggestions = RecordSuggestions(suggestions=suggestions, record=self)\n\n    def __repr__(self) -> str:\n        return (\n            f\"Record(id={self.id},status={self.status},fields={self.fields},metadata={self.metadata},\"\n            f\"suggestions={self.suggestions},responses={self.responses})\"\n        )\n\n    ############################\n    # Properties\n    ############################\n\n    @property\n    def id(self) -> str:\n        return self._model.external_id\n\n    @id.setter\n    def id(self, value: str) -> None:\n        self._model.external_id = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n\n    @property\n    def fields(self) -> \"RecordFields\":\n        return self.__fields\n\n    @property\n    def responses(self) -> \"RecordResponses\":\n        return self.__responses\n\n    @property\n    def suggestions(self) -> \"RecordSuggestions\":\n        return self.__suggestions\n\n    @property\n    def metadata(self) -> \"RecordMetadata\":\n        return self.__metadata\n\n    @property\n    def vectors(self) -> \"RecordVectors\":\n        return self.__vectors\n\n    @property\n    def status(self) -> str:\n        return self._model.status\n\n    @property\n    def _server_id(self) -> Optional[UUID]:\n        return self._model.id\n\n    ############################\n    # Public methods\n    ############################\n\n    def get(self) -> \"Record\":\n        \"\"\"Retrieves the record from the server.\"\"\"\n        model = self._client.api.records.get(self._server_id)\n        instance = self.from_model(model, dataset=self.dataset)\n        self.__dict__ = instance.__dict__\n\n        return self\n\n    def api_model(self) -> RecordModel:\n        return RecordModel(\n            id=self._model.id,\n            external_id=self._model.external_id,\n            fields=self.fields.to_dict(),\n            metadata=self.metadata.api_models(),\n            vectors=self.vectors.api_models(),\n            responses=self.responses.api_models(),\n            suggestions=self.suggestions.api_models(),\n            status=self.status,\n        )\n\n    def serialize(self) -> Dict[str, Any]:\n        \"\"\"Serializes the Record to a dictionary for interaction with the API\"\"\"\n        serialized_model = self._model.model_dump()\n        serialized_suggestions = [suggestion.serialize() for suggestion in self.__suggestions]\n        serialized_responses = [response.serialize() for response in self.__responses]\n        serialized_model[\"responses\"] = serialized_responses\n        serialized_model[\"suggestions\"] = serialized_suggestions\n\n        return serialized_model\n\n    def to_dict(self) -> Dict[str, Dict]:\n        \"\"\"Converts a Record object to a dictionary for export.\n        Returns:\n            A dictionary representing the record where the keys are \"fields\",\n            \"metadata\", \"suggestions\", and \"responses\". Each field and question is\n            represented as a key-value pair in the dictionary of the respective key. i.e.\n            `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},\n        \"\"\"\n        id = str(self.id) if self.id else None\n        server_id = str(self._model.id) if self._model.id else None\n        status = self.status\n        fields = self.fields.to_dict()\n        metadata = self.metadata.to_dict()\n        suggestions = self.suggestions.to_dict()\n        responses = self.responses.to_dict()\n        vectors = self.vectors.to_dict()\n\n        # TODO: Review model attributes when to_dict and serialize methods are unified\n        return {\n            \"id\": id,\n            \"fields\": fields,\n            \"metadata\": metadata,\n            \"suggestions\": suggestions,\n            \"responses\": responses,\n            \"vectors\": vectors,\n            \"status\": status,\n            \"_server_id\": server_id,\n        }\n\n    @classmethod\n    def from_dict(cls, data: Dict[str, Dict], dataset: Optional[\"Dataset\"] = None) -> \"Record\":\n        \"\"\"Converts a dictionary to a Record object.\n        Args:\n            data: A dictionary representing the record.\n            dataset: The dataset object to which the record belongs.\n        Returns:\n            A Record object.\n        \"\"\"\n        fields = data.get(\"fields\", {})\n        metadata = data.get(\"metadata\", {})\n        suggestions = data.get(\"suggestions\", {})\n        responses = data.get(\"responses\", {})\n        vectors = data.get(\"vectors\", {})\n        record_id = data.get(\"id\", None)\n        _server_id = data.get(\"_server_id\", None)\n\n        suggestions = [Suggestion(question_name=question_name, **value) for question_name, value in suggestions.items()]\n        responses = [\n            Response(question_name=question_name, **value)\n            for question_name, _responses in responses.items()\n            for value in _responses\n        ]\n\n        return cls(\n            id=record_id,\n            fields=fields,\n            suggestions=suggestions,\n            responses=responses,\n            vectors=vectors,\n            metadata=metadata,\n            _dataset=dataset,\n            _server_id=_server_id,\n        )\n\n    @classmethod\n    def from_model(cls, model: RecordModel, dataset: \"Dataset\") -> \"Record\":\n        \"\"\"Converts a RecordModel object to a Record object.\n        Args:\n            model: A RecordModel object.\n            dataset: The dataset object to which the record belongs.\n        Returns:\n            A Record object.\n        \"\"\"\n        instance = cls(\n            id=model.external_id,\n            fields=model.fields,\n            metadata={meta.name: meta.value for meta in model.metadata},\n            vectors={vector.name: vector.vector_values for vector in model.vectors},\n            _dataset=dataset,\n            responses=[],\n            suggestions=[],\n        )\n\n        # set private attributes\n        instance._dataset = dataset\n        instance._model = model\n\n        # Responses and suggestions are computed separately based on the record model\n        instance.responses.from_models(model.responses)\n        instance.suggestions.from_models(model.suggestions)\n\n        return instance\n\n    @property\n    def _client(self) -> Optional[\"Argilla\"]:\n        if self._dataset:\n            return self.dataset._client\n\n    @property\n    def _api(self) -> Optional[\"RecordsAPI\"]:\n        if self._client:\n            return self._client.api.records\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.__init__","title":"__init__(id=None, fields=None, metadata=None, vectors=None, responses=None, suggestions=None, _server_id=None, _dataset=None)","text":"

Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id. Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions and passed to Dataset.DatasetRecords.add() as a list of dictionaries.

Parameters:

Name Type Description Default id Optional[Union[UUID, str]]

An id for the record. If not provided, a UUID will be generated.

None fields Optional[Dict[str, FieldValue]]

A dictionary of fields for the record.

None metadata Optional[Dict[str, MetadataValue]]

A dictionary of metadata for the record.

None vectors Optional[Dict[str, VectorValue]]

A dictionary of vectors for the record.

None responses Optional[List[Response]]

A list of Response objects for the record.

None suggestions Optional[List[Suggestion]]

A list of Suggestion objects for the record.

None _server_id Optional[UUID]

An id for the record. (Read-only and set by the server)

None _dataset Optional[Dataset]

The dataset object to which the record belongs.

None Source code in src/argilla/records/_resource.py
def __init__(\n    self,\n    id: Optional[Union[UUID, str]] = None,\n    fields: Optional[Dict[str, FieldValue]] = None,\n    metadata: Optional[Dict[str, MetadataValue]] = None,\n    vectors: Optional[Dict[str, VectorValue]] = None,\n    responses: Optional[List[Response]] = None,\n    suggestions: Optional[List[Suggestion]] = None,\n    _server_id: Optional[UUID] = None,\n    _dataset: Optional[\"Dataset\"] = None,\n):\n    \"\"\"Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id.\n    Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions\n    and passed to Dataset.DatasetRecords.add() as a list of dictionaries.\n\n    Args:\n        id: An id for the record. If not provided, a UUID will be generated.\n        fields: A dictionary of fields for the record.\n        metadata: A dictionary of metadata for the record.\n        vectors: A dictionary of vectors for the record.\n        responses: A list of Response objects for the record.\n        suggestions: A list of Suggestion objects for the record.\n        _server_id: An id for the record. (Read-only and set by the server)\n        _dataset: The dataset object to which the record belongs.\n    \"\"\"\n\n    if fields is None and metadata is None and vectors is None and responses is None and suggestions is None:\n        raise ValueError(\"At least one of fields, metadata, vectors, responses, or suggestions must be provided.\")\n    if fields is None and id is None:\n        raise ValueError(\"If fields are not provided, an id must be provided.\")\n    if fields == {} and id is None:\n        raise ValueError(\"If fields are an empty dictionary, an id must be provided.\")\n\n    self._dataset = _dataset\n    self._model = RecordModel(external_id=id, id=_server_id)\n    self.__fields = RecordFields(fields=fields, record=self)\n    self.__vectors = RecordVectors(vectors=vectors)\n    self.__metadata = RecordMetadata(metadata=metadata)\n    self.__responses = RecordResponses(responses=responses, record=self)\n    self.__suggestions = RecordSuggestions(suggestions=suggestions, record=self)\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.get","title":"get()","text":"

Retrieves the record from the server.

Source code in src/argilla/records/_resource.py
def get(self) -> \"Record\":\n    \"\"\"Retrieves the record from the server.\"\"\"\n    model = self._client.api.records.get(self._server_id)\n    instance = self.from_model(model, dataset=self.dataset)\n    self.__dict__ = instance.__dict__\n\n    return self\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.serialize","title":"serialize()","text":"

Serializes the Record to a dictionary for interaction with the API

Source code in src/argilla/records/_resource.py
def serialize(self) -> Dict[str, Any]:\n    \"\"\"Serializes the Record to a dictionary for interaction with the API\"\"\"\n    serialized_model = self._model.model_dump()\n    serialized_suggestions = [suggestion.serialize() for suggestion in self.__suggestions]\n    serialized_responses = [response.serialize() for response in self.__responses]\n    serialized_model[\"responses\"] = serialized_responses\n    serialized_model[\"suggestions\"] = serialized_suggestions\n\n    return serialized_model\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.to_dict","title":"to_dict()","text":"

Converts a Record object to a dictionary for export. Returns: A dictionary representing the record where the keys are \"fields\", \"metadata\", \"suggestions\", and \"responses\". Each field and question is represented as a key-value pair in the dictionary of the respective key. i.e. `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},

Source code in src/argilla/records/_resource.py
def to_dict(self) -> Dict[str, Dict]:\n    \"\"\"Converts a Record object to a dictionary for export.\n    Returns:\n        A dictionary representing the record where the keys are \"fields\",\n        \"metadata\", \"suggestions\", and \"responses\". Each field and question is\n        represented as a key-value pair in the dictionary of the respective key. i.e.\n        `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},\n    \"\"\"\n    id = str(self.id) if self.id else None\n    server_id = str(self._model.id) if self._model.id else None\n    status = self.status\n    fields = self.fields.to_dict()\n    metadata = self.metadata.to_dict()\n    suggestions = self.suggestions.to_dict()\n    responses = self.responses.to_dict()\n    vectors = self.vectors.to_dict()\n\n    # TODO: Review model attributes when to_dict and serialize methods are unified\n    return {\n        \"id\": id,\n        \"fields\": fields,\n        \"metadata\": metadata,\n        \"suggestions\": suggestions,\n        \"responses\": responses,\n        \"vectors\": vectors,\n        \"status\": status,\n        \"_server_id\": server_id,\n    }\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.from_dict","title":"from_dict(data, dataset=None) classmethod","text":"

Converts a dictionary to a Record object. Args: data: A dictionary representing the record. dataset: The dataset object to which the record belongs. Returns: A Record object.

Source code in src/argilla/records/_resource.py
@classmethod\ndef from_dict(cls, data: Dict[str, Dict], dataset: Optional[\"Dataset\"] = None) -> \"Record\":\n    \"\"\"Converts a dictionary to a Record object.\n    Args:\n        data: A dictionary representing the record.\n        dataset: The dataset object to which the record belongs.\n    Returns:\n        A Record object.\n    \"\"\"\n    fields = data.get(\"fields\", {})\n    metadata = data.get(\"metadata\", {})\n    suggestions = data.get(\"suggestions\", {})\n    responses = data.get(\"responses\", {})\n    vectors = data.get(\"vectors\", {})\n    record_id = data.get(\"id\", None)\n    _server_id = data.get(\"_server_id\", None)\n\n    suggestions = [Suggestion(question_name=question_name, **value) for question_name, value in suggestions.items()]\n    responses = [\n        Response(question_name=question_name, **value)\n        for question_name, _responses in responses.items()\n        for value in _responses\n    ]\n\n    return cls(\n        id=record_id,\n        fields=fields,\n        suggestions=suggestions,\n        responses=responses,\n        vectors=vectors,\n        metadata=metadata,\n        _dataset=dataset,\n        _server_id=_server_id,\n    )\n
"},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.from_model","title":"from_model(model, dataset) classmethod","text":"

Converts a RecordModel object to a Record object. Args: model: A RecordModel object. dataset: The dataset object to which the record belongs. Returns: A Record object.

Source code in src/argilla/records/_resource.py
@classmethod\ndef from_model(cls, model: RecordModel, dataset: \"Dataset\") -> \"Record\":\n    \"\"\"Converts a RecordModel object to a Record object.\n    Args:\n        model: A RecordModel object.\n        dataset: The dataset object to which the record belongs.\n    Returns:\n        A Record object.\n    \"\"\"\n    instance = cls(\n        id=model.external_id,\n        fields=model.fields,\n        metadata={meta.name: meta.value for meta in model.metadata},\n        vectors={vector.name: vector.vector_values for vector in model.vectors},\n        _dataset=dataset,\n        responses=[],\n        suggestions=[],\n    )\n\n    # set private attributes\n    instance._dataset = dataset\n    instance._model = model\n\n    # Responses and suggestions are computed separately based on the record model\n    instance.responses.from_models(model.responses)\n    instance.suggestions.from_models(model.suggestions)\n\n    return instance\n
"},{"location":"reference/argilla/records/responses/","title":"rg.Response","text":"

Class for interacting with Argilla Responses of records. Responses are answers to questions by a user. Therefore, a recod question can have multiple responses, one for each user that has answered the question. A Response is typically created by a user in the UI or consumed from a data source as a label, unlike a Suggestion which is typically created by a model prediction.

"},{"location":"reference/argilla/records/responses/#usage-examples","title":"Usage Examples","text":"

Responses can be added to an instantiated Record directly or as a dictionary a dictionary. The following examples demonstrate how to add responses to a record object and how to access responses from a record object:

Instantiate the Record and related Response objects:

dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            responses=[rg.Response(\"label\", \"negative\", user_id=user.id)],\n            external_id=str(uuid.uuid4()),\n        )\n    ]\n)\n

Or, add a response from a dictionary where key is the question name and value is the response:

dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"label.response\": \"negative\",\n        },\n    ]\n)\n

Responses can be accessed from a Record via their question name as an attribute of the record. So if a question is named label, the response can be accessed as record.label. The following example demonstrates how to access responses from a record object:

# iterate over the records and responses\n\nfor record in dataset.records:\n    for response in record.responses[\"label\"]: # (1)\n        print(response.value)\n        print(response.user_id)\n\n# validate that the record has a response\n\nfor record in dataset.records:\n    if record.responses[\"label\"]:\n        for response in record.responses[\"label\"]:\n            print(response.value)\n            print(response.user_id)\n    else:\n        record.responses.add(\n            rg.Response(\"label\", \"positive\", user_id=user.id)\n        ) # (2)\n
1. Access the responses for the question named label for each record like a dictionary containing a list of Response objects. 2. Add a response to the record if it does not already have one.

"},{"location":"reference/argilla/records/responses/#format-per-question-type","title":"Format per Question type","text":"

Depending on the Question type, responses might need to be formatted in a slightly different way.

For LabelQuestionFor MultiLabelQuestionFor RankingQuestionFor RatingQuestionFor SpanQuestionFor TextQuestion
rg.Response(\n    question_name=\"label\",\n    value=\"positive\",\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"multi-label\",\n    value=[\"positive\", \"negative\"],\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"rank\",\n    value=[\"1\", \"3\", \"2\"],\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"rating\",\n    value=4,\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"span\",\n    value=[{\"start\": 0, \"end\": 9, \"label\": \"MISC\"}],\n    user_id=user.id,\n    status=\"draft\"\n)\n
rg.Response(\n    question_name=\"text\",\n    value=\"value\",\n    user_id=user.id,\n    status=\"draft\"\n)\n
"},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response","title":"Response","text":"

Class for interacting with Argilla Responses of records. Responses are answers to questions by a user. Therefore, a record question can have multiple responses, one for each user that has answered the question. A Response is typically created by a user in the UI or consumed from a data source as a label, unlike a Suggestion which is typically created by a model prediction.

Source code in src/argilla/responses.py
class Response:\n    \"\"\"Class for interacting with Argilla Responses of records. Responses are answers to questions by a user.\n    Therefore, a record question can have multiple responses, one for each user that has answered the question.\n    A `Response` is typically created by a user in the UI or consumed from a data source as a label,\n    unlike a `Suggestion` which is typically created by a model prediction.\n\n    \"\"\"\n\n    def __init__(\n        self,\n        question_name: str,\n        value: Any,\n        user_id: UUID,\n        status: Optional[Union[ResponseStatus, str]] = None,\n        _record: Optional[\"Record\"] = None,\n    ) -> None:\n        \"\"\"Initializes a `Response` for a `Record` with a user_id and value\n\n        Attributes:\n            question_name (str): The name of the question that the suggestion is for.\n            value (str): The value of the response\n            user_id (UUID): The id of the user that submits the response\n            status (Union[ResponseStatus, str]): The status of the response as \"draft\", \"submitted\", \"discarded\".\n        \"\"\"\n\n        if question_name is None:\n            raise ValueError(\"question_name is required\")\n        if value is None:\n            raise ValueError(\"value is required\")\n        if user_id is None:\n            raise ValueError(\"user_id is required\")\n\n        if isinstance(status, str):\n            status = ResponseStatus(status)\n\n        self._record = _record\n        self.question_name = question_name\n        self.value = value\n        self.user_id = user_id\n        self.status = status\n\n    @property\n    def record(self) -> \"Record\":\n        \"\"\"Returns the record associated with the response\"\"\"\n        return self._record\n\n    @record.setter\n    def record(self, record: \"Record\") -> None:\n        \"\"\"Sets the record associated with the response\"\"\"\n        self._record = record\n\n    def serialize(self) -> dict[str, Any]:\n        \"\"\"Serializes the Response to a dictionary. This is principally used for sending the response to the API, \\\n            but can be used for data wrangling or manual export.\n\n        Returns:\n            dict[str, Any]: The serialized response as a dictionary with keys `question_name`, `value`, and `user_id`.\n\n        Examples:\n\n        ```python\n        response = rg.Response(\"label\", \"negative\", user_id=user.id)\n        response.serialize()\n        ```\n        \"\"\"\n        return {\n            \"question_name\": self.question_name,\n            \"value\": self.value,\n            \"user_id\": self.user_id,\n            \"status\": self.status,\n        }\n
"},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.record","title":"record: Record property writable","text":"

Returns the record associated with the response

"},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.__init__","title":"__init__(question_name, value, user_id, status=None, _record=None)","text":"

Initializes a Response for a Record with a user_id and value

Attributes:

Name Type Description question_name str

The name of the question that the suggestion is for.

value str

The value of the response

user_id UUID

The id of the user that submits the response

status Union[ResponseStatus, str]

The status of the response as \"draft\", \"submitted\", \"discarded\".

Source code in src/argilla/responses.py
def __init__(\n    self,\n    question_name: str,\n    value: Any,\n    user_id: UUID,\n    status: Optional[Union[ResponseStatus, str]] = None,\n    _record: Optional[\"Record\"] = None,\n) -> None:\n    \"\"\"Initializes a `Response` for a `Record` with a user_id and value\n\n    Attributes:\n        question_name (str): The name of the question that the suggestion is for.\n        value (str): The value of the response\n        user_id (UUID): The id of the user that submits the response\n        status (Union[ResponseStatus, str]): The status of the response as \"draft\", \"submitted\", \"discarded\".\n    \"\"\"\n\n    if question_name is None:\n        raise ValueError(\"question_name is required\")\n    if value is None:\n        raise ValueError(\"value is required\")\n    if user_id is None:\n        raise ValueError(\"user_id is required\")\n\n    if isinstance(status, str):\n        status = ResponseStatus(status)\n\n    self._record = _record\n    self.question_name = question_name\n    self.value = value\n    self.user_id = user_id\n    self.status = status\n
"},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.serialize","title":"serialize()","text":"

Serializes the Response to a dictionary. This is principally used for sending the response to the API, but can be used for data wrangling or manual export.

Returns:

Type Description dict[str, Any]

dict[str, Any]: The serialized response as a dictionary with keys question_name, value, and user_id.

Examples:

response = rg.Response(\"label\", \"negative\", user_id=user.id)\nresponse.serialize()\n
Source code in src/argilla/responses.py
def serialize(self) -> dict[str, Any]:\n    \"\"\"Serializes the Response to a dictionary. This is principally used for sending the response to the API, \\\n        but can be used for data wrangling or manual export.\n\n    Returns:\n        dict[str, Any]: The serialized response as a dictionary with keys `question_name`, `value`, and `user_id`.\n\n    Examples:\n\n    ```python\n    response = rg.Response(\"label\", \"negative\", user_id=user.id)\n    response.serialize()\n    ```\n    \"\"\"\n    return {\n        \"question_name\": self.question_name,\n        \"value\": self.value,\n        \"user_id\": self.user_id,\n        \"status\": self.status,\n    }\n
"},{"location":"reference/argilla/records/suggestions/","title":"rg.Suggestion","text":"

Class for interacting with Argilla Suggestions of records. Suggestions are typically created by a model prediction, unlike a Response which is typically created by a user in the UI or consumed from a data source as a label.

"},{"location":"reference/argilla/records/suggestions/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/records/suggestions/#adding-records-with-suggestions","title":"Adding records with suggestions","text":"

Suggestions can be added to a record directly or via a dictionary structure. The following examples demonstrate how to add suggestions to a record object and how to access suggestions from a record object:

Add a response from a dictionary where key is the question name and value is the response:

dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"label\": \"negative\", # this will be used as a suggestion\n        },\n    ]\n)\n

If your data contains scores for suggestions you can add them as well via the mapping parameter. The following example demonstrates how to add a suggestion with a score to a record object:

dataset.records.log(\n    [\n        {\n            \"prompt\": \"Hello World, how are you?\",\n            \"label\": \"negative\",  # this will be used as a suggestion\n            \"score\": 0.9,  # this will be used as the suggestion score\n            \"model\": \"model_name\",  # this will be used as the suggestion agent\n        },\n    ],\n    mapping={\n        \"score\": \"label.suggestion.score\",\n        \"model\": \"label.suggestion.agent\",\n    },  # `label` is the question name in the dataset settings\n)\n

Or, instantiate the Record and related Suggestions objects directly, like this:

dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            suggestions=[rg.Suggestion(\"negative\", \"label\", score=0.9, agent=\"model_name\")],\n        )\n    ]\n)\n
"},{"location":"reference/argilla/records/suggestions/#iterating-over-records-with-suggestions","title":"Iterating over records with suggestions","text":"

Just like responses, suggestions can be accessed from a Record via their question name as an attribute of the record. So if a question is named label, the suggestion can be accessed as record.label. The following example demonstrates how to access suggestions from a record object:

for record in dataset.records(with_suggestions=True):\n    print(record.suggestions[\"label\"].value)\n

We can also add suggestions to records as we iterate over them using the add method:

for record in dataset.records(with_suggestions=True):\n    if not record.suggestions[\"label\"]: # (1)\n        record.suggestions.add(\n            rg.Suggestion(\"positive\", \"label\", score=0.9, agent=\"model_name\")\n        ) # (2)\n
  1. Validate that the record has a suggestion
  2. Add a suggestion to the record if it does not already have one
"},{"location":"reference/argilla/records/suggestions/#format-per-question-type","title":"Format per Question type","text":"

Depending on the Question type, responses might need to be formatted in a slightly different way.

For LabelQuestionFor MultiLabelQuestionFor RankingQuestionFor RatingQuestionFor SpanQuestionFor TextQuestion
rg.Suggestion(\n    question_name=\"label\",\n    value=\"positive\",\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"multi-label\",\n    value=[\"positive\", \"negative\"],\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"rank\",\n    value=[\"1\", \"3\", \"2\"],\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"rating\",\n    value=4,\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"span\",\n    value=[{\"start\": 0, \"end\": 9, \"label\": \"MISC\"}],\n    score=0.9,\n    agent=\"model_name\"\n)\n
rg.Suggestion(\n    question_name=\"text\",\n    value=\"value\",\n    score=0.9,\n    agent=\"model_name\"\n)\n
"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion","title":"Suggestion","text":"

Bases: Resource

Class for interacting with Argilla Suggestions. Suggestions are typically model predictions for records. Suggestions are rendered in the user interfaces as 'hints' or 'suggestions' for the user to review and accept or reject.

Attributes:

Name Type Description question_name str

The name of the question that the suggestion is for.

value str

The value of the suggestion

score float

The score of the suggestion. For example, the probability of the model prediction.

agent str

The agent that created the suggestion. For example, the model name.

type str

The type of suggestion, either 'model' or 'human'.

Source code in src/argilla/suggestions.py
class Suggestion(Resource):\n    \"\"\"Class for interacting with Argilla Suggestions. Suggestions are typically model predictions for records.\n    Suggestions are rendered in the user interfaces as 'hints' or 'suggestions' for the user to review and accept or reject.\n\n    Attributes:\n        question_name (str): The name of the question that the suggestion is for.\n        value (str): The value of the suggestion\n        score (float): The score of the suggestion. For example, the probability of the model prediction.\n        agent (str): The agent that created the suggestion. For example, the model name.\n        type (str): The type of suggestion, either 'model' or 'human'.\n    \"\"\"\n\n    _model: SuggestionModel\n\n    def __init__(\n        self,\n        question_name: str,\n        value: Any,\n        score: Union[float, List[float], None] = None,\n        agent: Optional[str] = None,\n        type: Optional[Literal[\"model\", \"human\"]] = None,\n        _record: Optional[\"Record\"] = None,\n    ) -> None:\n        super().__init__()\n\n        if question_name is None:\n            raise ValueError(\"question_name is required\")\n        if value is None:\n            raise ValueError(\"value is required\")\n\n        self._record = _record\n        self._model = SuggestionModel(\n            question_name=question_name,\n            value=value,\n            type=type,\n            score=score,\n            agent=agent,\n        )\n\n    ##############################\n    # Properties\n    ##############################\n\n    @property\n    def value(self) -> Any:\n        \"\"\"The value of the suggestion.\"\"\"\n        return self._model.value\n\n    @property\n    def question_name(self) -> Optional[str]:\n        \"\"\"The name of the question that the suggestion is for.\"\"\"\n        return self._model.question_name\n\n    @question_name.setter\n    def question_name(self, value: str) -> None:\n        self._model.question_name = value\n\n    @property\n    def type(self) -> Optional[Literal[\"model\", \"human\"]]:\n        \"\"\"The type of suggestion, either 'model' or 'human'.\"\"\"\n        return self._model.type\n\n    @property\n    def score(self) -> Optional[Union[float, List[float]]]:\n        \"\"\"The score of the suggestion.\"\"\"\n        return self._model.score\n\n    @score.setter\n    def score(self, value: float) -> None:\n        self._model.score = value\n\n    @property\n    def agent(self) -> Optional[str]:\n        \"\"\"The agent that created the suggestion.\"\"\"\n        return self._model.agent\n\n    @agent.setter\n    def agent(self, value: str) -> None:\n        self._model.agent = value\n\n    @property\n    def record(self) -> Optional[\"Record\"]:\n        \"\"\"The record that the suggestion is for.\"\"\"\n        return self._record\n\n    @record.setter\n    def record(self, value: \"Record\") -> None:\n        self._record = value\n\n    @classmethod\n    def from_model(cls, model: SuggestionModel, record: \"Record\") -> \"Suggestion\":\n        question = record.dataset.settings.questions[model.question_id]\n        model.question_name = question.name\n        model.value = cls.__from_model_value(model.value, question)\n\n        instance = cls(question.name, model.value, _record=record)\n        instance._model = model\n\n        return instance\n\n    def api_model(self) -> SuggestionModel:\n        if self.record is None or self.record.dataset is None:\n            return self._model\n\n        question = self.record.dataset.settings.questions[self.question_name]\n        if question:\n            return SuggestionModel(\n                value=self.__to_model_value(self.value, question),\n                question_name=None if not question else question.name,\n                question_id=None if not question else question.id,\n                type=self._model.type,\n                score=self._model.score,\n                agent=self._model.agent,\n                id=self._model.id,\n            )\n        else:\n            raise RecordSuggestionsError(\n                f\"Record suggestion is invalid because question with name={self.question_name} does not exist in the dataset ({self.record.dataset.name}). Available questions are: {list(self.record.dataset.settings.questions._properties_by_name.keys())}\"\n            )\n\n    @classmethod\n    def __to_model_value(cls, value: Any, question: \"QuestionType\") -> Any:\n        if isinstance(question, RankingQuestion):\n            return cls.__ranking_to_model_value(value)\n        return value\n\n    @classmethod\n    def __from_model_value(cls, value: Any, question: \"QuestionType\") -> Any:\n        if isinstance(question, RankingQuestion):\n            return cls.__ranking_from_model_value(value)\n        return value\n\n    @classmethod\n    def __ranking_from_model_value(cls, value: List[Dict[str, Any]]) -> List[str]:\n        return [v[\"value\"] for v in value]\n\n    @classmethod\n    def __ranking_to_model_value(cls, value: List[str]) -> List[Dict[str, str]]:\n        return [{\"value\": str(v)} for v in value]\n
"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.value","title":"value: Any property","text":"

The value of the suggestion.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.question_name","title":"question_name: Optional[str] property writable","text":"

The name of the question that the suggestion is for.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.type","title":"type: Optional[Literal['model', 'human']] property","text":"

The type of suggestion, either 'model' or 'human'.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.score","title":"score: Optional[Union[float, List[float]]] property writable","text":"

The score of the suggestion.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.agent","title":"agent: Optional[str] property writable","text":"

The agent that created the suggestion.

"},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.record","title":"record: Optional[Record] property writable","text":"

The record that the suggestion is for.

"},{"location":"reference/argilla/records/vectors/","title":"rg.Vector","text":"

A vector is a numerical representation of a Record field or attribute, usually the record's text. Vectors can be used to search for similar records via the UI or SDK. Vectors can be added to a record directly or as a dictionary with a key that the matches rg.VectorField name.

"},{"location":"reference/argilla/records/vectors/#usage-examples","title":"Usage Examples","text":"

To use vectors within a dataset, you must define a vector field in the dataset settings. The vector field is a list of vector fields that can be attached to a record. The following example demonstrates how to add vectors to a dataset and how to access vectors from a record object:

import argilla as rg\n\ndataset = Dataset(\n    name=\"dataset_with_metadata\",\n    settings=Settings(\n        fields=[TextField(name=\"text\")],\n        questions=[LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])],\n        vectors=[\n            VectorField(name=\"vector_name\"),\n        ],\n    ),\n)\ndataset.create()\n

Then, you can add records to the dataset with vectors that correspond to the vector field defined in the dataset settings:

dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"vector_name\": [0.1, 0.2, 0.3]\n        }\n    ]\n)\n

Vectors can be passed using a mapping, where the key is the key in the data source and the value is the name in the dataset's setting's rg.VectorField object. For example, the following code adds a record with a vector using a mapping:

dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"x\": [0.1, 0.2, 0.3]\n        }\n    ],\n    mapping={\"x\": \"vector_name\"}\n)\n

Or, vectors can be instantiated and added to a record directly, like this:

dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            vectors=[rg.Vector(\"embedding\", [0.1, 0.2, 0.3])],\n        )\n    ]\n)\n
"},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector","title":"Vector","text":"

Bases: Resource

Class for interacting with Argilla Vectors. Vectors are typically used to represent embeddings or features of records. The Vector class is used to deliver vectors to the Argilla server.

Attributes:

Name Type Description name str

The name of the vector.

values list[float]

The values of the vector.

Source code in src/argilla/vectors.py
class Vector(Resource):\n    \"\"\" Class for interacting with Argilla Vectors. Vectors are typically used to represent \\\n        embeddings or features of records. The `Vector` class is used to deliver vectors to the Argilla server.\n\n    Attributes:\n        name (str): The name of the vector.\n        values (list[float]): The values of the vector.\n    \"\"\"\n\n    _model: VectorModel\n\n    def __init__(\n        self,\n        name: str,\n        values: list[float],\n    ) -> None:\n        \"\"\"Initializes a Vector with a name and values that can be used to search in the Argilla ui.\n\n        Parameters:\n            name (str): Name of the vector\n            values (list[float]): List of float values\n\n        \"\"\"\n        self._model = VectorModel(\n            name=name,\n            vector_values=values,\n        )\n\n    def __repr__(self) -> str:\n        return repr(f\"{self.__class__.__name__}({self._model})\")\n\n    ##############################\n    # Properties\n    ##############################\n\n    @property\n    def name(self) -> str:\n        \"\"\"Name of the vector that corresponds to the name of the vector in the dataset's `Settings`\"\"\"\n        return self._model.name\n\n    @property\n    def values(self) -> list[float]:\n        \"\"\"List of float values that represent the vector.\"\"\"\n        return self._model.vector_values\n\n    ##############################\n    # Methods\n    ##############################\n\n    @classmethod\n    def from_model(cls, model: VectorModel) -> \"Vector\":\n        return cls(\n            name=model.name,\n            values=model.vector_values,\n        )\n\n    def serialize(self) -> dict[str, Any]:\n        dumped_model = self._model.model_dump()\n        name = dumped_model.pop(\"name\")\n        values = dumped_model.pop(\"vector_values\")\n        return {name: values}\n
"},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.name","title":"name: str property","text":"

Name of the vector that corresponds to the name of the vector in the dataset's Settings

"},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.values","title":"values: list[float] property","text":"

List of float values that represent the vector.

"},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.__init__","title":"__init__(name, values)","text":"

Initializes a Vector with a name and values that can be used to search in the Argilla ui.

Parameters:

Name Type Description Default name str

Name of the vector

required values list[float]

List of float values

required Source code in src/argilla/vectors.py
def __init__(\n    self,\n    name: str,\n    values: list[float],\n) -> None:\n    \"\"\"Initializes a Vector with a name and values that can be used to search in the Argilla ui.\n\n    Parameters:\n        name (str): Name of the vector\n        values (list[float]): List of float values\n\n    \"\"\"\n    self._model = VectorModel(\n        name=name,\n        vector_values=values,\n    )\n
"},{"location":"reference/argilla/settings/fields/","title":"Fields","text":"

Fields in Argilla define the content of a record that will be reviewed by a user.

"},{"location":"reference/argilla/settings/fields/#usage-examples","title":"Usage Examples","text":"

To define a field, instantiate the different field classes and pass it to the fields parameter of the Settings class.

text_field = rg.TextField(name=\"text\")\nmarkdown_field = rg.TextField(name=\"text\", use_markdown=True)\nimage_field = rg.ImageField(name=\"image\")\n

The fields parameter of the Settings class can accept a list of fields, like this:

settings = rg.Settings(\n    fields=[\n        text_field,\n        markdown_field,\n        image_field,\n    ],\n    questions=[\n        rg.TextQuestion(name=\"response\"),\n    ],\n)\n\ndata = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

To add records with values for fields, refer to the rg.Dataset.records documentation.

"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.TextField","title":"TextField","text":"

Bases: AbstractField

Text field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_field.py
class TextField(AbstractField):\n    \"\"\"Text field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        use_markdown: Optional[bool] = False,\n        required: bool = True,\n        description: Optional[str] = None,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Text field for use in Argilla `Dataset` `Settings`\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str], optional): The title of the field. Defaults to None.\n            use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to False.\n            required (bool): Whether the field is required. Defaults to True.\n            description (Optional[str], optional): The description of the field. Defaults to None.\n\n        \"\"\"\n\n        super().__init__(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=TextFieldSettings(use_markdown=use_markdown),\n            _client=client,\n        )\n\n    @property\n    def use_markdown(self) -> Optional[bool]:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, value: bool) -> None:\n        self._model.settings.use_markdown = value\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.TextField.__init__","title":"__init__(name, title=None, use_markdown=False, required=True, description=None, client=None)","text":"

Text field for use in Argilla Dataset Settings Parameters: name (str): The name of the field title (Optional[str], optional): The title of the field. Defaults to None. use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to False. required (bool): Whether the field is required. Defaults to True. description (Optional[str], optional): The description of the field. Defaults to None.

Source code in src/argilla/settings/_field.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    use_markdown: Optional[bool] = False,\n    required: bool = True,\n    description: Optional[str] = None,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Text field for use in Argilla `Dataset` `Settings`\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str], optional): The title of the field. Defaults to None.\n        use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to False.\n        required (bool): Whether the field is required. Defaults to True.\n        description (Optional[str], optional): The description of the field. Defaults to None.\n\n    \"\"\"\n\n    super().__init__(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=TextFieldSettings(use_markdown=use_markdown),\n        _client=client,\n    )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.ImageField","title":"ImageField","text":"

Bases: AbstractField

Image field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_field.py
class ImageField(AbstractField):\n    \"\"\"Image field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        required: Optional[bool] = True,\n        description: Optional[str] = None,\n        _client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"\n        Text field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str], optional): The title of the field. Defaults to None.\n            required (Optional[bool], optional): Whether the field is required. Defaults to True.\n            description (Optional[str], optional): The description of the field. Defaults to None.\n        \"\"\"\n\n        super().__init__(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=ImageFieldSettings(),\n            _client=_client,\n        )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.ImageField.__init__","title":"__init__(name, title=None, required=True, description=None, _client=None)","text":"

Text field for use in Argilla Dataset Settings

Parameters:

Name Type Description Default name str

The name of the field

required title Optional[str]

The title of the field. Defaults to None.

None required Optional[bool]

Whether the field is required. Defaults to True.

True description Optional[str]

The description of the field. Defaults to None.

None Source code in src/argilla/settings/_field.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    required: Optional[bool] = True,\n    description: Optional[str] = None,\n    _client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"\n    Text field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str], optional): The title of the field. Defaults to None.\n        required (Optional[bool], optional): Whether the field is required. Defaults to True.\n        description (Optional[str], optional): The description of the field. Defaults to None.\n    \"\"\"\n\n    super().__init__(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=ImageFieldSettings(),\n        _client=_client,\n    )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.ChatField","title":"ChatField","text":"

Bases: AbstractField

Chat field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_field.py
class ChatField(AbstractField):\n    \"\"\"Chat field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        use_markdown: Optional[bool] = True,\n        required: bool = True,\n        description: Optional[str] = None,\n        _client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"\n        Chat field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str], optional): The title of the field. Defaults to None.\n            use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to True.\n            required (bool): Whether the field is required. Defaults to True.\n            description (Optional[str], optional): The description of the field. Defaults to None.\n        \"\"\"\n\n        super().__init__(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=ChatFieldSettings(use_markdown=use_markdown),\n            _client=_client,\n        )\n\n    @property\n    def use_markdown(self) -> Optional[bool]:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, value: bool) -> None:\n        self._model.settings.use_markdown = value\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.ChatField.__init__","title":"__init__(name, title=None, use_markdown=True, required=True, description=None, _client=None)","text":"

Chat field for use in Argilla Dataset Settings

Parameters:

Name Type Description Default name str

The name of the field

required title Optional[str]

The title of the field. Defaults to None.

None use_markdown Optional[bool]

Whether to use markdown. Defaults to True.

True required bool

Whether the field is required. Defaults to True.

True description Optional[str]

The description of the field. Defaults to None.

None Source code in src/argilla/settings/_field.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    use_markdown: Optional[bool] = True,\n    required: bool = True,\n    description: Optional[str] = None,\n    _client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"\n    Chat field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str], optional): The title of the field. Defaults to None.\n        use_markdown (Optional[bool], optional): Whether to use markdown. Defaults to True.\n        required (bool): Whether the field is required. Defaults to True.\n        description (Optional[str], optional): The description of the field. Defaults to None.\n    \"\"\"\n\n    super().__init__(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=ChatFieldSettings(use_markdown=use_markdown),\n        _client=_client,\n    )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.CustomField","title":"CustomField","text":"

Bases: AbstractField

Custom field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_field.py
class CustomField(AbstractField):\n    \"\"\"Custom field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        template: Optional[str] = \"\",\n        advanced_mode: Optional[bool] = False,\n        required: bool = True,\n        description: Optional[str] = None,\n        _client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"\n        Custom field for use in Argilla `Dataset` `Settings` for working with custom HTML and CSS templates.\n        By default argilla will use a brackets syntax engine for the templates, which converts\n        `{{ field.key }}` to the values of record's field's object.\n\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str], optional): The title of the field. Defaults to None.\n            template (str): The template of the field (HTML and CSS)\n            advanced_mode (Optional[bool], optional): Whether to use advanced mode. Defaults to False.\n                Deactivate the brackets syntax engine and use custom javascript to render the field.\n            required (Optional[bool], optional): Whether the field is required. Defaults to True.\n            required (bool): Whether the field is required. Defaults to True.\n            description (Optional[str], optional): The description of the field. Defaults to None.\n        \"\"\"\n        template = self._load_template(template)\n        super().__init__(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=CustomFieldSettings(template=template, advanced_mode=advanced_mode),\n            _client=_client,\n        )\n\n    @property\n    def template(self) -> Optional[str]:\n        return self._model.settings.template\n\n    @template.setter\n    def template(self, value: str) -> None:\n        self._model.settings.template = self._load_template(value)\n\n    @property\n    def advanced_mode(self) -> Optional[bool]:\n        return self._model.settings.advanced_mode\n\n    @advanced_mode.setter\n    def advanced_mode(self, value: bool) -> None:\n        self._model.settings.advanced_mode = value\n\n    def validate(self):\n        if self.template is None or self.template.strip() == \"\":\n            raise SettingsError(\"A valid template is required for CustomField\")\n\n    @classmethod\n    def _load_template(cls, template: str) -> str:\n        if template.endswith(\".html\") and os.path.exists(template):\n            with open(template, \"r\") as f:\n                return f.read()\n        if template.startswith(\"http\") or template.startswith(\"https\"):\n            return requests.get(template).text\n        if isinstance(template, str):\n            return template\n        raise ArgillaError(\n            \"Invalid template. Please provide 1: a valid path or URL to a HTML file. 2: a valid HTML string.\"\n        )\n
"},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.CustomField.__init__","title":"__init__(name, title=None, template='', advanced_mode=False, required=True, description=None, _client=None)","text":"

Custom field for use in Argilla Dataset Settings for working with custom HTML and CSS templates. By default argilla will use a brackets syntax engine for the templates, which converts {{ field.key }} to the values of record's field's object.

Parameters:

Name Type Description Default name str

The name of the field

required title Optional[str]

The title of the field. Defaults to None.

None template str

The template of the field (HTML and CSS)

'' advanced_mode Optional[bool]

Whether to use advanced mode. Defaults to False. Deactivate the brackets syntax engine and use custom javascript to render the field.

False required Optional[bool]

Whether the field is required. Defaults to True.

True required bool

Whether the field is required. Defaults to True.

True description Optional[str]

The description of the field. Defaults to None.

None Source code in src/argilla/settings/_field.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    template: Optional[str] = \"\",\n    advanced_mode: Optional[bool] = False,\n    required: bool = True,\n    description: Optional[str] = None,\n    _client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"\n    Custom field for use in Argilla `Dataset` `Settings` for working with custom HTML and CSS templates.\n    By default argilla will use a brackets syntax engine for the templates, which converts\n    `{{ field.key }}` to the values of record's field's object.\n\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str], optional): The title of the field. Defaults to None.\n        template (str): The template of the field (HTML and CSS)\n        advanced_mode (Optional[bool], optional): Whether to use advanced mode. Defaults to False.\n            Deactivate the brackets syntax engine and use custom javascript to render the field.\n        required (Optional[bool], optional): Whether the field is required. Defaults to True.\n        required (bool): Whether the field is required. Defaults to True.\n        description (Optional[str], optional): The description of the field. Defaults to None.\n    \"\"\"\n    template = self._load_template(template)\n    super().__init__(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=CustomFieldSettings(template=template, advanced_mode=advanced_mode),\n        _client=_client,\n    )\n
"},{"location":"reference/argilla/settings/metadata_property/","title":"Metadata Properties","text":"

Metadata properties are used to define metadata fields in a dataset. Metadata fields are used to store additional information about the records in the dataset. For example, the category of a record, the price of a product, or any other information that is relevant to the record.

"},{"location":"reference/argilla/settings/metadata_property/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/settings/metadata_property/#defining-metadata-property-for-a-dataset","title":"Defining Metadata Property for a dataset","text":"

We define metadata properties via type specific classes. The following example demonstrates how to define metadata properties as either a float, integer, or terms metadata property and pass them to the Settings.

TermsMetadataProperty is used to define a metadata field with a list of options. For example, a color field with options red, blue, and green. FloatMetadataProperty and IntegerMetadataProperty is used to define a metadata field with a float value. For example, a price field with a minimum value of 0.0 and a maximum value of 100.0.

metadata_field = rg.TermsMetadataProperty(\n    name=\"color\",\n    options=[\"red\", \"blue\", \"green\"],\n    title=\"Color\",\n)\n\nfloat_metadata_field = rg.FloatMetadataProperty(\n    name=\"price\",\n    min=0.0,\n    max=100.0,\n    title=\"Price\",\n)\n\nint_metadata_field = rg.IntegerMetadataProperty(\n    name=\"quantity\",\n    min=0,\n    max=100,\n    title=\"Quantity\",\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=rg.Settings(\n        fields=[\n            rg.TextField(name=\"text\"),\n        ],\n        questions=[\n            rg.TextQuestion(name=\"response\"),\n        ],\n        metadata=[\n            metadata_field,\n            float_metadata_field,\n            int_metadata_field,\n        ],\n    ),\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

To add records with metadata, refer to the rg.Metadata class documentation.

"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.FloatMetadataProperty","title":"FloatMetadataProperty","text":"

Bases: MetadataPropertyBase

Source code in src/argilla/settings/_metadata.py
class FloatMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        min: Optional[float] = None,\n        max: Optional[float] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with float settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            min (Optional[float]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n            max (Optional[float]): The maximum valid value. If none is provided, it will be computed from the values provided in the records.\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings.\n        \"\"\"\n\n        super().__init__(client=client)\n\n        try:\n            settings = FloatMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.float)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.float,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def min(self) -> Optional[int]:\n        return self._model.settings.min\n\n    @min.setter\n    def min(self, value: Optional[int]) -> None:\n        self._model.settings.min = value\n\n    @property\n    def max(self) -> Optional[int]:\n        return self._model.settings.max\n\n    @max.setter\n    def max(self, value: Optional[int]) -> None:\n        self._model.settings.max = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"FloatMetadataProperty\":\n        instance = FloatMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.FloatMetadataProperty.__init__","title":"__init__(name, min=None, max=None, title=None, visible_for_annotators=True, client=None)","text":"

Create a metadata field with float settings.

Parameters:

Name Type Description Default name str

The name of the metadata field

required min Optional[float]

The minimum valid value. If none is provided, it will be computed from the values provided in the records.

None max Optional[float]

The maximum valid value. If none is provided, it will be computed from the values provided in the records.

None title Optional[str]

The title of the metadata to be shown in the UI

None visible_for_annotators Optional[bool]

Whether the metadata field is visible for annotators.

True

Raises:

Type Description MetadataError

If an error occurs while defining metadata settings.

Source code in src/argilla/settings/_metadata.py
def __init__(\n    self,\n    name: str,\n    min: Optional[float] = None,\n    max: Optional[float] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with float settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        min (Optional[float]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n        max (Optional[float]): The maximum valid value. If none is provided, it will be computed from the values provided in the records.\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings.\n    \"\"\"\n\n    super().__init__(client=client)\n\n    try:\n        settings = FloatMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.float)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.float,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.IntegerMetadataProperty","title":"IntegerMetadataProperty","text":"

Bases: MetadataPropertyBase

Source code in src/argilla/settings/_metadata.py
class IntegerMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        min: Optional[int] = None,\n        max: Optional[int] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with integer settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            min (Optional[int]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n            max (Optional[int]): The maximum  valid value. If none is provided, it will be computed from the values provided in the records.\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings.\n        \"\"\"\n        super().__init__(client=client)\n\n        try:\n            settings = IntegerMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.integer)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.integer,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def min(self) -> Optional[int]:\n        return self._model.settings.min\n\n    @min.setter\n    def min(self, value: Optional[int]) -> None:\n        self._model.settings.min = value\n\n    @property\n    def max(self) -> Optional[int]:\n        return self._model.settings.max\n\n    @max.setter\n    def max(self, value: Optional[int]) -> None:\n        self._model.settings.max = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"IntegerMetadataProperty\":\n        instance = IntegerMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.IntegerMetadataProperty.__init__","title":"__init__(name, min=None, max=None, title=None, visible_for_annotators=True, client=None)","text":"

Create a metadata field with integer settings.

Parameters:

Name Type Description Default name str

The name of the metadata field

required min Optional[int]

The minimum valid value. If none is provided, it will be computed from the values provided in the records.

None max Optional[int]

The maximum valid value. If none is provided, it will be computed from the values provided in the records.

None title Optional[str]

The title of the metadata to be shown in the UI

None visible_for_annotators Optional[bool]

Whether the metadata field is visible for annotators.

True

Raises:

Type Description MetadataError

If an error occurs while defining metadata settings.

Source code in src/argilla/settings/_metadata.py
def __init__(\n    self,\n    name: str,\n    min: Optional[int] = None,\n    max: Optional[int] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with integer settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        min (Optional[int]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n        max (Optional[int]): The maximum  valid value. If none is provided, it will be computed from the values provided in the records.\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings.\n    \"\"\"\n    super().__init__(client=client)\n\n    try:\n        settings = IntegerMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.integer)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.integer,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.TermsMetadataProperty","title":"TermsMetadataProperty","text":"

Bases: MetadataPropertyBase

Source code in src/argilla/settings/_metadata.py
class TermsMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        options: Optional[List[str]] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with terms settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            options (Optional[List[str]]): The list of options\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings\n        \"\"\"\n        super().__init__(client=client)\n\n        try:\n            settings = TermsMetadataPropertySettings(values=options, type=MetadataPropertyType.terms)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.terms,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def options(self) -> Optional[List[str]]:\n        return self._model.settings.values\n\n    @options.setter\n    def options(self, value: list[str]) -> None:\n        self._model.settings.values = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"TermsMetadataProperty\":\n        instance = TermsMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
"},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.TermsMetadataProperty.__init__","title":"__init__(name, options=None, title=None, visible_for_annotators=True, client=None)","text":"

Create a metadata field with terms settings.

Parameters:

Name Type Description Default name str

The name of the metadata field

required options Optional[List[str]]

The list of options

None title Optional[str]

The title of the metadata to be shown in the UI

None visible_for_annotators Optional[bool]

Whether the metadata field is visible for annotators.

True

Raises:

Type Description MetadataError

If an error occurs while defining metadata settings

Source code in src/argilla/settings/_metadata.py
def __init__(\n    self,\n    name: str,\n    options: Optional[List[str]] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with terms settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        options (Optional[List[str]]): The list of options\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings\n    \"\"\"\n    super().__init__(client=client)\n\n    try:\n        settings = TermsMetadataPropertySettings(values=options, type=MetadataPropertyType.terms)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.terms,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
"},{"location":"reference/argilla/settings/questions/","title":"Questions","text":"

Argilla uses questions to gather the feedback. The questions will be answered by users or models.

"},{"location":"reference/argilla/settings/questions/#usage-examples","title":"Usage Examples","text":"

To define a label question, for example, instantiate the LabelQuestion class and pass it to the Settings class.

label_question = rg.LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])\n\nsettings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        label_question,\n    ],\n)\n

Questions can be combined in extensible ways based on the type of feedback you want to collect. For example, you can combine a label question with a text question to collect both a label and a text response.

label_question = rg.LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])\ntext_question = rg.TextQuestion(name=\"response\")\n\nsettings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        label_question,\n        text_question,\n    ],\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

To add records with responses to questions, refer to the rg.Response class documentation.

"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.LabelQuestion","title":"LabelQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class LabelQuestion(QuestionPropertyBase):\n    _model: LabelQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        labels: Union[List[str], Dict[str, str]],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n        visible_labels: Optional[int] = None,\n    ) -> None:\n        \"\"\" Define a new label question for `Settings` of a `Dataset`. A label \\\n            question is a question where the user can select one label from \\\n            a list of available labels.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            title (Optional[str]): The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n        \"\"\"\n        self._model = LabelQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=LabelQuestionSettings(\n                options=self._render_values_as_options(labels), visible_options=visible_labels\n            ),\n        )\n\n    @classmethod\n    def from_model(cls, model: LabelQuestionModel) -> \"LabelQuestion\":\n        instance = cls(name=model.name, labels=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"LabelQuestion\":\n        model = LabelQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    ##############################\n    # Public properties\n    ##############################\n\n    @property\n    def labels(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @labels.setter\n    def labels(self, labels: List[str]) -> None:\n        self._model.settings.options = self._render_values_as_options(labels)\n\n    @property\n    def visible_labels(self) -> Optional[int]:\n        return self._model.settings.visible_options\n\n    @visible_labels.setter\n    def visible_labels(self, visible_labels: Optional[int]) -> None:\n        self._model.settings.visible_options = visible_labels\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.LabelQuestion.__init__","title":"__init__(name, labels, title=None, description=None, required=True, visible_labels=None)","text":"

Define a new label question for Settings of a Dataset. A label question is a question where the user can select one label from a list of available labels.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required labels Union[List[str], Dict[str, str]]

The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

required title Optional[str]

The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True visible_labels Optional[int]

The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

None Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    labels: Union[List[str], Dict[str, str]],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n    visible_labels: Optional[int] = None,\n) -> None:\n    \"\"\" Define a new label question for `Settings` of a `Dataset`. A label \\\n        question is a question where the user can select one label from \\\n        a list of available labels.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        title (Optional[str]): The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n        visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n            Setting it to None show all options.\n    \"\"\"\n    self._model = LabelQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=LabelQuestionSettings(\n            options=self._render_values_as_options(labels), visible_options=visible_labels\n        ),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.MultiLabelQuestion","title":"MultiLabelQuestion","text":"

Bases: LabelQuestion

Source code in src/argilla/settings/_question.py
class MultiLabelQuestion(LabelQuestion):\n    _model: MultiLabelQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        labels: Union[List[str], Dict[str, str]],\n        visible_labels: Optional[int] = None,\n        labels_order: Literal[\"natural\", \"suggestion\"] = \"natural\",\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new multi-label question for `Settings` of a `Dataset`. A \\\n            multi-label question is a question where the user can select multiple \\\n            labels from a list of available labels.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n            labels_order (Literal[\"natural\", \"suggestion\"]): The order of the labels in the UI. \\\n                Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). \\\n                The score of the suggestion will be taken into account for ordering if available.\n            title (Optional[str]: The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = MultiLabelQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=MultiLabelQuestionSettings(\n                options=self._render_values_as_options(labels),\n                visible_options=visible_labels,\n                options_order=labels_order,\n            ),\n        )\n\n    @classmethod\n    def from_model(cls, model: MultiLabelQuestionModel) -> \"MultiLabelQuestion\":\n        instance = cls(\n            name=model.name,\n            labels=cls._render_options_as_values(model.settings.options),\n            labels_order=model.settings.options_order,\n        )\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"MultiLabelQuestion\":\n        model = MultiLabelQuestionModel(**data)\n        return cls.from_model(model=model)\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.MultiLabelQuestion.__init__","title":"__init__(name, labels, visible_labels=None, labels_order='natural', title=None, description=None, required=True)","text":"

Create a new multi-label question for Settings of a Dataset. A multi-label question is a question where the user can select multiple labels from a list of available labels.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required labels Union[List[str], Dict[str, str]]

The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

required visible_labels Optional[int]

The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

None labels_order Literal['natural', 'suggestion']

The order of the labels in the UI. Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). The score of the suggestion will be taken into account for ordering if available.

'natural' title Optional[str]

The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    labels: Union[List[str], Dict[str, str]],\n    visible_labels: Optional[int] = None,\n    labels_order: Literal[\"natural\", \"suggestion\"] = \"natural\",\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new multi-label question for `Settings` of a `Dataset`. A \\\n        multi-label question is a question where the user can select multiple \\\n        labels from a list of available labels.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n            Setting it to None show all options.\n        labels_order (Literal[\"natural\", \"suggestion\"]): The order of the labels in the UI. \\\n            Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). \\\n            The score of the suggestion will be taken into account for ordering if available.\n        title (Optional[str]: The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = MultiLabelQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=MultiLabelQuestionSettings(\n            options=self._render_values_as_options(labels),\n            visible_options=visible_labels,\n            options_order=labels_order,\n        ),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RankingQuestion","title":"RankingQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class RankingQuestion(QuestionPropertyBase):\n    _model: RankingQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        values: Union[List[str], Dict[str, str]],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new ranking question for `Settings` of a `Dataset`. A ranking question \\\n            is a question where the user can rank a list of options.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            values (Union[List[str], Dict[str, str]]): The list of options to be ranked, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = RankingQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=RankingQuestionSettings(options=self._render_values_as_options(values)),\n        )\n\n    @classmethod\n    def from_model(cls, model: RankingQuestionModel) -> \"RankingQuestion\":\n        instance = cls(name=model.name, values=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"RankingQuestion\":\n        model = RankingQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def values(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @values.setter\n    def values(self, values: List[int]) -> None:\n        self._model.settings.options = self._render_values_as_options(values)\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RankingQuestion.__init__","title":"__init__(name, values, title=None, description=None, required=True)","text":"

Create a new ranking question for Settings of a Dataset. A ranking question is a question where the user can rank a list of options.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required values Union[List[str], Dict[str, str]]

The list of options to be ranked, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

required title Optional[str]

) The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    values: Union[List[str], Dict[str, str]],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new ranking question for `Settings` of a `Dataset`. A ranking question \\\n        is a question where the user can rank a list of options.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        values (Union[List[str], Dict[str, str]]): The list of options to be ranked, or a \\\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        title (Optional[str]:) The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = RankingQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=RankingQuestionSettings(options=self._render_values_as_options(values)),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.TextQuestion","title":"TextQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class TextQuestion(QuestionPropertyBase):\n    _model: TextQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n        use_markdown: bool = False,\n    ) -> None:\n        \"\"\"Create a new text question for `Settings` of a `Dataset`. A text question \\\n            is a question where the user can input text.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            title (Optional[str]): The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n            use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n                to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n        \"\"\"\n        self._model = TextQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=TextQuestionSettings(use_markdown=use_markdown),\n        )\n\n    @classmethod\n    def from_model(cls, model: TextQuestionModel) -> \"TextQuestion\":\n        instance = cls(name=model.name)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"TextQuestion\":\n        model = TextQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def use_markdown(self) -> bool:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, use_markdown: bool) -> None:\n        self._model.settings.use_markdown = use_markdown\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.TextQuestion.__init__","title":"__init__(name, title=None, description=None, required=True, use_markdown=False)","text":"

Create a new text question for Settings of a Dataset. A text question is a question where the user can input text.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required title Optional[str]

The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True use_markdown Optional[bool]

Whether to render the markdown in the UI. When True, you will be able to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.

False Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n    use_markdown: bool = False,\n) -> None:\n    \"\"\"Create a new text question for `Settings` of a `Dataset`. A text question \\\n        is a question where the user can input text.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        title (Optional[str]): The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n        use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n            to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n    \"\"\"\n    self._model = TextQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=TextQuestionSettings(use_markdown=use_markdown),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RatingQuestion","title":"RatingQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class RatingQuestion(QuestionPropertyBase):\n    _model: RatingQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        values: List[int],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new rating question for `Settings` of a `Dataset`. A rating question \\\n            is a question where the user can select a value from a sequential list of options.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            values (List[int]): The list of selectable values. It should be defined in the range [0, 10].\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = RatingQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            values=values,\n            settings=RatingQuestionSettings(options=self._render_values_as_options(values)),\n        )\n\n    @classmethod\n    def from_model(cls, model: RatingQuestionModel) -> \"RatingQuestion\":\n        instance = cls(name=model.name, values=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"RatingQuestion\":\n        model = RatingQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def values(self) -> List[int]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @values.setter\n    def values(self, values: List[int]) -> None:\n        self._model.values = self._render_values_as_options(values)\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RatingQuestion.__init__","title":"__init__(name, values, title=None, description=None, required=True)","text":"

Create a new rating question for Settings of a Dataset. A rating question is a question where the user can select a value from a sequential list of options.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required values List[int]

The list of selectable values. It should be defined in the range [0, 10].

required title Optional[str]

) The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    values: List[int],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new rating question for `Settings` of a `Dataset`. A rating question \\\n        is a question where the user can select a value from a sequential list of options.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        values (List[int]): The list of selectable values. It should be defined in the range [0, 10].\n        title (Optional[str]:) The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = RatingQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        values=values,\n        settings=RatingQuestionSettings(options=self._render_values_as_options(values)),\n    )\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.SpanQuestion","title":"SpanQuestion","text":"

Bases: QuestionPropertyBase

Source code in src/argilla/settings/_question.py
class SpanQuestion(QuestionPropertyBase):\n    _model: SpanQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        field: str,\n        labels: Union[List[str], Dict[str, str]],\n        allow_overlapping: bool = False,\n        visible_labels: Optional[int] = None,\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ):\n        \"\"\" Create a new span question for `Settings` of a `Dataset`. A span question \\\n            is a question where the user can select a section of text within a text field \\\n            and assign it a label.\n\n            Parameters:\n                name (str): The name of the question to be used as a reference.\n                field (str): The name of the text field where the span question will be applied.\n                labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                    dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n                allow_overlapping (bool): This value specifies whether overlapped spans are allowed or not.\n                visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                    Setting it to None show all options.\n                title (Optional[str]:) The title of the question to be shown in the UI.\n                description (Optional[str]): The description of the question to be shown in the UI.\n                required (bool): If the question is required for a record to be valid. At least one question must be required.\n            \"\"\"\n        self._model = SpanQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=SpanQuestionSettings(\n                field=field,\n                allow_overlapping=allow_overlapping,\n                visible_options=visible_labels,\n                options=self._render_values_as_options(labels),\n            ),\n        )\n\n    @property\n    def name(self):\n        return self._model.name\n\n    @property\n    def field(self):\n        return self._model.settings.field\n\n    @field.setter\n    def field(self, field: str):\n        self._model.settings.field = field\n\n    @property\n    def allow_overlapping(self):\n        return self._model.settings.allow_overlapping\n\n    @allow_overlapping.setter\n    def allow_overlapping(self, allow_overlapping: bool):\n        self._model.settings.allow_overlapping = allow_overlapping\n\n    @property\n    def visible_labels(self) -> Optional[int]:\n        return self._model.settings.visible_options\n\n    @visible_labels.setter\n    def visible_labels(self, visible_labels: Optional[int]) -> None:\n        self._model.settings.visible_options = visible_labels\n\n    @property\n    def labels(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @labels.setter\n    def labels(self, labels: List[str]) -> None:\n        self._model.settings.options = self._render_values_as_options(labels)\n\n    @classmethod\n    def from_model(cls, model: SpanQuestionModel) -> \"SpanQuestion\":\n        instance = cls(\n            name=model.name,\n            field=model.settings.field,\n            labels=cls._render_options_as_values(model.settings.options),\n        )\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"SpanQuestion\":\n        model = SpanQuestionModel(**data)\n        return cls.from_model(model=model)\n
"},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.SpanQuestion.__init__","title":"__init__(name, field, labels, allow_overlapping=False, visible_labels=None, title=None, description=None, required=True)","text":"

Create a new span question for Settings of a Dataset. A span question is a question where the user can select a section of text within a text field and assign it a label.

Parameters:

Name Type Description Default name str

The name of the question to be used as a reference.

required field str

The name of the text field where the span question will be applied.

required labels Union[List[str], Dict[str, str]]

The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

required allow_overlapping bool

This value specifies whether overlapped spans are allowed or not.

False visible_labels Optional[int]

The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

None title Optional[str]

) The title of the question to be shown in the UI.

None description Optional[str]

The description of the question to be shown in the UI.

None required bool

If the question is required for a record to be valid. At least one question must be required.

True Source code in src/argilla/settings/_question.py
def __init__(\n    self,\n    name: str,\n    field: str,\n    labels: Union[List[str], Dict[str, str]],\n    allow_overlapping: bool = False,\n    visible_labels: Optional[int] = None,\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n):\n    \"\"\" Create a new span question for `Settings` of a `Dataset`. A span question \\\n        is a question where the user can select a section of text within a text field \\\n        and assign it a label.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            field (str): The name of the text field where the span question will be applied.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            allow_overlapping (bool): This value specifies whether overlapped spans are allowed or not.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n    self._model = SpanQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=SpanQuestionSettings(\n            field=field,\n            allow_overlapping=allow_overlapping,\n            visible_options=visible_labels,\n            options=self._render_values_as_options(labels),\n        ),\n    )\n
"},{"location":"reference/argilla/settings/settings/","title":"rg.Settings","text":"

rg.Settings is used to define the setttings of an Argilla Dataset. The settings can be used to configure the behavior of the dataset, such as the fields, questions, guidelines, metadata, and vectors. The Settings class is passed to the Dataset class and used to create the dataset on the server. Once created, the settings of a dataset cannot be changed.

"},{"location":"reference/argilla/settings/settings/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/settings/settings/#creating-a-new-dataset-with-settings","title":"Creating a new dataset with settings","text":"

To create a new dataset with settings, instantiate the Settings class and pass it to the Dataset class.

import argilla as rg\n\nsettings = rg.Settings(\n    guidelines=\"Select the sentiment of the prompt.\",\n    fields=[rg.TextField(name=\"prompt\", use_markdown=True)],\n    questions=[rg.LabelQuestion(name=\"sentiment\", labels=[\"positive\", \"negative\"])],\n)\n\ndataset = rg.Dataset(name=\"sentiment_analysis\", settings=settings)\n\n# Create the dataset on the server\ndataset.create()\n

To define the settings for fields, questions, metadata, vectors, or distribution, refer to the rg.TextField, rg.LabelQuestion, rg.TermsMetadataProperty, and rg.VectorField, rg.TaskDistribution class documentation.

"},{"location":"reference/argilla/settings/settings/#creating-settings-using-built-in-templates","title":"Creating settings using built in templates","text":"

Argilla provides built-in templates for creating settings for common dataset types. To use a template, use the class methods of the Settings class. There are three built-in templates available for classification, ranking, and rating tasks. Template settings also include default guidelines and mappings.

"},{"location":"reference/argilla/settings/settings/#classification-task","title":"Classification Task","text":"

You can define a classification task using the rg.Settings.for_classification class method. This will create a dataset with a text field and a label question. You can select field types using the field_type parameter with image or text.

settings = rg.Settings.for_classification(labels=[\"positive\", \"negative\"]) # (1)\n

This will return a Settings object with the following settings:

settings = Settings(\n    guidelines=\"Select a label for the document.\",\n    fields=[rg.TextField(field_type)(name=\"text\")],\n    questions=[LabelQuestion(name=\"label\", labels=labels)],\n    mapping={\"input\": \"text\", \"output\": \"label\", \"document\": \"text\"},\n)\n
"},{"location":"reference/argilla/settings/settings/#ranking-task","title":"Ranking Task","text":"

You can define a ranking task using the rg.Settings.for_ranking class method. This will create a dataset with a text field and a ranking question.

settings = rg.Settings.for_ranking()\n

This will return a Settings object with the following settings:

settings = Settings(\n    guidelines=\"Rank the responses.\",\n    fields=[\n        rg.TextField(name=\"instruction\"),\n        rg.TextField(name=\"response1\"),\n        rg.TextField(name=\"response2\"),\n    ],\n    questions=[RankingQuestion(name=\"ranking\", values=[\"response1\", \"response2\"])],\n    mapping={\n        \"input\": \"instruction\",\n        \"prompt\": \"instruction\",\n        \"chosen\": \"response1\",\n        \"rejected\": \"response2\",\n    },\n)\n
"},{"location":"reference/argilla/settings/settings/#rating-task","title":"Rating Task","text":"

You can define a rating task using the rg.Settings.for_rating class method. This will create a dataset with a text field and a rating question.

settings = rg.Settings.for_rating()\n

This will return a Settings object with the following settings:

settings = Settings(\n    guidelines=\"Rate the response.\",\n    fields=[\n        rg.TextField(name=\"instruction\"),\n        rg.TextField(name=\"response\"),\n    ],\n    questions=[RatingQuestion(name=\"rating\", values=[1, 2, 3, 4, 5])],\n    mapping={\n        \"input\": \"instruction\",\n        \"prompt\": \"instruction\",\n        \"output\": \"response\",\n        \"score\": \"rating\",\n    },\n)\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings","title":"Settings","text":"

Bases: DefaultSettingsMixin, Resource

Settings class for Argilla Datasets.

This class is used to define the representation of a Dataset within the UI.

Source code in src/argilla/settings/_resource.py
class Settings(DefaultSettingsMixin, Resource):\n    \"\"\"\n    Settings class for Argilla Datasets.\n\n    This class is used to define the representation of a Dataset within the UI.\n    \"\"\"\n\n    def __init__(\n        self,\n        fields: Optional[List[Field]] = None,\n        questions: Optional[List[QuestionType]] = None,\n        vectors: Optional[List[VectorField]] = None,\n        metadata: Optional[List[MetadataType]] = None,\n        guidelines: Optional[str] = None,\n        allow_extra_metadata: bool = False,\n        distribution: Optional[TaskDistribution] = None,\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        _dataset: Optional[\"Dataset\"] = None,\n    ) -> None:\n        \"\"\"\n        Args:\n            fields (List[Field]): A list of Field objects that represent the fields in the Dataset.\n            questions (List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]):\n                A list of Question objects that represent the questions in the Dataset.\n            vectors (List[VectorField]): A list of VectorField objects that represent the vectors in the Dataset.\n            metadata (List[MetadataField]): A list of MetadataField objects that represent the metadata in the Dataset.\n            guidelines (str): A string containing the guidelines for the Dataset.\n            allow_extra_metadata (bool): A boolean that determines whether or not extra metadata is allowed in the\n                Dataset. Defaults to False.\n            distribution (TaskDistribution): The annotation task distribution configuration.\n                Default to DEFAULT_TASK_DISTRIBUTION\n            mapping (Dict[str, Union[str, Sequence[str]]]): A dictionary that maps incoming data names to Argilla dataset attributes in DatasetRecords.\n        \"\"\"\n        super().__init__(client=_dataset._client if _dataset else None)\n\n        self._dataset = _dataset\n        self._distribution = distribution\n        self._mapping = mapping\n        self.__guidelines = self.__process_guidelines(guidelines)\n        self.__allow_extra_metadata = allow_extra_metadata\n\n        self.__questions = QuestionsProperties(self, questions)\n        self.__fields = SettingsProperties(self, fields)\n        self.__vectors = SettingsProperties(self, vectors)\n        self.__metadata = SettingsProperties(self, metadata)\n\n    #####################\n    # Properties        #\n    #####################\n\n    @property\n    def fields(self) -> \"SettingsProperties\":\n        return self.__fields\n\n    @fields.setter\n    def fields(self, fields: List[Field]):\n        self.__fields = SettingsProperties(self, fields)\n\n    @property\n    def questions(self) -> \"SettingsProperties\":\n        return self.__questions\n\n    @questions.setter\n    def questions(self, questions: List[QuestionType]):\n        self.__questions = QuestionsProperties(self, questions)\n\n    @property\n    def vectors(self) -> \"SettingsProperties\":\n        return self.__vectors\n\n    @vectors.setter\n    def vectors(self, vectors: List[VectorField]):\n        self.__vectors = SettingsProperties(self, vectors)\n\n    @property\n    def metadata(self) -> \"SettingsProperties\":\n        return self.__metadata\n\n    @metadata.setter\n    def metadata(self, metadata: List[MetadataType]):\n        self.__metadata = SettingsProperties(self, metadata)\n\n    @property\n    def guidelines(self) -> str:\n        return self.__guidelines\n\n    @guidelines.setter\n    def guidelines(self, guidelines: str):\n        self.__guidelines = self.__process_guidelines(guidelines)\n\n    @property\n    def allow_extra_metadata(self) -> bool:\n        return self.__allow_extra_metadata\n\n    @allow_extra_metadata.setter\n    def allow_extra_metadata(self, value: bool):\n        self.__allow_extra_metadata = value\n\n    @property\n    def distribution(self) -> TaskDistribution:\n        return self._distribution or TaskDistribution.default()\n\n    @distribution.setter\n    def distribution(self, value: TaskDistribution) -> None:\n        self._distribution = value\n\n    @property\n    def mapping(self) -> Dict[str, Union[str, Sequence[str]]]:\n        return self._mapping\n\n    @mapping.setter\n    def mapping(self, value: Dict[str, Union[str, Sequence[str]]]):\n        self._mapping = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, dataset: \"Dataset\"):\n        self._dataset = dataset\n        self._client = dataset._client\n\n    @cached_property\n    def schema(self) -> dict:\n        schema_dict = {}\n\n        for field in self.fields:\n            schema_dict[field.name] = field\n\n        for question in self.questions:\n            schema_dict[question.name] = question\n\n        for vector in self.vectors:\n            schema_dict[vector.name] = vector\n\n        for metadata in self.metadata:\n            schema_dict[metadata.name] = metadata\n\n        return schema_dict\n\n    @cached_property\n    def schema_by_id(self) -> Dict[UUID, Union[Field, QuestionType, MetadataType, VectorField]]:\n        return {v.id: v for v in self.schema.values()}\n\n    def validate(self) -> None:\n        self._validate_empty_settings()\n        self._validate_duplicate_names()\n\n        for field in self.fields:\n            field.validate()\n\n    #####################\n    #  Public methods   #\n    #####################\n\n    def get(self) -> \"Settings\":\n        self.fields = self._fetch_fields()\n        self.questions = self._fetch_questions()\n        self.vectors = self._fetch_vectors()\n        self.metadata = self._fetch_metadata()\n        self.__fetch_dataset_related_attributes()\n\n        self._update_last_api_call()\n        return self\n\n    def create(self) -> \"Settings\":\n        self.validate()\n\n        self._update_dataset_related_attributes()\n        self.__fields.create()\n        self.__questions.create()\n        self.__vectors.create()\n        self.__metadata.create()\n\n        self._update_last_api_call()\n        return self\n\n    def update(self) -> \"Resource\":\n        self.validate()\n\n        self._update_dataset_related_attributes()\n        self.__fields.update()\n        self.__vectors.update()\n        self.__metadata.update()\n        # self.questions.update()\n\n        self._update_last_api_call()\n        return self\n\n    def serialize(self):\n        try:\n            return {\n                \"guidelines\": self.guidelines,\n                \"questions\": self.__questions.serialize(),\n                \"fields\": self.__fields.serialize(),\n                \"vectors\": self.vectors.serialize(),\n                \"metadata\": self.metadata.serialize(),\n                \"allow_extra_metadata\": self.allow_extra_metadata,\n                \"distribution\": self.distribution.to_dict(),\n                \"mapping\": self.mapping,\n            }\n        except Exception as e:\n            raise ArgillaSerializeError(f\"Failed to serialize the settings. {e.__class__.__name__}\") from e\n\n    def to_json(self, path: Union[Path, str]) -> None:\n        \"\"\"Save the settings to a file on disk\n\n        Parameters:\n            path (str): The path to save the settings to\n        \"\"\"\n        if not isinstance(path, Path):\n            path = Path(path)\n        if path.exists():\n            raise FileExistsError(f\"File {path} already exists\")\n        with open(path, \"w\") as file:\n            json.dump(self.serialize(), file)\n\n    @classmethod\n    def from_json(cls, path: Union[Path, str]) -> \"Settings\":\n        \"\"\"Load the settings from a file on disk\"\"\"\n\n        with open(path, \"r\") as file:\n            settings_dict = json.load(file)\n            return cls._from_dict(settings_dict)\n\n    @classmethod\n    def from_hub(\n        cls,\n        repo_id: str,\n        subset: Optional[str] = None,\n        feature_mapping: Optional[Dict[str, Literal[\"question\", \"field\", \"metadata\"]]] = None,\n        **kwargs,\n    ) -> \"Settings\":\n        \"\"\"Load the settings from the Hub\n\n        Parameters:\n            repo_id (str): The ID of the repository to load the settings from on the Hub.\n            subset (Optional[str]): The subset of the repository to load the settings from.\n            feature_mapping (Dict[str, Literal[\"question\", \"field\", \"metadata\"]]): A dictionary that maps incoming column names to Argilla attributes.\n        \"\"\"\n\n        settings = build_settings_from_repo_id(repo_id=repo_id, feature_mapping=feature_mapping, subset=subset)\n        return settings\n\n    def __eq__(self, other: \"Settings\") -> bool:\n        return self.serialize() == other.serialize()  # TODO: Create proper __eq__ methods for fields and questions\n\n    #####################\n    #  Repr Methods     #\n    #####################\n\n    def __repr__(self) -> str:\n        return (\n            f\"Settings(guidelines={self.guidelines}, allow_extra_metadata={self.allow_extra_metadata}, \"\n            f\"distribution={self.distribution}, \"\n            f\"fields={self.fields}, questions={self.questions}, vectors={self.vectors}, metadata={self.metadata})\"\n        )\n\n    #####################\n    #  Private methods  #\n    #####################\n\n    @classmethod\n    def _from_dict(cls, settings_dict: dict) -> \"Settings\":\n        fields = settings_dict.get(\"fields\", [])\n        vectors = settings_dict.get(\"vectors\", [])\n        metadata = settings_dict.get(\"metadata\", [])\n        guidelines = settings_dict.get(\"guidelines\")\n        distribution = settings_dict.get(\"distribution\")\n        allow_extra_metadata = settings_dict.get(\"allow_extra_metadata\")\n        mapping = settings_dict.get(\"mapping\")\n\n        questions = [question_from_dict(question) for question in settings_dict.get(\"questions\", [])]\n        fields = [_field_from_dict(field) for field in fields]\n        vectors = [VectorField.from_dict(vector) for vector in vectors]\n        metadata = [MetadataField.from_dict(metadata) for metadata in metadata]\n\n        if distribution:\n            distribution = TaskDistribution.from_dict(distribution)\n\n        if mapping:\n            mapping = cls._validate_mapping(mapping)\n\n        return cls(\n            questions=questions,\n            fields=fields,\n            vectors=vectors,\n            metadata=metadata,\n            guidelines=guidelines,\n            allow_extra_metadata=allow_extra_metadata,\n            distribution=distribution,\n            mapping=mapping,\n        )\n\n    def _copy(self) -> \"Settings\":\n        instance = self.__class__._from_dict(self.serialize())\n        return instance\n\n    def _fetch_fields(self) -> List[Field]:\n        models = self._client.api.fields.list(dataset_id=self._dataset.id)\n        return [_field_from_model(model) for model in models]\n\n    def _fetch_questions(self) -> List[QuestionType]:\n        models = self._client.api.questions.list(dataset_id=self._dataset.id)\n        return [question_from_model(model) for model in models]\n\n    def _fetch_vectors(self) -> List[VectorField]:\n        models = self.dataset._client.api.vectors.list(self.dataset.id)\n        return [VectorField.from_model(model) for model in models]\n\n    def _fetch_metadata(self) -> List[MetadataType]:\n        models = self._client.api.metadata.list(dataset_id=self._dataset.id)\n        return [MetadataField.from_model(model) for model in models]\n\n    def __fetch_dataset_related_attributes(self):\n        # This flow may be a bit weird, but it's the only way to update the dataset related attributes\n        # Everything is point that we should have several settings-related endpoints in the API to handle this.\n        # POST /api/v1/datasets/{dataset_id}/settings\n        # {\n        #   \"guidelines\": ....,\n        #   \"allow_extra_metadata\": ....,\n        # }\n        # But this is not implemented yet, so we need to update the dataset model directly\n        dataset_model = self._client.api.datasets.get(self._dataset.id)\n\n        self.guidelines = dataset_model.guidelines\n        self.allow_extra_metadata = dataset_model.allow_extra_metadata\n\n        if dataset_model.distribution:\n            self.distribution = TaskDistribution.from_model(dataset_model.distribution)\n\n    def _update_dataset_related_attributes(self):\n        # This flow may be a bit weird, but it's the only way to update the dataset related attributes\n        # Everything is point that we should have several settings-related endpoints in the API to handle this.\n        # POST /api/v1/datasets/{dataset_id}/settings\n        # {\n        #   \"guidelines\": ....,\n        #   \"allow_extra_metadata\": ....,\n        # }\n        # But this is not implemented yet, so we need to update the dataset model directly\n        dataset_model = DatasetModel(\n            id=self._dataset.id,\n            name=self._dataset.name,\n            guidelines=self.guidelines,\n            allow_extra_metadata=self.allow_extra_metadata,\n            distribution=self.distribution._api_model(),\n        )\n        self._client.api.datasets.update(dataset_model)\n\n    def _validate_empty_settings(self):\n        if not all([self.fields, self.questions]):\n            message = \"Fields and questions are required\"\n            raise SettingsError(message=message)\n\n    def _validate_duplicate_names(self) -> None:\n        dataset_properties_by_name = {}\n\n        for properties in [self.fields, self.questions, self.vectors, self.metadata]:\n            for property in properties:\n                if property.name in dataset_properties_by_name:\n                    raise SettingsError(\n                        f\"names of dataset settings must be unique, \"\n                        f\"but the name {property.name!r} is used by {type(property).__name__!r} and {type(dataset_properties_by_name[property.name]).__name__!r} \"\n                    )\n                dataset_properties_by_name[property.name] = property\n\n    @classmethod\n    def _validate_mapping(cls, mapping: Dict[str, Union[str, Sequence[str]]]) -> dict:\n        validate_mapping = {}\n        for key, value in mapping.items():\n            if isinstance(value, str):\n                validate_mapping[key] = value\n            elif isinstance(value, list) or isinstance(value, tuple):\n                validate_mapping[key] = tuple(value)\n            else:\n                raise SettingsError(f\"Invalid mapping value for key {key!r}: {value}\")\n\n        return validate_mapping\n\n    @classmethod\n    def _sanitize_settings_name(cls, name: str) -> str:\n        \"\"\"Sanitize the name for the settings\"\"\"\n\n        for char in [\" \", \":\", \".\", \"&\", \"?\", \"!\"]:\n            name = name.replace(char, \"_\")\n\n        return name.lower()\n\n    def __process_guidelines(self, guidelines):\n        if guidelines is None:\n            return guidelines\n\n        if not isinstance(guidelines, str):\n            raise SettingsError(\"Guidelines must be a string or a path to a file\")\n\n        if os.path.exists(guidelines):\n            with open(guidelines, \"r\") as file:\n                return file.read()\n\n        return guidelines\n\n    @classmethod\n    def _is_valid_name(cls, name: str) -> bool:\n        \"\"\"Check if the name is valid\"\"\"\n        return bool(re.match(r\"^(?=.*[a-z0-9])[a-z0-9_-]+$\", name))\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.__init__","title":"__init__(fields=None, questions=None, vectors=None, metadata=None, guidelines=None, allow_extra_metadata=False, distribution=None, mapping=None, _dataset=None)","text":"

Parameters:

Name Type Description Default fields List[Field]

A list of Field objects that represent the fields in the Dataset.

None questions List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]

A list of Question objects that represent the questions in the Dataset.

None vectors List[VectorField]

A list of VectorField objects that represent the vectors in the Dataset.

None metadata List[MetadataField]

A list of MetadataField objects that represent the metadata in the Dataset.

None guidelines str

A string containing the guidelines for the Dataset.

None allow_extra_metadata bool

A boolean that determines whether or not extra metadata is allowed in the Dataset. Defaults to False.

False distribution TaskDistribution

The annotation task distribution configuration. Default to DEFAULT_TASK_DISTRIBUTION

None mapping Dict[str, Union[str, Sequence[str]]]

A dictionary that maps incoming data names to Argilla dataset attributes in DatasetRecords.

None Source code in src/argilla/settings/_resource.py
def __init__(\n    self,\n    fields: Optional[List[Field]] = None,\n    questions: Optional[List[QuestionType]] = None,\n    vectors: Optional[List[VectorField]] = None,\n    metadata: Optional[List[MetadataType]] = None,\n    guidelines: Optional[str] = None,\n    allow_extra_metadata: bool = False,\n    distribution: Optional[TaskDistribution] = None,\n    mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n    _dataset: Optional[\"Dataset\"] = None,\n) -> None:\n    \"\"\"\n    Args:\n        fields (List[Field]): A list of Field objects that represent the fields in the Dataset.\n        questions (List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]):\n            A list of Question objects that represent the questions in the Dataset.\n        vectors (List[VectorField]): A list of VectorField objects that represent the vectors in the Dataset.\n        metadata (List[MetadataField]): A list of MetadataField objects that represent the metadata in the Dataset.\n        guidelines (str): A string containing the guidelines for the Dataset.\n        allow_extra_metadata (bool): A boolean that determines whether or not extra metadata is allowed in the\n            Dataset. Defaults to False.\n        distribution (TaskDistribution): The annotation task distribution configuration.\n            Default to DEFAULT_TASK_DISTRIBUTION\n        mapping (Dict[str, Union[str, Sequence[str]]]): A dictionary that maps incoming data names to Argilla dataset attributes in DatasetRecords.\n    \"\"\"\n    super().__init__(client=_dataset._client if _dataset else None)\n\n    self._dataset = _dataset\n    self._distribution = distribution\n    self._mapping = mapping\n    self.__guidelines = self.__process_guidelines(guidelines)\n    self.__allow_extra_metadata = allow_extra_metadata\n\n    self.__questions = QuestionsProperties(self, questions)\n    self.__fields = SettingsProperties(self, fields)\n    self.__vectors = SettingsProperties(self, vectors)\n    self.__metadata = SettingsProperties(self, metadata)\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.to_json","title":"to_json(path)","text":"

Save the settings to a file on disk

Parameters:

Name Type Description Default path str

The path to save the settings to

required Source code in src/argilla/settings/_resource.py
def to_json(self, path: Union[Path, str]) -> None:\n    \"\"\"Save the settings to a file on disk\n\n    Parameters:\n        path (str): The path to save the settings to\n    \"\"\"\n    if not isinstance(path, Path):\n        path = Path(path)\n    if path.exists():\n        raise FileExistsError(f\"File {path} already exists\")\n    with open(path, \"w\") as file:\n        json.dump(self.serialize(), file)\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.from_json","title":"from_json(path) classmethod","text":"

Load the settings from a file on disk

Source code in src/argilla/settings/_resource.py
@classmethod\ndef from_json(cls, path: Union[Path, str]) -> \"Settings\":\n    \"\"\"Load the settings from a file on disk\"\"\"\n\n    with open(path, \"r\") as file:\n        settings_dict = json.load(file)\n        return cls._from_dict(settings_dict)\n
"},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.from_hub","title":"from_hub(repo_id, subset=None, feature_mapping=None, **kwargs) classmethod","text":"

Load the settings from the Hub

Parameters:

Name Type Description Default repo_id str

The ID of the repository to load the settings from on the Hub.

required subset Optional[str]

The subset of the repository to load the settings from.

None feature_mapping Dict[str, Literal['question', 'field', 'metadata']]

A dictionary that maps incoming column names to Argilla attributes.

None Source code in src/argilla/settings/_resource.py
@classmethod\ndef from_hub(\n    cls,\n    repo_id: str,\n    subset: Optional[str] = None,\n    feature_mapping: Optional[Dict[str, Literal[\"question\", \"field\", \"metadata\"]]] = None,\n    **kwargs,\n) -> \"Settings\":\n    \"\"\"Load the settings from the Hub\n\n    Parameters:\n        repo_id (str): The ID of the repository to load the settings from on the Hub.\n        subset (Optional[str]): The subset of the repository to load the settings from.\n        feature_mapping (Dict[str, Literal[\"question\", \"field\", \"metadata\"]]): A dictionary that maps incoming column names to Argilla attributes.\n    \"\"\"\n\n    settings = build_settings_from_repo_id(repo_id=repo_id, feature_mapping=feature_mapping, subset=subset)\n    return settings\n
"},{"location":"reference/argilla/settings/task_distribution/","title":"Distribution","text":"

Distribution settings are used to define the criteria used by the tool to automatically manage records in the dataset depending on the expected number of submitted responses per record.

"},{"location":"reference/argilla/settings/task_distribution/#usage-examples","title":"Usage Examples","text":"

The default minimum submitted responses per record is 1. If you wish to increase this value, you can define it through the TaskDistribution class and pass it to the Settings class.

settings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3)\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings\n)\n
"},{"location":"reference/argilla/settings/task_distribution/#src.argilla.settings._task_distribution.OverlapTaskDistribution","title":"OverlapTaskDistribution","text":"

The task distribution settings class.

This task distribution defines a number of submitted responses required to complete a record.

Parameters:

Name Type Description Default min_submitted int

The number of min. submitted responses to complete the record

required Source code in src/argilla/settings/_task_distribution.py
class OverlapTaskDistribution:\n    \"\"\"The task distribution settings class.\n\n    This task distribution defines a number of submitted responses required to complete a record.\n\n    Parameters:\n        min_submitted (int): The number of min. submitted responses to complete the record\n    \"\"\"\n\n    strategy: Literal[\"overlap\"] = \"overlap\"\n\n    def __init__(self, min_submitted: int):\n        self._model = OverlapTaskDistributionModel(min_submitted=min_submitted, strategy=self.strategy)\n\n    def __repr__(self) -> str:\n        return f\"OverlapTaskDistribution(min_submitted={self.min_submitted})\"\n\n    def __eq__(self, other) -> bool:\n        if not isinstance(other, self.__class__):\n            return False\n\n        return self._model == other._model\n\n    @classmethod\n    def default(cls) -> \"OverlapTaskDistribution\":\n        return cls(min_submitted=1)\n\n    @property\n    def min_submitted(self):\n        return self._model.min_submitted\n\n    @min_submitted.setter\n    def min_submitted(self, value: int):\n        self._model.min_submitted = value\n\n    @classmethod\n    def from_model(cls, model: OverlapTaskDistributionModel) -> \"OverlapTaskDistribution\":\n        return cls(min_submitted=model.min_submitted)\n\n    @classmethod\n    def from_dict(cls, dict: Dict[str, Any]) -> \"OverlapTaskDistribution\":\n        return cls.from_model(OverlapTaskDistributionModel.model_validate(dict))\n\n    def to_dict(self):\n        return self._model.model_dump()\n\n    def _api_model(self) -> OverlapTaskDistributionModel:\n        return self._model\n
"},{"location":"reference/argilla/settings/vectors/","title":"Vectors","text":"

Vector fields in Argilla are used to define the vector form of a record that will be reviewed by a user.

"},{"location":"reference/argilla/settings/vectors/#usage-examples","title":"Usage Examples","text":"

To define a vector field, instantiate the VectorField class with a name and dimensions, then pass it to the vectors parameter of the Settings class.

settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    vectors=[\n        rg.VectorField(\n            name=\"my_vector\",\n            dimension=768,\n            title=\"Document Embedding\",\n        ),\n    ],\n)\n

To add records with vectors, refer to the rg.Vector class documentation.

"},{"location":"reference/argilla/settings/vectors/#src.argilla.settings._vector.VectorField","title":"VectorField","text":"

Bases: Resource

Vector field for use in Argilla Dataset Settings

Source code in src/argilla/settings/_vector.py
class VectorField(Resource):\n    \"\"\"Vector field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    _model: VectorFieldModel\n    _api: VectorsAPI\n    _dataset: Optional[\"Dataset\"]\n\n    def __init__(\n        self,\n        name: str,\n        dimensions: int,\n        title: Optional[str] = None,\n        _client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Vector field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the vector field\n            dimensions (int): The number of dimensions in the vector\n            title (Optional[str]): The title of the vector to be shown in the UI.\n        \"\"\"\n        client = _client or Argilla._get_default()\n        super().__init__(api=client.api.vectors, client=client)\n        self._model = VectorFieldModel(name=name, title=title, dimensions=dimensions)\n        self._dataset = None\n\n    @property\n    def name(self) -> str:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def title(self) -> Optional[str]:\n        return self._model.title\n\n    @title.setter\n    def title(self, value: Optional[str]) -> None:\n        self._model.title = value\n\n    @property\n    def dimensions(self) -> int:\n        return self._model.dimensions\n\n    @dimensions.setter\n    def dimensions(self, value: int) -> None:\n        self._model.dimensions = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n        self._model.dataset_id = self._dataset.id\n        self._with_client(self._dataset._client)\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(name={self.name}, title={self.title}, dimensions={self.dimensions})\"\n\n    @classmethod\n    def from_model(cls, model: VectorFieldModel) -> \"VectorField\":\n        instance = cls(name=model.name, dimensions=model.dimensions)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"VectorField\":\n        model = VectorFieldModel(**data)\n        return cls.from_model(model=model)\n\n    def _with_client(self, client: \"Argilla\") -> \"VectorField\":\n        # TODO: Review and simplify. Maybe only one of them is required\n        self._client = client\n        self._api = self._client.api.vectors\n\n        return self\n
"},{"location":"reference/argilla/settings/vectors/#src.argilla.settings._vector.VectorField.__init__","title":"__init__(name, dimensions, title=None, _client=None)","text":"

Vector field for use in Argilla Dataset Settings

Parameters:

Name Type Description Default name str

The name of the vector field

required dimensions int

The number of dimensions in the vector

required title Optional[str]

The title of the vector to be shown in the UI.

None Source code in src/argilla/settings/_vector.py
def __init__(\n    self,\n    name: str,\n    dimensions: int,\n    title: Optional[str] = None,\n    _client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Vector field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the vector field\n        dimensions (int): The number of dimensions in the vector\n        title (Optional[str]): The title of the vector to be shown in the UI.\n    \"\"\"\n    client = _client or Argilla._get_default()\n    super().__init__(api=client.api.vectors, client=client)\n    self._model = VectorFieldModel(name=name, title=title, dimensions=dimensions)\n    self._dataset = None\n
"},{"location":"reference/argilla-server/configuration/","title":"Server configuration","text":"

This section explains advanced operations and settings for running the Argilla Server and Argilla Python Client.

By default, the Argilla Server will look for your Elasticsearch (ES) endpoint at http://localhost:9200. You can customize this by setting the ARGILLA_ELASTICSEARCH environment variable. Have a look at the list of available environment variables to further configure the Argilla server.

From the Argilla version 1.19.0, you must set up the search engine manually to work with datasets. You should set the environment variable ARGILLA_SEARCH_ENGINE=opensearch or ARGILLA_SEARCH_ENGINE=elasticsearch depending on the backend you're using The default value for this variable is set to elasticsearch. The minimal version for Elasticsearch is 8.5.0, and for Opensearch is 2.4.0. Please, review your backend and upgrade it if necessary.

Warning

For vector search in OpenSearch, the filtering applied is using a post_filter step, since there is a bug that makes queries fail using filtering + knn from Argilla. See https://github.com/opensearch-project/k-NN/issues/1286

This may result in unexpected results when combining filtering with vector search with this engine.

"},{"location":"reference/argilla-server/configuration/#launching","title":"Launching","text":""},{"location":"reference/argilla-server/configuration/#using-a-proxy","title":"Using a proxy","text":"

If you run Argilla behind a proxy by adding some extra prefix to expose the service, you should set the ARGILLA_BASE_URL environment variable to properly route requests to the server application.

For example, if your proxy exposes Argilla in the URL https://my-proxy/custom-path-for-argilla, you should launch the Argilla server with ARGILLA_BASE_URL=/custom-path-for-argilla.

NGINX and Traefik have been tested and are known to work with Argilla:

  • NGINX example
  • Traefik example
"},{"location":"reference/argilla-server/configuration/#environment-variables","title":"Environment variables","text":"

You can set the following environment variables to further configure your server and client.

"},{"location":"reference/argilla-server/configuration/#server","title":"Server","text":""},{"location":"reference/argilla-server/configuration/#fastapi","title":"FastAPI","text":"
  • ARGILLA_HOME_PATH: The directory where Argilla will store all the files needed to run. If the path doesn't exist it will be automatically created (Default: ~/.argilla).

  • ARGILLA_BASE_URL: If you want to launch the Argilla server in a specific base path other than /, you should set up this environment variable. This can be useful when running Argilla behind a proxy that adds a prefix path to route the service (Default: \"/\").

  • ARGILLA_CORS_ORIGINS: List of host patterns for CORS origin access.

  • ARGILLA_DOCS_ENABLED: If False, disables openapi docs endpoint at /api/docs.

  • HF_HUB_DISABLE_TELEMETRY: If True, disables telemetry for usage metrics. Alternatively, you can disable telemetry by setting HF_HUB_OFFLINE=1.

"},{"location":"reference/argilla-server/configuration/#authentication","title":"Authentication","text":"
  • ARGILLA_AUTH_SECRET_KEY: The secret key used to sign the API token data. You can use openssl rand -hex 32 to generate a 32 character string to use with this environment variable. By default a random value is generated, so if you are using more than one server worker (or more than one Argilla server) you will need to set the same value for all of them.
  • USERNAME: If provided, the owner username (Default: None).
  • PASSWORD: If provided, the owner password (Default: None).

If USERNAME and PASSWORD are provided, the owner user will be created with these credentials on the server startup.

"},{"location":"reference/argilla-server/configuration/#database","title":"Database","text":"
  • ARGILLA_DATABASE_URL: A URL string that contains the necessary information to connect to a database. Argilla uses SQLite by default, PostgreSQL is also officially supported (Default: sqlite:///$ARGILLA_HOME_PATH/argilla.db?check_same_thread=False).
"},{"location":"reference/argilla-server/configuration/#sqlite","title":"SQLite","text":"

The following environment variables are useful only when SQLite is used:

  • ARGILLA_DATABASE_SQLITE_TIMEOUT: How many seconds the connection should wait before raising an OperationalError when a table is locked. If another connection opens a transaction to modify a table, that table will be locked until the transaction is committed. (Defaut: 15 seconds).
"},{"location":"reference/argilla-server/configuration/#postgresql","title":"PostgreSQL","text":"

The following environment variables are useful only when PostgreSQL is used:

  • ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE: The number of connections to keep open inside the database connection pool (Default: 15).

  • ARGILLA_DATABASE_POSTGRESQL_MAX_OVERFLOW: The number of connections that can be opened above and beyond ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE setting (Default: 10).

"},{"location":"reference/argilla-server/configuration/#search-engine","title":"Search engine","text":"
  • ARGILLA_ELASTICSEARCH: URL of the connection endpoint of the Elasticsearch instance (Default: http://localhost:9200).

  • ARGILLA_SEARCH_ENGINE: Search engine to use. Valid values are \"elasticsearch\" and \"opensearch\" (Default: \"elasticsearch\").

  • ARGILLA_ELASTICSEARCH_SSL_VERIFY: If \"False\", disables SSL certificate verification when connecting to the Elasticsearch backend.

  • ARGILLA_ELASTICSEARCH_CA_PATH: Path to CA cert for ES host. For example: /full/path/to/root-ca.pem (Optional)

"},{"location":"reference/argilla-server/configuration/#redis","title":"Redis","text":"

Redis is used by Argilla to store information about jobs to be processed on background. The following environment variables are useful to config how Argilla connects to Redis:

  • ARGILLA_REDIS_URL: A URL string that contains the necessary information to connect to a Redis instance (Default: redis://localhost:6379/0).
"},{"location":"reference/argilla-server/configuration/#datasets","title":"Datasets","text":"
  • ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS: Set the number of maximum items to be allowed by label and multi label questions (Default: 500).

  • ARGILLA_SPAN_OPTIONS_MAX_ITEMS: Set the number of maximum items to be allowed by span questions (Default: 500).

"},{"location":"reference/argilla-server/configuration/#hugging-face","title":"Hugging Face","text":"
  • ARGILLA_SHOW_HUGGINGFACE_SPACE_PERSISTENT_STORAGE_WARNING: When Argilla is running on Hugging Face Spaces you can use this environment variable to disable the warning message showed when persistent storage is disabled for the space (Default: true).
"},{"location":"reference/argilla-server/configuration/#docker-images-only","title":"Docker images only","text":"
  • REINDEX_DATASETS: If true or 1, the datasets will be reindexed in the search engine. This is needed when some search configuration changed or data must be refreshed (Default: 0).

  • USERNAME: If provided, the owner username. This can be combined with HF OAuth to define the argilla server owner (Default: \"\").

  • PASSWORD: If provided, the owner password. If USERNAME and PASSWORD are provided, the owner user will be created with these credentials on the server startup (Default: \"\").

  • WORKSPACE: If provided, the workspace name. If USERNAME, PASSWORD and WORSPACE are provided, a default workspace will be created with this name (Default: \"\").

  • API_KEY: The default user api key to user. If API_KEY is not provided, a new random api key will be generated (Default: \"\").

  • UVICORN_APP: [Advanced] The name of the FastAPI app to run. This is useful when you want to extend the FastAPI app with additional routes or middleware. The default value is argilla_server:app.

"},{"location":"reference/argilla-server/configuration/#rest-api-docs","title":"REST API docs","text":"

FastAPI also provides beautiful REST API docs that you can check at http://localhost:6900/api/v1/docs.

"},{"location":"reference/argilla-server/telemetry/","title":"Server Telemetry","text":"

Argilla uses telemetry to report anonymous usage and error information. As an open-source software, this type of information is important to improve and understand how the product is used. This is done through the Hugging Face Hub library telemetry implementations.

"},{"location":"reference/argilla-server/telemetry/#how-to-opt-out","title":"How to opt-out","text":"

You can opt out of telemetry reporting using the ENV variable HF_HUB_DISABLE_TELEMETRY before launching the server. Setting this variable to 1 will completely disable telemetry reporting.

If you are a Linux/MacOs user, you should run:

export HF_HUB_DISABLE_TELEMETRY=1\n

If you are a Windows user, you should run:

set HF_HUB_DISABLE_TELEMETRY=1\n

To opt in again, you can set the variable to 0.

"},{"location":"reference/argilla-server/telemetry/#why-reporting-telemetry","title":"Why reporting telemetry","text":"

Anonymous telemetry information enables us to continuously improve the product and detect recurring problems to better serve all users. We collect aggregated information about general usage and errors. We do NOT collect any information on users' data records, datasets, or metadata information.

"},{"location":"reference/argilla-server/telemetry/#sensitive-data","title":"Sensitive data","text":"

We do not collect any piece of information related to the source data you store in Argilla. We don't identify individual users. Your data does not leave your server at any time:

  • No dataset record is collected.
  • No dataset names or metadata are collected.
"},{"location":"reference/argilla-server/telemetry/#information-reported","title":"Information reported","text":"

The following usage and error information is reported:

  • The code of the raised error
  • The user-agent and accept-language http headers
  • Task name and number of records for bulk operations
  • An anonymous generated user uuid
  • An anonymous generated server uuid
  • The Argilla version running the server
  • The Python version, e.g. 3.8.13
  • The system/OS name, such as Linux, Darwin, Windows
  • The system\u2019s release version, e.g. Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020
  • The machine type, e.g. AMD64
  • The underlying platform spec with as much useful information as possible. (eg. macOS-10.16-x86_64-i386-64bit)
  • The type of deployment: huggingface_space or server
  • The dockerized deployment flag: True or False

For transparency, you can inspect the source code where this is performed here.

If you have any doubts, don't hesitate to join our Discord channel or open a GitHub issue. We'd be very happy to discuss how we can improve this.

"},{"location":"tutorials/","title":"Tutorials","text":"

These are the tutorials for the Argilla SDK. They provide step-by-step instructions for common tasks.

  • Text classification

    Learn about a standard workflow for a text classification task with model fine-tuning.

    Tutorial

  • Token classification

    Learn about a standard workflow for a token classification task with model fine-tuning.

    Tutorial

  • Image classification

    Learn about a standard workflow for an image classification task with model fine-tuning.

    Tutorial

  • Image preference

    Learn about a standard workflow for multi-modal preference datasets like image generation preference.

    Tutorial

"},{"location":"tutorials/image_classification/","title":"Image classification","text":"
  • Goal: Show a standard workflow for an image classification task.
  • Dataset: MNIST, a dataset of 28x28 grayscale images that need to be classified as digits.
  • Libraries: datasets, transformers
  • Components: ImageField, LabelQuestion, Suggestion

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

!pip install argilla\n
!pip install \"transformers[torch]~=4.0\" \"accelerate~=0.34\"\n

Let's make the required imports:

import base64\nimport io\nimport re\n\nfrom IPython.display import display\nimport numpy as np\nimport torch\nfrom PIL import Image\n\nfrom datasets import load_dataset, Dataset, load_metric\nfrom transformers import (\n    AutoImageProcessor,\n    AutoModelForImageClassification,\n    pipeline,\n    Trainer,\n    TrainingArguments\n)\n\nimport argilla as rg\n

You also need to connect to the Argilla server using the api_url and api_key.

# Replace api_url with your url if using Docker\n# Replace api_key with your API key under \"My Settings\" in the UI\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"[your-api-key]\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a field for the image column and a label question for the label column.

Note

Check this how-to guide to know more about configuring and creating a dataset.

labels = [str(x) for x in range(10)]\n\nsettings = rg.Settings(\n    guidelines=\"The goal of this task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9, inclusively.\",\n    fields=[\n        rg.ImageField(\n            name=\"image\",\n            title=\"An image of a handwritten digit.\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"image_label\",\n            title=\"What digit do you see on the image?\",\n            labels=labels,\n        )\n    ]\n)\n

Let's create the dataset with the name and the defined settings:

dataset = rg.Dataset(\n    name=\"image_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the ylecun/mnist dataset from the Hugging Face Hub. Specifically, we will use 100 examples. Because we are dealing with a potentially large image dataset, we will set streaming=True to avoid loading the entire dataset into memory and iterate over the data to lazily load it.

Tip

When working with Hugging Face datasets, you can set Image(decode=False) so that we can get public image URLs, but this depends on the dataset.

n_rows = 100\n\nhf_dataset = load_dataset(\"ylecun/mnist\", streaming=True)\ndataset_rows = [row for _,row in zip(range(n_rows), hf_dataset[\"train\"])]\nhf_dataset = Dataset.from_list(dataset_rows)\n\nhf_dataset\n
\nDataset({\n    features: ['image', 'label'],\n    num_rows: 100\n})\n

Let's have a look at the first image in the dataset.

hf_dataset[0]\n
\n{'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28>,\n 'label': 5}\n

We will easily add them to the dataset using log, without needing a mapping since the names already match the Argilla resources. Additionally, since the images are already in PIL format and defined as Image in the Hugging Face dataset\u2019s features, we can log them directly. We will also include an id column in each record, allowing us to easily trace back to the external data source.

hf_dataset = hf_dataset.add_column(\"id\", range(len(hf_dataset)))\ndataset.records.log(records=hf_dataset)\n

The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a zero-shot CLIP model. However, you can use a framework or technique of your choice.

We will start by loading the model using a transformers pipeline.

checkpoint = \"openai/clip-vit-large-patch14\"\ndetector = pipeline(model=checkpoint, task=\"zero-shot-image-classification\")\n

Now, let's try to make a model prediction and see if it makes sense.

predictions = detector(hf_dataset[1][\"image\"], candidate_labels=labels)\npredictions, display(hf_dataset[1][\"image\"])\n
\n([{'score': 0.5236628651618958, 'label': '0'},\n  {'score': 0.11496700346469879, 'label': '7'},\n  {'score': 0.08030630648136139, 'label': '8'},\n  {'score': 0.07141078263521194, 'label': '9'},\n  {'score': 0.05868939310312271, 'label': '6'},\n  {'score': 0.05507850646972656, 'label': '5'},\n  {'score': 0.0341767854988575, 'label': '1'},\n  {'score': 0.027202051132917404, 'label': '4'},\n  {'score': 0.018533246591687202, 'label': '3'},\n  {'score': 0.015973029658198357, 'label': '2'}],\n None)\n

It's time to make the predictions on the dataset! We will set a function that uses the zero-shot model. The model will infer the label based on the image. When working with large datasets, you can create a batch_predict method to speed up the process.

def predict(input, labels):\n    prediction = detector(input, candidate_labels=labels)\n    prediction = prediction[0]\n    return {\"image_label\": prediction[\"label\"], \"score\": prediction[\"score\"]}\n

To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"id\": sample[\"id\"],\n        **predict(sample[\"image\"], labels),\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data, mapping={\"score\": \"image_label.suggestion.score\"})\n

Voil\u00e0! We have added the suggestions to the dataset, and they will appear in the UI marked with a \u2728.

Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

Note

Check this how-to guide to know more about annotating in the UI.

After the annotation, we will have a robust dataset to train the main model. In our case, we will fine-tune using transformers. However, you can select the one that best fits your requirements.

So, let's start by retrieving the annotated records and exporting them as a Dataset, so images will be in PIL format.

Note

Check this how-to guide to know more about filtering and querying in Argilla. Also, you can check the Hugging Face docs on fine-tuning an image classification model.

dataset = client.datasets(\"image_classification_dataset\")\n
status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_datasets()\n

We now need to ensure our images are forwarded with the correct dimensions. Because the original MNIST dataset is greyscale and the VIT model expects RGB, we need to add a channel dimension to the images. We will do this by stacking the images along the channel axis.

def greyscale_to_rgb(img) -&gt; Image:\n    return Image.merge('RGB', (img, img, img))\n\nsubmitted_image_rgb = [\n    {\n        \"id\": sample[\"id\"],\n        \"image\": greyscale_to_rgb(sample[\"image\"]),\n        \"label\": sample[\"image_label.responses\"][0],\n    }\n    for sample in submitted\n]\nsubmitted_image_rgb[0]\n
\n{'id': '0', 'image': <PIL.Image.Image image mode=RGB size=28x28>, 'label': '0'}\n

Next, we will load the ImageProcessor to fine-tune the model. This processor will handle the image resizing and normalization in order to be compatible with the model we intend to use.

checkpoint = \"google/vit-base-patch16-224-in21k\"\nprocessor = AutoImageProcessor.from_pretrained(checkpoint)\n\nsubmitted_image_rgb_processed = [\n    {\n        \"pixel_values\": processor(sample[\"image\"], return_tensors='pt')[\"pixel_values\"],\n        \"label\": sample[\"label\"],\n    }\n    for sample in submitted_image_rgb\n]\nsubmitted_image_rgb_processed[0]\n

We can now convert the images to a Hugging Face Dataset that is ready for fine-tuning.

prepared_ds = Dataset.from_list(submitted_image_rgb_processed)\nprepared_ds = prepared_ds.train_test_split(test_size=0.2)\nprepared_ds\n
\nDatasetDict({\n    train: Dataset({\n        features: ['pixel_values', 'label'],\n        num_rows: 80\n    })\n    test: Dataset({\n        features: ['pixel_values', 'label'],\n        num_rows: 20\n    })\n})\n

We then need to define our data collator, which will ensure the data is unpacked and stacked correctly for the model.

def collate_fn(batch):\n    return {\n        'pixel_values': torch.stack([torch.tensor(x['pixel_values'][0]) for x in batch]),\n        'labels': torch.tensor([int(x['label']) for x in batch])\n    }\n

Next, we can define our training metrics. We will use the accuracy metric to evaluate the model's performance.

metric = load_metric(\"accuracy\", trust_remote_code=True)\ndef compute_metrics(p):\n    return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)\n

We then load our model and configure the labels that we will use for training.

model = AutoModelForImageClassification.from_pretrained(\n    checkpoint,\n    num_labels=len(labels),\n    id2label={int(i): int(c) for i, c in enumerate(labels)},\n    label2id={int(c): int(i) for i, c in enumerate(labels)}\n)\nmodel.config\n

Finally, we define the training arguments and start the training process.

training_args = TrainingArguments(\n  output_dir=\"./image-classifier\",\n  per_device_train_batch_size=16,\n  eval_strategy=\"steps\",\n  num_train_epochs=1,\n  fp16=False, # True if you have a GPU with mixed precision support\n  save_steps=100,\n  eval_steps=100,\n  logging_steps=10,\n  learning_rate=2e-4,\n  save_total_limit=2,\n  remove_unused_columns=True,\n  push_to_hub=False,\n  load_best_model_at_end=True,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    data_collator=collate_fn,\n    compute_metrics=compute_metrics,\n    train_dataset=prepared_ds[\"train\"],\n    eval_dataset=prepared_ds[\"test\"],\n    tokenizer=processor,\n)\n\ntrain_results = trainer.train()\ntrainer.save_model()\ntrainer.log_metrics(\"train\", train_results.metrics)\ntrainer.save_metrics(\"train\", train_results.metrics)\ntrainer.save_state()\n
\n{'train_runtime': 12.5374, 'train_samples_per_second': 6.381, 'train_steps_per_second': 0.399, 'train_loss': 2.0533515930175783, 'epoch': 1.0}\n***** train metrics *****\n  epoch                    =        1.0\n  total_flos               =  5774017GF\n  train_loss               =     2.0534\n  train_runtime            = 0:00:12.53\n  train_samples_per_second =      6.381\n  train_steps_per_second   =      0.399\n\n

As the training data was of better quality, we can expect a better model. So we can update the remainder of our original dataset with the new model's suggestions.

pipe = pipeline(\"image-classification\", model=model, image_processor=processor)\n\ndef run_inference(batch):\n    predictions = pipe(batch[\"image\"])\n    batch[\"image_label\"] = [prediction[0][\"label\"] for prediction in predictions]\n    batch[\"score\"] = [prediction[0][\"score\"] for prediction in predictions]\n    return batch\n\nhf_dataset = hf_dataset.map(run_inference, batched=True)\n
data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"image_label\": str(sample[\"image_label\"]),\n        \"id\": sample[\"id\"],\n        \"score\": sample[\"score\"],\n    }\n    for sample in hf_dataset\n]\ndataset.records.log(records=updated_data, mapping={\"score\": \"image_label.suggestion.score\"})\n

In this tutorial, we present an end-to-end example of an image classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

We started by configuring the dataset and adding records and suggestions from a zero-shot model. After the annotation process, we trained a new model with the annotated data and updated the remaining records with the new suggestions.

"},{"location":"tutorials/image_classification/#image-classification","title":"Image classification","text":""},{"location":"tutorials/image_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/image_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/image_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/image_classification/#vibe-check-the-dataset","title":"Vibe check the dataset","text":"

We will look at the dataset to understand its structure and the kind of data it contains. We do this by using the embedded Hugging Face Dataset Viewer.

"},{"location":"tutorials/image_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/image_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/image_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/image_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/image_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/image_classification/#formatting-the-data","title":"Formatting the data","text":""},{"location":"tutorials/image_classification/#the-actual-training","title":"The actual training","text":""},{"location":"tutorials/image_classification/#conclusions","title":"Conclusions","text":""},{"location":"tutorials/image_preference/","title":"Image preference","text":"
  • Goal: Show a standard workflow for working with complex multi-modal preference datasets, such as for image-generation preference.
  • Dataset: tomg-group-umd/pixelprose, is a comprehensive dataset of over 16M (million) synthetically generated captions, leveraging cutting-edge vision-language models (Gemini 1.0 Pro Vision) for detailed and accurate descriptions.
  • Libraries: datasets, sentence-transformers
  • Components: TextField, ImageField, TextQuestion, LabelQuestion VectorField, FloatMetadataProperty

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

!pip install argilla\n
!pip install \"sentence-transformers~=3.0\"\n

Let's make the required imports:

import io\nimport os\nimport time\n\nimport argilla as rg\nimport requests\nfrom PIL import Image\nfrom datasets import load_dataset, Dataset\nfrom sentence_transformers import SentenceTransformer\n

You also need to connect to the Argilla server using the api_url and api_key.

# Replace api_url with your url if using Docker\n# Replace api_key with your API key under \"My Settings\" in the UI\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"[your-api-key]\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. We will include a TextField, an ImageField corresponding to the url image column, and two additional ImageField fields representing the images we will generate based on the original_caption column from our dataset. Additionally, we will use a LabelQuestion and an optional TextQuestion, which will be used to collect the user's preference and the reason behind it. We will also be adding a VectorField to store the embeddings for the original_caption so that we can use semantic search and speed up our labeling process. Lastly, we will include two FloatMetadataProperty to store information from the toxicity and the identity_attack columns.

Note

Check this how-to guide to know more about configuring and creating a dataset.

settings = rg.Settings(\n    guidelines=\"The goal is to choose the image that best represents the caption.\",\n    fields=[\n        rg.TextField(\n            name=\"caption\",\n            title=\"An image caption belonging to the original image.\",\n        ),\n        rg.ImageField(\n            name=\"image_original\",\n            title=\"The original image, belonging to the caption.\",\n        ),\n        rg.ImageField(\n            name=\"image_1\",\n            title=\"An image that has been generated based on the caption.\",\n        ),\n        rg.ImageField(\n            name=\"image_2\",\n            title=\"An image that has been generated based on the caption.\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"preference\",\n            title=\"The chosen preference for the generation.\",\n            labels=[\"image_1\", \"image_2\"],\n        ),\n        rg.TextQuestion(\n            name=\"comment\",\n            title=\"Any additional comments.\",\n            required=False,\n        ),\n    ],\n    metadata=[\n        rg.FloatMetadataProperty(name=\"toxicity\", title=\"Toxicity score\"),\n        rg.FloatMetadataProperty(name=\"identity_attack\", title=\"Identity attack score\"),\n\n    ],\n    vectors=[\n        rg.VectorField(name=\"original_caption_vector\", dimensions=384),\n    ]\n)\n

Let's create the dataset with the name and the defined settings:

dataset = rg.Dataset(\n    name=\"image_preference_dataset\",\n    settings=settings,\n)\ndataset.create()\n
n_rows = 25\n\nhf_dataset = load_dataset(\"tomg-group-umd/pixelprose\", streaming=True)\ndataset_rows = [row for _,row in zip(range(n_rows), hf_dataset[\"train\"])]\nhf_dataset = Dataset.from_list(dataset_rows)\n\nhf_dataset\n
\nDataset({\n    features: ['uid', 'url', 'key', 'status', 'original_caption', 'vlm_model', 'vlm_caption', 'toxicity', 'severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat', 'sexual_explicit', 'watermark_class_id', 'watermark_class_score', 'aesthetic_score', 'error_message', 'width', 'height', 'original_width', 'original_height', 'exif', 'sha256', 'image_id', 'author', 'subreddit', 'score'],\n    num_rows: 25\n})\n

Let's have a look at the first entry in the dataset.

hf_dataset[0]\n
\n{'uid': '0065a9b1cb4da4696f2cd6640e00304257cafd97c0064d4c61e44760bf0fa31c',\n 'url': 'https://media.gettyimages.com/photos/plate-of-food-from-murray-bros-caddy-shack-at-the-world-golf-hall-of-picture-id916117812?s=612x612',\n 'key': '007740026',\n 'status': 'success',\n 'original_caption': 'A plate of food from Murray Bros Caddy Shack at the World Golf Hall of Fame',\n 'vlm_model': 'gemini-pro-vision',\n 'vlm_caption': ' This image displays: A plate of fried calamari with a lemon wedge and a side of green beans, served in a basket with a pink bowl of marinara sauce. The basket is sitting on a table with a checkered tablecloth. In the background is a glass of water and a plate with a burger and fries. The style of the image is a photograph.',\n 'toxicity': 0.0005555678508244455,\n 'severe_toxicity': 1.7323875454167137e-06,\n 'obscene': 3.8304504414554685e-05,\n 'identity_attack': 0.00010549413127591833,\n 'insult': 0.00014773994917050004,\n 'threat': 2.5982120860135183e-05,\n 'sexual_explicit': 2.0972733182134107e-05,\n 'watermark_class_id': 1.0,\n 'watermark_class_score': 0.733799934387207,\n 'aesthetic_score': 5.390625,\n 'error_message': None,\n 'width': 612,\n 'height': 408,\n 'original_width': 612,\n 'original_height': 408,\n 'exif': '{\"Image ImageDescription\": \"A plate of food from Murray Bros. Caddy Shack at the World Golf Hall of Fame. (Photo by: Jeffrey Greenberg/Universal Images Group via Getty Images)\", \"Image XResolution\": \"300\", \"Image YResolution\": \"300\"}',\n 'sha256': '0065a9b1cb4da4696f2cd6640e00304257cafd97c0064d4c61e44760bf0fa31c',\n 'image_id': 'null',\n 'author': 'null',\n 'subreddit': -1,\n 'score': -1}\n

As we can see, the url column does not contain an image extension, so we will apply some additional filtering to ensure we have only public image URLs.

hf_dataset = hf_dataset.filter(\n    lambda x: any([x[\"url\"].endswith(extension) for extension in [\".jpg\", \".png\", \".jpeg\"]]))\n\nhf_dataset\n
\nDataset({\n    features: ['uid', 'url', 'key', 'status', 'original_caption', 'vlm_model', 'vlm_caption', 'toxicity', 'severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat', 'sexual_explicit', 'watermark_class_id', 'watermark_class_score', 'aesthetic_score', 'error_message', 'width', 'height', 'original_width', 'original_height', 'exif', 'sha256', 'image_id', 'author', 'subreddit', 'score'],\n    num_rows: 18\n})\n
API_URL = \"https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-schnell\"\nheaders = {\"Authorization\": f\"Bearer {os.getenv('HF_TOKEN')}\"}\n\ndef query(payload):\n    response = requests.post(API_URL, headers=headers, json=payload)\n    if response.status_code == 200:\n        image_bytes = response.content\n        image = Image.open(io.BytesIO(image_bytes))\n    else:\n        print(f\"Request failed with status code {response.status_code}. retrying in 10 seconds.\")\n        time.sleep(10)\n        image = query(payload)\n    return image\n\nquery({\n    \"inputs\": \"Astronaut riding a horse\"\n})\n

Cool! Since we've evaluated the generation function, let's generate the PIL images for the dataset.

def generate_image(row):\n    caption = row[\"original_caption\"]\n    row[\"image_1\"] = query({\"inputs\": caption})\n    row[\"image_2\"] = query({\"inputs\": caption + \" \"}) # space to avoid caching and getting the same image\n    return row\n\nhf_dataset_with_images = hf_dataset.map(generate_image, batched=False)\n\nhf_dataset_with_images\n
\nDataset({\n    features: ['uid', 'url', 'key', 'status', 'original_caption', 'vlm_model', 'vlm_caption', 'toxicity', 'severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat', 'sexual_explicit', 'watermark_class_id', 'watermark_class_score', 'aesthetic_score', 'error_message', 'width', 'height', 'original_width', 'original_height', 'exif', 'sha256', 'image_id', 'author', 'subreddit', 'score', 'image_1', 'image_2'],\n    num_rows: 18\n})\n
model = SentenceTransformer(\"TaylorAI/bge-micro-v2\")\n\ndef encode_questions(batch):\n    vectors_as_numpy = model.encode(batch[\"original_caption\"])\n    batch[\"original_caption_vector\"] = [x.tolist() for x in vectors_as_numpy]\n    return batch\n\nhf_dataset_with_images_vectors = hf_dataset_with_images.map(encode_questions, batched=True)\n
dataset.records.log(records=hf_dataset_with_images_vectors, mapping={\n    \"key\": \"id\",\n    \"original_caption\": \"caption\",\n    \"url\": \"image_original\",\n})\n

Voil\u00e0! We have our Argilla dataset ready for annotation.

Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records.

Note

Check this how-to guide to know more about annotating in the UI.

In this tutorial, we present an end-to-end example of an image preference task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

We started by configuring the dataset and adding records with the original and generated images. After the annotation process, you can evaluate the results and potentially retrain the model to improve the quality of the generated images.

"},{"location":"tutorials/image_preference/#image-preference","title":"Image preference","text":""},{"location":"tutorials/image_preference/#getting-started","title":"Getting started","text":""},{"location":"tutorials/image_preference/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/image_preference/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/image_preference/#vibe-check-the-dataset","title":"Vibe check the dataset","text":"

We will take a look at the dataset to understand its structure and the types of data it contains. We can do this using the embedded Hugging Face Dataset Viewer.

"},{"location":"tutorials/image_preference/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/image_preference/#add-records","title":"Add records","text":"

Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the tomg-group-umd/pixelprose dataset from the Hugging Face Hub. Specifically, we will use 25 examples. Because we are dealing with a potentially large image dataset, we will set streaming=True to avoid loading the entire dataset into memory and iterate over the data to lazily load it.

Tip

When working with Hugging Face datasets, you can set Image(decode=False) so that we can get public image URLs, but this depends on the dataset.

"},{"location":"tutorials/image_preference/#generate-images","title":"Generate images","text":"

We'll start by generating images based on the original_caption column using the recently released black-forest-labs/FLUX.1-schnell model. For this, we will use the free but rate-limited Inference API provided by Hugging Face, but you can use any other model from the Hub or method. We will generate 2 images per example. Additionally, we will add a small retry mechanism to handle the rate limit.

Let's begin by defining and testing a generation function.

"},{"location":"tutorials/image_preference/#add-vectors","title":"Add vectors","text":"

We will use the sentence-transformers library to create vectors for the original_caption. We will use the TaylorAI/bge-micro-v2 model, which strikes a good balance between speed and performance. Note that we also need to convert the vectors to a list to store them in the Argilla dataset.

"},{"location":"tutorials/image_preference/#log-to-argilla","title":"Log to Argilla","text":"

We will easily add them to the dataset using log and the mapping, where we indicate which column from our dataset needs to be mapped to which Argilla resource if the names do not correspond. We are also using the key column as id for our record so we can easily backtrack the record to the external data source.

"},{"location":"tutorials/image_preference/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/image_preference/#conclusions","title":"Conclusions","text":""},{"location":"tutorials/text_classification/","title":"Text classification","text":"
  • Goal: Show a standard workflow for a text classification task, including zero-shot suggestions and model fine-tuning.
  • Dataset: IMDB, a dataset of movie reviews that need to be classified as positive or negative.
  • Libraries: datasets, transformers, setfit
  • Components: TextField, LabelQuestion, Suggestion, Query, Filter

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

!pip install argilla\n
!pip install setfit==1.0.3 transformers==4.40.2\n

Let's make the required imports:

import argilla as rg\n\nfrom datasets import load_dataset, Dataset\nfrom setfit import SetFitModel, Trainer, get_templated_dataset, sample_dataset\n

You also need to connect to the Argilla server using the api_url and api_key.

# Replace api_url with your url if using Docker\n# Replace api_key with your API key under \"My Settings\" in the UI\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"[your-api-key]\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a text field and a label question, corresponding to the text and label columns.

Note

Check this how-to guide to know more about configuring and creating a dataset.

labels = [\"positive\", \"negative\"]\n\nsettings = rg.Settings(\n    guidelines=\"Classify the reviews as positive or negative.\",\n    fields=[\n        rg.TextField(\n            name=\"review\",\n            title=\"Text from the review\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"sentiment_label\",\n            title=\"In which category does this article fit?\",\n            labels=labels,\n        )\n    ],\n)\n

Let's create the dataset with the name and the defined settings:

dataset = rg.Dataset(\n    name=\"text_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the imdb dataset from the Hugging Face Hub. Specifically, we will use 100 samples from the train split.

hf_dataset = load_dataset(\"imdb\", split=\"train[:100]\")\n

We will easily add them to the dataset using log and the mapping, where we indicate that the column text is the data that should be added to the field review.

dataset.records.log(records=hf_dataset, mapping={\"text\": \"review\"})\n

The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a zero-shot SetFit model. However, you can use a framework or technique of your choice.

We will start by defining an example training set with the required labels: positive and negative. Using get_templated_dataset will create sentences from the default template: \"This sentence is {label}.\"

zero_ds = get_templated_dataset(\n    candidate_labels=labels,\n    sample_size=8,\n)\n

Now, we will prepare a function to train the SetFit model.

Note

For further customization, you can check the SetFit documentation.

def train_model(model_name, dataset):\n    model = SetFitModel.from_pretrained(model_name)\n\n    trainer = Trainer(\n        model=model,\n        train_dataset=dataset,\n    )\n\n    trainer.train()\n\n    return model\n

Let's train the model. We will use TaylorAI/bge-micro-v2, available in the Hugging Face Hub.

model = train_model(model_name=\"TaylorAI/bge-micro-v2\", dataset=zero_ds)\n

You can save it locally or push it to the Hub. And then, load it from there.

# Save and load locally\n# model.save_pretrained(\"text_classification_model\")\n# model = SetFitModel.from_pretrained(\"text_classification_model\")\n\n# Push and load in HF\n# model.push_to_hub(\"[username]/text_classification_model\")\n# model = SetFitModel.from_pretrained(\"[username]/text_classification_model\")\n

It's time to make the predictions! We will set a function that uses the predict method to get the suggested label. The model will infer the label based on the text.

def predict(model, input, labels):\n    model.labels = labels\n\n    prediction = model.predict([input])\n\n    return prediction[0]\n

To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"sentiment_label\": predict(model, sample[\"review\"], labels),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

Voil\u00e0! We have added the suggestions to the dataset, and they will appear in the UI marked with a \u2728.

Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

Note

Check this how-to guide to know more about annotating in the UI.

After the annotation, we will have a robust dataset to train the main model. In our case, we will fine-tune using SetFit. However, you can select the one that best fits your requirements. So, let's start by retrieving the annotated records.

Note

Check this how-to guide to know more about filtering and querying in Argilla. Also, you can check the Hugging Face docs on fine-tuning an text classification model.

dataset = client.datasets(\"text_classification_dataset\")\n
status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_list(flatten=True)\n

As we have a single response per record, we can retrieve the selected label straightforwardly and create the training set with 8 samples per label. We selected 8 samples per label to have a balanced dataset for few-shot learning.

train_records = [\n    {\n        \"text\": r[\"review\"],\n        \"label\": r[\"sentiment_label.responses\"][0],\n    }\n    for r in submitted\n]\ntrain_dataset = Dataset.from_list(train_records)\ntrain_dataset = sample_dataset(train_dataset, label_column=\"label\", num_samples=8)\n

We can train the model using our previous function, but this time with a high-quality human-annotated training set.

model = train_model(model_name=\"TaylorAI/bge-micro-v2\", dataset=train_dataset)\n

As the training data was of better quality, we can expect a better model. So we can update the remaining non-annotated records with the new model's suggestions.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"sentiment_label\": predict(model, sample[\"review\"], labels),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

In this tutorial, we present an end-to-end example of a text classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

We started by configuring the dataset, adding records, and training a zero-shot SetFit model, as an example, to add suggestions. After the annotation process, we trained a new model with the annotated data and updated the remaining records with the new suggestions.

"},{"location":"tutorials/text_classification/#text-classification","title":"Text classification","text":""},{"location":"tutorials/text_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/text_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/text_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/text_classification/#vibe-check-the-dataset","title":"Vibe check the dataset","text":"

We will have a look at the dataset to understand its structure and the kind of data it contains. We do this by using the embedded Hugging Face Dataset Viewer.

"},{"location":"tutorials/text_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/text_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/text_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/text_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/text_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/text_classification/#conclusions","title":"Conclusions","text":""},{"location":"tutorials/token_classification/","title":"Token classification","text":"
  • Goal: Show a standard workflow for a token classification task, including zero-shot suggestions and model fine-tuning.
  • Dataset: ontonotes5, a large corpus comprising various genres of text that need to be classified for Named Entity Recognition.
  • Libraries: datasets, gliner, transformers, spanmarker
  • Components: TextField, SpanQuestion, Suggestion, Query, Filter

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

!pip install argilla\n
!pip install gliner==0.2.6 transformers==4.40.2 span_marker==1.5.0\n

Let's make the needed imports:

import re\n\nimport argilla as rg\n\nimport torch\nfrom datasets import load_dataset, Dataset, DatasetDict\nfrom gliner import GLiNER\nfrom span_marker import SpanMarkerModel, Trainer\nfrom transformers import TrainingArguments\n

You also need to connect to the Argilla server with the api_url and api_key.

# Replace api_url with your url if using Docker\n# Replace api_key with your API key under \"My Settings\" in the UI\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"[your-api-key]\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a text field and a span question, corresponding to the token and tags columns. We will focus on Name Entity Recognition, but this workflow can also be applied to Span Classification, which differs in that the spans are less clearly defined and often overlap.

labels = [\n    \"CARDINAL\",\n    \"DATE\",\n    \"PERSON\",\n    \"NORP\",\n    \"GPE\",\n    \"LAW\",\n    \"PERCENT\",\n    \"ORDINAL\",\n    \"MONEY\",\n    \"WORK_OF_ART\",\n    \"FAC\",\n    \"TIME\",\n    \"QUANTITY\",\n    \"PRODUCT\",\n    \"LANGUAGE\",\n    \"ORG\",\n    \"LOC\",\n    \"EVENT\",\n]\n\nsettings = rg.Settings(\n    guidelines=\"Classify individual tokens according to the specified categories, ensuring that any overlapping or nested entities are accurately captured.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n            title=\"Text\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.SpanQuestion(\n            name=\"span_label\",\n            field=\"text\",\n            labels=labels,\n            title=\"Classify the tokens according to the specified categories.\",\n            allow_overlapping=False,\n        )\n    ],\n)\n

Let's create the dataset with the name and the defined settings:

dataset = rg.Dataset(\n    name=\"token_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

We have created the dataset (you can check it in the UI), but we still need to add the data for annotation. In this case, we will use the ontonote5 dataset from the Hugging Face Hub. Specifically, we will use 2100 samples from the test split.

hf_dataset = load_dataset(\"tner/ontonotes5\", split=\"test[:2100]\")\n

We will iterate over the Hugging Face dataset, adding data to the corresponding field in the Record object for the Argilla dataset. Then, we will easily add them to the dataset using log.

records = [rg.Record(fields={\"text\": \" \".join(row[\"tokens\"])}) for row in hf_dataset]\n\ndataset.records.log(records)\n

The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a GLiNER model. However, you can use a framework or technique of your choice.

Note

For further information, you can check the GLiNER repository and the original paper.

We will start by loading the pre-trained GLiNER model. Specifically, we will use gliner_mediumv2, available in Hugging Face Hub.

gliner_model = GLiNER.from_pretrained(\"urchade/gliner_mediumv2.1\")\n

Next, we will create a function to generate predictions using this general model, which can identify the specified labels without being pre-trained on them. The function will return a dictionary formatted with the necessary schema to add entities to our Argilla dataset. This schema includes the keys 'start\u2019 and \u2018end\u2019 to indicate the indices where the span begins and ends, as well as \u2018label\u2019 for the entity label.

def predict_gliner(model, text, labels, threshold):\n    entities = model.predict_entities(text, labels, threshold)\n    return [\n        {k: v for k, v in ent.items() if k not in {\"score\", \"text\"}} for ent in entities\n    ]\n

To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"span_label\": predict_gliner(\n            model=gliner_model, text=sample[\"text\"], labels=labels, threshold=0.70\n        ),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

Voil\u00e0! We have added the suggestions to the dataset and they will appear in the UI marked with \u2728.

Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

Note

Check this how-to guide to know more about annotating in the UI.

After the annotation, we will have a robust dataset to train our model for entity recognition. For our case, we will train a SpanMarker model, but you can select any model of your choice. So, let's start by retrieving the annotated records.

Note

Check this how-to guide to learn more about filtering and querying in Argilla. Also, you can check the Hugging Face docs on fine-tuning an token classification model.

dataset = client.datasets(\"token_classification_dataset\")\n

In our case, we submitted 2000 annotations using the bulk view.

status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_list(flatten=True)\n

SpanMarker accepts any dataset as long as it has the tokens and ner_tags columns. The ner_tags can be annotated using the IOB, IOB2, BIOES or BILOU labeling scheme, as well as regular unschemed labels. In our case, we have chosen to use the IOB format. Thus, we will define a function to extract the annotated NER tags according to this schema.

Note

For further information, you can check the SpanMarker documentation.

def get_iob_tag_for_token(token_start, token_end, ner_spans):\n    for span in ner_spans:\n        if token_start &gt;= span[\"start\"] and token_end &lt;= span[\"end\"]:\n            if token_start == span[\"start\"]:\n                return f\"B-{span['label']}\"\n            else:\n                return f\"I-{span['label']}\"\n    return \"O\"\n\n\ndef extract_ner_tags(text, responses):\n    tokens = re.split(r\"(\\s+)\", text)\n    ner_tags = []\n\n    current_position = 0\n    for token in tokens:\n        if token.strip():\n            token_start = current_position\n            token_end = current_position + len(token)\n            tag = get_iob_tag_for_token(token_start, token_end, responses)\n            ner_tags.append(tag)\n        current_position += len(token)\n\n    return ner_tags\n

Let's now extract them and save two lists with the tokens and NER tags, which will help us build our dataset to train the SpanMarker model.

tokens = []\nner_tags = []\nfor r in submitted:\n    tags = extract_ner_tags(r[\"text\"], r[\"span_label.responses\"][0])\n    tks = r[\"text\"].split()\n    tokens.append(tks)\n    ner_tags.append(tags)\n

In addition, we will have to indicate the labels and they should be formatted as integers. So, we will retrieve them and map them.

labels = list(set([item for sublist in ner_tags for item in sublist]))\n\nid2label = {i: label for i, label in enumerate(labels)}\nlabel2id = {label: id_ for id_, label in id2label.items()}\n\nmapped_ner_tags = [[label2id[label] for label in ner_tag] for ner_tag in ner_tags]\n

Finally, we will create a dataset with the train and validation sets.

records = [\n    {\n        \"tokens\": token,\n        \"ner_tags\": ner_tag,\n    }\n    for token, ner_tag in zip(tokens, mapped_ner_tags)\n]\nspan_dataset = DatasetDict(\n    {\n        \"train\": Dataset.from_list(records[:1500]),\n        \"validation\": Dataset.from_list(records[1501:2000]),\n    }\n)\n

Now, let's prepare to train our model. For this, it is recommended to use GPU. You can check if it is available as shown below.

if torch.cuda.is_available():\n    device = torch.device(\"cuda\")\n    print(f\"Using {torch.cuda.get_device_name(0)}\")\nelif torch.backends.mps.is_available():\n    device = torch.device(\"mps\")\n    print(\"Using MPS device\")\nelse:\n    device = torch.device(\"cpu\")\n    print(\"No GPU available, using CPU instead.\")\n

We will define our model and arguments. In this case, we will use the bert-base-cased, available in the Hugging Face Hub, but others can be applied.

Note

The training arguments are inherited from the Transformers library. You can check more information here.

encoder_id = \"bert-base-cased\"\nmodel = SpanMarkerModel.from_pretrained(\n    encoder_id,\n    labels=labels,\n    model_max_length=256,\n    entity_max_length=8,\n)\n\nargs = TrainingArguments(\n    output_dir=\"models/span-marker\",\n    learning_rate=5e-5,\n    per_device_train_batch_size=8,\n    per_device_eval_batch_size=8,\n    num_train_epochs=1,\n    weight_decay=0.01,\n    warmup_ratio=0.1,\n    fp16=False,  # Set to True if available\n    logging_first_step=True,\n    logging_steps=50,\n    evaluation_strategy=\"steps\",\n    save_strategy=\"steps\",\n    eval_steps=500,\n    save_total_limit=2,\n    dataloader_num_workers=2,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=args,\n    train_dataset=span_dataset[\"train\"],\n    eval_dataset=span_dataset[\"validation\"],\n)\n

Let's train it! This time, we use a high-quality human-annotated training set, so the results are expected to have improved.

trainer.train()\n
trainer.evaluate()\n

You can save it locally or push it to the Hub. And then load it from there.

# Save and load locally\n# model.save_pretrained(\"token_classification_model\")\n# model = SpanMarkerModel.from_pretrained(\"token_classification_model\")\n\n# Push and load in HF\n# model.push_to_hub(\"[username]/token_classification_model\")\n# model = SpanMarkerModel.from_pretrained(\"[username]/token_classification_model\")\n

It's time to make the predictions! We will set a function that uses the predict method to get the suggested label. The model will infer the label based on the text. The function will return the spans in the corresponding structure for the Argilla dataset.

def predict_spanmarker(model, text):\n    entities = model.predict(text)\n    return [\n        {\n            \"start\": ent[\"char_start_index\"],\n            \"end\": ent[\"char_end_index\"],\n            \"label\": ent[\"label\"],\n        }\n        for ent in entities\n    ]\n

As the training data was of better quality, we can expect a better model. So we can update the remaining non-annotated records with the new model's suggestions.

data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"span_label\": predict_spanmarker(model=model, text=sample[\"text\"]),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

In this tutorial, we present an end-to-end example of a token classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

We started by configuring the dataset, adding records, and adding suggestions based on the GLiNer predictions. After the annotation process, we trained a SpanMarker model with the annotated data and updated the remaining records with the new suggestions.

"},{"location":"tutorials/token_classification/#token-classification","title":"Token classification","text":""},{"location":"tutorials/token_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/token_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/token_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/token_classification/#vibe-check-the-dataset","title":"Vibe check the dataset","text":"

We will have a look at the dataset to understand its structure and the kind of data it contains. We do this by using the embedded Hugging Face Dataset Viewer.

"},{"location":"tutorials/token_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/token_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/token_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/token_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/token_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/token_classification/#conclusions","title":"Conclusions","text":""}]} \ No newline at end of file