From ec1f61ce41385da9d1dc382e1c7b2a54e3435100 Mon Sep 17 00:00:00 2001 From: nataliaElv Date: Fri, 30 Aug 2024 09:04:22 +0000 Subject: [PATCH] Deployed 7495136 to dev with MkDocs 1.6.0 and mike 2.1.3 --- dev/community/popular_issues/index.html | 34 ++++----- dev/how_to_guides/annotate/index.html | 8 +++ dev/how_to_guides/query/index.html | 86 ++++++++++++++++++++++- dev/search/search_index.json | 2 +- dev/sitemap.xml | 88 ++++++++++++------------ dev/sitemap.xml.gz | Bin 625 -> 625 bytes 6 files changed, 154 insertions(+), 64 deletions(-) diff --git a/dev/community/popular_issues/index.html b/dev/community/popular_issues/index.html index 52a73e31ad..ac87efdbf8 100644 --- a/dev/community/popular_issues/index.html +++ b/dev/community/popular_issues/index.html @@ -2139,54 +2139,54 @@

Issue dashboard

1 +🟒 5442 - [BUG-python/deployment] +by nicolamassarenti + + +2 🟒 5438 - [FEATURE] Make text box size of TextQuestion adjustable by MoritzLaurer -2 +3 🟣 5424 - [BUG-python/deployment]The status of all the dataset.records.to_dict(orient='index') records are pending by Huarong -3 +4 🟒 5414 - docker download failed by njhouse365 -4 +5 🟣 5357 - [BUG-python/deployment] Response sanity check not working due to variable renaming by maxserras -5 +6 🟒 5348 - [FEATURE] Ability to create new labels on-the-fly by uahmad235 -6 +7 🟒 5338 - [BUG-UI/UX] CSS is being stripped from TextQuestion by paulbauriegel -7 +8 🟒 5318 - [BUG-python/deployment] filter_by returning unexpected results for response_status by bertozzivill -8 +9 🟒 5302 - [FEATURE]Auto-annotation of Repeated Tokens by bikash119 -9 +10 🟣 5290 - [BUG-python/deployment] Docker deployment issues by zhongze-fish - -10 -🟣 5281 - [BUG-python/deployment] Error thrown by Argilla SDK : The reason of error thrown by Argilla SDK is different / ambigious from the cause of the error -by bikash119 - @@ -2237,17 +2237,17 @@

Issue dashboard

8 -🟒 5278 - [TASK] Check records using ImageField with URLs using HTTP protocol serving application from HTTPS +🟒 5361 - [BUG-UI/UX] required/optional differentiation forFields are not represented in the dataset settings v2.1.0 9 -🟒 5361 - [BUG-UI/UX] required/optional differentiation forFields are not represented in the dataset settings +🟒 3338 - [FEATURE] Add conversation support to fields in Argilla dataset (ChatField) v2.1.0 10 -🟒 3338 - [FEATURE] Add conversation support to fields in Argilla dataset (ChatField) +🟒 5371 - [UI/UX] Implement dark theme v2.1.0 @@ -2255,7 +2255,7 @@

Issue dashboard

-

Last update: 2024-08-29

+

Last update: 2024-08-30

diff --git a/dev/how_to_guides/annotate/index.html b/dev/how_to_guides/annotate/index.html index 4eeb1f7b8f..b19a5d9075 100644 --- a/dev/how_to_guides/annotate/index.html +++ b/dev/how_to_guides/annotate/index.html @@ -2553,6 +2553,14 @@

Use search, filters, and sortSearch

From the control panel at the top of the left pane, you can search by keyword across the entire dataset. If you have more than one field in your records, you may specify if the search is to be performed β€œAll” fields or on a specific one. Matched results are highlighted in color.

+
+

Note

+

If you introduce more than one keyword, the search will return results where all keywords have a match.

+
+
+

Tip

+

For more advanced searches, take a look at the advanced queries DSL.

+

Order by record semantic similarity

You can retrieve records based on their similarity to another record if vectors have been added to the dataset.

diff --git a/dev/how_to_guides/query/index.html b/dev/how_to_guides/query/index.html index 43249b83e6..15447017c2 100644 --- a/dev/how_to_guides/query/index.html +++ b/dev/how_to_guides/query/index.html @@ -933,6 +933,21 @@ + +
  • @@ -2132,6 +2147,21 @@ + +
  • @@ -2217,7 +2247,7 @@

    Query and filter records

    Query with search terms

    To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field. You can search a single term or various terms, in the latter, all of them should appear in the record to be retrieved.

    -
    +
    import argilla as rg
    @@ -2245,6 +2275,58 @@ 

    Query with search terms

    +

    Advanced queries

    +

    If you need more complex searches, you can use Elasticsearch's simple query string syntax. Here is a summary of the different available operators:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    operatordescriptionexample
    + or spaceAND: search both termsargilla + distilabel or argilla distilabel
    return records that include the terms "argilla" and "distilabel"
    |OR: search either termargilla | distilabel
    returns records that include the term "argilla" or "distilabel"
    -Negation: exclude a termargilla -distilabel
    returns records that contain the term "argilla" and don't have the term "distilabel"
    *Prefix: search a prefixarg*
    returns records with any words starting with "arg-"
    "Phrase: search a phrase"argilla and distilabel"
    returns records that contain the phrase "argilla and distilabel"
    ( and )Precedence: group terms(argilla | distilabel) rules
    returns records that contain either "argilla" or "distilabel" and "rules"
    ~NEdit distance: search a term or phrase with an edit distanceargilla~1
    returns records that contain the term "argilla" with an edit distance of 1, e.g. "argila"
    +
    +

    Tip

    +

    To use one of these characters literally, escape it with a preceding backslash \, e.g. "1 \+ 2" would match records where the phrase "1 + 2" is found.

    +

    Filter by conditions

    You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include "==", ">=", "<=", or "in". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses. You can use a single condition or multiple conditions to filter records.

    @@ -2269,7 +2351,7 @@

    Filter by conditionsArgilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets.

    To get started:

    • Get started in 5 minutes!

      Deploy Argilla for free on the Hugging Face Hub or with Docker. Install the Python SDK with pip and create your first project.

      Quickstart

    • How-to guides

      Get familiar with the basic workflows of Argilla. Learn how to manage Users, Workspaces, Datasets, and Records to set up your data annotation projects.

      Learn more

    Or, play with the Argilla UI by signing in with your Hugging Face account:

    Looking for Argilla 1.x?

    Looking for documentation for Argilla 1.x? Visit the latest release.

    Migrate to Argilla 2.x

    Want to learn how to migrate from Argilla 1.x to 2.x? Take a look at our dedicated Migration Guide.

    "},{"location":"#why-use-argilla","title":"Why use Argilla?","text":"

    Argilla can be used for collecting human feedback for a wide variety of AI projects like traditional NLP (text classification, NER, etc.), LLMs (RAG, preference tuning, etc.), or multimodal models (text to image, etc.).

    Argilla's programmatic approach lets you build workflows for continuous evaluation and model improvement. The goal of Argilla is to ensure your data work pays off by quickly iterating on the right data and models.

    Improve your AI output quality through data quality

    Compute is expensive and output quality is important. We help you focus on data, which tackles the root cause of both of these problems at once. Argilla helps you to achieve and keep high-quality standards for your data. This means you can improve the quality of your AI outputs.

    Take control of your data and models

    Most AI tools are black boxes. Argilla is different. We believe that you should be the owner of both your data and your models. That's why we provide you with all the tools your team needs to manage your data and models in a way that suits you best.

    Improve efficiency by quickly iterating on the right data and models

    Gathering data is a time-consuming process. Argilla helps by providing a tool that allows you to interact with your data in a more engaging way. This means you can quickly and easily label your data with filters, AI feedback suggestions and semantic search. So you can focus on training your models and monitoring their performance.

    "},{"location":"#what-do-people-build-with-argilla","title":"What do people build with Argilla?","text":"

    Datasets and models

    Argilla is a tool that can be used to achieve and keep high-quality data standards with a focus on NLP and LLMs. The community uses Argilla to create amazing open-source datasets and models, and we love contributions to open-source too.

    • cleaned UltraFeedback dataset and the Notus and Notux models, where we improved benchmark and empirical human judgment for the Mistral and Mixtral models with cleaner data using human feedback.
    • distilabeled Intel Orca DPO dataset and the improved OpenHermes model, show how we improve model performance by filtering out 50% of the original dataset through human and AI feedback.

    Projects and pipelines

    AI teams from companies like the Red Cross, Loris.ai and Prolific use Argilla to improve the quality and efficiency of AI projects. They shared their experiences in the AI community meetup.

    • AI for good: the Red Cross presentation showcases how their experts and AI team collaborate by classifying and redirecting requests from refugees of the Ukrainian crisis to streamline the support processes of the Red Cross.
    • Customer support: during the Loris meetup they showed how their AI team uses unsupervised and few-shot contrastive learning to help them quickly validate and gain labelled samples for a huge amount of multi-label classifiers.
    • Research studies: the showcase from Prolific announced their integration with Argilla. They use it to actively distribute data collection projects among their annotating workforce. This allows them to quickly and efficiently collect high-quality data for their research studies.
    "},{"location":"community/","title":"Community","text":"

    We are an open-source community-driven project not only focused on building a great product but also on building a great community, where you can get support, share your experiences, and contribute to the project! We would love to hear from you and help you get started with Argilla.

    • Discord

      In our Discord channels (#argilla-distilabel-general and #argilla-distilabel-help), you can get direct support from the community.

      Discord \u2197

    • Community Meetup

      We host bi-weekly community meetups where you can listen in or present your work.

      Community Meetup \u2197

    • Changelog

      The changelog is where you can find the latest updates and changes to the Argilla project.

      Changelog \u2197

    • Roadmap

      We love to discuss our plans with the community. Feel encouraged to participate in our roadmap discussions.

      Roadmap \u2197

    "},{"location":"community/changelog/","title":"Changelog","text":"

    All notable changes to this project will be documented in this file.

    The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

    "},{"location":"community/changelog/#201","title":"2.0.1","text":""},{"location":"community/changelog/#fixed","title":"Fixed","text":"
    • Fixed error when creating optional fields. (#5362)
    • Fixed error creating integer and float metadata with visible_for_annotators. (#5364)
    • Fixed error when logging records with suggestions or responses for non-existent questions. (#5396 by @maxserras)
    • Fixed error from conflicts in testing suite when running tests in parallel. (#5349)
    • Fixed error in response model when creating a response with a None value. (#5343)
    "},{"location":"community/changelog/#changed","title":"Changed","text":"
    • Changed from_hub method to raise an error when a dataset with the same name exists. (#5258)
    • Changed log method when ingesting records with no known keys to raise a descriptive error. (#5356)
    • Changed code snippets to add new datasets (#5395)
    "},{"location":"community/changelog/#added","title":"Added","text":"
    • Added Google Analytics to the documentation site. (#5366)
    • Added frontend skeletons to progress metrics to optimise load time and improve user experience. (#5391)
    • Added documentation in methods in API references for the Python SDK. (#5400)
    "},{"location":"community/changelog/#fixed_1","title":"Fixed","text":"
    • Fix bug when submit the latest record, sometimes you navigate to non existing page #5419
    "},{"location":"community/changelog/#200","title":"2.0.0","text":""},{"location":"community/changelog/#added_1","title":"Added","text":"
    • Added core class refactors. For an overview, see this blog post
    • Added TaskDistribution to define distribution of records to users .
    • Added new documentation site and structure and migrated legacy documentation.
    "},{"location":"community/changelog/#changed_1","title":"Changed","text":"
    • Changed FeedbackDataset to Dataset.
    • Changed rg.init into rg.Argilla class to interact with Argilla server.
    "},{"location":"community/changelog/#deprecated","title":"Deprecated","text":"
    • Deprecated task specific dataset classes like TextClassification and TokenClassification. To migrate legacy datasets to rg.Dataset class, see the how-to-guide.
    • Deprecated use case extensions like listeners and ArgillaTrainer.
    "},{"location":"community/changelog/#200rc1","title":"2.0.0rc1","text":"

    [!NOTE] This release for 2.0.0rc1 does not contain any changelog entries because it is the first release candidate for the 2.0.0 version. The following versions will contain the changelog entries again. For a general overview of the changes in the 2.0.0 version, please refer to our blog or our new documentation.

    "},{"location":"community/changelog/#1290","title":"1.29.0","text":""},{"location":"community/changelog/#added_2","title":"Added","text":"
    • Added support for rating questions to include 0 as a valid value. (#4860)
    • Added support for Python 3.12. (#4837)
    • Added search by field in the FeedbackDataset UI search. (#4746)
    • Added record metadata info in the FeedbackDataset UI. (#4851)
    • Added highlight on search results in the FeedbackDataset UI. (#4747)
    "},{"location":"community/changelog/#fixed_2","title":"Fixed","text":"
    • Fix wildcard import for the whole argilla module. (#4874)
    • Fix issue when record does not have vectors related. (#4856)
    • Fix issue on character level. (#4836)
    "},{"location":"community/changelog/#1280","title":"1.28.0","text":""},{"location":"community/changelog/#added_3","title":"Added","text":"
    • Added suggestion multi score attribute. (#4730)
    • Added order by suggestion first. (#4731)
    • Added multi selection entity dropdown for span annotation overlap. (#4735)
    • Added pre selection highlight for span annotation. (#4726)
    • Added banner when persistent storage is not enabled. (#4744)
    • Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)
    "},{"location":"community/changelog/#changed_2","title":"Changed","text":"
    • Changed the way how Hugging Face space and user is showed in sign in. (#4748)
    "},{"location":"community/changelog/#fixed_3","title":"Fixed","text":"
    • Fixed Korean character reversed. (#4753)
    "},{"location":"community/changelog/#fixed_4","title":"Fixed","text":"
    • Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)
    "},{"location":"community/changelog/#1270","title":"1.27.0","text":""},{"location":"community/changelog/#added_4","title":"Added","text":"
    • Added Allow overlap spans in the FeedbackDataset. (#4668)
    • Added allow_overlapping parameter for span questions. (#4697)
    • Added overall progress bar on Datasets table. (#4696)
    • Added German language translation. (#4688)
    "},{"location":"community/changelog/#changed_3","title":"Changed","text":"
    • New UI design for suggestions. (#4682)
    "},{"location":"community/changelog/#fixed_5","title":"Fixed","text":"
    • Improve performance for more than 250 labels. (#4702)
    "},{"location":"community/changelog/#1261","title":"1.26.1","text":""},{"location":"community/changelog/#added_5","title":"Added","text":"
    • Added support for automatic detection of RTL languages. (#4686)
    "},{"location":"community/changelog/#1260","title":"1.26.0","text":""},{"location":"community/changelog/#added_6","title":"Added","text":"
    • If you expand the labels of a single or multi label Question, the state is maintained during the entire annotation process. (#4630)
    • Added support for span questions in the Python SDK. (#4617)
    • Added support for span values in suggestions and responses. (#4623)
    • Added span questions for FeedbackDataset. (#4622)
    • Added ARGILLA_CACHE_DIR environment variable to configure the client cache directory. (#4509)
    "},{"location":"community/changelog/#fixed_6","title":"Fixed","text":"
    • Fixed contextualized workspaces. (#4665)
    • Fixed prepare for training when passing RankingValueSchema instances to suggestions. (#4628)
    • Fixed parsing ranking values in suggestions from HF datasets. (#4629)
    • Fixed reading description from API response payload. (#4632)
    • Fixed pulling (n*chunk_size)+1 records when using ds.pull or iterating over the dataset. (#4662)
    • Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)
    "},{"location":"community/changelog/#1250","title":"1.25.0","text":"

    [!NOTE] For changes in the argilla-server module, visit the argilla-server release notes

    "},{"location":"community/changelog/#added_7","title":"Added","text":"
    • Reorder labels in dataset settings page for single/multi label questions (#4598)
    • Added pandas v2 support using the python SDK. (#4600)
    "},{"location":"community/changelog/#removed","title":"Removed","text":"
    • Removed missing response for status filter. Use pending instead. (#4533)
    "},{"location":"community/changelog/#fixed_7","title":"Fixed","text":"
    • Fixed FloatMetadataProperty: value is not a valid float (#4570)
    • Fixed redirect to user-settings instead of 404 user_settings (#4609)
    "},{"location":"community/changelog/#1240","title":"1.24.0","text":"

    [!NOTE] This release does not contain any new features, but it includes a major change in the argilla-server dependency. The package is using the argilla-server dependency defined here. (#4537)

    "},{"location":"community/changelog/#changed_4","title":"Changed","text":"
    • The package is using the argilla-server dependency defined here. (#4537)
    "},{"location":"community/changelog/#1231","title":"1.23.1","text":""},{"location":"community/changelog/#fixed_8","title":"Fixed","text":"
    • Fixed Responsive view for Feedback Datasets. (#4579)
    "},{"location":"community/changelog/#1230","title":"1.23.0","text":""},{"location":"community/changelog/#added_8","title":"Added","text":"
    • Added bulk annotation by filter criteria. (#4516)
    • Automatically fetch new datasets on focus tab. (#4514)
    • API v1 responses returning Record schema now always include dataset_id as attribute. (#4482)
    • API v1 responses returning Response schema now always include record_id as attribute. (#4482)
    • API v1 responses returning Question schema now always include dataset_id attribute. (#4487)
    • API v1 responses returning Field schema now always include dataset_id attribute. (#4488)
    • API v1 responses returning MetadataProperty schema now always include dataset_id attribute. (#4489)
    • API v1 responses returning VectorSettings schema now always include dataset_id attribute. (#4490)
    • Added pdf_to_html function to .html_utils module that convert PDFs to dataURL to be able to render them in tha Argilla UI. (#4481)
    • Added ARGILLA_AUTH_SECRET_KEY environment variable. (#4539)
    • Added ARGILLA_AUTH_ALGORITHM environment variable. (#4539)
    • Added ARGILLA_AUTH_TOKEN_EXPIRATION environment variable. (#4539)
    • Added ARGILLA_AUTH_OAUTH_CFG environment variable. (#4546)
    • Added OAuth2 support for HuggingFace Hub. (#4546)
    "},{"location":"community/changelog/#deprecated_1","title":"Deprecated","text":"
    • Deprecated ARGILLA_LOCAL_AUTH_* environment variables. Will be removed in the release v1.25.0. (#4539)
    "},{"location":"community/changelog/#changed_5","title":"Changed","text":"
    • Changed regex pattern for username attribute in UserCreate. Now uppercase letters are allowed. (#4544)
    "},{"location":"community/changelog/#removed_1","title":"Removed","text":"
    • Remove sending Authorization header from python SDK requests. (#4535)
    "},{"location":"community/changelog/#fixed_9","title":"Fixed","text":"
    • Fixed keyboard shortcut for label questions. (#4530)
    "},{"location":"community/changelog/#1220","title":"1.22.0","text":""},{"location":"community/changelog/#added_9","title":"Added","text":"
    • Added Bulk annotation support. (#4333)
    • Restore filters from feedback dataset settings. ([#4461])(https://github.com/argilla-io/argilla/pull/4461)
    • Warning on feedback dataset settings when leaving page with unsaved changes. (#4461)
    • Added pydantic v2 support using the python SDK. (#4459)
    • Added vector_settings to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset. (#4454)
    • Added integration for sentence-transformers using SentenceTransformersExtractor to configure vector_settings in FeedbackDataset and FeedbackRecord. (#4454)
    "},{"location":"community/changelog/#changed_6","title":"Changed","text":"
    • Module argilla.cli.server definitions have been moved to argilla.server.cli module. (#4472)
    • [breaking] Changed vector_settings_by_name for generic property_by_name usage, which will return None instead of raising an error. (#4454)
    • The constant definition ES_INDEX_REGEX_PATTERN in module argilla._constants is now private. (#4472)
    • nan values in metadata properties will raise a 422 error when creating/updating records. (#4300)
    • None values are now allowed in metadata properties. (#4300)
    • Refactor and add width, height, autoplay and loop attributes as optional args in to_html functions. (#4481)
    "},{"location":"community/changelog/#fixed_10","title":"Fixed","text":"
    • Paginating to a new record, automatically scrolls down to selected form area. (#4333)
    "},{"location":"community/changelog/#deprecated_2","title":"Deprecated","text":"
    • The missing response status for filtering records is deprecated and will be removed in the release v1.24.0. Use pending instead. (#4433)
    "},{"location":"community/changelog/#removed_2","title":"Removed","text":"
    • The deprecated python -m argilla database command has been removed. (#4472)
    "},{"location":"community/changelog/#1210","title":"1.21.0","text":""},{"location":"community/changelog/#added_10","title":"Added","text":"
    • Added new draft queue for annotation view (#4334)
    • Added annotation metrics module for the FeedbackDataset (argilla.client.feedback.metrics). (#4175).
    • Added strategy to handle and translate errors from the server for 401 HTTP status code` (#4362)
    • Added integration for textdescriptives using TextDescriptivesExtractor to configure metadata_properties in FeedbackDataset and FeedbackRecord. (#4400). Contributed by @m-newhauser
    • Added POST /api/v1/me/responses/bulk endpoint to create responses in bulk for current user. (#4380)
    • Added list support for term metadata properties. (Closes #4359)
    • Added new CLI task to reindex datasets and records into the search engine. (#4404)
    • Added httpx_extra_kwargs argument to rg.init and Argilla to allow passing extra arguments to httpx.Client used by Argilla. (#4440)
    • Added ResponseStatusFilter enum in __init__ imports of Argilla (#4118). Contributed by @Piyush-Kumar-Ghosh.
    "},{"location":"community/changelog/#changed_7","title":"Changed","text":"
    • More productive and simpler shortcut system (#4215)
    • Move ArgillaSingleton, init and active_client to a new module singleton. (#4347)
    • Updated argilla.load functions to also work with FeedbackDatasets. (#4347)
    • [breaking] Updated argilla.delete functions to also work with FeedbackDatasets. It now raises an error if the dataset does not exist. (#4347)
    • Updated argilla.list_datasets functions to also work with FeedbackDatasets. (#4347)
    "},{"location":"community/changelog/#fixed_11","title":"Fixed","text":"
    • Fixed error in TextClassificationSettings.from_dict method in which the label_schema created was a list of dict instead of a list of str. (#4347)
    • Fixed total records on pagination component (#4424)
    "},{"location":"community/changelog/#removed_3","title":"Removed","text":"
    • Removed draft auto save for annotation view (#4334)
    "},{"location":"community/changelog/#1200","title":"1.20.0","text":""},{"location":"community/changelog/#added_11","title":"Added","text":"
    • Added GET /api/v1/datasets/:dataset_id/records/search/suggestions/options endpoint to return suggestion available options for searching. (#4260)
    • Added metadata_properties to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset.(#4192).
    • Added get_model_kwargs, get_trainer_kwargs, get_trainer_model, get_trainer_tokenizer and get_trainer -methods to the ArgillaTrainer to improve interoperability across frameworks. (#4214).
    • Added additional formatting checks to the ArgillaTrainer to allow for better interoperability of defaults and formatting_func usage. (#4214).
    • Added a warning to the update_config-method of ArgillaTrainer to emphasize if the kwargs were updated correctly. (#4214).
    • Added argilla.client.feedback.utils module with html_utils (this mainly includes video/audio/image_to_html that convert media to dataURL to be able to render them in tha Argilla UI and create_token_highlights to highlight tokens in a custom way. Both work on TextQuestion and TextField with use_markdown=True) and assignments (this mainly includes assign_records to assign records according to a number of annotators and records, an overlap and the shuffle option; and assign_workspace to assign and create if needed a workspace according to the record assignment). (#4121)
    "},{"location":"community/changelog/#fixed_12","title":"Fixed","text":"
    • Fixed error in ArgillaTrainer, with numerical labels, using RatingQuestion instead of RankingQuestion (#4171)
    • Fixed error in ArgillaTrainer, now we can train for extractive_question_answering using a validation sample (#4204)
    • Fixed error in ArgillaTrainer, when training for sentence-similarity it didn't work with a list of values per record (#4211)
    • Fixed error in the unification strategy for RankingQuestion (#4295)
    • Fixed TextClassificationSettings.labels_schema order was not being preserved. Closes #3828 (#4332)
    • Fixed error when requesting non-existing API endpoints. Closes #4073 (#4325)
    • Fixed error when passing draft responses to create records endpoint. (#4354)
    "},{"location":"community/changelog/#changed_8","title":"Changed","text":"
    • [breaking] Suggestions agent field only accepts now some specific characters and a limited length. (#4265)
    • [breaking] Suggestions score field only accepts now float values in the range 0 to 1. (#4266)
    • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support optional query attribute. (#4327)
    • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support filter and sort attributes. (#4327)
    • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support optional query attribute. (#4270)
    • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support filter and sort attributes. (#4270)
    • Changed the logging style while pulling and pushing FeedbackDataset to Argilla from tqdm style to rich. (#4267). Contributed by @zucchini-nlp.
    • Updated push_to_argilla to print repr of the pushed RemoteFeedbackDataset after push and changed show_progress to True by default. (#4223)
    • Changed models and tokenizer for the ArgillaTrainer to explicitly allow for changing them when needed. (#4214).
    "},{"location":"community/changelog/#1190","title":"1.19.0","text":""},{"location":"community/changelog/#added_12","title":"Added","text":"
    • Added POST /api/v1/datasets/:dataset_id/records/search endpoint to search for records without user context, including responses by all users. (#4143)
    • Added POST /api/v1/datasets/:dataset_id/vectors-settings endpoint for creating vector settings for a dataset. (#3776)
    • Added GET /api/v1/datasets/:dataset_id/vectors-settings endpoint for listing the vectors settings for a dataset. (#3776)
    • Added DELETE /api/v1/vectors-settings/:vector_settings_id endpoint for deleting a vector settings. (#3776)
    • Added PATCH /api/v1/vectors-settings/:vector_settings_id endpoint for updating a vector settings. (#4092)
    • Added GET /api/v1/records/:record_id endpoint to get a specific record. (#4039)
    • Added support to include vectors for GET /api/v1/datasets/:dataset_id/records endpoint response using include query param. (#4063)
    • Added support to include vectors for GET /api/v1/me/datasets/:dataset_id/records endpoint response using include query param. (#4063)
    • Added support to include vectors for POST /api/v1/me/datasets/:dataset_id/records/search endpoint response using include query param. (#4063)
    • Added show_progress argument to from_huggingface() method to make the progress bar for parsing records process optional.(#4132).
    • Added a progress bar for parsing records process to from_huggingface() method with trange in tqdm.(#4132).
    • Added to sort by inserted_at or updated_at for datasets with no metadata. (4147)
    • Added max_records argument to pull() method for RemoteFeedbackDataset.(#4074)
    • Added functionality to push your models to the Hugging Face hub with ArgillaTrainer.push_to_huggingface (#3976). Contributed by @Racso-3141.
    • Added filter_by argument to ArgillaTrainer to filter by response_status (#4120).
    • Added sort_by argument to ArgillaTrainer to sort by metadata (#4120).
    • Added max_records argument to ArgillaTrainer to limit record used for training (#4120).
    • Added add_vector_settings method to local and remote FeedbackDataset. (#4055)
    • Added update_vectors_settings method to local and remote FeedbackDataset. (#4122)
    • Added delete_vectors_settings method to local and remote FeedbackDataset. (#4130)
    • Added vector_settings_by_name method to local and remote FeedbackDataset. (#4055)
    • Added find_similar_records method to local and remote FeedbackDataset. (#4023)
    • Added ARGILLA_SEARCH_ENGINE environment variable to configure the search engine to use. (#4019)
    "},{"location":"community/changelog/#changed_9","title":"Changed","text":"
    • [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. (#4173)
    • [breaking] Users working with OpenSearch engines must use version >=2.4 and set ARGILLA_SEARCH_ENGINE=opensearch. (#4019 and #4111)
    • [breaking] Changed FeedbackDataset.*_by_name() methods to return None when no match is found (#4101).
    • [breaking] limit query parameter for GET /api/v1/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
    • [breaking] limit query parameter for GET /api/v1/me/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
    • Update GET /api/v1/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
    • Update GET /api/v1/me/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
    • Update POST /api/v1/datasets/:dataset_id/records endpoint to allow to create records with vectors (#4022)
    • Update PATCH /api/v1/datasets/:dataset_id endpoint to allow updating allow_extra_metadata attribute. (#4112)
    • Update PATCH /api/v1/datasets/:dataset_id/records endpoint to allow to update records with vectors. (#4062)
    • Update PATCH /api/v1/records/:record_id endpoint to allow to update record with vectors. (#4062)
    • Update POST /api/v1/me/datasets/:dataset_id/records/search endpoint to allow to search records with vectors. (#4019)
    • Update BaseElasticAndOpenSearchEngine.index_records method to also index record vectors. (#4062)
    • Update FeedbackDataset.__init__ to allow passing a list of vector settings. (#4055)
    • Update FeedbackDataset.push_to_argilla to also push vector settings. (#4055)
    • Update FeedbackDatasetRecord to support the creation of records with vectors. (#4043)
    • Using cosine similarity to compute similarity between vectors. (#4124)
    "},{"location":"community/changelog/#fixed_13","title":"Fixed","text":"
    • Fixed svg images out of screen with too large images (#4047)
    • Fixed creating records with responses from multiple users. Closes #3746 and #3808 (#4142)
    • Fixed deleting or updating responses as an owner for annotators. (Commit 403a66d)
    • Fixed passing user_id when getting records by id. (Commit 98c7927)
    • Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes #4089 (#4200)
    "},{"location":"community/changelog/#1180","title":"1.18.0","text":""},{"location":"community/changelog/#added_13","title":"Added","text":"
    • New GET /api/v1/datasets/:dataset_id/metadata-properties endpoint for listing dataset metadata properties. (#3813)
    • New POST /api/v1/datasets/:dataset_id/metadata-properties endpoint for creating dataset metadata properties. (#3813)
    • New PATCH /api/v1/metadata-properties/:metadata_property_id endpoint allowing the update of a specific metadata property. (#3952)
    • New DELETE /api/v1/metadata-properties/:metadata_property_id endpoint for deletion of a specific metadata property. (#3911)
    • New GET /api/v1/metadata-properties/:metadata_property_id/metrics endpoint to compute metrics for a specific metadata property. (#3856)
    • New PATCH /api/v1/records/:record_id endpoint to update a record. (#3920)
    • New PATCH /api/v1/dataset/:dataset_id/records endpoint to bulk update the records of a dataset. (#3934)
    • Missing validations to PATCH /api/v1/questions/:question_id. Now title and description are using the same validations used to create questions. (#3967)
    • Added TermsMetadataProperty, IntegerMetadataProperty and FloatMetadataProperty classes allowing to define metadata properties for a FeedbackDataset. (#3818)
    • Added metadata_filters to filter_by method in RemoteFeedbackDataset to filter based on metadata i.e. TermsMetadataFilter, IntegerMetadataFilter, and FloatMetadataFilter. (#3834)
    • Added a validation layer for both metadata_properties and metadata_filters in their schemas and as part of the add_records and filter_by methods, respectively. (#3860)
    • Added sort_by query parameter to listing records endpoints that allows to sort the records by inserted_at, updated_at or metadata property. (#3843)
    • Added add_metadata_property method to both FeedbackDataset and RemoteFeedbackDataset (i.e. FeedbackDataset in Argilla). (#3900)
    • Added fields inserted_at and updated_at in RemoteResponseSchema. (#3822)
    • Added support for sort_by for RemoteFeedbackDataset i.e. a FeedbackDataset uploaded to Argilla. (#3925)
    • Added metadata_properties support for both push_to_huggingface and from_huggingface. (#3947)
    • Add support for update records (metadata) from Python SDK. (#3946)
    • Added delete_metadata_properties method to delete metadata properties. (#3932)
    • Added update_metadata_properties method to update metadata_properties. (#3961)
    • Added automatic model card generation through ArgillaTrainer.save (#3857)
    • Added FeedbackDataset TaskTemplateMixin for pre-defined task templates. (#3969)
    • A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
    • New last_activity_at field to FeedbackDataset exposing when the last activity for the associated dataset occurs. (#3992)
    "},{"location":"community/changelog/#changed_10","title":"Changed","text":"
    • GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records and POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to return the total number of records. (#3848, #3903)
    • Implemented __len__ method for filtered datasets to return the number of records matching the provided filters. (#3916)
    • Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
    • Force elastic index refresh after records creation. (#3929)
    • Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
    • Using metadata property name instead of id for indexing data in search engine index. (#3994)
    "},{"location":"community/changelog/#fixed_14","title":"Fixed","text":"
    • Fixed response schemas to allow values to be None i.e. when a record is discarded the response.values are set to None. (#3926)
    "},{"location":"community/changelog/#1170","title":"1.17.0","text":""},{"location":"community/changelog/#added_14","title":"Added","text":"
    • Added fields inserted_at and updated_at in RemoteResponseSchema (#3822).
    • Added automatic model card generation through ArgillaTrainer.save (#3857).
    • Added task templates to the FeedbackDataset (#3973).
    "},{"location":"community/changelog/#changed_11","title":"Changed","text":"
    • Updated Dockerfile to use multi stage build (#3221 and #3793).
    • Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
    • Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
    • FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
    • The unify_responses support for remote datasets (#3937).
    "},{"location":"community/changelog/#fixed_15","title":"Fixed","text":"
    • Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
    • Updated active learning for text classification notebooks to pass ids of type int to TextClassificationRecord (#3831).
    • Fixed record fields validation that was preventing from logging records with optional fields (i.e. required=True) when the field value was None (#3846).
    • Always set pretrained_model_name_or_path attribute as string in ArgillaTrainer (#3914).
    • The inserted_at and updated_at attributes are create using the utcnow factory to avoid unexpected race conditions on timestamp creation (#3945)
    • Fixed configure_dataset_settings when providing the workspace via the arg workspace (#3887).
    • Fixed saving of models trained with ArgillaTrainer with a peft_config parameter (#3795).
    • Fixed backwards compatibility on from_huggingface when loading a FeedbackDataset from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829).
    • Fixed wrong __repr__ problem for TrainingTask. (#3969)
    • Fixed wrong key return error prepare_for_training_with_* for TrainingTask. (#3969)
    "},{"location":"community/changelog/#deprecated_3","title":"Deprecated","text":"
    • Function rg.configure_dataset is deprecated in favour of rg.configure_dataset_settings. The former will be removed in version 1.19.0
    "},{"location":"community/changelog/#1160","title":"1.16.0","text":""},{"location":"community/changelog/#added_15","title":"Added","text":"
    • Added ArgillaTrainer integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739)
    • Added ArgillaTrainer integration with TrainingTask.for_question_answering (#3740)
    • Added Auto save record to save automatically the current record that you are working on (#3541)
    • Added ArgillaTrainer integration with OpenAI, allowing fine tuning for chat completion (#3615)
    • Added workspaces list command to list Argilla workspaces (#3594).
    • Added datasets list command to list Argilla datasets (#3658).
    • Added users create command to create users (#3667).
    • Added whoami command to get current user (#3673).
    • Added users delete command to delete users (#3671).
    • Added users list command to list users (#3688).
    • Added workspaces delete-user command to remove a user from a workspace (#3699).
    • Added datasets list command to list Argilla datasets (#3658).
    • Added users create command to create users (#3667).
    • Added users delete command to delete users (#3671).
    • Added workspaces create command to create an Argilla workspace (#3676).
    • Added datasets push-to-hub command to push a FeedbackDataset from Argilla into the HuggingFace Hub (#3685).
    • Added info command to get info about the used Argilla client and server (#3707).
    • Added datasets delete command to delete a FeedbackDataset from Argilla (#3703).
    • Added created_at and updated_at properties to RemoteFeedbackDataset and FilteredRemoteFeedbackDataset (#3709).
    • Added handling PermissionError when executing a command with a logged in user with not enough permissions (#3717).
    • Added workspaces add-user command to add a user to workspace (#3712).
    • Added workspace_id param to GET /api/v1/me/datasets endpoint (#3727).
    • Added workspace_id arg to list_datasets in the Python SDK (#3727).
    • Added argilla script that allows to execute Argilla CLI using the argilla command (#3730).
    • Added support for passing already initialized model and tokenizer instances to the ArgillaTrainer (#3751)
    • Added server_info function to check the Argilla server information (also accessible via rg.server_info) (#3772).
    "},{"location":"community/changelog/#changed_12","title":"Changed","text":"
    • Move database commands under server group of commands (#3710)
    • server commands only included in the CLI app when server extra requirements are installed (#3710).
    • Updated PUT /api/v1/responses/{response_id} to replace values stored with received values in request (#3711).
    • Display a UserWarning when the user_id in Workspace.add_user and Workspace.delete_user is the ID of an user with the owner role as they don't require explicit permissions (#3716).
    • Rename tasks sub-package to cli (#3723).
    • Changed argilla database command in the CLI to now be accessed via argilla server database, to be deprecated in the upcoming release (#3754).
    • Changed visible_options (of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).
    "},{"location":"community/changelog/#fixed_16","title":"Fixed","text":"
    • Fixed remove user modification in text component on clear answers (#3775)
    • Fixed Highlight raw text field in dataset feedback task (#3731)
    • Fixed Field title too long (#3734)
    • Fixed error messages when deleting a DatasetForTextClassification (#3652)
    • Fixed Pending queue pagination problems when during data annotation (#3677)
    • Fixed visible_labels default value to be 20 just when visible_labels not provided and len(labels) > 20, otherwise it will either be the provided visible_labels value or None, for LabelQuestion and MultiLabelQuestion (#3702).
    • Fixed DatasetCard generation when RemoteFeedbackDataset contains suggestions (#3718).
    • Add missing draft status in ResponseSchema as now there can be responses with draft status when annotating via the UI (#3749).
    • Searches when queried words are distributed along the record fields (#3759).
    • Fixed Python 3.11 compatibility issue with /api/datasets endpoints due to the TaskType enum replacement in the endpoint URL (#3769).
    • Fixed RankingValueSchema and FeedbackRankingValueModel schemas to allow rank=None when status=draft (#3781).
    "},{"location":"community/changelog/#1151","title":"1.15.1","text":""},{"location":"community/changelog/#fixed_17","title":"Fixed","text":"
    • Fixed Text component text content sanitization behavior just for markdown to prevent disappear the text(#3738)
    • Fixed Text component now you need to press Escape to exit the text area (#3733)
    • Fixed SearchEngine was creating the same number of primary shards and replica shards for each FeedbackDataset (#3736).
    "},{"location":"community/changelog/#1150","title":"1.15.0","text":""},{"location":"community/changelog/#added_16","title":"Added","text":"
    • Added Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI (#3489)
    • Added ArgillaTrainer integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467)
    • Added formatting_func to ArgillaTrainer for FeedbackDataset datasets add a custom formatting for the data (#3599).
    • Added login function in argilla.client.login to login into an Argilla server and store the credentials locally (#3582).
    • Added login command to login into an Argilla server (#3600).
    • Added logout command to logout from an Argilla server (#3605).
    • Added DELETE /api/v1/suggestions/{suggestion_id} endpoint to delete a suggestion given its ID (#3617).
    • Added DELETE /api/v1/records/{record_id}/suggestions endpoint to delete several suggestions linked to the same record given their IDs (#3617).
    • Added response_status param to GET /api/v1/datasets/{dataset_id}/records to be able to filter by response_status as previously included for GET /api/v1/me/datasets/{dataset_id}/records (#3613).
    • Added list classmethod to ArgillaMixin to be used as FeedbackDataset.list(), also including the workspace to list from as arg (#3619).
    • Added filter_by method in RemoteFeedbackDataset to filter based on response_status (#3610).
    • Added list_workspaces function (to be used as rg.list_workspaces, but Workspace.list is preferred) to list all the workspaces from an user in Argilla (#3641).
    • Added list_datasets function (to be used as rg.list_datasets) to list the TextClassification, TokenClassification, and Text2Text datasets in Argilla (#3638).
    • Added RemoteSuggestionSchema to manage suggestions in Argilla, including the delete method to delete suggestios from Argilla via DELETE /api/v1/suggestions/{suggestion_id} (#3651).
    • Added delete_suggestions to RemoteFeedbackRecord to remove suggestions from Argilla via DELETE /api/v1/records/{record_id}/suggestions (#3651).
    "},{"location":"community/changelog/#changed_13","title":"Changed","text":"
    • Changed Optional label for * mark for required question (#3608)
    • Updated RemoteFeedbackDataset.delete_records to use batch delete records endpoint (#3580).
    • Included allowed_for_roles for some RemoteFeedbackDataset, RemoteFeedbackRecords, and RemoteFeedbackRecord methods that are only allowed for users with roles owner and admin (#3601).
    • Renamed ArgillaToFromMixin to ArgillaMixin (#3619).
    • Move users CLI app under database CLI app (#3593).
    • Move server Enum classes to argilla.server.enums module (#3620).
    "},{"location":"community/changelog/#fixed_18","title":"Fixed","text":"
    • Fixed Filter by workspace in breadcrumbs (#3577)
    • Fixed Filter by workspace in datasets table (#3604)
    • Fixed Query search highlight for Text2Text and TextClassification (#3621)
    • Fixed RatingQuestion.values validation to raise a ValidationError when values are out of range i.e. [1, 10] (#3626).
    "},{"location":"community/changelog/#removed_4","title":"Removed","text":"
    • Removed multi_task_text_token_classification from TaskType as not used (#3640).
    • Removed argilla_id in favor of id from RemoteFeedbackDataset (#3663).
    • Removed fetch_records from RemoteFeedbackDataset as now the records are lazily fetched from Argilla (#3663).
    • Removed push_to_argilla from RemoteFeedbackDataset, as it just works when calling it through a FeedbackDataset locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663).
    • Removed set_suggestions in favor of update(suggestions=...) for both FeedbackRecord and RemoteFeedbackRecord, as all the updates of any \"updateable\" attribute of a record will go through update instead (#3663).
    • Remove unused owner attribute for client Dataset data model (#3665)
    "},{"location":"community/changelog/#1141","title":"1.14.1","text":""},{"location":"community/changelog/#fixed_19","title":"Fixed","text":"
    • Fixed PostgreSQL database not being updated after begin_nested because of missing commit (#3567).
    "},{"location":"community/changelog/#fixed_20","title":"Fixed","text":"
    • Fixed settings could not be provided when updating a rating or ranking question (#3552).
    "},{"location":"community/changelog/#1140","title":"1.14.0","text":""},{"location":"community/changelog/#added_17","title":"Added","text":"
    • Added PATCH /api/v1/fields/{field_id} endpoint to update the field title and markdown settings (#3421).
    • Added PATCH /api/v1/datasets/{dataset_id} endpoint to update dataset name and guidelines (#3402).
    • Added PATCH /api/v1/questions/{question_id} endpoint to update question title, description and some settings (depending on the type of question) (#3477).
    • Added DELETE /api/v1/records/{record_id} endpoint to remove a record given its ID (#3337).
    • Added pull method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) to pull all the records from it and return it as a local copy as a FeedbackDataset (#3465).
    • Added delete method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) (#3512).
    • Added delete_records method in RemoteFeedbackDataset, and delete method in RemoteFeedbackRecord to delete records from Argilla (#3526).
    "},{"location":"community/changelog/#changed_14","title":"Changed","text":"
    • Improved efficiency of weak labeling when dataset contains vectors (#3444).
    • Added ArgillaDatasetMixin to detach the Argilla-related functionality from the FeedbackDataset (#3427)
    • Moved FeedbackDataset-related pydantic.BaseModel schemas to argilla.client.feedback.schemas instead, to be better structured and more scalable and maintainable (#3427)
    • Update CLI to use database async connection (#3450).
    • Limit rating questions values to the positive range [1, 10] (#3451).
    • Updated POST /api/users endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462).
    • Updated Python client User.create method to be able to provide a list of workspace names to which the user should be linked to (#3462).
    • Updated GET /api/v1/me/datasets/{dataset_id}/records endpoint to allow getting records matching one of the response statuses provided via query param (#3359).
    • Updated POST /api/v1/me/datasets/{dataset_id}/records endpoint to allow searching records matching one of the response statuses provided via query param (#3359).
    • Updated SearchEngine.search method to allow searching records matching one of the response statuses provided (#3359).
    • After calling FeedbackDataset.push_to_argilla, the methods FeedbackDataset.add_records and FeedbackRecord.set_suggestions will automatically call Argilla with no need of calling push_to_argilla explicitly (#3465).
    • Now calling FeedbackDataset.push_to_huggingface dumps the responses as a List[Dict[str, Any]] instead of Sequence to make it more readable via \ud83e\udd17datasets (#3539).
    "},{"location":"community/changelog/#fixed_21","title":"Fixed","text":"
    • Fixed issue with bool values and default from Jinja2 while generating the HuggingFace DatasetCard from argilla_template.md (#3499).
    • Fixed DatasetConfig.from_yaml which was failing when calling FeedbackDataset.from_huggingface as the UUIDs cannot be deserialized automatically by PyYAML, so UUIDs are neither dumped nor loaded anymore (#3502).
    • Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
    • TextClassificationSettings and TokenClassificationSettings labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).
    • Fixed PUT /api/v1/datasets/{dataset_id}/publish to check whether at least one field and question has required=True (#3511).
    • Fixed FeedbackDataset.from_huggingface as suggestions were being lost when there were no responses (#3539).
    • Fixed QuestionSchema and FieldSchema not validating name attribute (#3550).
    "},{"location":"community/changelog/#deprecated_4","title":"Deprecated","text":"
    • After calling FeedbackDataset.push_to_argilla, calling push_to_argilla again won't do anything since the dataset is already pushed to Argilla (#3465).
    • After calling FeedbackDataset.push_to_argilla, calling fetch_records won't do anything since the records are lazily fetched from Argilla (#3465).
    • After calling FeedbackDataset.push_to_argilla, the Argilla ID is no longer stored in the attribute/property argilla_id but in id instead (#3465).
    "},{"location":"community/changelog/#1133","title":"1.13.3","text":""},{"location":"community/changelog/#fixed_22","title":"Fixed","text":"
    • Fixed ModuleNotFoundError caused because the argilla.utils.telemetry module used in the ArgillaTrainer was importing an optional dependency not installed by default (#3471).
    • Fixed ImportError caused because the argilla.client.feedback.config module was importing pyyaml optional dependency not installed by default (#3471).
    "},{"location":"community/changelog/#1132","title":"1.13.2","text":""},{"location":"community/changelog/#fixed_23","title":"Fixed","text":"
    • The suggestion_type_enum ENUM data type created in PostgreSQL didn't have any value (#3445).
    "},{"location":"community/changelog/#1131","title":"1.13.1","text":""},{"location":"community/changelog/#fixed_24","title":"Fixed","text":"
    • Fix database migration for PostgreSQL (See #3438)
    "},{"location":"community/changelog/#1130","title":"1.13.0","text":""},{"location":"community/changelog/#added_18","title":"Added","text":"
    • Added GET /api/v1/users/{user_id}/workspaces endpoint to list the workspaces to which a user belongs (#3308 and #3343).
    • Added HuggingFaceDatasetMixin for internal usage, to detach the FeedbackDataset integrations from the class itself, and use Mixins instead (#3326).
    • Added GET /api/v1/records/{record_id}/suggestions API endpoint to get the list of suggestions for the responses associated to a record (#3304).
    • Added POST /api/v1/records/{record_id}/suggestions API endpoint to create a suggestion for a response associated to a record (#3304).
    • Added support for RankingQuestionStrategy, RankingQuestionUnification and the .for_text_classification method for the TrainingTaskMapping (#3364)
    • Added PUT /api/v1/records/{record_id}/suggestions API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391).
    • Added suggestions attribute to FeedbackRecord, and allow adding and retrieving suggestions from the Python client (#3370)
    • Added allowed_for_roles Python decorator to check whether the current user has the required role to access the decorated function/method for User and Workspace (#3383)
    • Added API and Python Client support for workspace deletion (Closes #3260)
    • Added GET /api/v1/me/workspaces endpoint to list the workspaces of the current active user (#3390)
    "},{"location":"community/changelog/#changed_15","title":"Changed","text":"
    • Updated output payload for GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records, POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to include the suggestions of the records based on the value of the include query parameter (#3304).
    • Updated POST /api/v1/datasets/{dataset_id}/records input payload to add suggestions (#3304).
    • The POST /api/datasets/:dataset-id/:task/bulk endpoints don't create the dataset if does not exists (Closes #3244)
    • Added Telemetry support for ArgillaTrainer (closes #3325)
    • User.workspaces is no longer an attribute but a property, and is calling list_user_workspaces to list all the workspace names for a given user ID (#3334)
    • Renamed FeedbackDatasetConfig to DatasetConfig and export/import from YAML as default instead of JSON (just used internally on push_to_huggingface and from_huggingface methods of FeedbackDataset) (#3326).
    • The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
    • Updated Dockerfile parent image from python:3.9.16-slim to python:3.10.12-slim (#3425).
    • Updated quickstart.Dockerfile parent image from elasticsearch:8.5.3 to argilla/argilla-server:${ARGILLA_VERSION} (#3425).
    "},{"location":"community/changelog/#removed_5","title":"Removed","text":"
    • Removed support to non-prefixed environment variables. All valid env vars start with ARGILLA_ (See #3392).
    "},{"location":"community/changelog/#fixed_25","title":"Fixed","text":"
    • Fixed GET /api/v1/me/datasets/{dataset_id}/records endpoint returning always the responses for the records even if responses was not provided via the include query parameter (#3304).
    • Values for protected metadata fields are not truncated (Closes #3331).
    • Big number ids are properly rendered in UI (Closes #3265)
    • Fixed ArgillaDatasetCard to include the values/labels for all the existing questions (#3366)
    "},{"location":"community/changelog/#deprecated_5","title":"Deprecated","text":"
    • Integer support for record id in text classification, token classification and text2text datasets.
    "},{"location":"community/changelog/#1121","title":"1.12.1","text":""},{"location":"community/changelog/#fixed_26","title":"Fixed","text":"
    • Using rg.init with default argilla user skips setting the default workspace if not available. (Closes #3340)
    • Resolved wrong import structure for ArgillaTrainer and TrainingTaskMapping (Closes #3345)
    • Pin pydantic dependency to version < 2 (Closes 3348)
    "},{"location":"community/changelog/#1120","title":"1.12.0","text":""},{"location":"community/changelog/#added_19","title":"Added","text":"
    • Added RankingQuestionSettings class allowing to create ranking questions in the API using POST /api/v1/datasets/{dataset_id}/questions endpoint (#3232)
    • Added RankingQuestion in the Python client to create ranking questions (#3275).
    • Added Ranking component in feedback task question form (#3177 & #3246).
    • Added FeedbackDataset.prepare_for_training method for generaring a framework-specific dataset with the responses provided for RatingQuestion, LabelQuestion and MultiLabelQuestion (#3151).
    • Added ArgillaSpaCyTransformersTrainer class for supporting the training with spacy-transformers (#3256).
    "},{"location":"community/changelog/#docs","title":"Docs","text":"
    • Added instructions for how to run the Argilla frontend in the developer docs (#3314).
    "},{"location":"community/changelog/#changed_16","title":"Changed","text":"
    • All docker related files have been moved into the docker folder (#3053).
    • release.Dockerfile have been renamed to Dockerfile (#3133).
    • Updated rg.load function to raise a ValueError with a explanatory message for the cases in which the user tries to use the function to load a FeedbackDataset (#3289).
    • Updated ArgillaSpaCyTrainer to allow re-using tok2vec (#3256).
    "},{"location":"community/changelog/#fixed_27","title":"Fixed","text":"
    • Check available workspaces on Argilla on rg.set_workspace (Closes #3262)
    "},{"location":"community/changelog/#1110","title":"1.11.0","text":""},{"location":"community/changelog/#fixed_28","title":"Fixed","text":"
    • Replaced np.float alias by float to avoid AttributeError when using find_label_errors function with numpy>=1.24.0 (#3214).
    • Fixed format_as(\"datasets\") when no responses or optional respones in FeedbackRecord, to set their value to what \ud83e\udd17 Datasets expects instead of just None (#3224).
    • Fixed push_to_huggingface() when generate_card=True (default behaviour), as we were passing a sample record to the ArgillaDatasetCard class, and UUIDs introduced in 1.10.0 (#3192), are not JSON-serializable (#3231).
    • Fixed from_argilla and push_to_argilla to ensure consistency on both field and question re-construction, and to ensure UUIDs are properly serialized as str, respectively (#3234).
    • Refactored usage of import argilla as rg to clarify package navigation (#3279).
    "},{"location":"community/changelog/#docs_1","title":"Docs","text":"
    • Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
    • Fixed library buttons' formatting on Tutorials page (#3255).
    • Modified styling of error code outputs in notebooks (#3270).
    • Added ElasticSearch and OpenSearch versions (#3280).
    • Removed template notebook from table of contents (#3271).
    • Fixed tutorials with pip install argilla to not use older versions of the package (#3282).
    "},{"location":"community/changelog/#added_20","title":"Added","text":"
    • Added metadata attribute to the Record of the FeedbackDataset (#3194)
    • New users update command to update the role for an existing user (#3188)
    • New Workspace class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180)
    • Added User class to let users manage their Argilla users via the Python client (#3169).
    • Added an option to display tqdm progress bar to FeedbackDataset.push_to_argilla when looping over the records to upload (#3233).
    "},{"location":"community/changelog/#changed_17","title":"Changed","text":"
    • The role system now support three different roles owner, admin and annotator (#3104)
    • admin role is scoped to workspace-level operations (#3115)
    • The owner user is created among the default pool of users in the quickstart, and the default user in the server has now owner role (#3248), reverting (#3188).
    "},{"location":"community/changelog/#deprecated_6","title":"Deprecated","text":"
    • As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
    "},{"location":"community/changelog/#1100","title":"1.10.0","text":""},{"location":"community/changelog/#added_21","title":"Added","text":"
    • Added search component for feedback datasets (#3138)
    • Added markdown support for feedback dataset guidelines (#3153)
    • Added Train button for feedback datasets (#3170)
    "},{"location":"community/changelog/#changed_18","title":"Changed","text":"
    • Updated SearchEngine and POST /api/v1/me/datasets/{dataset_id}/records/search to return the total number of records matching the search query (#3166)
    "},{"location":"community/changelog/#fixed_29","title":"Fixed","text":"
    • Replaced Enum for string value in URLs for client API calls (Closes #3149)
    • Resolve breaking issue with ArgillaSpanMarkerTrainer for Named Entity Recognition with span_marker v1.1.x onwards.
    • Move ArgillaDatasetCard import under @requires_version decorator, so that the ImportError on huggingface_hub is handled properly (#3174)
    • Allow flow FeedbackDataset.from_argilla -> FeedbackDataset.push_to_argilla under different dataset names and/or workspaces (#3192)
    "},{"location":"community/changelog/#docs_2","title":"Docs","text":"
    • Resolved typos in the docs (#3240).
    • Fixed mention of master branch (#3254).
    "},{"location":"community/changelog/#190","title":"1.9.0","text":""},{"location":"community/changelog/#added_22","title":"Added","text":"
    • Added boolean use_markdown property to TextFieldSettings model.
    • Added boolean use_markdown property to TextQuestionSettings model.
    • Added new status draft for the Response model.
    • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API (#3005)
    • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API (#3010).
    • Added POST /api/v1/me/datasets/{dataset_id}/records/search endpoint (#3068).
    • Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
    • Added docstrings to the pydantic.BaseModels defined at argilla/client/feedback/schemas.py (#3137)
    • Added the information about executing tests in the developer documentation ([#3143]).
    "},{"location":"community/changelog/#changed_19","title":"Changed","text":"
    • Updated GET /api/v1/me/datasets/:dataset_id/metrics output payload to include the count of responses with draft status.
    • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API.
    • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API.
    • Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
    • Updated alembic setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044)
    • Improved DatasetCard generation on FeedbackDataset.push_to_huggingface when generate_card=True, following the official HuggingFace Hub template, but suited to FeedbackDatasets from Argilla (#3110)
    "},{"location":"community/changelog/#fixed_30","title":"Fixed","text":"
    • Disallow fields and questions in FeedbackDataset with the same name (#3126).
    • Fixed broken links in the documentation and updated the development branch name from development to develop ([#3145]).
    "},{"location":"community/changelog/#180","title":"1.8.0","text":""},{"location":"community/changelog/#added_23","title":"Added","text":"
    • /api/v1/datasets new endpoint to list and create datasets (#2615).
    • /api/v1/datasets/{dataset_id} new endpoint to get and delete datasets (#2615).
    • /api/v1/datasets/{dataset_id}/publish new endpoint to publish a dataset (#2615).
    • /api/v1/datasets/{dataset_id}/questions new endpoint to list and create dataset questions (#2615)
    • /api/v1/datasets/{dataset_id}/fields new endpoint to list and create dataset fields (#2615)
    • /api/v1/datasets/{dataset_id}/questions/{question_id} new endpoint to delete a dataset questions (#2615)
    • /api/v1/datasets/{dataset_id}/fields/{field_id} new endpoint to delete a dataset field (#2615)
    • /api/v1/workspaces/{workspace_id} new endpoint to get workspaces by id (#2615)
    • /api/v1/responses/{response_id} new endpoint to update and delete a response (#2615)
    • /api/v1/datasets/{dataset_id}/records new endpoint to create and list dataset records (#2615)
    • /api/v1/me/datasets new endpoint to list user visible datasets (#2615)
    • /api/v1/me/dataset/{dataset_id}/records new endpoint to list dataset records with user responses (#2615)
    • /api/v1/me/datasets/{dataset_id}/metrics new endpoint to get the dataset user metrics (#2615)
    • /api/v1/me/records/{record_id}/responses new endpoint to create record user responses (#2615)
    • showing new feedback task datasets in datasets list ([#2719])
    • new page for feedback task ([#2680])
    • show feedback task metrics ([#2822])
    • user can delete dataset in dataset settings page ([#2792])
    • Support for FeedbackDataset in Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003])
    • Integration with the HuggingFace Hub ([#2949])
    • Added ArgillaPeftTrainer for text and token classificaiton #2854
    • Added predict_proba() method to ArgillaSetFitTrainer
    • Added ArgillaAutoTrainTrainer for Text Classification #2664
    • New database revisions command showing database revisions info
    "},{"location":"community/changelog/#fixes","title":"Fixes","text":"
    • Avoid rendering html for invalid html strings in Text2text ([#2911]https://github.com/argilla-io/argilla/issues/2911)
    "},{"location":"community/changelog/#changed_20","title":"Changed","text":"
    • The database migrate command accepts a --revision param to provide specific revision id
    • tokens_length metrics function returns empty data (#3045)
    • token_length metrics function returns empty data (#3045)
    • mention_length metrics function returns empty data (#3045)
    • entity_density metrics function returns empty data (#3045)
    "},{"location":"community/changelog/#deprecated_7","title":"Deprecated","text":"
    • Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
    • tokens_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
    • token_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
    • mention_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
    • entity_density metrics function has been deprecated and will be removed in 1.10.0 (#3045)
    "},{"location":"community/changelog/#removed_6","title":"Removed","text":"
    • Removed mention density, tokens_length and chars_length metrics from token classification metrics storage (#3045)
    • Removed token char_start, char_end, tag, and score metrics from token classification metrics storage (#3045)
    • Removed tags-related metrics from token classification metrics storage (#3045)
    "},{"location":"community/changelog/#170","title":"1.7.0","text":""},{"location":"community/changelog/#added_24","title":"Added","text":"
    • add max_retries and num_threads parameters to rg.log to run data logging request concurrently with backoff retry policy. See #2458 and #2533
    • rg.load accepts include_vectors and include_metrics when loading data. Closes #2398
    • Added settings param to prepare_for_training (#2689)
    • Added prepare_for_training for openai (#2658)
    • Added ArgillaOpenAITrainer (#2659)
    • Added ArgillaSpanMarkerTrainer for Named Entity Recognition (#2693)
    • Added ArgillaTrainer CLI support. Closes (#2809)
    "},{"location":"community/changelog/#fixes_1","title":"Fixes","text":"
    • fix image alignment on token classification
    "},{"location":"community/changelog/#changed_21","title":"Changed","text":"
    • Argilla quickstart image dependencies are externalized into quickstart.requirements.txt. See #2666
    • bulk endpoints will upsert data when record id is present. Closes #2535
    • moved from click to typer CLI support. Closes (#2815)
    • Argilla server docker image is built with PostgreSQL support. Closes #2686
    • The rg.log computes all batches and raise an error for all failed batches.
    • The default batch size for rg.log is now 100.
    "},{"location":"community/changelog/#fixed_31","title":"Fixed","text":"
    • argilla.training bugfixes and unification (#2665)
    • Resolved several small bugs in the ArgillaTrainer.
    "},{"location":"community/changelog/#deprecated_8","title":"Deprecated","text":"
    • The rg.log_async function is deprecated and will be removed in next minor release.
    "},{"location":"community/changelog/#160","title":"1.6.0","text":""},{"location":"community/changelog/#added_25","title":"Added","text":"
    • ARGILLA_HOME_PATH new environment variable (#2564).
    • ARGILLA_DATABASE_URL new environment variable (#2564).
    • Basic support for user roles with admin and annotator (#2564).
    • id, first_name, last_name, role, inserted_at and updated_at new user fields (#2564).
    • /api/users new endpoint to list and create users (#2564).
    • /api/users/{user_id} new endpoint to delete users (#2564).
    • /api/workspaces new endpoint to list and create workspaces (#2564).
    • /api/workspaces/{workspace_id}/users new endpoint to list workspace users (#2564).
    • /api/workspaces/{workspace_id}/users/{user_id} new endpoint to create and delete workspace users (#2564).
    • argilla.tasks.users.migrate new task to migrate users from old YAML file to database (#2564).
    • argilla.tasks.users.create new task to create a user (#2564).
    • argilla.tasks.users.create_default new task to create a user with default credentials (#2564).
    • argilla.tasks.database.migrate new task to execute database migrations (#2564).
    • release.Dockerfile and quickstart.Dockerfile now creates a default argilladata volume to persist data (#2564).
    • Add user settings page. Closes #2496
    • Added Argilla.training module with support for spacy, setfit, and transformers. Closes #2504
    "},{"location":"community/changelog/#fixes_2","title":"Fixes","text":"
    • Now the prepare_for_training method is working when multi_label=True. Closes #2606
    "},{"location":"community/changelog/#changed_22","title":"Changed","text":"
    • ARGILLA_USERS_DB_FILE environment variable now it's only used to migrate users from YAML file to database (#2564).
    • full_name user field is now deprecated and first_name and last_name should be used instead (#2564).
    • password user field now requires a minimum of 8 and a maximum of 100 characters in size (#2564).
    • quickstart.Dockerfile image default users from team and argilla to admin and annotator including new passwords and API keys (#2564).
    • Datasets to be managed only by users with admin role (#2564).
    • The list of rules is now accessible while metrics are computed. Closes#2117
    • Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648
    "},{"location":"community/changelog/#removed_7","title":"Removed","text":"
    • email user field (#2564).
    • disabled user field (#2564).
    • Support for private workspaces (#2564).
    • ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY and ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD environment variables. Use python -m argilla.tasks.users.create_default instead (#2564).
    • The old headers for API Key and workspace from python client
    • The default value for old API Key constant. Closes #2251
    "},{"location":"community/changelog/#151-2023-03-30","title":"1.5.1 - 2023-03-30","text":""},{"location":"community/changelog/#fixes_3","title":"Fixes","text":"
    • Copying datasets between workspaces with proper owner/workspace info. Closes #2562
    • Copy dataset with empty workspace to the default user workspace 905d4de
    • Using elasticsearch config to request backend version. Closes #2311
    • Remove sorting by score in labels. Closes #2622
    "},{"location":"community/changelog/#changed_23","title":"Changed","text":"
    • Update field name in metadata for image url. See #2609
    • Improvements in tutorial doc cards. Closes #2216
    "},{"location":"community/changelog/#150-2023-03-21","title":"1.5.0 - 2023-03-21","text":""},{"location":"community/changelog/#added_26","title":"Added","text":"
    • Add the fields to retrieve when loading the data from argilla. rg.load takes too long because of the vector field, even when users don't need it. Closes #2398
    • Add new page and components for dataset settings. Closes #2442
    • Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
    • Non-searchable fields support in metadata. #2570
    • Add record ID references to the prepare for training methods. Closes #2483
    • Add tutorial on Image Classification. #2420
    • Add Train button, visible for \"admin\" role, with code snippets from a selection of libraries. Closes [#2591] (https://github.com/argilla-io/argilla/pull/2591)
    "},{"location":"community/changelog/#changed_24","title":"Changed","text":"
    • Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see https://github.com/argilla-io/argilla/issues/2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
    • The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
    • Update \"Define a labeling schema\" section in docs.
    • The record inputs are sorted alphabetically in UI by default. #2581
    • The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587
    "},{"location":"community/changelog/#fixes_4","title":"Fixes","text":"
    • Allow URL to be clickable in Jupyter notebook again. Closes #2527
    "},{"location":"community/changelog/#removed_8","title":"Removed","text":"
    • Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client <v1.3.0
    • Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version <1.3.0
    • Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.
    "},{"location":"community/contributor/","title":"How to contribute?","text":"

    Thank you for investing your time in contributing to the project! Any contribution you make will be reflected in the most recent version of Argilla \ud83e\udd29.

    New to contributing in general?

    If you're a new contributor, read the README to get an overview of the project. In addition, here are some resources to help you get started with open-source contributions:

    • Discord: You are welcome to join the Argilla Discord community, where you can keep in touch with other users, contributors and the Argilla team. In the following section, you can find more information on how to get started in Discord.
    • Git: This is a very useful tool to keep track of the changes in your files. Using the command-line interface (CLI), you can make your contributions easily. For that, you need to have it installed and updated on your computer.
    • GitHub: It is a platform and cloud-based service that uses git and allows developers to collaborate on projects. To contribute to Argilla, you'll need to create an account. Check the Contributor Workflow with Git and Github for more info.
    • Developer Documentation: To collaborate, you'll need to set up an efficient environment. Check the Server and Frontend READMEs to know how to do it.
    • Schedule a meeting with our developer advocate: If you have more questions, do not hesitate to contact our developer advocate and schedule a meeting.
    "},{"location":"community/contributor/#first-contact-in-discord","title":"First Contact in Discord","text":"

    Discord is a handy tool for more casual conversations and to answer day-to-day questions. As part of Hugging Face, we have set up some Argilla channels on the server. Click here to join the Hugging Face Discord community effortlessly.

    When part of the Hugging Face Discord, you can select \"Channels & roles\" and select \"Argilla\" along with any of the other groups that are interesting to you. \"Argilla\" will cover anything about argilla and distilabel. You can join the following channels:

    • #argilla-distilabel-general: \ud83d\udce3 Stay up-to-date and general discussions.
    • #argilla-distilabel-help: \ud83d\ude4b\u200d\u2640\ufe0f Need assistance? We're always here to help. Select the appropriate label (argilla or distilabel) for your issue and post it.

    So now there is only one thing left to do: introduce yourself and talk to the community. You'll always be welcome! \ud83e\udd17\ud83d\udc4b

    "},{"location":"community/contributor/#contributor-workflow-with-git-and-github","title":"Contributor Workflow with Git and GitHub","text":"

    If you're working with Argilla and suddenly a new idea comes to your mind or you find an issue that can be improved, it's time to actively participate and contribute to the project!

    "},{"location":"community/contributor/#report-an-issue","title":"Report an issue","text":"

    If you spot a problem, search if an issue already exists. You can use the Label filter. If that is the case, participate in the conversation. If it does not exist, create an issue by clicking on New Issue.

    This will show various templates, choose the one that best suits your issue.

    Below, you can see an example of the Feature request template. Once you choose one, you will need to fill in it following the guidelines. Try to be as clear as possible. In addition, you can assign yourself to the issue and add or choose the right labels. Finally, click on Submit new issue.

    "},{"location":"community/contributor/#work-with-a-fork","title":"Work with a fork","text":""},{"location":"community/contributor/#fork-the-argilla-repository","title":"Fork the Argilla repository","text":"

    After having reported the issue, you can start working on it. For that, you will need to create a fork of the project. To do that, click on the Fork button.

    Now, fill in the information. Remember to uncheck the Copy develop branch only if you are going to work in or from another branch (for instance, to fix documentation the main branch is used). Then, click on Create fork.

    Now, you will be redirected to your fork. You can see that you are in your fork because the name of the repository will be your username/argilla, and it will indicate forked from argilla-io/argilla.

    "},{"location":"community/contributor/#clone-your-forked-repository","title":"Clone your forked repository","text":"

    In order to make the required adjustments, clone the forked repository to your local machine. Choose the destination folder and run the following command:

    git clone https://github.com/[your-github-username]/argilla.git\ncd argilla\n

    To keep your fork\u2019s main/develop branch up to date with our repo, add it as an upstream remote branch.

    git remote add upstream https://github.com/argilla-io/argilla.git\n
    "},{"location":"community/contributor/#create-a-new-branch","title":"Create a new branch","text":"

    For each issue you're addressing, it's advisable to create a new branch. GitHub offers a straightforward method to streamline this process.

    \u26a0\ufe0f Never work directly on the main or develop branch. Always create a new branch for your changes.

    Navigate to your issue and on the right column, select Create a branch.

    After the new window pops up, the branch will be named after the issue, include a prefix such as feature/, bug/, or docs/ to facilitate quick recognition of the issue type. In the Repository destination, pick your fork ( [your-github-username]/argilla), and then select Change branch source to specify the source branch for creating the new one. Complete the process by clicking Create branch.

    \ud83e\udd14 Remember that the main branch is only used to work with the documentation. For any other changes, use the develop branch.

    Now, locally change to the new branch you just created.

    git fetch origin\ngit checkout [branch-name]\n
    "},{"location":"community/contributor/#use-changelogmd","title":"Use CHANGELOG.md","text":"

    If you are working on a new feature, it is a good practice to make note of it for others to keep up with the changes. For that, we utilize the CHANGELOG.md file in the root directory. This file is used to list changes made in each version of the project and there are headers that we use to denote each type of change.

    • Added: for new features.
    • Changed: for changes in existing functionality.
    • Deprecated: for soon-to-be removed features.
    • Removed: for now removed features.
    • Fixed: for any bug fixes.
    • Security: in case of vulnerabilities.

    A sample addition would be:

    - Fixed the key errors for the `init` method ([#NUMBER_OF_PR](LINK_TO_PR)). Contributed by @github_handle.\n

    You can have a look at the CHANGELOG.md) file to see more cases and examples.

    "},{"location":"community/contributor/#make-changes-and-push-them","title":"Make changes and push them","text":"

    Make the changes you want in your local repository, and test that everything works and you are following the guidelines.

    Once you have finished, you can check the status of your repository and synchronize with the upstreaming repo with the following command:

    # Check the status of your repository\ngit status\n\n# Synchronize with the upstreaming repo\ngit checkout [branch-name]\ngit rebase [default-branch]\n

    If everything is right, we need to commit and push the changes to your fork. For that, run the following commands:

    # Add the changes to the staging area\ngit add filename\n\n# Commit the changes by writing a proper message\ngit commit -m \"commit-message\"\n\n# Push the changes to your fork\ngit push origin [branch-name]\n

    When pushing, you will be asked to enter your GitHub login credentials. Once the push is complete, all local commits will be on your GitHub repository.

    "},{"location":"community/contributor/#create-a-pull-request","title":"Create a pull request","text":"

    Come back to GitHub, navigate to the original repository where you created your fork, and click on Compare & pull request.

    First, click on compare across forks and select the right repositories and branches.

    In the base repository, keep in mind to select either main or develop based on the modifications made. In the head repository, indicate your forked repository and the branch corresponding to the issue.

    Then, fill in the pull request template. You should add a prefix to the PR name as we did with the branch above. If you are working on a new feature, you can name your PR as feat: TITLE. If your PR consists of a solution for a bug, you can name your PR as bug: TITLE And, if your work is for improving the documentation, you can name your PR as docs: TITLE.

    In addition, on the right side, you can select a reviewer (for instance, if you discussed the issue with a member of the Argilla team) and assign the pull request to yourself. It is highly advisable to add labels to PR as well. You can do this again by the labels section right to the screen. For instance, if you are addressing a bug, add the bug label or if the PR is related to the documentation, add the documentation label. This way, PRs can be easily filtered.

    Finally, fill in the template carefully and follow the guidelines. Remember to link the original issue and enable the checkbox to allow maintainer edits so the branch can be updated for a merge. Then, click on Create pull request.

    "},{"location":"community/contributor/#review-your-pull-request","title":"Review your pull request","text":"

    Once you submit your PR, a team member will review your proposal. We may ask questions, request additional information or ask for changes to be made before a PR can be merged, either using suggested changes or pull request comments.

    You can apply the changes directly through the UI (check the files changed and click on the right-corner three dots, see image below) or from your fork, and then commit them to your branch. The PR will be updated automatically and the suggestions will appear as outdated.

    If you run into any merge issues, check out this git tutorial to help you resolve merge conflicts and other issues.

    "},{"location":"community/contributor/#your-pr-is-merged","title":"Your PR is merged!","text":"

    Congratulations \ud83c\udf89\ud83c\udf8a We thank you \ud83e\udd29

    Once your PR is merged, your contributions will be publicly visible on the Argilla GitHub.

    Additionally, we will include your changes in the next release based on our development branch.

    "},{"location":"community/contributor/#additional-resources","title":"Additional resources","text":"

    Here are some helpful resources for your reference.

    • Configuring Discord, a guide to learn how to get started with Discord.
    • Pro Git, a book to learn Git.
    • Git in VSCode, a guide to learn how to easily use Git in VSCode.
    • GitHub Skills, an interactive course to learn GitHub.
    "},{"location":"community/popular_issues/","title":"Issue dashboard","text":"Most engaging open issuesLatest issues open by the communityPlanned issues for upcoming releases Rank Issue Reactions Comments 1 4637 - [FEATURE] Label breakdown in Feedback dataset stats \ud83d\udc4d 6 \ud83d\udcac 3 2 1607 - Support for hierarchical multilabel text classification (taxonomy) \ud83d\udc4d 4 \ud83d\udcac 15 3 4658 - Active listeners for Feedback Dataset \ud83d\udc4d 4 \ud83d\udcac 5 4 4867 - [FEATURE] Add dependencies to support dev mode on HF Spaces \ud83d\udc4d 3 \ud83d\udcac 5 5 4964 - [DOCS] Improvements to docker-compose getting started installation guide \ud83d\udc4d 3 \ud83d\udcac 4 6 3338 - [FEATURE] Add conversation support to fields in Argilla dataset (ChatField) \ud83d\udc4d 2 \ud83d\udcac 11 7 1800 - Add comments/notes to annotation datasets to share with teammates. \ud83d\udc4d 2 \ud83d\udcac 6 8 1837 - Custom Record UI Templates \ud83d\udc4d 2 \ud83d\udcac 6 9 1630 - Accepting several predictions/annotations for the same record \ud83d\udc4d 2 \ud83d\udcac 2 10 4823 - [FEATURE] ImageField \ud83d\udc4d 2 \ud83d\udcac 1 Rank Issue Author 1 \ud83d\udfe2 5438 - [FEATURE] Make text box size of TextQuestion adjustable by MoritzLaurer 2 \ud83d\udfe3 5424 - [BUG-python/deployment]The status of all the dataset.records.to_dict(orient='index') records are pending by Huarong 3 \ud83d\udfe2 5414 - docker download failed by njhouse365 4 \ud83d\udfe3 5357 - [BUG-python/deployment] Response sanity check not working due to variable renaming by maxserras 5 \ud83d\udfe2 5348 - [FEATURE] Ability to create new labels on-the-fly by uahmad235 6 \ud83d\udfe2 5338 - [BUG-UI/UX] CSS is being stripped from TextQuestion by paulbauriegel 7 \ud83d\udfe2 5318 - [BUG-python/deployment] filter_by returning unexpected results for response_status by bertozzivill 8 \ud83d\udfe2 5302 - [FEATURE]Auto-annotation of Repeated Tokens by bikash119 9 \ud83d\udfe3 5290 - [BUG-python/deployment] Docker deployment issues by zhongze-fish 10 \ud83d\udfe3 5281 - [BUG-python/deployment] Error thrown by Argilla SDK : The reason of error thrown by Argilla SDK is different / ambigious from the cause of the error by bikash119 Rank Issue Milestone 1 \ud83d\udfe2 5204 - [FEATURE] add huggingface_hub.utils.send_telemetry to the argilla-server v2.1.0 2 \ud83d\udfe2 4952 - [DOCS] Add contributing pages v2.1.0 3 \ud83d\udfe2 4951 - [DOCS] Add tutorials pages to new documentation v2.1.0 4 \ud83d\udfe2 4950 - [DOCS] Add integrations to new documentation v2.1.0 5 \ud83d\udfe2 4949 - FEAT Implement external integrations v2.1.0 6 \ud83d\udfe2 4823 - [FEATURE] ImageField v2.1.0 7 \ud83d\udfe2 4944 - [REFACTOR] Simplify naming of serialize, to_dict, and to_json methods v2.1.0 8 \ud83d\udfe2 5278 - [TASK] Check records using ImageField with URLs using HTTP protocol serving application from HTTPS v2.1.0 9 \ud83d\udfe2 5361 - [BUG-UI/UX] required/optional differentiation forFields are not represented in the dataset settings v2.1.0 10 \ud83d\udfe2 3338 - [FEATURE] Add conversation support to fields in Argilla dataset (ChatField) v2.1.0

    Last update: 2024-08-29

    "},{"location":"getting_started/faq/","title":"FAQs","text":"What is Argilla?

    Argilla is a collaboration tool for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency. It is designed to help you achieve and keep high-quality data standards, store your training data, store the results of your models, evaluate their performance, and improve the data through human and AI feedback.

    Does Argilla cost money?

    No. Argilla is an open-source project and is free to use. You can deploy Argilla on your own infrastructure or use our cloud offering.

    What data types does Argilla support?

    Text data, mostly. Argilla natively supports textual data, however, we do support rich text, which means you can represent different types of data in Argilla as long as you can convert it to text. For example, you can store images, audio, video, and any other type of data as long as you can convert it to their base64 representation or render them as HTML in for example an IFrame.

    Does Argilla train models?

    No. Argilla is a collaboration tool to achieve and keep high-quality data standards. You can use Argilla to store your training data, store the results of your models, evaluate their performance and improve the data. For training models, you can use any machine learning framework or library that you prefer even though we recommend starting with Hugging Face Transformers.

    Does Argilla provide annotation workforces?

    Yes, kind of. We don't provide annotation workforce in-house but we do have partnerships with workforce providers that ensure ethical practices and secure work environments. Feel free to schedule a meeting here or contact us via email.

    How does Argilla differ from competitors like Lilac, Snorkel, Prodigy and Scale?

    Argilla distinguishes itself for its focus on specific use cases and human-in-the-loop approaches. While it does offer programmatic features, Argilla\u2019s core value lies in actively involving human experts in the tool-building process, setting it apart from other competitors.

    Furthermore, Argilla places particular emphasis on smooth integration with other tools in the community, particularly within the realms of MLOps and NLP. So, its compatibility with popular frameworks like spaCy and Hugging Face makes it exceptionally user-friendly and accessible.

    Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, often require a significant commitment. Argilla, on the other hand, works more as a tool within the MLOps ecosystem, allowing users to begin with specific use cases and then scale up as needed. This flexibility is particularly beneficial for users and customers who prefer to start small and expand their applications over time, as opposed to committing to an all-encompassing tool from the outset.

    What is the difference between Argilla 2.0 and the legacy datasets in 1.0?

    Argilla 1.0 relied on 3 main task datasets: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text. These tasks were designed to be simple, easy to use and high in functionality but they were limited in adaptability. With the introduction of Large Language Models (LLMs) and the increasing complexity of NLP tasks, we realized that we needed to expand the capabilities of Argilla to support more advanced feedback mechanisms which led to the introduction of the FeedbackDataset. Compared to its predecessor it was high in adaptability but still limited in functionality. After having ported all of the functionality of the legacy tasks to the new FeedbackDataset, we decided to deprecate the legacy tasks in favor of a brand new SDK with the FeedbackDataset at its core.

    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/","title":"Hugging Face Spaces Settings","text":"

    This section details how to configure and deploy Argilla on Hugging Face Spaces. It covers:

    • Persistent storage
    • How to deploy Argilla under a Hugging Face Organization
    • How to configure and disable HF OAuth access
    • How to use Private Spaces

    Looking to get started easily?

    If you just discovered Argilla and want to get started quickly, go to the Quickstart guide.

    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#persistent-storage","title":"Persistent storage","text":"

    In the Space creation UI, persistent storage is set to Small PAID, which is a paid service, charged per hour of usage.

    Spaces get restarted due to maintainance, inactivity, and every time you change your Spaces settings. Persistent storage enables Argilla to save to disk your datasets and configurations across restarts.

    Ephimeral FREE persistent storage

    Not setting persistent storage to Small means that you will loose your data when the Space restarts.

    If you plan to use the Argilla Space beyond testing, it's highly recommended to set persistent storage to Small.

    If you just want to quickly test or use Argilla for a few hours with the risk of loosing your datasets, choose Ephemeral FREE. Ephemeral FREE means your datasets and configuration will not be saved to disk, when the Space is restarted your datasets, workspaces, and users will be lost.

    If you want to disable the persistence storage warning, you can set the environment variable ARGILLA_SHOW_HUGGINGFACE_SPACE_PERSISTENT_STORAGE_WARNING=false

    Read this if you have datasets and want to enable persistent storage

    If you want to enable persistent storage Small PAID and you have created datasets, users, or workspaces, follow this process:

    • First, make a local or remote copy of your datasets, following the Import and Export guide. This is the most important step, because changing the settings of your Space leads to a restart and thus a data loss.
    • If you have created users (not signed in with Hugging Face login), consider storing a copy of users following the manage users guide.
    • Once you have stored all your data safely, go to you Space Settings Tab and select Small.
    • Your Space will be restarted and existing data will be lost. From now on, all the new data you create in Argilla will be kept safely
    • Recover your data, by following the above mentioned guides.
    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-configure-and-disable-oauth-access","title":"How to configure and disable OAuth access","text":"

    By default, Argilla Spaces are configured with Hugging Face OAuth, in the following way:

    • Any Hugging Face user that can see your Space, can use the Sign in button, join as an annotator, and contribute to the datasets available under the argilla workspace. This workspace is created during the deployment process.
    • These users can only explore and annotate datasets in the argilla workspace but can't perform any critical operation like create, delete, update, or configure datasets. By default, any other workspace you create, won't be visible to these users.

    To restrict access or change the default behaviour, there's two options:

    Set your Space to private. This is especially useful if your Space is under an organization. This will only allow members within your organization to see and join your Argilla space. It can also be used for personal, solo projects.

    Modify the .oauth.yml configuration file. You can find and modify this file under the Files tab of your Space. The default file looks like this:

    # Change to `false` to disable HF oauth integration\n#enabled: false\n\nproviders:\n  - name: huggingface\n\n# Allowed workspaces must exists\nallowed_workspaces:\n  - name: argilla\n
    You can modify two things:

    • Uncomment enabled: false to completely disable the Sign in with Hugging Face. If you disable it make sure to set the USERNAME and PASSWORD Space secrets to be able to login as an owner.
    • Change the list of allowed workspaces.

    For example if you want to let users join a new workspace community-initiative:

    allowed_workspaces:\n  - name: argilla\n  - name: community-initiative\n
    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-deploy-argilla-under-a-hugging-face-organization","title":"How to deploy Argilla under a Hugging Face Organization","text":"

    Creating an Argilla Space within an organization is useful for several scenarios:

    • You want to only enable members of your organization to join your Space. You can achieve this by setting your Space to private.
    • You want manage the Space together with other users (e.g., Space settings, etc.). Note that if you just want to manage your Argilla datasets, workspaces, you can achieve this by adding other Argilla owner roles to your Argilla Server.
    • More generally, you want to make available your space under an organization/community umbrella.

    The steps are very similar the Quickstart guide with two important differences:

    Setup USERNAME

    You need to set up the USERNAME Space Secret with your Hugging Face username. This way, the first time you enter with the Hugging Face Sign in button, you'll be granted the owner role.

    Enable Persistent Storage SMALL

    Not setting persistent storage to Small means that you will loose your data when the Space restarts.

    For Argilla Spaces with many users, it's strongly recommended to set persistent storage to Small.

    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-use-private-spaces","title":"How to use Private Spaces","text":"

    Setting your Space visibility to private can be useful if:

    • You want to work on your personal, solo project.
    • You want your Argilla to be available only to members of the organization where you deploy the Argilla Space.

    You can set the visibility of the Space during the Space creation process or afterwards under the Settings Tab.

    To use the Python SDK with private Spaces you need to specify your HF_TOKEN which can be found here, when creating the client:

    import argilla as rg\n\nHF_TOKEN = \"...\"\n\nclient = rg.Argilla(\n    api_url=\"<api_url>\",\n    api_key=\"<api_key>\"\n    headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n
    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#space-secrets-overview","title":"Space Secrets overview","text":"

    There's two optional secrets to set up the USERNAME and PASSWORD of the owner of the Argilla Space. Remember that, by default Argilla Spaces are configured with a Sign in with Hugging Face button, which is also used to grant an owner to the creator of the Space for personal spaces.

    The USERNAME and PASSWORD are only useful in a couple of scenarios:

    • You have disabled Hugging Face OAuth.
    • You want to set up Argilla under an organization and want your Hugging Face username to be granted the owner role.

    In summary, when setting up a Space:

    Creating a Space under your personal account

    If you are creating the Space under your personal account, don't insert any value for USERNAME and PASSWORD. Once you launch the Space you will be able to Sign in with your Hugging Face username and the owner role.

    Creating a Space under an organization

    If you are creating the Space under an organization make sure to insert your Hugging Face username in the secret USERNAME. In this way, you'll be able to Sign in with your Hugging Face user.

    "},{"location":"getting_started/how-to-deploy-argilla-with-docker/","title":"Deploy with Docker","text":"

    This guide describes how to deploy the Argilla Server with docker compose. This is useful if you want to deploy Argilla locally, and/or have full control over the configuration the server, database, and search engine (Elasticsearch).

    First, you need to install docker on your machine and make sure you can run docker compose.

    Then, create a folder (you can modify the folder name):

    mkdir argilla && cd argilla\n

    Download docker-compose.yaml:

    wget -O docker-compose.yaml https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml\n

    or using curl:

    curl https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml -o docker-compose.yaml\n

    Run to deploy the server on http://localhost:6900:

    docker compose up -d\n

    Once is completed, go to this URL with your browser: http://localhost:6900 and you should see the Argilla login page.

    If it's not available, check the logs:

    docker compose logs -f\n

    Most of the deployment issues are related to ElasticSearch. Join Hugging Face Discord's server and ask for support on the Argilla channel.

    "},{"location":"getting_started/quickstart/","title":"Quickstart","text":"

    Argilla is a free, open-source, self-hosted tool. This means you need to deploy its UI to start using it. There is two main ways to deploy Argilla:

    Deploy on the Hugging Face Hub

    The recommended choice to get started. You can get up and running in under 5 minutes and don't need to maintain a server or run any commands.

    If you're just getting started with Argilla, click the deploy button below:

    You can use the default values following these steps:

    • Leave the default Space owner (your personal account)
    • Leave USERNAME and PASSWORD secrets empty since you'll sign in with your HF user as the Argilla Space owner.
    • Click create Space to launch Argilla \ud83d\ude80.
    • Once you see the Argilla UI, go to the Sign in into the Argilla UI section. If you see the Building message for longer than 2-3 min refresh the page.

    Persistent storage SMALL

    Not setting persistent storage to SMALL means that you will loose your data when the Space restarts. Spaces get restarted due to maintainance, inactivity, and every time you change your Spaces settings. If you want to use the Space just for testing you can use FREE temporarily.

    If you want to deploy Argilla within a Hugging Face organization, setup a more stable Space, or understand the settings, check out the HF Spaces settings guide.

    Deploy with Docker

    If you want to run Argilla locally on your machine or a server, or tune the server configuration, choose this option. To use this option, check this guide.

    "},{"location":"getting_started/quickstart/#sign-in-into-the-argilla-ui","title":"Sign in into the Argilla UI","text":"

    If everything went well, you should see the Argilla sign in page that looks like this:

    Building errors

    If you get a build error, sometimes restarting the Space from the Settings page works, otherwise check the HF Spaces settings guide.

    In the sign in page:

    1. Click on Sign in with Hugging Face
    2. Authorize the application and you will be logged in into Argilla as an owner.

    Unauthorized error

    Sometimes, after authorizing you'll see an unauthorized error, and get redirected to the sign in page. Typically, clicking the Sign in button solves the issue.

    Congrats! Your Argilla server is ready to start your first project using the Python SDK. You now have full rights to create datasets. Follow the instructions in the home page, or keep reading this guide if you want a more detailed explanation.

    "},{"location":"getting_started/quickstart/#install-the-python-sdk","title":"Install the Python SDK","text":"

    To manage workspaces and datasets in Argilla, you need to use the Argilla Python SDK. You can install it with pip as follows:

    pip install argilla\n
    "},{"location":"getting_started/quickstart/#create-your-first-dataset","title":"Create your first dataset","text":"

    For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab.

    To start interacting with your Argilla server, you need to create a instantiate a client with an API key and API URL:

    • The <api_key> is in the My Settings page of your Argilla Space.

    • The <api_url> is the URL shown in your browser if it ends with *.hf.space.

    import argilla as rg\n\nclient = rg.Argilla(\n    api_url=\"<api_url>\",\n    api_key=\"<api_key>\"\n)\n

    You can't find your API URL

    If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, there are two options: 1. Click on the three points menu at the top of the Space, select \"Embed this Space\", and open the direct URL. 2. Use this pattern: https://[your-owner-name]-[your_space_name].hf.space.

    To create a dataset with a simple text classification task, first, you need to define the dataset settings.

    settings = rg.Settings(\n    guidelines=\"Classify the reviews as positive or negative.\",\n    fields=[\n        rg.TextField(\n            name=\"review\",\n            title=\"Text from the review\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"my_label\",\n            title=\"In which category does this article fit?\",\n            labels=[\"positive\", \"negative\"],\n        )\n    ],\n)\n

    Now you can create the dataset with these settings. Publish the dataset to make it available in the UI and add the records.

    About workspaces

    Workspaces in Argilla group datasets and user access rights. The workspace parameter is optional in this case. If you don't specify it, the dataset will be created in the default workspace argilla.

    By default, this workspace will be visible to users joining with the Sign in with Hugging Face button. You can create other workspaces and decide to grant access to users either with the SDK or the changing the OAuth configuration.

    dataset = rg.Dataset(\n    name=f\"my_first_dataset\",\n    settings=settings,\n    client=client,\n    #workspace=\"argilla\"\n)\ndataset.create()\n

    Now you can add records to your dataset. We will use the IMDB dataset from the Hugging Face Datasets library as an example. The mapping parameter indicates which keys/columns in the source dataset correspond to the Argilla dataset fields.

    from datasets import load_dataset\n\ndata = load_dataset(\"imdb\", split=\"train[:100]\").to_list()\n\ndataset.records.log(records=data, mapping={\"text\": \"review\"})\n

    \ud83c\udf89 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records.

    "},{"location":"getting_started/quickstart/#next-steps","title":"Next steps","text":"
    • To learn how to create your datasets, workspace, and manage users, check the how-to guides.

    • To learn Argilla with hands-on examples, check the Tutorials section.

    • To further configure your Argilla Space, check the Hugging Face Spaces settings guide.

    "},{"location":"how_to_guides/","title":"How-to guides","text":"

    These guides provide step-by-step instructions for common scenarios, including detailed explanations and code samples. They are divided into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Argilla, while the advanced guides will help you explore more advanced features.

    "},{"location":"how_to_guides/#basic","title":"Basic","text":"
    • Manage users and credentials

      Learn what they are and how to manage (create, read and delete) Users in Argilla.

      How-to guide

    • Manage workspaces

      Learn what they are and how to manage (create, read and delete) Workspaces in Argilla.

      How-to guide

    • Create, update, and delete datasets

      Learn what they are and how to manage (create, read and delete) Datasets and customize them using the Settings for Fields, Questions, Metadata and Vectors.

      How-to guide

    • Add, update, and delete records

      Learn what they are and how to add, update and delete the values for a Record, which are made up of Metadata, Vectors, Suggestions and Responses.

      How-to guide

    • Distribute the annotation

      Learn how to use Argilla's automatic task distribution to annotate as a team efficiently.

      How-to guide

    • Annotate a dataset

      Learn how to use the Argilla UI to navigate datasets and submit responses.

      How-to guide

    • Query and filter a dataset

      Learn how to query and filter a Dataset.

      How-to guide

    • Import and export datasets and records

      Learn how to export your dataset or its records to Python, your local disk, or the Hugging face Hub.

      How-to guide

    "},{"location":"how_to_guides/#advanced","title":"Advanced","text":"
    • Use Markdown to format rich content

      Learn how to use Markdown and HTML in TextFields to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

      How-to guide

    • Migrate to Argilla V2

      Learn how to migrate your legacy datasets from Argilla 1.x to 2.x.

      How-to guide

    "},{"location":"how_to_guides/annotate/","title":"Annotate your dataset","text":"

    To experience the UI features firsthand, you can take a look at the Demo \u2197.

    Argilla UI offers many functions to help you manage your annotation workflow, aiming to provide the most flexible approach to fit the wide variety of use cases handled by the community.

    "},{"location":"how_to_guides/annotate/#annotation-interface-overview","title":"Annotation interface overview","text":""},{"location":"how_to_guides/annotate/#flexible-layout","title":"Flexible layout","text":"

    The UI is responsive with two columns for larger devices and one column for smaller devices. This enables you to annotate data using your mobile phone for simple datasets (i.e., not very long text and 1-2 questions) or resize your screen to get a more compact UI.

    HeaderLeft paneRight paneLeft bottom panelRight bottom panel

    At the right side of the navigation breadcrumb, you can customize the dataset settings and edit your profile.

    This area displays the control panel on the top. The control panel is used for performing keyword-based search, applying filters, and sorting the results.

    Below the control panel, the record card(s) are displayed one by one (Focus view) or in a vertical list (Bulk view).

    This is where you annotate your dataset. Simply fill it out as a form, then choose to Submit, Save as Draft, or Discard.

    This expandable area displays the annotation guidelines. The annotation guidelines can be edited by owner and admin roles in the dataset settings.

    This expandable area displays your annotation progress.

    "},{"location":"how_to_guides/annotate/#shortcuts","title":"Shortcuts","text":"

    The Argilla UI includes a range of shortcuts. For the main actions (submit, discard, save as draft and selecting labels) the keys are showed in the corresponding button.

    To learn how to move from one question to another or between records using the keyboard, take a look at the table below.

    Shortcuts provide a smoother annotation experience, especially with datasets using a single question (Label, MultiLabel, Rating, or Ranking).

    Available shortcuts Action Keys Activate form \u21e5 Tab Move between questions \u2193 Down arrow\u00a0or\u00a0\u2191 Up arrow Select and unselect label 1,\u00a02,\u00a03 Move between labels or ranking options \u21e5 Tab\u00a0or\u00a0\u21e7 Shift\u00a0\u21e5 Tab Select rating and rank 1,\u00a02,\u00a03 Fit span to character selection Hold\u00a0\u21e7 Shift Activate text area \u21e7 Shift\u00a0\u21b5 Enter Exit text area Esc Discard \u232b Backspace Save draft (Mac os) \u2318 Cmd\u00a0S Save draft (Other) Ctrl\u00a0S Submit \u21b5 Enter Move between pages \u2192 Right arrow\u00a0or\u00a0\u2190 Left arrow"},{"location":"how_to_guides/annotate/#view-by-status","title":"View by status","text":"

    The view selector is set by default on Pending.

    If you are starting an annotation effort, all the records are initially kept in the Pending view. Once you start annotating, the records will move to the other queues: Draft, Submitted, Discarded.

    • Pending: The records without a response.
    • Draft: The records with partial responses. They can be submitted or discarded later. You can\u2019t move them back to the pending queue.
    • Discarded: The records may or may not have responses. They can be edited but you can\u2019t move them back to the pending queue.
    • Submitted: The records have been fully annotated and have already been submitted. You can remove them from this queue and send them to the draft or discarded queues, but never back to the pending queue.

    Note

    If you are working as part of a team, the number of records in your Pending queue may change as other members of the team submit responses and those records get completed.

    Tip

    If you are working as part of a team, the records in the draft queue that have been completed by other team members will show a check mark to indicate that there is no need to provide a response.

    "},{"location":"how_to_guides/annotate/#suggestions","title":"Suggestions","text":"

    If your dataset includes model predictions, you will see them represented by a sparkle icon \u2728 in the label or value button. We call them \u201cSuggestions\u201d and they appear in the form as pre-filled responses. If confidence scores have been included by the dataset admin, they will be shown alongside with the label. Additionally, admins can choose to always show suggested labels at the beginning of the list. This can be configured from the dataset settings.

    If you agree with the suggestions, you just need to click on the Submit button, and they will be considered as your response. If the suggestion is incorrect, you can modify it and submit your final response.

    "},{"location":"how_to_guides/annotate/#focus-view","title":"Focus view","text":"

    This is the default view to annotate your dataset linearly, displaying one record after another.

    Tip

    You should use this view if you have a large number of required questions or need a strong focus on the record content to be labelled. This is also the recommended view for annotating a dataset sample to avoid potential biases introduced by using filters, search, sorting and bulk labelling.

    Once you submit your first response, the next record will appear automatically. To see again your submitted response, just click on Prev.

    Navigating through the records

    To navigate through the records, you can use the\u00a0Prev, shown as\u00a0<, and\u00a0Next,\u00a0> buttons on top of the record card.

    Each time the page is fully refreshed, the records with modified statuses (Pending to Discarded, Pending to Save as Draft, Pending to Submitted) are sent to the corresponding queue. The control panel displays the status selector, which is set to Pending by default.

    "},{"location":"how_to_guides/annotate/#bulk-view","title":"Bulk view","text":"

    The bulk view is designed to speed up the annotation and get a quick overview of the whole dataset.

    The bulk view displays the records in a vertical list. Once this view is active, some functions from the control panel will activate to optimize the view. You can define the number of records to display by page between 10, 25, 50, 100 and whether records are shown with a fixed (Collapse records) or their natural height (Expand records).

    Tip

    You should use this to quickly explore a dataset. This view is also recommended if you have a good understanding of the domain and want to apply your knowledge based on things like similarity and keyword search, filters, and suggestion score thresholds. For a datasets with a large number of required questions or very long fields, the focus view would be more suitable.

    With multiple questions, think about using the bulk view to annotate massively one question. Then, you can complete the annotation per record from the draft queue.

    Note

    Please note that suggestions are not shown in bulk view (except for Spans) and that you will need to save as a draft when you are not providing responses to all required questions.

    "},{"location":"how_to_guides/annotate/#annotation-progress","title":"Annotation progress","text":"

    You can track the progress of an annotation task in the progress bar shown in the dataset list and in the progress panel inside the dataset. This bar shows the number of records that have been completed (i.e., those that have the minimum number of submitted responses) and those left to be completed.

    You can also track your own progress in real time expanding the right-bottom panel inside the dataset page. There you can see the number of records for which you have Pending,\u00a0Draft,\u00a0Submitted\u00a0and\u00a0Discarded responses.

    "},{"location":"how_to_guides/annotate/#use-search-filters-and-sort","title":"Use search, filters, and sort","text":"

    The UI offers various features designed for data exploration and understanding. Combining these features with bulk labelling can save you and your team hours of time.

    Tip

    You should use this when you are familiar with your data and have large volumes to annotate based on verified beliefs and experience.

    "},{"location":"how_to_guides/annotate/#search","title":"Search","text":"

    From the control panel at the top of the left pane, you can search by keyword across the entire dataset. If you have more than one field in your records, you may specify if the search is to be performed \u201cAll\u201d fields or on a specific one. Matched results are highlighted in color.

    "},{"location":"how_to_guides/annotate/#order-by-record-semantic-similarity","title":"Order by record semantic similarity","text":"

    You can retrieve records based on their similarity to another record if vectors have been added to the dataset.

    Note

    Check these guides to know how to add vectors to your\u00a0dataset and\u00a0records.

    To use the search by semantic similarity function, click on Find similar within the record you wish to use as a reference. If multiple vectors are available, select the desired vector. You can also choose whether to retrieve the most or least similar records.

    The retrieved records are then ordered by similarity, with the similarity score displayed on each record card.

    While the semantic search is active, you can update the selected vector or adjust the order of similarity, and specify the number of desired results.

    To cancel the search, click on the cross icon next to the reference record.

    "},{"location":"how_to_guides/annotate/#filter-and-sort-by-metadata-responses-and-suggestions","title":"Filter and sort by metadata, responses, and suggestions","text":""},{"location":"how_to_guides/annotate/#filter","title":"Filter","text":"

    If the dataset contains metadata, responses and suggestions, click on\u00a0Filter in the control panel to display the available filters. You can select multiple filters and combine them.

    Note

    Record info including metadata is visible from the ellipsis menu in the record card.

    From the Metadata dropdown, type and select the property. You can set a range for integer and float properties, and select specific values for term metadata.

    Note

    Note that if a metadata property was set to visible_for_annotators=False this metadata property will only appear in the metadata filter for users with the admin or owner role.

    From the Responses dropdown, type and select the question. You can set a range for rating questions and select specific values for label, multi-label, and span questions.

    Note

    The text and ranking questions are not available for filtering.

    From the Suggestions dropdown, filter the suggestions by\u00a0Suggestion values,\u00a0Score\u00a0, or\u00a0Agent.\u00a0

    "},{"location":"how_to_guides/annotate/#sort","title":"Sort","text":"

    You can sort your records according to one or several attributes.

    The insertion time and last update are general to all records.

    The suggestion scores, response, and suggestion values for rating questions and metadata properties are available only when they were provided.

    "},{"location":"how_to_guides/dataset/","title":"Dataset management","text":"

    This guide provides an overview of datasets, explaining the basics of how to set them up and manage them in Argilla.

    A dataset is a collection of records that you can configure for labelers to provide feedback using the UI. Depending on the specific requirements of your task, you may need various types of feedback. You can customize the dataset to include different kinds of questions, so the first step will be to define the aim of your project and the kind of data and feedback you will need. With this information, you can start configuring a dataset by defining fields, questions, metadata, vectors, and guidelines through settings.

    Question: Who can manage datasets?

    Only users with the owner role can manage (create, retrieve, update and delete) all the datasets.

    The users with the admin role can manage (create, retrieve, update and delete) the datasets in the workspaces they have access to.

    Main Classes

    rg.Datasetrg.Settings
    rg.Dataset(\n    name=\"name\",\n    workspace=\"workspace\",\n    settings=settings,\n    client=client\n)\n

    Check the Dataset - Python Reference to see the attributes, arguments, and methods of the Dataset class in detail.

    rg.Settings(\n    fields=[rg.TextField(name=\"text\")],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        )\n    ],\n    metadata=[rg.TermsMetadataProperty(name=\"metadata\")],\n    vectors=[rg.VectorField(name=\"vector\", dimensions=10)],\n    guidelines=\"guidelines\",\n    allow_extra_metadata=True,\n    distribution=rg.TaskDistribution(min_submitted=2),\n)\n

    Check the Settings - Python Reference to see the attributes, arguments, and methods of the Settings class in detail.

    "},{"location":"how_to_guides/dataset/#create-a-dataset","title":"Create a dataset","text":"

    To create a dataset, you can define it in the Dataset class and then call the create method that will send the dataset to the server so that it can be visualized in the UI. If the dataset does not appear in the UI, you may need to click the refresh button to update the view. For further configuration of the dataset, you can refer to the settings section.

    Info

    If you have deployed Argilla with Hugging Face Spaces and HF Sign in, you can use argilla as a workspace name. Otherwise, you might need to create a workspace following this guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nsettings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    settings=settings,\n)\n\ndataset.create()\n

    The created dataset will be empty, to add records go to this how-to guide.

    Accessing attributes

    Access the attributes of a dataset by calling them directly on the dataset object. For example, dataset.id, dataset.name or dataset.settings. You can similarly access the fields, questions, metadata, vectors and guidelines. For instance, dataset.fields or dataset.questions.

    "},{"location":"how_to_guides/dataset/#create-multiple-datasets-with-the-same-settings","title":"Create multiple datasets with the same settings","text":"

    To create multiple datasets with the same settings, define the settings once and pass it to each dataset.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nsettings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[rg.TextField(name=\"text\", use_markdown=True)],\n    questions=[\n        rg.LabelQuestion(name=\"label\", labels=[\"label_1\", \"label_2\", \"label_3\"])\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3),\n)\n\ndataset1 = rg.Dataset(name=\"my_dataset_1\", settings=settings)\ndataset2 = rg.Dataset(name=\"my_dataset_2\", settings=settings)\n\n# Create the datasets on the server\ndataset1.create()\ndataset2.create()\n
    "},{"location":"how_to_guides/dataset/#create-a-dataset-from-an-existing-dataset","title":"Create a dataset from an existing dataset","text":"

    To create a new dataset from an existing dataset, get the settings from the existing dataset and pass them to the new dataset.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nexisting_dataset = client.datasets(\"my_dataset\")\n\nnew_dataset = rg.Dataset(name=\"my_dataset_copy\", settings=existing_dataset.settings)\n\nnew_dataset.create()\n

    Info

    You can also copy the records from the original dataset to the new one:

    records = list(existing_dataset.records)\nnew_dataset.records.log(records)\n
    "},{"location":"how_to_guides/dataset/#define-dataset-settings","title":"Define dataset settings","text":""},{"location":"how_to_guides/dataset/#fields","title":"Fields","text":"

    The fields in a dataset consist of one or more data items requiring annotation. Currently, Argilla only supports plain text and markdown through the TextField, though we plan to introduce additional field types in future updates.

    Note

    The order of the fields in the UI follows the order in which these are added to the fields attribute in the Python SDK.

    Check the Field - Python Reference to see the field classes in detail.

    rg.TextField(\n    name=\"text\",\n    title=\"Text\",\n    use_markdown=False,\n    required=True,\n    description=\"Field description\",\n)\n

    "},{"location":"how_to_guides/dataset/#questions","title":"Questions","text":"

    To collect feedback for your dataset, you need to formulate questions that annotators will be asked to answer.

    Check the Questions - Python Reference to see the question classes in detail.

    LabelMulti-labelRankingRatingSpanText

    A LabelQuestion asks annotators to choose a unique label from a list of options. This type is useful for text classification tasks. In the UI, they will have a rounded shape.

    rg.LabelQuestion(\n    name=\"label\",\n    labels={\"YES\": \"Yes\", \"NO\": \"No\"}, # or [\"YES\", \"NO\"]\n    title=\"Is the response relevant for the given prompt?\",\n    description=\"Select the one that applies.\",\n    required=True,\n    visible_labels=10\n)\n

    A MultiLabelQuestion asks annotators to choose all applicable labels from a list of options. This type is useful for multi-label text classification tasks. In the UI, they will have a squared shape.

    rg.MultiLabelQuestion(\n    name=\"multi_label\",\n    labels={\n        \"hate\": \"Hate Speech\",\n        \"sexual\": \"Sexual content\",\n        \"violent\": \"Violent content\",\n        \"pii\": \"Personal information\",\n        \"untruthful\": \"Untruthful info\",\n        \"not_english\": \"Not English\",\n        \"inappropriate\": \"Inappropriate content\"\n    }, # or [\"hate\", \"sexual\", \"violent\", \"pii\", \"untruthful\", \"not_english\", \"inappropriate\"]\n    title=\"Does the response include any of the following?\",\n    description=\"Select all that apply.\",\n    required=True,\n    visible_labels=10,\n    labels_order=\"natural\"\n)\n

    A RankingQuestion asks annotators to order a list of options. It is useful to gather information on the preference or relevance of a set of options.

    rg.RankingQuestion(\n    name=\"ranking\",\n    values={\n        \"reply-1\": \"Reply 1\",\n        \"reply-2\": \"Reply 2\",\n        \"reply-3\": \"Reply 3\"\n    } # or [\"reply-1\", \"reply-2\", \"reply-3\"]\n    title=\"Order replies based on your preference\",\n    description=\"1 = best, 3 = worst. Ties are allowed.\",\n    required=True,\n)\n

    A RatingQuestion asks annotators to select one option from a list of integer values. This type is useful for collecting numerical scores.

    rg.RatingQuestion(\n    name=\"rating\",\n    values=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n    title=\"How satisfied are you with the response?\",\n    description=\"1 = very unsatisfied, 10 = very satisfied\",\n    required=True,\n)\n

    A SpanQuestion asks annotators to select a portion of the text of a specific field and apply a label to it. This type of question is useful for named entity recognition or information extraction tasks.

    rg.SpanQuestion(\n    name=\"span\",\n    field=\"text\",\n    labels={\n        \"PERSON\": \"Person\",\n        \"ORG\": \"Organization\",\n        \"LOC\": \"Location\",\n        \"MISC\": \"Miscellaneous\"\n    }, # or [\"PERSON\", \"ORG\", \"LOC\", \"MISC\"]\n    title=\"Select the entities in the text\",\n    description=\"Select the entities in the text\",\n    required=True,\n    allow_overlapping=False,\n    visible_labels=10\n)\n

    A TextQuestion offers to annotators a free-text area where they can enter any text. This type is useful for collecting natural language data, such as corrections or explanations.

    rg.TextQuestion(\n    name=\"text\",\n    title=\"Please provide feedback on the response\",\n    description=\"Please provide feedback on the response\",\n    required=True,\n    use_markdown=True\n)\n

    "},{"location":"how_to_guides/dataset/#metadata","title":"Metadata","text":"

    Metadata properties allow you to configure the use of metadata information for the filtering and sorting features available in the UI and Python SDK.

    Check the Metadata - Python Reference to see the metadata classes in detail.

    TermsIntegerFloat

    A TermsMetadataProperty allows to add a list of strings as metadata options.

    rg.TermsMetadataProperty(\n    name=\"terms\",\n    options=[\"group-a\", \"group-b\", \"group-c\"]\n    title=\"Annotation groups\",\n    visible_for_annotators=True,\n)\n

    An IntegerMetadataProperty allows to add integer values as metadata.

    rg.IntegerMetadataProperty(\n    name=\"integer\",\n    title=\"length-input\",\n    min=42,\n    max=1984,\n)\n

    A FloatMetadataProperty allows to add float values as metadata.

    rg.FloatMetadataProperty(\n    name=\"float\",\n    title=\"Reading ease\",\n    min=-92.29914,\n    max=119.6975,\n)\n

    Note

    You can also set the allow_extra_metadata argument in the dataset to True to specify whether the dataset will allow metadata fields in the records other than those specified under metadata. Note that these will not be accessible from the UI for any user, only retrievable using the Python SDK.

    "},{"location":"how_to_guides/dataset/#vectors","title":"Vectors","text":"

    To use the similarity search in the UI and the Python SDK, you will need to configure vectors using the VectorField class.

    Check the Vector - Python Reference to see the VectorField class in detail.

    rg.VectorField(\n    name=\"my_vector\",\n    title=\"My Vector\",\n    dimensions=768\n)\n

    "},{"location":"how_to_guides/dataset/#guidelines","title":"Guidelines","text":"

    Once you have decided on the data to show and the questions to ask, it's important to provide clear guidelines to the annotators. These guidelines help them understand the task and answer the questions consistently. You can provide guidelines in two ways:

    • In the dataset guidelines: this is added as an argument when you create your dataset in the Python SDK. They will appear in the annotation interface.

    guidelines = \"In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. [...]\"\n

    • As question descriptions: these are added as an argument when you create questions in the Python SDK. This text will appear in a tooltip next to the question in the UI.

    It is good practice to use at least the dataset guidelines if not both methods. Question descriptions should be short and provide context to a specific question. They can be a summary of the guidelines to that question, but often that is not sufficient to align the whole annotation team. In the guidelines, you can include a description of the project, details on how to answer each question with examples, instructions on when to discard a record, etc.

    Tip

    If you want further guidance on good practices for guidelines during the project development, check our blog post.

    "},{"location":"how_to_guides/dataset/#distribution","title":"Distribution","text":"

    When working as a team, you may want to distribute the annotation task to ensure efficiency and quality. You can use the\u00a0TaskDistribution settings to configure the number of minimum submitted responses expected for each record. Argilla will use this setting to automatically handle records in your team members' pending queues.

    Check the Task Distribution - Python Reference to see the TaskDistribution class in detail.

    rg.TaskDistribution(\n    min_submitted = 2\n)\n

    To learn more about how to distribute the task among team members in the Distribute the annotation guide.

    "},{"location":"how_to_guides/dataset/#list-datasets","title":"List datasets","text":"

    You can list all the datasets available in a workspace using the datasets attribute of the Workspace class. You can also use len(workspace.datasets) to get the number of datasets in a workspace.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\ndatasets = workspace.datasets\n\nfor dataset in datasets:\n    print(dataset)\n

    When you list datasets, dataset settings are not preloaded, since this can introduce extra requests to the server. If you want to work with settings when listing datasets, you need to load them:

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nfor dataset in client.datasets:\n    dataset.settings.get() # this will get the dataset settings from the server\n    print(dataset.settings)\n

    Notebooks

    When using a notebook, executing client.datasets will display a table with the nameof the existing datasets, the id, workspace_id to which they belong, and the last update as updated_at. .

    "},{"location":"how_to_guides/dataset/#retrieve-a-dataset","title":"Retrieve a dataset","text":"

    You can retrieve a dataset by calling the datasets method on the Argilla class and passing the name or id of the dataset as an argument. If the dataset does not exist, a warning message will be raised and None will be returned.

    By nameBy id

    By default, this method attempts to retrieve the dataset from the first workspace. If the dataset is in a different workspace, you must specify either the workspace or workspace name as an argument.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\n# Retrieve the dataset from the first workspace\nretrieved_dataset = client.datasets(name=\"my_dataset\")\n\n# Retrieve the dataset from the specified workspace\nretrieved_dataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(id=\"<uuid-or-uuid-string>\")\n
    "},{"location":"how_to_guides/dataset/#check-dataset-existence","title":"Check dataset existence","text":"

    You can check if a dataset exists. The client.datasets method will return None if the dataset was not found.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\nif dataset is not None:\n    pass\n
    "},{"location":"how_to_guides/dataset/#update-a-dataset","title":"Update a dataset","text":"

    Once a dataset is published, there are limited things you can update. Here is a summary of the attributes you can change for each setting:

    FieldsQuestionsMetadataVectorsGuidelinesDistribution Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Required \u274c \u274c Use markdown \u2705 \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u274c \u2705 Description \u274c \u2705 Required \u274c \u274c Labels \u274c \u274c Values \u274c \u274c Label order \u274c \u2705 Suggestions first \u274c \u2705 Visible labels \u274c \u2705 Field \u274c \u274c Allow overlapping \u274c \u274c Use markdown \u274c \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Options \u274c \u274c Minimum value \u274c \u274c Maximum value \u274c \u274c Visible for annotators \u2705 \u2705 Allow extra metadata \u2705 \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Dimensions \u274c \u274c From SDK From UI \u2705 \u2705 Attributes From SDK From UI Minimum submitted \u2705* \u2705*

    * Can be changed as long as the dataset doesn't have any responses.

    To modify these attributes, you can simply set the new value of the attributes you wish to change and call the update method on the Dataset object.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.fields[\"text\"].use_markdown = True\ndataset.settings.metadata[\"my_metadata\"].visible_for_annotators = False\n\ndataset.update()\n

    You can also add and delete metadata properties and vector fields using the add and delete methods.

    AddDelete
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.vectors.add(rg.VectorField(name=\"my_new_vector\", dimensions=123))\ndataset.settings.metadata.add(\n    rg.TermsMetadataProperty(\n        name=\"my_new_metadata\",\n        options=[\"option_1\", \"option_2\", \"option_3\"],\n    ),\n)\ndataset.update()\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.vectors[\"my_old_vector\"].delete()\ndataset.settings.metadata[\"my_old_metadata\"].delete()\n\ndataset.update()\n
    "},{"location":"how_to_guides/dataset/#delete-a-dataset","title":"Delete a dataset","text":"

    You can delete an existing dataset by calling the delete method on the Dataset class.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset_to_delete = client.datasets(name=\"my_dataset\")\n\ndataset_deleted = dataset_to_delete.delete()\n
    "},{"location":"how_to_guides/distribution/","title":"Distribute the annotation task among the team","text":"

    This guide explains how you can use Argilla\u2019s automatic task distribution to efficiently divide the task of annotating a dataset among multiple team members.

    Owners and admins can define the minimum number of submitted responses expected for each record. Argilla will use this setting to handle automatically the records that will be shown in the pending queues of all users with access to the dataset.

    When a record has met the minimum number of submissions, the status of the record will change to completed, and the record will be removed from the Pending queue of all team members so they can focus on providing responses where they are most needed. The dataset\u2019s annotation task will be fully completed once all records have the completed status.

    Note

    The status of a record can be either completed, when it has the required number of responses with submitted status, or pending, when it doesn\u2019t meet this requirement.

    Each record can have multiple responses, and each of those can have the status submitted, discarded, or draft.

    Main Class

    rg.TaskDistribution(\n    min_submitted = 2\n)\n

    Check the Task Distribution - Python Reference to see the attributes, arguments, and methods of the TaskDistribution class in detail.

    "},{"location":"how_to_guides/distribution/#configure-task-distribution-settings","title":"Configure task distribution settings","text":"

    By default, Argilla will set the required minimum submitted responses to 1. This means that whenever a record has at least 1 response with the status submitted the status of the record will be completed and removed from the Pending queue of other team members.

    Tip

    Leave the default value of minimum submissions (1) if you are working on your own or when you don't require more than one submitted response per record.

    If you wish to set a different number, you can do so through the distribution setting in your dataset settings:

    settings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3)\n)\n

    Learn more about configuring dataset settings in the Dataset management guide.

    Tip

    Increase the number of minimum subsmissions if you\u2019d like to ensure you get more than one submitted response per record. Make sure that this number is never higher than the number of members in your team. Note that the lower this number is, the faster the task will be completed.

    Note

    Note that some records may have more responses than expected if multiple team members submit responses on the same record simultaneously.

    "},{"location":"how_to_guides/distribution/#change-task-distribution-settings","title":"Change task distribution settings","text":"

    If you wish to change the minimum submitted responses required in a dataset, you can do so as long as the annotation hasn\u2019t started, i.e., the dataset has no responses for any records.

    Admins and owners can change this value from the dataset settings page in the UI or from the SDK:

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.distribution.min_submitted = 4\n\ndataset.update()\n
    "},{"location":"how_to_guides/import_export/","title":"Importing and exporting datasets and records","text":"

    This guide provides an overview of how to import and export your dataset or its records to Python, your local disk, or the Hugging Face Hub.

    In Argilla, you can import/export two main components of a dataset:

    • The dataset's complete configuration is defined in rg.Settings. This is useful if you want to share your feedback task or restore it later in Argilla.
    • The records stored in the dataset, including Metadata, Vectors, Suggestions, and Responses. This is useful if you want to use your dataset's records outside of Argilla.

    Check the Dataset - Python Reference to see the attributes, arguments, and methods of the export Dataset class in detail.

    Main Classes

    rg.Dataset.to_hubrg.Dataset.from_hubrg.Dataset.to_diskrg.Dataset.from_diskrg.Dataset.records.to_datasets()rg.Dataset.records.to_dict()rg.Dataset.records.to_list()
    rg.Dataset.to_hub(\n    repo_id=\"<my_org>/<my_dataset>\",\n    with_records=True,\n    generate_card=True\n)\n
    rg.Dataset.from_hub(\n    repo_id=\"<my_org>/<my_dataset>\",\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    client=rg.Client(),\n    with_records=True\n)\n
    rg.Dataset.to_disk(\n    path=\"<path-empty-directory>\",\n    with_records=True\n)\n
    rg.Dataset.from_disk(\n    path=\"<path-dataset-directory>\",\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    client=rg.Client(),\n    with_records=True\n)\n
    rg.Dataset.records.to_datasets()\n
    rg.Dataset.records.to_dict()\n
    rg.Dataset.records.to_list()\n

    Check the Dataset - Python Reference to see the attributes, arguments, and methods of the export Dataset class in detail.

    Check the Record - Python Reference to see the attributes, arguments, and methods of the Record class in detail.

    "},{"location":"how_to_guides/import_export/#importing-and-exporting-datasets","title":"Importing and exporting datasets","text":"

    First, we will go through exporting a complete dataset from Argilla. This includes the dataset's settings and records. All of these methods use the rg.Dataset.from_* and rg.Dataset.to_* methods.

    "},{"location":"how_to_guides/import_export/#hugging-face-hub","title":"Hugging Face Hub","text":""},{"location":"how_to_guides/import_export/#export-to-hub","title":"Export to Hub","text":"

    You can push a dataset from Argilla to the Hugging Face Hub. This is useful if you want to share your dataset with the community or version control it. You can push the dataset to the Hugging Face Hub using the rg.Dataset.to_hub method.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\ndataset.to_hub(repo_id=\"<my_org>/<my_dataset>\")\n

    With or without records

    The example above will push the dataset's Settings and records to the hub. If you only want to push the dataset's configuration, you can set the with_records parameter to False. This is useful if you're just interested in a specific dataset template or you want to make changes in the dataset settings and/or records.

    dataset.to_hub(repo_id=\"<my_org>/<my_dataset>\", with_records=False)\n
    "},{"location":"how_to_guides/import_export/#import-from-hub","title":"Import from Hub","text":"

    You can pull a dataset from the Hugging Face Hub to Argilla. This is useful if you want to restore a dataset and its configuration. You can pull the dataset from the Hugging Face Hub using the rg.Dataset.from_hub method.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\")\n

    The rg.Dataset.from_hub method loads the configuration and records from the dataset repo. If you only want to load records, you can pass a datasets.Dataset object to the rg.Dataset.log method. This enables you to configure your own dataset and reuse existing Hub datasets. See the guide on records for more information.

    With or without records

    The example above will pull the dataset's Settings and records from the hub. If you only want to pull the dataset's configuration, you can set the with_records parameter to False. This is useful if you're just interested in a specific dataset template or you want to make changes in the dataset settings and/or records.

    dataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\", with_records=False)\n

    With the dataset's configuration, you could then make changes to the dataset. For example, you could adapt the dataset's settings for a different task:

    dataset.settings.questions = [rg.TextQuestion(name=\"answer\")]\ndataset.update()\n

    You could then log the dataset's records using the load_dataset method of the datasets package and pass the dataset to the rg.Dataset.log method.

    hf_dataset = load_dataset(\"<my_org>/<my_dataset>\")\ndataset.records.log(hf_dataset)\n
    "},{"location":"how_to_guides/import_export/#local-disk","title":"Local Disk","text":""},{"location":"how_to_guides/import_export/#export-to-disk","title":"Export to Disk","text":"

    You can save a dataset from Argilla to your local disk. This is useful if you want to back up your dataset. You can use the rg.Dataset.to_disk method. We recommend you to use an empty directory.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\ndataset.to_disk(path=\"<path-empty-directory>\")\n

    This will save the dataset's configuration and records to the specified path. If you only want to save the dataset's configuration, you can set the with_records parameter to False.

    dataset.to_disk(path=\"<path-empty-directory>\", with_records=False)\n
    "},{"location":"how_to_guides/import_export/#import-from-disk","title":"Import from Disk","text":"

    You can load a dataset from your local disk to Argilla. This is useful if you want to restore a dataset's configuration. You can use the rg.Dataset.from_disk method.

    import argilla as rg\n\ndataset = rg.Dataset.from_disk(path=\"<path-dataset-directory>\")\n

    Directing the dataset to a name and workspace

    You can also specify the name and workspace of the dataset when loading it from the disk.

    dataset = rg.Dataset.from_disk(path=\"<path-dataset-directory>\", name=\"my_dataset\", workspace=\"my_workspace\")\n
    "},{"location":"how_to_guides/import_export/#importing-and-exporting-records","title":"Importing and exporting records","text":"

    The records alone can be exported from a dataset in Argilla. This is useful if you want to process the records in Python, export them to a different platform, or use them in model training. All of these methods use the rg.Dataset.records attribute.

    "},{"location":"how_to_guides/import_export/#export-records","title":"Export records","text":"

    The records can be exported as a dictionary, a list of dictionaries, or a Dataset of the datasets package.

    To a python dictionaryTo a python listTo the datasets package

    Records can be exported from Dataset.records as a dictionary. The to_dict method can be used to export records as a dictionary. You can specify the orientation of the dictionary output. You can also decide if to flatten or not the dictionary.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\")\n\n# Export records as a dictionary\nexported_records = dataset.records.to_dict()\n# {'fields': [{'text': 'Hello'},{'text': 'World'}], suggestions': [{'label': {'value': 'positive'}}, {'label': {'value': 'negative'}}]\n\n# Export records as a dictionary with orient=index\nexported_records = dataset.records.to_dict(orient=\"index\")\n# {\"uuid\": {'fields': {'text': 'Hello'}, 'suggestions': {'label': {'value': 'positive'}}}, {\"uuid\": {'fields': {'text': 'World'}, 'suggestions': {'label': {'value': 'negative'}}},\n\n# Export records as a dictionary with flatten=True\nexported_records = dataset.records.to_dict(flatten=True)\n# {\"text\": [\"Hello\", \"World\"], \"label.suggestion\": [\"greeting\", \"greeting\"]}\n

    Records can be exported from Dataset.records as a list of dictionaries. The to_list method can be used to export records as a list of dictionaries. You can decide if to flatten it or not.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=workspace)\n\n# Export records as a list of dictionaries\nexported_records = dataset.records.to_list()\n# [{'fields': {'text': 'Hello'}, 'suggestion': {'label': {value: 'greeting'}}}, {'fields': {'text': 'World'}, 'suggestion': {'label': {value: 'greeting'}}}]\n\n# Export records as a list of dictionaries with flatten=True\nexported_records = dataset.records.to_list(flatten=True)\n# [{\"text\": \"Hello\", \"label\": \"greeting\"}, {\"text\": \"World\", \"label\": \"greeting\"}]\n

    Records can be exported from Dataset.records to the datasets package. The to_dataset method can be used to export records to the datasets package. You can specify the name of the dataset and the split to export the records.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\")\n\n# Export records as a dictionary\nexported_dataset = dataset.records.to_datasets()\n
    "},{"location":"how_to_guides/import_export/#import-records","title":"Import records","text":"

    To import records to a dataset, use the rg.Datasets.records.log method. There is a guide on how to do this in How-to guides - Record, or you can check the Record - Python Reference.

    "},{"location":"how_to_guides/migrate_from_legacy_datasets/","title":"Migrate your legacy datasets to Argilla V2","text":"

    This guide will help you migrate task specific datasets to Argilla V2. These do not include the FeedbackDataset which is just an interim naming convention for the latest extensible dataset. Task specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the SDK migration blog post.

    Note

    Legacy Datasets include: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text.

    FeedbackDataset's do not need to be migrated as they are already in the Argilla V2 format.

    To follow this guide, you will need to have the following prerequisites:

    • An argilla 1.* server instance running with legacy datasets.
    • An argilla >=1.29 server instance running. If you don't have one, you can create one by following this Argilla guide.
    • The argilla sdk package installed in your environment.

    If your current legacy datasets are on a server with Argilla release after 1.29, you could chose to recreate your legacy datasets as new datasets on the same server. You could then upgrade the server to Argilla 2.0 and carry on working their. Your legacy datasets will not be visible on the new server, but they will remain in storage layers if you need to access them.

    "},{"location":"how_to_guides/migrate_from_legacy_datasets/#steps","title":"Steps","text":"

    The guide will take you through three steps:

    1. Retrieve the legacy dataset from the Argilla V1 server using the new argilla package.
    2. Define the new dataset in the Argilla V2 format.
    3. Upload the dataset records to the new Argilla V2 dataset format and attributes.
    "},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-1-retrieve-the-legacy-dataset","title":"Step 1: Retrieve the legacy dataset","text":"

    Connect to the Argilla V1 server via the new argilla package. First, you should install an extra dependency:

    pip install \"argilla[legacy]\"\n

    Now, you can use the v1 module to connect to the Argilla V1 server.

    import argilla.v1 as rg_v1\n\n# Initialize the API with an Argilla server less than 2.0\napi_url = \"<your-url>\"\napi_key = \"<your-api-key>\"\nrg_v1.init(api_url, api_key)\n

    Next, load the dataset settings and records from the Argilla V1 server:

    dataset_name = \"news-programmatic-labeling\"\nworkspace = \"demo\"\n\nsettings_v1 = rg_v1.load_dataset_settings(dataset_name, workspace)\nrecords_v1 = rg_v1.load(dataset_name, workspace)\nhf_dataset = records_v1.to_datasets()\n

    Your legacy dataset is now loaded into the hf_dataset object.

    "},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-2-define-the-new-dataset","title":"Step 2: Define the new dataset","text":"

    Define the new dataset in the Argilla V2 format. The new dataset format is defined in the argilla package. You can create a new dataset with the Settings and Dataset classes:

    First, instantiate the Argilla class to connect to the Argilla V2 server:

    import argilla as rg\n\nclient = rg.Argilla()\n

    Next, define the new dataset settings:

    For single-label classificationFor multi-label classificationFor token classificationFor text generation
    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"), # (1)\n    ],\n    questions=[\n        rg.LabelQuestion(name=\"label\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (2)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (3)\n    ],\n)\n
    1. The default field in DatasetForTextClassification is text, but make sure you provide all fields included in record.inputs.

    2. Make sure you provide all relevant metadata fields available in the dataset.

    3. Make sure you provide all relevant vectors available in the dataset.

    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"), # (1)\n    ],\n    questions=[\n        rg.MultiLabelQuestion(name=\"labels\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (2)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (3)\n    ],\n)\n
    1. The default field in DatasetForTextClassification is text, but we should provide all fields included in record.inputs.

    2. Make sure you provide all relevant metadata fields available in the dataset.

    3. Make sure you provide all relevant vectors available in the dataset.

    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        rg.SpanQuestion(name=\"spans\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (1)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (2)\n    ],\n)\n
    1. Make sure you provide all relevant metadata fields available in the dataset.

    2. Make sure you provide all relevant vectors available in the dataset.

    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        rg.TextQuestion(name=\"text_generation\"),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (1)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (2)\n    ],\n)\n
    1. We should provide all relevant metadata fields available in the dataset.

    2. We should provide all relevant vectors available in the dataset.

    Finally, create the new dataset on the Argilla V2 server:

    dataset = rg.Dataset(name=dataset_name, settings=settings)\ndataset.create()\n

    Note

    If a dataset with the same name already exists, the create method will raise an exception. You can check if the dataset exists and delete it before creating a new one.

    dataset = client.datasets(name=dataset_name)\n\nif dataset is not None:\n    dataset.delete()\n
    "},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-3-upload-the-dataset-records","title":"Step 3: Upload the dataset records","text":"

    To upload the records to the new server, we will need to convert the records from the Argilla V1 format to the Argilla V2 format. The new argilla sdk package uses a generic Record class, but legacy datasets have specific record classes. We will need to convert the records to the generic Record class.

    Here are a set of example functions to convert the records for single-label and multi-label classification. You can modify these functions to suit your dataset.

    For single-label classificationFor multi-label classificationFor token classificationFor text generation
    def map_to_record_for_single_label(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        label, score = prediction[0].values()\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"label\", # (1)\n                value=label,\n                score=score,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"label\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields=data[\"inputs\"],\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        suggestions=suggestions,\n        responses=responses,\n    )\n
    1. Make sure the question_name matches the name of the question in question settings.

    2. Make sure the question_name matches the name of the question in question settings.

    def map_to_record_for_multi_label(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        labels, scores = zip(*[(pred[\"label\"], pred[\"score\"]) for pred in prediction])\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"labels\", # (1)\n                value=labels,\n                score=scores,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"labels\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields=data[\"inputs\"],\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        suggestions=suggestions,\n        responses=responses,\n    )\n
    1. Make sure the question_name matches the name of the question in question settings.

    2. Make sure the question_name matches the name of the question in question settings.

    def map_to_record_for_span(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a token classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        scores = [span[\"score\"] for span in prediction]\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"spans\", # (1)\n                value=prediction,\n                score=scores,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"spans\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields={\"text\": data[\"text\"]},\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        # The vectors field should be a dictionary with the same keys as the `vectors` in the settings\n        suggestions=suggestions,\n        responses=responses,\n    )\n
    1. Make sure the question_name matches the name of the question in question settings.

    2. Make sure the question_name matches the name of the question in question settings.

    def map_to_record_for_text_generation(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text2text record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        first = prediction[0]\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"text_generation\", # (1)\n                value=first[\"text\"],\n                score=first[\"score\"],\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        # From data[annotation]\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"text_generation\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields={\"text\": data[\"text\"]},\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        # The vectors field should be a dictionary with the same keys as the `vectors` in the settings\n        suggestions=suggestions,\n        responses=responses,\n    )\n
    1. Make sure the question_name matches the name of the question in question settings.

    2. Make sure the question_name matches the name of the question in question settings.

    The functions above depend on the users_by_name dictionary and the current_user object to assign responses to users, we need to load the existing users. You can retrieve the users from the Argilla V2 server and the current user as follows:

    users_by_name = {user.username: user for user in client.users}\ncurrent_user = client.me\n

    Finally, upload the records to the new dataset using the log method and map functions.

    records = []\n\nfor data in hf_records:\n    records.append(map_to_record_for_single_label(data, users_by_name, current_user))\n\n# Upload the records to the new dataset\ndataset.records.log(records)\n

    You have now successfully migrated your legacy dataset to Argilla V2. For more guides on how to use the Argilla SDK, please refer to the How to guides.

    "},{"location":"how_to_guides/query/","title":"Query and filter records","text":"

    This guide provides an overview of how to query and filter a dataset in Argilla.

    You can search for records in your dataset by querying or filtering. The query focuses on the content of the text field, while the filter is used to filter the records based on conditions. You can use them independently or combine multiple filters to create complex search queries. You can also export records from a dataset either as a single dictionary or a list of dictionaries.

    Main Classes

    rg.queryrg.Filter
    rg.Query(\n    query=\"query\",\n    filter=filter\n)\n

    Check the Query - Python Reference to see the attributes, arguments, and methods of the Query class in detail.

    rg.Filter(\n    [\n        (\"field\", \"==\", \"value\"),\n    ]\n)\n

    Check the Filter - Python Reference to see the attributes, arguments, and methods of the Filter class in detail.

    "},{"location":"how_to_guides/query/#query-with-search-terms","title":"Query with search terms","text":"

    To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field. You can search a single term or various terms, in the latter, all of them should appear in the record to be retrieved.

    Single search termMultiple search term
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery = rg.Query(query=\"my_term\")\n\nqueried_records = dataset.records(query=query).to_list(flatten=True)\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery = rg.Query(query=\"my_term1 my_term2\")\n\nqueried_records = dataset.records(query=query).to_list(flatten=True)\n
    "},{"location":"how_to_guides/query/#filter-by-conditions","title":"Filter by conditions","text":"

    You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include \"==\", \">=\", \"<=\", or \"in\". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses. You can use a single condition or multiple conditions to filter records.

    operator description == The field value is equal to the value >= The field value is greater than or equal to the value <= The field value is less than or equal to the value in TThe field value is included in a list of values Single conditionMultiple conditions
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nfilter_label = rg.Filter((\"label\", \"==\", \"positive\"))\n\nfiltered_records = dataset.records(query=rg.Query(filter=filter_label)).to_list(\n    flatten=True\n)\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nfilters = rg.Filter(\n    [\n        (\"label.suggestion\", \"==\", \"positive\"),\n        (\"metadata.count\", \">=\", 10),\n        (\"metadata.count\", \"<=\", 20),\n        (\"label\", \"in\", [\"positive\", \"negative\"])\n    ]\n)\n\nfiltered_records = dataset.records(\n    query=rg.Query(filter=filters), with_suggestions=True\n).to_list(flatten=True)\n
    "},{"location":"how_to_guides/query/#filter-by-status","title":"Filter by status","text":"

    You can filter records based on record or response status. Record status can be pending or completed, and response status can be pending, draft, submitted, or discarded.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nstatus_filter = rg.Query(\n    filter=rg.Filter(\n        [\n            (\"status\", \"==\", \"completed\"),\n            (\"response.status\", \"==\", \"discarded\")\n        ]\n    )\n)\n\nfiltered_records = dataset.records(status_filter).to_list(flatten=True)\n
    "},{"location":"how_to_guides/query/#query-and-filter-a-dataset","title":"Query and filter a dataset","text":"

    As mentioned, you can use a query with a search term and a filter or various filters to create complex search queries.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery_filter = rg.Query(\n    query=\"my_term\",\n    filter=rg.Filter(\n        [\n            (\"label.suggestion\", \"==\", \"positive\"),\n            (\"metadata.count\", \">=\", 10),\n        ]\n    )\n)\n\nqueried_filtered_records = dataset.records(\n    query=query_filter,\n    with_metadata=True,\n    with_suggestions=True\n).to_list(flatten=True)\n
    "},{"location":"how_to_guides/record/","title":"Add, update, and delete records","text":"

    This guide provides an overview of records, explaining the basics of how to define and manage them in Argilla.

    A record in Argilla is a data item that requires annotation, consisting of one or more fields. These are the pieces of information displayed to the user in the UI to facilitate the completion of the annotation task. Each record also includes questions that annotators are required to answer, with the option of adding suggestions and responses to assist them. Guidelines are also provided to help annotators effectively complete their tasks.

    A record is part of a dataset, so you will need to create a dataset before adding records. Check this guide to learn how to create a dataset.

    Main Class

    rg.Record(\n    external_id=\"1234\",\n    fields={\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\"\n    },\n    metadata={\n        \"category\": \"A\"\n    },\n    vectors={\n        \"my_vector\": [0.1, 0.2, 0.3],\n    },\n    suggestions=[\n        rg.Suggestion(\"my_label\", \"positive\", score=0.9, agent=\"model_name\")\n    ],\n    responses=[\n        rg.Response(\"label\", \"positive\", user_id=user_id)\n    ],\n)\n

    Check the Record - Python Reference to see the attributes, arguments, and methods of the Record class in detail.

    "},{"location":"how_to_guides/record/#add-records","title":"Add records","text":"

    You can add records to a dataset in two different ways: either by using a dictionary or by directly initializing a Record object. You should ensure that fields, metadata and vectors match those configured in the dataset settings. In both cases, are added via the Dataset.records.log method. As soon as you add the records, these will be available in the Argilla UI. If they do not appear in the UI, you may need to click the refresh button to update the view.

    Tip

    Take some time to inspect the data before adding it to the dataset in case this triggers changes in the questions or fields.

    Note

    If you are planning to use public data, the Datasets page of the Hugging Face Hub is a good place to start. Remember to always check the license to make sure you can legally use it for your specific use case.

    As Record objectsFrom a generic data structureFrom a Hugging Face dataset

    You can add records to a dataset by initializing a Record object directly. This is ideal if you need to apply logic to the data before defining the record. If the data is already structured, you should consider adding it directly as a dictionary or Hugging Face dataset.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n    ), # (1)\n]\n\ndataset.records.log(records)\n
    1. This is an illustration of a definition. In a real-world scenario, you would iterate over a data structure and create Record objects for each iteration.

    You can add the data directly as a dictionary like structure, where the keys correspond to the names of fields, questions, metadata or vectors in the dataset and the values are the data to be added.

    If your data structure does not correspond to your Argilla dataset names, you can use a mapping to indicate which keys in the source data correspond to the dataset fields, metadata, vectors, suggestions, or responses. If you need to add the same data to multiple attributes, you can also use a list with the name of the attributes.

    We illustrate this python dictionaries that represent your data, but we would not advise you to define dictionaries. Instead, use the Record object to instantiate records.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\n# Add records to the dataset with the fields 'question' and 'answer'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n    }, # (1)\n]\ndataset.records.log(data)\n\n# Add records to the dataset with a mapping of the fields 'question' and 'answer'\ndata = [\n    {\n        \"query\": \"Do you need oxygen to breathe?\",\n        \"response\": \"Yes\",\n    },\n    {\n        \"query\": \"What is the boiling point of water?\",\n        \"response\": \"100 degrees Celsius\",\n    },\n]\ndataset.records.log(data, mapping={\"query\": \"question\", \"response\": \"answer\"}) # (2)\n
    1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
    2. The data structure has keys query and response, and the Argilla dataset has fields question and answer. You can use the mapping parameter to map the keys in the data structure to the fields in the Argilla dataset.

    You can also add records to a dataset using a Hugging Face dataset. This is useful when you want to use a dataset from the Hugging Face Hub and add it to your Argilla dataset.

    You can add the dataset where the column names correspond to the names of fields, metadata or vectors in the Argilla dataset.

    import argilla as rg\nfrom datasets import load_dataset\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\") # (1)\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (2)\n\ndataset.records.log(records=hf_dataset)\n
    1. In this case, we are using the my_dataset dataset from the Argilla workspace. The dataset has a text field and a label question.

    2. In this example, the Hugging Face dataset matches the Argilla dataset schema. If that is not the case, you could use the .map of the datasets library to prepare the data before adding it to the Argilla dataset.

    If the Hugging Face dataset's schema does not correspond to your Argilla dataset field names, you can use a mapping to specify the relationship. You should indicate as key the column name of the Hugging Face dataset and, as value, the field name of the Argilla dataset.

    dataset.records.log(\n    records=hf_dataset, mapping={\"text\": \"review\", \"label\": \"sentiment\"}\n) # (1)\n
    1. In this case, the text key in the Hugging Face dataset would correspond to the review field in the Argilla dataset, and the label key in the Hugging Face dataset would correspond to the sentiment field in the Argilla dataset.
    "},{"location":"how_to_guides/record/#metadata","title":"Metadata","text":"

    Record metadata can include any information about the record that is not part of the fields in the form of a dictionary. To use metadata for filtering and sorting records, make sure that the key of the dictionary corresponds with the metadata property name. When the key doesn't correspond, this will be considered extra metadata that will get stored with the record (as long as allow_extra_metadata is set to True for the dataset), but will not be usable for filtering and sorting.

    Note

    Remember that to use metadata within a dataset, you must define a metadata property in the dataset settings.

    Check the Metadata - Python Reference to see the attributes, arguments, and methods for using metadata in detail.

    As Record objectsFrom a generic data structure

    You can add metadata to a record in an initialized Record object.

    # Add records to the dataset with the metadata 'category'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        metadata={\"my_metadata\": \"option_1\"},\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        metadata={\"my_metadata\": \"option_1\"},\n    ),\n]\ndataset.records.log(records)\n

    You can add metadata to a record directly as a dictionary structure, where the keys correspond to the names of metadata properties in the dataset and the values are the metadata to be added. Remember that you can also use the mapping parameter to specify the data structure.

    # Add records to the dataset with the metadata 'category'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"my_metadata\": \"option_1\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"my_metadata\": \"option_1\",\n    },\n]\ndataset.records.log(data)\n
    "},{"location":"how_to_guides/record/#vectors","title":"Vectors","text":"

    You can associate vectors, like text embeddings, to your records. They can be used for semantic search in the UI and the Python SDK. Make sure that the length of the list corresponds to the dimensions set in the vector settings.

    Note

    Remember that to use vectors within a dataset, you must define them in the dataset settings.

    Check the Vector - Python Reference to see the attributes, arguments, and methods of the Vector class in detail.

    As Record objectsFrom a generic data structure

    You can also add vectors to a record in an initialized Record object.

    # Add records to the dataset with the vector 'my_vector' and dimension=3\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        vectors={\n            \"my_vector\": [0.1, 0.2, 0.3]\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        vectors={\n            \"my_vector\": [0.2, 0.5, 0.3]\n        },\n    ),\n]\ndataset.records.log(records)\n

    You can add vectors from a dictionary-like structure, where the keys correspond to the names of the vector settings that were configured for your dataset and the value is a list of floats. Remember that you can also use the mapping parameter to specify the data structure.

    # Add records to the dataset with the vector 'my_vector' and dimension=3\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"my_vector\": [0.1, 0.2, 0.3],\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"my_vector\": [0.2, 0.5, 0.3],\n    },\n]\ndataset.records.log(data)\n
    "},{"location":"how_to_guides/record/#suggestions","title":"Suggestions","text":"

    Suggestions refer to suggested responses (e.g. model predictions) that you can add to your records to make the annotation process faster. These can be added during the creation of the record or at a later stage. Only one suggestion can be provided for each question, and suggestion values must be compliant with the pre-defined questions e.g. if we have a RatingQuestion between 1 and 5, the suggestion should have a valid value within that range.

    Check the Suggestions - Python Reference to see the attributes, arguments, and methods of the Suggestion class in detail.

    Tip

    Check the Suggestions - Python Reference for different formats per Question type.

    As Record objectsFrom a generic data structure

    You can also add suggestions to a record in an initialized Record object.

    # Add records to the dataset with the label 'my_label'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        suggestions=[\n            rg.Suggestion(\n                \"my_label\",\n                \"positive\",\n                score=0.9,\n                agent=\"model_name\"\n            )\n        ],\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        suggestions=[\n            rg.Suggestion(\n                \"my_label\",\n                \"negative\",\n                score=0.9,\n                agent=\"model_name\"\n            )\n        ],\n    ),\n]\ndataset.records.log(records)\n

    You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure.

    # Add records to the dataset with the label question 'my_label'\ndata =  [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"label\": \"positive\",\n        \"score\": 0.9,\n        \"agent\": \"model_name\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"label\": \"negative\",\n        \"score\": 0.9,\n        \"agent\": \"model_name\",\n    },\n]\ndataset.records.log(\n    data=data,\n    mapping={\n        \"label\": \"my_label\",\n        \"score\": \"my_label.suggestion.score\",\n        \"agent\": \"my_label.suggestion.agent\",\n    },\n)\n
    "},{"location":"how_to_guides/record/#responses","title":"Responses","text":"

    If your dataset includes some annotations, you can add those to the records as you create them. Make sure that the responses adhere to the same format as Argilla's output and meet the schema requirements for the specific type of question being answered. Make sure to include the user_id in case you're planning to add more than one response for the same question, if not responses will apply to all the annotators.

    Check the Responses - Python Reference to see the attributes, arguments, and methods of the Response class in detail.

    Note

    Keep in mind that records with responses will be displayed as \"Draft\" in the UI.

    Tip

    Check the Responses - Python Reference for different formats per Question type.

    As Record objectsFrom a generic data structure

    You can also add suggestions to a record in an initialized Record object.

    # Add records to the dataset with the label 'my_label'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        responses=[\n            rg.Response(\"my_label\", \"positive\", user_id=user.id)\n        ]\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        responses=[\n            rg.Response(\"my_label\", \"negative\", user_id=user.id)\n        ]\n    ),\n]\ndataset.records.log(records)\n

    You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure. If you want to specify the user that added the response, you can use the user_id parameter.

    # Add records to the dataset with the label 'my_label'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"label\": \"positive\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"label\": \"negative\",\n    },\n]\ndataset.records.log(data, user_id=user.id, mapping={\"label\": \"my_label.response\"})\n
    "},{"location":"how_to_guides/record/#list-records","title":"List records","text":"

    To list records in a dataset, you can use the records method on the Dataset object. This method returns a list of Record objects that can be iterated over to access the record properties.

    for record in dataset.records(\n    with_suggestions=True,\n    with_responses=True,\n    with_vectors=True\n):\n\n    # Access the record properties\n    print(record.metadata)\n    print(record.vectors)\n    print(record.suggestions)\n    print(record.responses)\n\n    # Access the responses of the record\n    for response in record.responses:\n        print(response.value)\n
    "},{"location":"how_to_guides/record/#update-records","title":"Update records","text":"

    You can update records in a dataset by calling the log method on the Dataset object. To update a record, you need to provide the record id and the new data to be updated.

    data = dataset.records.to_list(flatten=True)\n\nupdated_data = [\n    {\n        \"text\": sample[\"text\"],\n        \"label\": \"positive\",\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n
    Update the metadataUpdate vectorsUpdate suggestionsUpdate responses

    The metadata of the Record object is a python dictionary. To update it, you can iterate over the records and update the metadata by key. After that, you should update the records in the dataset.

    Tip

    Check the Metadata - Python Reference for different formats per MetadataProperty type.

    updated_records = []\n\nfor record in dataset.records():\n\n    record.metadata[\"my_metadata\"] = \"new_value\"\n    record.metadata[\"my_new_metadata\"] = \"new_value\"\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

    If a new vector field is added to the dataset settings or some value for the existing record vectors must be updated, you can iterate over the records and update the vectors by key. After that, you should update the records in the dataset.

    updated_records = []\n\nfor record in dataset.records(with_vectors=True):\n\n    record.vectors[\"my_vector\"] = [ 0, 1, 2, 3, 4, 5 ]\n    record.vectors[\"my_new_vector\"] = [ 0, 1, 2, 3, 4, 5 ]\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

    If some value for the existing record suggestions must be updated, you can iterate over the records and update the suggestions by key. You can also add a suggestion using the add method. After that, you should update the records in the dataset.

    Tip

    Check the Suggestions - Python Reference for different formats per Question type.

    updated_records = []\n\nfor record in dataset.records(with_suggestions=True):\n\n    # We can update existing suggestions\n    record.suggestions[\"label\"].value = \"new_value\"\n    record.suggestions[\"label\"].score = 0.9\n    record.suggestions[\"label\"].agent = \"model_name\"\n\n    # We can also add new suggestions with the `add` method:\n    if not record.suggestions[\"label\"]:\n        record.suggestions.add(\n            rg.Suggestion(\"value\", \"label\", score=0.9, agent=\"model_name\")\n        )\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

    If some value for the existing record responses must be updated, you can iterate over the records and update the responses by key. You can also add a response using the add method. After that, you should update the records in the dataset.

    Tip

    Check the Responses - Python Reference for different formats per Question type.

    updated_records = []\n\nfor record in dataset.records(with_responses=True):\n\n    for response in record.responses[\"label\"]:\n\n        if response:\n                response.value = \"new_value\"\n                response.user_id = \"existing_user_id\"\n\n        else:\n            record.responses.add(rg.Response(\"label\", \"YES\", user_id=user.id))\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n
    "},{"location":"how_to_guides/record/#delete-records","title":"Delete records","text":"

    You can delete records in a dataset calling the delete method on the Dataset object. To delete records, you need to retrieve them from the server and get a list with those that you want to delete.

    records_to_delete = list(dataset.records)[:5]\ndataset.records.delete(records=records_to_delete)\n

    Delete records based on a query

    It can be very useful to avoid eliminating records with responses.

    For more information about the query syntax, check this how-to guide.

    status_filter = rg.Query(\n    filter = rg.Filter((\"response.status\", \"==\", \"pending\"))\n)\nrecords_to_delete = list(dataset.records(status_filter))\n\ndataset.records.delete(records_to_delete)\n
    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/","title":"Use Markdown to format rich content","text":"

    This guide provides an overview of how to use Markdown and HTML in TextFields to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

    The TextField and TextQuestion provide the option to enable Markdown and therefore HTML by setting use_markdown=True. Given the flexibility of HTML, we can get great control over the presentation of data to our annotators. We provide some out-of-the-box methods for multi-modality and chat templates in the examples below.

    Main Methods

    image_to_htmlaudio_to_htmlvideo_to_htmlpdf_to_htmlchat_to_html
    image_to_html(\"local_image_file.png\")\n
    audio_to_html(\"local_audio_file.mp3\")\n
    audio_to_html(\"local_video_file.mp4\")\n
    pdf_to_html(\"local_pdf_file.pdf\")\n
    chat_to_html([{\"role\": \"user\", \"content\": \"hello\"}])\n

    Check the Markdown - Python Reference to see the arguments of the rg.markdown methods in detail.

    Tip

    You can get pretty creative with HTML. For example, think about visualizing graphs and tables. You can use some interesting Python packages methods like pandas.DataFrame.to_html and plotly.io.to_html.

    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/#multi-modal-support-images-audio-video-pdfs-and-more","title":"Multi-modal support: images, audio, video, PDFs and more","text":"

    Argilla has basic multi-modal support in different ways, each with pros and cons, but they both offer the same UI experience because they both rely on HTML.

    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/#local-content-through-dataurls","title":"Local content through DataURLs","text":"

    A DataURL is a scheme that allows data to be encoded into a base64-encoded string and then embedded directly into HTML. To facilitate this, we offer some functions: image_to_html, audio_to_html, video_to_thml, and pdf_to_html. These functions accept either the file path or the file's byte data and return the corresponding HTMurl to render the media file within the Argilla user interface. Additionally, you can also set the width and height in pixel or percentage for video and image (defaults to the original dimensions) and the autoplay and loop attributes to True for audio and video (defaults to False).

    Warning

    DataURLs increase the memory usage of the original filesize. Additionally, different browsers enforce different size limitations for rendering DataURLs which might block the visualization experience per user.

    ImageAudioVideoPDF
    from argilla.markdown import image_to_html\n\nhtml = image_to_html(\n    \"local_image_file.png\",\n    width=\"300px\",\n    height=\"300px\"\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    from argilla.markdown import audio_to_html\n\nhtml = audio_to_html(\n    \"local_audio_file.mp3\",\n    width=\"300px\",\n    height=\"300px\",\n    autoplay=True,\n    loop=True\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    from argilla.markdown import video_to_thml\n\nhtml = video_to_html(\n    \"local_video_file.mp4\",\n    width=\"300px\",\n    height=\"300px\",\n    autoplay=True,\n    loop=True\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    from argilla.markdown import pdf_to_html\n\nhtml = pdf_to_html(\n    \"local_pdf_file.pdf\",\n    width=\"300px\",\n    height=\"300px\"\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/#hosted-content","title":"Hosted content","text":"

    Instead of uploading local files through DataURLs, we can also visualize URLs directly linking to media files such as images, audio, video, and PDFs hosted on a public or private server. In this case, you can use basic HTML to visualize content available on platforms like Google Drive or decide to configure a private media server.

    Warning

    When trying to access content from a private media server you have to ensure that the Argilla server has network access to the private media server, which might be done through something like IP whitelisting.

    ImageAudioVideoPDF
    html = \"<img src='https://example.com/public-image-file.jpg'>\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    html = \"\"\"\n<audio controls>\n    <source src=\"https://example.com/public-audio-file.mp3\" type=\"audio/mpeg\">\n</audio>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    html = \"\"\"\n<video width=\"320\" height=\"240\" controls>\n    <source src=\"https://example.com/public-video-file.mp4\" type=\"video/mp4\">\n</video>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    html = \"\"\"\n<iframe\n    src=\"https://example.com/public-pdf-file.pdf\"\n    width=\"600\"\n    height=\"500\">\n</iframe>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/#chat-and-conversation-support","title":"Chat and conversation support","text":"

    When working with chat data from multi-turn interaction with a Large Language Model, it might be nice to be able to visualize the conversation in a similar way as a common chat interface. To facilitate this, we offer the chat_to_html function, which converts messages from OpenAI chat format to an HTML-formatted chat interface.

    OpenAI chat format

    The OpenAI chat format is a way to structure a list of messages as input from users and returns a model-generated message as output. These messages can only contain the roles \"user\" for human messages and \"assistant\", \"system\" or \"model\" for model-generated messages.

    from argilla.markdown import chat_to_html\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Hello! How are you?\"},\n    {\"role\": \"assistant\", \"content\": \"I'm good, thank you!\"}\n]\n\nhtml = chat_to_html(messages)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n

    "},{"location":"how_to_guides/user/","title":"User management","text":"

    This guide provides an overview of user roles and credentials, explaining how to set up and manage users in Argilla.

    A user in Argilla is an authorized person who, depending on their role, can use the Python SDK and access the UI in a running Argilla instance. We differentiate between three types of users depending on their role, permissions and needs: owner, admin and annotator.

    OverviewOwnerAdminAnnotator Owner Admin Annotator Number Unlimited Unlimited Unlimited Create and delete workspaces Yes No No Assign users to workspaces Yes No No Create, configure, update, and delete datasets Yes Only within assigned workspaces No Create, update, and delete users Yes No No Provide feedback with Argila UI Yes Yes Yes

    The owner refers to the root user who created the Argilla instance. Using workspaces within Argilla proves highly beneficial for organizing tasks efficiently. So, the owner has full access to all workspaces and their functionalities:

    • Workspace management: It can create, read and delete a workspace.
    • User management: It can create a new user, assign it to a workspace, and delete it. It can also list them and search for a specific one.
    • Dataset management: It can create, configure, retrieve, update, and delete datasets.
    • Annotation: It can annotate datasets in the Argilla UI.
    • Feedback: It can provide feedback with the Argilla UI.

    An admin user can only access the workspaces it has been assigned to and cannot assign other users to it. An admin user has the following permissions:

    • Dataset management: It can create, configure, retrieve, update, and delete datasets only on the assigned workspaces.
    • Annotation: It can annotate datasets in the assigned workspaces via the Argilla UI.
    • Feedback: It can provide feedback with the Argilla UI.

    An annotator user is limited to accessing only the datasets assigned to it within the workspace. It has two specific permissions:

    • Annotation: It can annotate the assigned datasets in the Argilla UI.
    • Feedback: It can provide feedback with the Argilla UI.
    Question: Who can manage users?

    Only users with the owner role can manage (create, retrieve, delete) other users.

    "},{"location":"how_to_guides/user/#initial-users-and-credentials","title":"Initial users and credentials","text":"

    Depending on your Argilla deployment, the initial user with the owner role will vary.

    • If you deploy on the Hugging Face Hub, the initial user will correspond to the Space owner (your personal account). The API key is automatically generated and can be copied from the \"Settings\" section of the UI.
    • If you deploy with Docker, the default values for the environment variables are: USERNAME: argilla, PASSWORD: 12345678, API_KEY: argilla.apikey.

    For the new users, the username and password are set during the creation process. The API key can be copied from the \"Settings\" section of the UI.

    Main Class

    rg.User(\n    username=\"username\",\n    first_name=\"first_name\",\n    last_name=\"last_name\",\n    role=\"owner\",\n    password=\"password\",\n    client=client\n)\n

    Check the User - Python Reference to see the attributes, arguments, and methods of the User class in detail.

    "},{"location":"how_to_guides/user/#get-current-user","title":"Get current user","text":"

    To ensure you're using the correct credentials for managing users, you can get the current user in Argilla using the me attribute of the Argilla class.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ncurrent_user = client.me\n
    "},{"location":"how_to_guides/user/#create-a-user","title":"Create a user","text":"

    To create a new user in Argilla, you can define it in the User class and then call the create method. This method is inherited from the Resource base class and operates without modifications.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser_to_create = rg.User(\n    username=\"my_username\",\n    password=\"12345678\",\n)\n\ncreated_user = user_to_create.create()\n

    Accessing attributes

    Access the attributes of a user by calling them directly on the User object. For example, user.id or user.username.

    "},{"location":"how_to_guides/user/#list-users","title":"List users","text":"

    You can list all the existing users in Argilla by accessing the users attribute on the Argilla class and iterating over them. You can also use len(client.users) to get the number of users.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nusers = client.users\n\nfor user in users:\n    print(user)\n

    Notebooks

    When using a notebook, executing client.users will display a table with username, id, role, and the last update as updated_at.

    "},{"location":"how_to_guides/user/#retrieve-a-user","title":"Retrieve a user","text":"

    You can retrieve an existing user from Argilla by accessing the users attribute on the Argilla class and passing the username or id as an argument. If the user does not exist, a warning message will be raised and None will be returned.

    By usernameBy id
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_user = client.users(\"my_username\")\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_user = client.users(id=\"<uuid-or-uuid-string>\")\n
    "},{"location":"how_to_guides/user/#check-user-existence","title":"Check user existence","text":"

    You can check if a user exists. The client.users method will return None if the user was not found.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users(\"my_username\")\n\nif user is not None:\n    pass\n
    "},{"location":"how_to_guides/user/#list-users-in-a-workspace","title":"List users in a workspace","text":"

    You can list all the users in a workspace by accessing the users attribute on the Workspace class and iterating over them. You can also use len(workspace.users) to get the number of users by workspace.

    For further information on how to manage workspaces, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces('my_workspace')\n\nfor user in workspace.users:\n    print(user)\n
    "},{"location":"how_to_guides/user/#add-a-user-to-a-workspace","title":"Add a user to a workspace","text":"

    You can add an existing user to a workspace in Argilla by calling the add_to_workspace method on the User class.

    For further information on how to manage workspaces, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users('my_username')\nworkspace = client.workspaces('my_workspace')\n\nadded_user = user.add_to_workspace(workspace)\n
    "},{"location":"how_to_guides/user/#remove-a-user-from-a-workspace","title":"Remove a user from a workspace","text":"

    You can remove an existing user from a workspace in Argilla by calling the remove_from_workspace method on the User class.

    For further information on how to manage workspaces, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users('my_username')\nworkspace = client.workspaces('my_workspace')\n\nremoved_user = user.remove_from_workspace(workspace)\n
    "},{"location":"how_to_guides/user/#delete-a-user","title":"Delete a user","text":"

    You can delete an existing user from Argilla by calling the delete method on the User class.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser_to_delete = client.users('my_username')\n\ndeleted_user = user_to_delete.delete()\n
    "},{"location":"how_to_guides/workspace/","title":"Workspace management","text":"

    This guide provides an overview of workspaces, explaining how to set up and manage workspaces in Argilla.

    A workspace is a space inside your Argilla instance where authorized users can collaborate on datasets. It is accessible through the Python SDK and the UI.

    Question: Who can manage workspaces?

    Only users with the owner role can manage (create, read and delete) workspaces.

    A user with the admin role can only read the workspace to which it belongs.

    "},{"location":"how_to_guides/workspace/#initial-workspaces","title":"Initial workspaces","text":"

    Depending on your Argilla deployment, the initial workspace will vary.

    • If you deploy on the Hugging Face Hub, the initial workspace will be the one indicated in the .oauth.yaml file. By default, argilla.
    • If you deploy with Docker, you will need to create a workspace as shown in the next section.

    Main Class

    rg.Workspace(\n    name = \"name\",\n    client=client\n)\n

    Check the Workspace - Python Reference to see the attributes, arguments, and methods of the Workspace class in detail.

    "},{"location":"how_to_guides/workspace/#create-a-new-workspace","title":"Create a new workspace","text":"

    To create a new workspace in Argilla, you can define it in the Workspace class and then call the create method. This method is inherited from the Resource base class and operates without modifications.

    When you create a new workspace, it will be empty. To create and add a new dataset, check these guides.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace_to_create = rg.Workspace(name=\"my_workspace\")\n\ncreated_workspace = workspace_to_create.create()\n

    Accessing attributes

    Access the attributes of a workspace by calling them directly on the Workspace object. For example, workspace.id or workspace.name.

    "},{"location":"how_to_guides/workspace/#list-workspaces","title":"List workspaces","text":"

    You can list all the existing workspaces in Argilla by calling the workspaces attribute on the Argilla class and iterating over them. You can also use len(client.workspaces) to get the number of workspaces.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspaces = client.workspaces\n\nfor workspace in workspaces:\n    print(workspace)\n

    Notebooks

    When using a notebook, executing client.workspaces will display a table with the number of datasets in each workspace, name, id, and the last update as updated_at.

    "},{"location":"how_to_guides/workspace/#retrieve-a-workspace","title":"Retrieve a workspace","text":"

    You can retrieve a workspace by accessing the workspaces method on the Argilla class and passing the name or id of the workspace as an argument. If the workspace does not exist, a warning message will be raised and None will be returned.

    By nameBy id
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_workspace = client.workspaces(\"my_workspace\")\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_workspace = client.workspaces(id=\"<uuid-or-uuid-string>\")\n
    "},{"location":"how_to_guides/workspace/#check-workspace-existence","title":"Check workspace existence","text":"

    You can check if a workspace exists. The client.workspaces method will return None if the workspace is not found.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nif workspace is not None:\n    pass\n
    "},{"location":"how_to_guides/workspace/#list-users-in-a-workspace","title":"List users in a workspace","text":"

    You can list all the users in a workspace by accessing the users attribute on the Workspace class and iterating over them. You can also use len(workspace.users) to get the number of users by workspace.

    For further information on how to manage users, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces('my_workspace')\n\nfor user in workspace.users:\n    print(user)\n
    "},{"location":"how_to_guides/workspace/#add-a-user-to-a-workspace","title":"Add a user to a workspace","text":"

    You can also add a user to a workspace by calling the add_user method on the Workspace class.

    For further information on how to manage users, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nadded_user = workspace.add_user(\"my_username\")\n
    "},{"location":"how_to_guides/workspace/#remove-a-user-from-workspace","title":"Remove a user from workspace","text":"

    You can also remove a user from a workspace by calling the remove_user method on the Workspace class.

    For further information on how to manage users, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nremoved_user = workspace.remove_user(\"my_username\")\n
    "},{"location":"how_to_guides/workspace/#delete-a-workspace","title":"Delete a workspace","text":"

    To delete a workspace, no dataset can be associated with it. If the workspace contains any dataset, deletion will fail. You can delete a workspace by calling the delete method on the Workspace class.

    To clear a workspace and delete all their datasets, refer to this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace_to_delete = client.workspaces(\"my_workspace\")\n\ndeleted_workspace = workspace_to_delete.delete()\n
    "},{"location":"reference/argilla/SUMMARY/","title":"SUMMARY","text":"
    • rg.Argilla
    • rg.Workspace
    • rg.User
    • rg.Dataset
      • rg.Dataset.records
    • rg.Settings
      • Fields
      • Questions
      • Metadata
      • Vectors
      • Distribution
    • rg.Record
      • rg.Response
      • rg.Suggestion
      • rg.Vector
      • rg.Metadata
    • rg.Query
    • rg.markdown
    "},{"location":"reference/argilla/client/","title":"rg.Argilla","text":"

    To interact with the Argilla server from Python you can use the Argilla class. The Argilla client is used to create, get, update, and delete all Argilla resources, such as workspaces, users, datasets, and records.

    "},{"location":"reference/argilla/client/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/client/#connecting-to-an-argilla-server","title":"Connecting to an Argilla server","text":"

    To connect to an Argilla server, instantiate the Argilla class and pass the api_url of the server and the api_key to authenticate.

    import argilla as rg\n\nclient = rg.Argilla(\n    api_url=\"https://argilla.example.com\",\n    api_key=\"my_api_key\",\n)\n
    "},{"location":"reference/argilla/client/#accessing-dataset-workspace-and-user-objects","title":"Accessing Dataset, Workspace, and User objects","text":"

    The Argilla clients provides access to the Dataset, Workspace, and User objects of the Argilla server.

    my_dataset = client.datasets(\"my_dataset\")\n\nmy_workspace = client.workspaces(\"my_workspace\")\n\nmy_user = client.users(\"my_user\")\n

    These resources can then be interacted with to access their properties and methods. For example, to list all datasets in a workspace:

    for dataset in my_workspace.datasets:\n    print(dataset.name)\n
    "},{"location":"reference/argilla/client/#src.argilla.client.Argilla","title":"Argilla","text":"

    Bases: APIClient

    Argilla API client. This is the main entry point to interact with the API.

    Attributes:

    Name Type Description workspaces Workspaces

    A collection of workspaces.

    datasets Datasets

    A collection of datasets.

    users Users

    A collection of users.

    me User

    The current user.

    Source code in src/argilla/client.py
    class Argilla(_api.APIClient):\n    \"\"\"Argilla API client. This is the main entry point to interact with the API.\n\n    Attributes:\n        workspaces: A collection of workspaces.\n        datasets: A collection of datasets.\n        users: A collection of users.\n        me: The current user.\n\n    \"\"\"\n\n    workspaces: \"Workspaces\"\n    datasets: \"Datasets\"\n    users: \"Users\"\n    me: \"User\"\n\n    # Default instance of Argilla\n    _default_client: Optional[\"Argilla\"] = None\n\n    def __init__(\n        self,\n        api_url: Optional[str] = DEFAULT_HTTP_CONFIG.api_url,\n        api_key: Optional[str] = DEFAULT_HTTP_CONFIG.api_key,\n        timeout: int = DEFAULT_HTTP_CONFIG.timeout,\n        **http_client_args,\n    ) -> None:\n        super().__init__(api_url=api_url, api_key=api_key, timeout=timeout, **http_client_args)\n\n        self._set_default(self)\n\n    @property\n    def workspaces(self) -> \"Workspaces\":\n        \"\"\"A collection of workspaces on the server.\"\"\"\n        return Workspaces(client=self)\n\n    @property\n    def datasets(self) -> \"Datasets\":\n        \"\"\"A collection of datasets on the server.\"\"\"\n        return Datasets(client=self)\n\n    @property\n    def users(self) -> \"Users\":\n        \"\"\"A collection of users on the server.\"\"\"\n        return Users(client=self)\n\n    @cached_property\n    def me(self) -> \"User\":\n        from argilla.users import User\n\n        return User(client=self, _model=self.api.users.get_me())\n\n    ############################\n    # Private methods\n    ############################\n\n    @classmethod\n    def _set_default(cls, client: \"Argilla\") -> None:\n        \"\"\"Set the default instance of Argilla.\"\"\"\n        cls._default_client = client\n\n    @classmethod\n    def _get_default(cls) -> \"Argilla\":\n        \"\"\"Get the default instance of Argilla. If it doesn't exist, create a new one.\"\"\"\n        if cls._default_client is None:\n            cls._default_client = Argilla()\n        return cls._default_client\n
    "},{"location":"reference/argilla/client/#src.argilla.client.Argilla.workspaces","title":"workspaces: Workspaces property","text":"

    A collection of workspaces on the server.

    "},{"location":"reference/argilla/client/#src.argilla.client.Argilla.datasets","title":"datasets: Datasets property","text":"

    A collection of datasets on the server.

    "},{"location":"reference/argilla/client/#src.argilla.client.Argilla.users","title":"users: Users property","text":"

    A collection of users on the server.

    "},{"location":"reference/argilla/markdown/","title":"rg.markdown","text":"

    To support the usage of Markdown within Argilla, we've created some helper functions to easy the usage of DataURL conversions and chat message visualizations.

    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.media","title":"media","text":""},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.video_to_html","title":"video_to_html(file_source, file_type=None, width=None, height=None, autoplay=False, loop=False)","text":"

    Convert a video file to an HTML tag with embedded base64 data.

    Parameters:

    Name Type Description Default file_source Union[str, bytes]

    The path to the media file or a non-b64 encoded byte string.

    required file_type Optional[str]

    The type of the video file. If not provided, it will be inferred from the file extension.

    None width Optional[str]

    Display width in HTML. Defaults to None.

    None height Optional[str]

    Display height in HTML. Defaults to None.

    None autoplay bool

    True to autoplay media. Defaults to False.

    False loop bool

    True to loop media. Defaults to False.

    False

    Returns:

    Type Description str

    The HTML tag with embedded base64 data.

    Examples:

    from argilla.markdown import video_to_html\nhtml = video_to_html(\"my_video.mp4\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n
    Source code in src/argilla/markdown/media.py
    def video_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n    autoplay: bool = False,\n    loop: bool = False,\n) -> str:\n    \"\"\"\n    Convert a video file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the video file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n        autoplay: True to autoplay media. Defaults to False.\n        loop: True to loop media. Defaults to False.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import video_to_html\n        html = video_to_html(\"my_video.mp4\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n        ```\n    \"\"\"\n    return _media_to_html(\"video\", file_source, file_type, width, height, autoplay, loop)\n
    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.audio_to_html","title":"audio_to_html(file_source, file_type=None, width=None, height=None, autoplay=False, loop=False)","text":"

    Convert an audio file to an HTML tag with embedded base64 data.

    Parameters:

    Name Type Description Default file_source Union[str, bytes]

    The path to the media file or a non-b64 encoded byte string.

    required file_type Optional[str]

    The type of the audio file. If not provided, it will be inferred from the file extension.

    None width Optional[str]

    Display width in HTML. Defaults to None.

    None height Optional[str]

    Display height in HTML. Defaults to None.

    None autoplay bool

    True to autoplay media. Defaults to False.

    False loop bool

    True to loop media. Defaults to False.

    False

    Returns:

    Type Description str

    The HTML tag with embedded base64 data.

    Examples:

    from argilla.markdown import audio_to_html\nhtml = audio_to_html(\"my_audio.mp3\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n
    Source code in src/argilla/markdown/media.py
    def audio_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n    autoplay: bool = False,\n    loop: bool = False,\n) -> str:\n    \"\"\"\n    Convert an audio file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the audio file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n        autoplay: True to autoplay media. Defaults to False.\n        loop: True to loop media. Defaults to False.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import audio_to_html\n        html = audio_to_html(\"my_audio.mp3\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n        ```\n    \"\"\"\n    return _media_to_html(\"audio\", file_source, file_type, width, height, autoplay, loop)\n
    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.image_to_html","title":"image_to_html(file_source, file_type=None, width=None, height=None)","text":"

    Convert an image file to an HTML tag with embedded base64 data.

    Parameters:

    Name Type Description Default file_source Union[str, bytes]

    The path to the media file or a non-b64 encoded byte string.

    required file_type Optional[str]

    The type of the image file. If not provided, it will be inferred from the file extension.

    None width Optional[str]

    Display width in HTML. Defaults to None.

    None height Optional[str]

    Display height in HTML. Defaults to None.

    None

    Returns:

    Type Description str

    The HTML tag with embedded base64 data.

    Examples:

    from argilla.markdown import image_to_html\nhtml = image_to_html(\"my_image.png\", width=\"300px\", height=\"300px\")\n
    Source code in src/argilla/markdown/media.py
    def image_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n) -> str:\n    \"\"\"\n    Convert an image file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the image file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import image_to_html\n        html = image_to_html(\"my_image.png\", width=\"300px\", height=\"300px\")\n        ```\n    \"\"\"\n    return _media_to_html(\"image\", file_source, file_type, width, height)\n
    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.pdf_to_html","title":"pdf_to_html(file_source, width='1000px', height='1000px')","text":"

    Convert a pdf file to an HTML tag with embedded data.

    Parameters:

    Name Type Description Default file_source Union[str, bytes]

    The path to the PDF file, a bytes object with PDF data, or a URL.

    required width Optional[str]

    Display width in HTML. Defaults to \"1000px\".

    '1000px' height Optional[str]

    Display height in HTML. Defaults to \"1000px\".

    '1000px'

    Returns:

    Type Description str

    HTML string embedding the PDF.

    Raises:

    Type Description ValueError

    If the width and height are not pixel or percentage.

    Examples:

    from argilla.markdown import pdf_to_html\nhtml = pdf_to_html(\"my_pdf.pdf\", width=\"300px\", height=\"300px\")\n
    Source code in src/argilla/markdown/media.py
    def pdf_to_html(\n    file_source: Union[str, bytes], width: Optional[str] = \"1000px\", height: Optional[str] = \"1000px\"\n) -> str:\n    \"\"\"\n    Convert a pdf file to an HTML tag with embedded data.\n\n    Args:\n        file_source: The path to the PDF file, a bytes object with PDF data, or a URL.\n        width: Display width in HTML. Defaults to \"1000px\".\n        height: Display height in HTML. Defaults to \"1000px\".\n\n    Returns:\n        HTML string embedding the PDF.\n\n    Raises:\n        ValueError: If the width and height are not pixel or percentage.\n\n    Examples:\n        ```python\n        from argilla.markdown import pdf_to_html\n        html = pdf_to_html(\"my_pdf.pdf\", width=\"300px\", height=\"300px\")\n        ```\n    \"\"\"\n    if not _is_valid_dimension(width) or not _is_valid_dimension(height):\n        raise ValueError(\"Width and height must be valid pixel (e.g., '300px') or percentage (e.g., '50%') values.\")\n\n    if isinstance(file_source, str) and urlparse(file_source).scheme in [\"http\", \"https\"]:\n        return f'<embed src=\"{file_source}\" type=\"application/pdf\" width=\"{width}\" height=\"{height}\"></embed>'\n\n    file_data, _ = _get_file_data(file_source, \"pdf\")\n    pdf_base64 = base64.b64encode(file_data).decode(\"utf-8\")\n    data_url = f\"data:application/pdf;base64,{pdf_base64}\"\n    return f'<object id=\"pdf\" data=\"{data_url}\" type=\"application/pdf\" width=\"{width}\" height=\"{height}\"></object>'\n
    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.chat","title":"chat","text":""},{"location":"reference/argilla/markdown/#src.argilla.markdown.chat.chat_to_html","title":"chat_to_html(messages)","text":"

    Converts a list of chat messages in the OpenAI format to HTML.

    Parameters:

    Name Type Description Default messages List[Dict[str, str]]

    A list of dictionaries where each dictionary represents a chat message. Each dictionary should have the keys: - \"role\": A string indicating the role of the sender (e.g., \"user\", \"model\", \"assistant\", \"system\"). - \"content\": The content of the message.

    required

    Returns:

    Name Type Description str str

    An HTML string that represents the chat conversation.

    Raises:

    Type Description ValueError

    If the an invalid role is passed.

    Examples:

    from argilla.markdown import chat_to_html\nhtml = chat_to_html([\n    {\"role\": \"user\", \"content\": \"hello\"},\n    {\"role\": \"assistant\", \"content\": \"goodbye\"}\n])\n
    Source code in src/argilla/markdown/chat.py
    def chat_to_html(messages: List[Dict[str, str]]) -> str:\n    \"\"\"\n    Converts a list of chat messages in the OpenAI format to HTML.\n\n    Args:\n        messages (List[Dict[str, str]]): A list of dictionaries where each dictionary represents a chat message.\n            Each dictionary should have the keys:\n                - \"role\": A string indicating the role of the sender (e.g., \"user\", \"model\", \"assistant\", \"system\").\n                - \"content\": The content of the message.\n\n    Returns:\n        str: An HTML string that represents the chat conversation.\n\n    Raises:\n        ValueError: If the an invalid role is passed.\n\n    Examples:\n        ```python\n        from argilla.markdown import chat_to_html\n        html = chat_to_html([\n            {\"role\": \"user\", \"content\": \"hello\"},\n            {\"role\": \"assistant\", \"content\": \"goodbye\"}\n        ])\n        ```\n    \"\"\"\n    chat_html = \"\"\n    for message in messages:\n        role = message[\"role\"]\n        content = message[\"content\"]\n        content_html = markdown.markdown(content)\n\n        if role == \"user\":\n            html = '<div class=\"user-message\">' + '<div class=\"message-content\">'\n        elif role in [\"model\", \"assistant\", \"system\"]:\n            html = '<div class=\"system-message\">' + '<div class=\"message-content\">'\n        else:\n            raise ValueError(f\"Invalid role: {role}\")\n\n        html += f\"{content_html}\"\n        html += \"</div></div>\"\n        chat_html += html\n\n    return f\"<body>{CHAT_CSS_STYLE}{chat_html}</body>\"\n
    "},{"location":"reference/argilla/search/","title":"rg.Query","text":"

    To collect records based on searching criteria, you can use the Query and Filter classes. The Query class is used to define the search criteria, while the Filter class is used to filter the search results. Filter is passed to a Query object so you can combine multiple filters to create complex search queries. A Query object can also be passed to Dataset.records to fetch records based on the search criteria.

    "},{"location":"reference/argilla/search/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/search/#searching-for-records-with-terms","title":"Searching for records with terms","text":"

    To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field.

    for record in dataset.records(query=\"paris\"):\n    print(record)\n
    "},{"location":"reference/argilla/search/#filtering-records-by-conditions","title":"Filtering records by conditions","text":"

    Argilla allows you to filter records based on conditions. You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include \"==\", \">=\", \"<=\", or \"in\". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses.

    # create a range from 10 to 20\nrange_filter = rg.Filter(\n    [\n        (\"metadata.count\", \">=\", 10),\n        (\"metadata.count\", \"<=\", 20)\n    ]\n)\n\n# query records with metadata count greater than 10 and less than 20\nquery = rg.Query(filters=range_filter, query=\"paris\")\n\n# iterate over the results\nfor record in dataset.records(query=query):\n    print(record)\n
    "},{"location":"reference/argilla/search/#src.argilla.records._search.Query","title":"Query","text":"

    This class is used to map user queries to the internal query models

    Source code in src/argilla/records/_search.py
    class Query:\n    \"\"\"This class is used to map user queries to the internal query models\"\"\"\n\n    query: Optional[str] = None\n\n    def __init__(self, *, query: Union[str, None] = None, filter: Union[Filter, None] = None):\n        \"\"\"Create a query object for use in Argilla search requests.add()\n\n        Parameters:\n            query (Union[str, None], optional): The query string that will be used to search.\n            filter (Union[Filter, None], optional): The filter object that will be used to filter the search results.\n        \"\"\"\n\n        self.query = query\n        self.filter = filter\n\n    def api_model(self) -> SearchQueryModel:\n        model = SearchQueryModel()\n\n        if self.query is not None:\n            text_query = TextQueryModel(q=self.query)\n            model.query = QueryModel(text=text_query)\n\n        if self.filter is not None:\n            model.filters = self.filter.api_model()\n\n        return model\n
    "},{"location":"reference/argilla/search/#src.argilla.records._search.Query.__init__","title":"__init__(*, query=None, filter=None)","text":"

    Create a query object for use in Argilla search requests.add()

    Parameters:

    Name Type Description Default query Union[str, None]

    The query string that will be used to search.

    None filter Union[Filter, None]

    The filter object that will be used to filter the search results.

    None Source code in src/argilla/records/_search.py
    def __init__(self, *, query: Union[str, None] = None, filter: Union[Filter, None] = None):\n    \"\"\"Create a query object for use in Argilla search requests.add()\n\n    Parameters:\n        query (Union[str, None], optional): The query string that will be used to search.\n        filter (Union[Filter, None], optional): The filter object that will be used to filter the search results.\n    \"\"\"\n\n    self.query = query\n    self.filter = filter\n
    "},{"location":"reference/argilla/search/#src.argilla.records._search.Filter","title":"Filter","text":"

    This class is used to map user filters to the internal filter models

    Source code in src/argilla/records/_search.py
    class Filter:\n    \"\"\"This class is used to map user filters to the internal filter models\"\"\"\n\n    def __init__(self, conditions: Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None] = None):\n        \"\"\" Create a filter object for use in Argilla search requests.\n\n        Parameters:\n            conditions (Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None], optional): \\\n                The conditions that will be used to filter the search results. \\\n                The conditions should be a list of tuples where each tuple contains \\\n                the field, operator, and value. For example `(\"label\", \"in\", [\"positive\",\"happy\"])`.\\\n\n        \"\"\"\n\n        if isinstance(conditions, tuple):\n            conditions = [conditions]\n        self.conditions = [Condition(condition) for condition in conditions]\n\n    def api_model(self) -> AndFilterModel:\n        return AndFilterModel.model_validate({\"and\": [condition.api_model() for condition in self.conditions]})\n
    "},{"location":"reference/argilla/search/#src.argilla.records._search.Filter.__init__","title":"__init__(conditions=None)","text":"

    Create a filter object for use in Argilla search requests.

    Parameters:

    Name Type Description Default conditions Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None]

    The conditions that will be used to filter the search results. The conditions should be a list of tuples where each tuple contains the field, operator, and value. For example (\"label\", \"in\", [\"positive\",\"happy\"]).

    None Source code in src/argilla/records/_search.py
    def __init__(self, conditions: Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None] = None):\n    \"\"\" Create a filter object for use in Argilla search requests.\n\n    Parameters:\n        conditions (Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None], optional): \\\n            The conditions that will be used to filter the search results. \\\n            The conditions should be a list of tuples where each tuple contains \\\n            the field, operator, and value. For example `(\"label\", \"in\", [\"positive\",\"happy\"])`.\\\n\n    \"\"\"\n\n    if isinstance(conditions, tuple):\n        conditions = [conditions]\n    self.conditions = [Condition(condition) for condition in conditions]\n
    "},{"location":"reference/argilla/users/","title":"rg.User","text":"

    A user in Argilla is a profile that uses the SDK or UI. Their profile can be used to track their feedback activity and to manage their access to the Argilla server.

    "},{"location":"reference/argilla/users/#usage-examples","title":"Usage Examples","text":"

    To create a new user, instantiate the User object with the client and the username:

    user = rg.User(username=\"my_username\", password=\"my_password\")\nuser.create()\n

    Existing users can be retrieved by their username:

    user = client.users(\"my_username\")\n

    The current user of the rg.Argilla client can be accessed using the me attribute:

    client.me\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User","title":"User","text":"

    Bases: Resource

    Class for interacting with Argilla users in the Argilla server. User profiles are used to manage access to the Argilla server and track responses to records.

    Attributes:

    Name Type Description username str

    The username of the user.

    first_name str

    The first name of the user.

    last_name str

    The last name of the user.

    role str

    The role of the user, either 'annotator' or 'admin'.

    password str

    The password of the user.

    id UUID

    The ID of the user.

    Source code in src/argilla/users/_resource.py
    class User(Resource):\n    \"\"\"Class for interacting with Argilla users in the Argilla server. User profiles \\\n        are used to manage access to the Argilla server and track responses to records.\n\n    Attributes:\n        username (str): The username of the user.\n        first_name (str): The first name of the user.\n        last_name (str): The last name of the user.\n        role (str): The role of the user, either 'annotator' or 'admin'.\n        password (str): The password of the user.\n        id (UUID): The ID of the user.\n    \"\"\"\n\n    _model: UserModel\n    _api: UsersAPI\n\n    def __init__(\n        self,\n        username: Optional[str] = None,\n        first_name: Optional[str] = None,\n        last_name: Optional[str] = None,\n        role: Optional[str] = None,\n        password: Optional[str] = None,\n        client: Optional[\"Argilla\"] = None,\n        id: Optional[UUID] = None,\n        _model: Optional[UserModel] = None,\n    ) -> None:\n        \"\"\"Initializes a User object with a client and a username\n\n        Parameters:\n            username (str): The username of the user\n            first_name (str): The first name of the user\n            last_name (str): The last name of the user\n            role (str): The role of the user, either 'annotator', admin, or 'owner'\n            password (str): The password of the user\n            client (Argilla): The client used to interact with Argilla\n\n        Returns:\n            User: The initialized user object\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.users)\n\n        if _model is None:\n            _model = UserModel(\n                username=username,\n                password=password,\n                first_name=first_name or username,\n                last_name=last_name,\n                role=role or Role.annotator,\n                id=id,\n            )\n            self._log_message(f\"Initialized user with username {username}\")\n        self._model = _model\n\n    def create(self) -> \"User\":\n        \"\"\"Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.\n\n        Returns:\n            User: The user that was created in Argilla.\n        \"\"\"\n        model_create = self.api_model()\n        model = self._api.create(model_create)\n        # The password is not returned in the response\n        model.password = model_create.password\n        self._model = model\n        return self\n\n    def delete(self) -> None:\n        \"\"\"Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.\"\"\"\n        super().delete()\n        # exists relies on the id, so we need to set it to None\n        self._model = UserModel(username=self.username)\n\n    def add_to_workspace(self, workspace: \"Workspace\") -> \"User\":\n        \"\"\"Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets\n        in the workspace.\n\n        Args:\n            workspace (Workspace): The workspace to add the user to.\n\n        Returns:\n            User: The user that was added to the workspace.\n        \"\"\"\n        self._model = self._api.add_to_workspace(workspace.id, self.id)\n        return self\n\n    def remove_from_workspace(self, workspace: \"Workspace\") -> \"User\":\n        \"\"\"Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to\n        the datasets in the workspace.\n\n        Args:\n            workspace (Workspace): The workspace to remove the user from.\n\n        Returns:\n            User: The user that was removed from the workspace.\n\n        \"\"\"\n        self._model = self._api.delete_from_workspace(workspace.id, self.id)\n        return self\n\n    ############################\n    # Properties\n    ############################\n    @property\n    def username(self) -> str:\n        return self._model.username\n\n    @username.setter\n    def username(self, value: str) -> None:\n        self._model.username = value\n\n    @property\n    def password(self) -> str:\n        return self._model.password\n\n    @password.setter\n    def password(self, value: str) -> None:\n        self._model.password = value\n\n    @property\n    def first_name(self) -> str:\n        return self._model.first_name\n\n    @first_name.setter\n    def first_name(self, value: str) -> None:\n        self._model.first_name = value\n\n    @property\n    def last_name(self) -> str:\n        return self._model.last_name\n\n    @last_name.setter\n    def last_name(self, value: str) -> None:\n        self._model.last_name = value\n\n    @property\n    def role(self) -> Role:\n        return self._model.role\n\n    @role.setter\n    def role(self, value: Role) -> None:\n        self._model.role = value\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.__init__","title":"__init__(username=None, first_name=None, last_name=None, role=None, password=None, client=None, id=None, _model=None)","text":"

    Initializes a User object with a client and a username

    Parameters:

    Name Type Description Default username str

    The username of the user

    None first_name str

    The first name of the user

    None last_name str

    The last name of the user

    None role str

    The role of the user, either 'annotator', admin, or 'owner'

    None password str

    The password of the user

    None client Argilla

    The client used to interact with Argilla

    None

    Returns:

    Name Type Description User None

    The initialized user object

    Source code in src/argilla/users/_resource.py
    def __init__(\n    self,\n    username: Optional[str] = None,\n    first_name: Optional[str] = None,\n    last_name: Optional[str] = None,\n    role: Optional[str] = None,\n    password: Optional[str] = None,\n    client: Optional[\"Argilla\"] = None,\n    id: Optional[UUID] = None,\n    _model: Optional[UserModel] = None,\n) -> None:\n    \"\"\"Initializes a User object with a client and a username\n\n    Parameters:\n        username (str): The username of the user\n        first_name (str): The first name of the user\n        last_name (str): The last name of the user\n        role (str): The role of the user, either 'annotator', admin, or 'owner'\n        password (str): The password of the user\n        client (Argilla): The client used to interact with Argilla\n\n    Returns:\n        User: The initialized user object\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.users)\n\n    if _model is None:\n        _model = UserModel(\n            username=username,\n            password=password,\n            first_name=first_name or username,\n            last_name=last_name,\n            role=role or Role.annotator,\n            id=id,\n        )\n        self._log_message(f\"Initialized user with username {username}\")\n    self._model = _model\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.create","title":"create()","text":"

    Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.

    Returns:

    Name Type Description User User

    The user that was created in Argilla.

    Source code in src/argilla/users/_resource.py
    def create(self) -> \"User\":\n    \"\"\"Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.\n\n    Returns:\n        User: The user that was created in Argilla.\n    \"\"\"\n    model_create = self.api_model()\n    model = self._api.create(model_create)\n    # The password is not returned in the response\n    model.password = model_create.password\n    self._model = model\n    return self\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.delete","title":"delete()","text":"

    Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.

    Source code in src/argilla/users/_resource.py
    def delete(self) -> None:\n    \"\"\"Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.\"\"\"\n    super().delete()\n    # exists relies on the id, so we need to set it to None\n    self._model = UserModel(username=self.username)\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.add_to_workspace","title":"add_to_workspace(workspace)","text":"

    Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets in the workspace.

    Parameters:

    Name Type Description Default workspace Workspace

    The workspace to add the user to.

    required

    Returns:

    Name Type Description User User

    The user that was added to the workspace.

    Source code in src/argilla/users/_resource.py
    def add_to_workspace(self, workspace: \"Workspace\") -> \"User\":\n    \"\"\"Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets\n    in the workspace.\n\n    Args:\n        workspace (Workspace): The workspace to add the user to.\n\n    Returns:\n        User: The user that was added to the workspace.\n    \"\"\"\n    self._model = self._api.add_to_workspace(workspace.id, self.id)\n    return self\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.remove_from_workspace","title":"remove_from_workspace(workspace)","text":"

    Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to the datasets in the workspace.

    Parameters:

    Name Type Description Default workspace Workspace

    The workspace to remove the user from.

    required

    Returns:

    Name Type Description User User

    The user that was removed from the workspace.

    Source code in src/argilla/users/_resource.py
    def remove_from_workspace(self, workspace: \"Workspace\") -> \"User\":\n    \"\"\"Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to\n    the datasets in the workspace.\n\n    Args:\n        workspace (Workspace): The workspace to remove the user from.\n\n    Returns:\n        User: The user that was removed from the workspace.\n\n    \"\"\"\n    self._model = self._api.delete_from_workspace(workspace.id, self.id)\n    return self\n
    "},{"location":"reference/argilla/workspaces/","title":"rg.Workspace","text":"

    In Argilla, workspaces are used to organize datasets in to groups. For example, you might have a workspace for each project or team.

    "},{"location":"reference/argilla/workspaces/#usage-examples","title":"Usage Examples","text":"

    To create a new workspace, instantiate the Workspace object with the client and the name:

    workspace = rg.Workspace(name=\"my_workspace\")\nworkspace.create()\n

    To retrieve an existing workspace, use the client.workspaces attribute:

    workspace = client.workspaces(\"my_workspace\")\n
    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace","title":"Workspace","text":"

    Bases: Resource

    Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.

    Attributes:

    Name Type Description name str

    The name of the workspace.

    id UUID

    The ID of the workspace. This is a unique identifier for the workspace in the server.

    datasets List[Dataset]

    A list of all datasets in the workspace.

    users WorkspaceUsers

    A list of all users in the workspace.

    Source code in src/argilla/workspaces/_resource.py
    class Workspace(Resource):\n    \"\"\"Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.\n\n    Attributes:\n        name (str): The name of the workspace.\n        id (UUID): The ID of the workspace. This is a unique identifier for the workspace in the server.\n        datasets (List[Dataset]): A list of all datasets in the workspace.\n        users (WorkspaceUsers): A list of all users in the workspace.\n    \"\"\"\n\n    name: Optional[str]\n\n    _api: \"WorkspacesAPI\"\n\n    def __init__(\n        self,\n        name: Optional[str] = None,\n        id: Optional[UUID] = None,\n        client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Initializes a Workspace object with a client and a name or id\n\n        Parameters:\n            client (Argilla): The client used to interact with Argilla\n            name (str): The name of the workspace\n            id (UUID): The id of the workspace\n\n        Returns:\n            Workspace: The initialized workspace object\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.workspaces)\n\n        self._model = WorkspaceModel(name=name, id=id)\n\n    def add_user(self, user: Union[\"User\", str]) -> \"User\":\n        \"\"\"Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets\n        in the workspace.\n\n        Args:\n            user (Union[User, str]): The user to add to the workspace. Can be a User object or a username.\n\n        Returns:\n            User: The user that was added to the workspace\n        \"\"\"\n        return self.users.add(user)\n\n    def remove_user(self, user: Union[\"User\", str]) -> \"User\":\n        \"\"\"Removes a user from the workspace. After removing a user from the workspace, it will no longer have access\n\n        Args:\n            user (Union[User, str]): The user to remove from the workspace. Can be a User object or a username.\n\n        Returns:\n            User: The user that was removed from the workspace.\n        \"\"\"\n        return self.users.delete(user)\n\n    # TODO: Make this method private\n    def list_datasets(self) -> List[\"Dataset\"]:\n        from argilla.datasets import Dataset\n\n        datasets = self._client.api.datasets.list(self.id)\n        self._log_message(f\"Got {len(datasets)} datasets for workspace {self.id}\")\n        return [Dataset.from_model(model=dataset, client=self._client) for dataset in datasets]\n\n    @classmethod\n    def from_model(cls, model: WorkspaceModel, client: Argilla) -> \"Workspace\":\n        instance = cls(name=model.name, id=model.id, client=client)\n        instance._model = model\n\n        return instance\n\n    ############################\n    # Properties\n    ############################\n\n    @property\n    def name(self) -> Optional[str]:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def datasets(self) -> List[\"Dataset\"]:\n        \"\"\"List all datasets in the workspace\n\n        Returns:\n            List[Dataset]: A list of all datasets in the workspace\n        \"\"\"\n        return self.list_datasets()\n\n    @property\n    def users(self) -> \"WorkspaceUsers\":\n        \"\"\"List all users in the workspace\n\n        Returns:\n            WorkspaceUsers: A list of all users in the workspace\n        \"\"\"\n        return WorkspaceUsers(workspace=self)\n
    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.datasets","title":"datasets: List[Dataset] property","text":"

    List all datasets in the workspace

    Returns:

    Type Description List[Dataset]

    List[Dataset]: A list of all datasets in the workspace

    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.users","title":"users: WorkspaceUsers property","text":"

    List all users in the workspace

    Returns:

    Name Type Description WorkspaceUsers WorkspaceUsers

    A list of all users in the workspace

    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.__init__","title":"__init__(name=None, id=None, client=None)","text":"

    Initializes a Workspace object with a client and a name or id

    Parameters:

    Name Type Description Default client Argilla

    The client used to interact with Argilla

    None name str

    The name of the workspace

    None id UUID

    The id of the workspace

    None

    Returns:

    Name Type Description Workspace None

    The initialized workspace object

    Source code in src/argilla/workspaces/_resource.py
    def __init__(\n    self,\n    name: Optional[str] = None,\n    id: Optional[UUID] = None,\n    client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Initializes a Workspace object with a client and a name or id\n\n    Parameters:\n        client (Argilla): The client used to interact with Argilla\n        name (str): The name of the workspace\n        id (UUID): The id of the workspace\n\n    Returns:\n        Workspace: The initialized workspace object\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.workspaces)\n\n    self._model = WorkspaceModel(name=name, id=id)\n
    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.add_user","title":"add_user(user)","text":"

    Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets in the workspace.

    Parameters:

    Name Type Description Default user Union[User, str]

    The user to add to the workspace. Can be a User object or a username.

    required

    Returns:

    Name Type Description User User

    The user that was added to the workspace

    Source code in src/argilla/workspaces/_resource.py
    def add_user(self, user: Union[\"User\", str]) -> \"User\":\n    \"\"\"Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets\n    in the workspace.\n\n    Args:\n        user (Union[User, str]): The user to add to the workspace. Can be a User object or a username.\n\n    Returns:\n        User: The user that was added to the workspace\n    \"\"\"\n    return self.users.add(user)\n
    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.remove_user","title":"remove_user(user)","text":"

    Removes a user from the workspace. After removing a user from the workspace, it will no longer have access

    Parameters:

    Name Type Description Default user Union[User, str]

    The user to remove from the workspace. Can be a User object or a username.

    required

    Returns:

    Name Type Description User User

    The user that was removed from the workspace.

    Source code in src/argilla/workspaces/_resource.py
    def remove_user(self, user: Union[\"User\", str]) -> \"User\":\n    \"\"\"Removes a user from the workspace. After removing a user from the workspace, it will no longer have access\n\n    Args:\n        user (Union[User, str]): The user to remove from the workspace. Can be a User object or a username.\n\n    Returns:\n        User: The user that was removed from the workspace.\n    \"\"\"\n    return self.users.delete(user)\n
    "},{"location":"reference/argilla/datasets/dataset_records/","title":"rg.Dataset.records","text":""},{"location":"reference/argilla/datasets/dataset_records/#usage-examples","title":"Usage Examples","text":"

    In most cases, you will not need to create a DatasetRecords object directly. Instead, you can access it via the Dataset object:

    dataset.records\n

    For user familiar with legacy approaches

    1. Dataset.records object is used to interact with the records in a dataset. It interactively fetches records from the server in batches without using a local copy of the records.
    2. The log method of Dataset.records is used to both add and update records in a dataset. If the record includes a known id field, the record will be updated. If the record does not include a known id field, the record will be added.
    "},{"location":"reference/argilla/datasets/dataset_records/#adding-records-to-a-dataset","title":"Adding records to a dataset","text":"

    To add records to a dataset, use the log method. Records can be added as dictionaries or as Record objects. Single records can also be added as a dictionary or Record.

    As a Record objectFrom a data structureFrom a data structure with a mappingFrom a Hugging Face dataset

    You can also add records to a dataset by initializing a Record object directly.

    records = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n    ),\n] # (1)\n\ndataset.records.log(records)\n
    1. This is an illustration of a definition. In a real world scenario, you would iterate over a data structure and create Record objects for each iteration.
    data = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n    },\n] # (1)\n\ndataset.records.log(data)\n
    1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
    data = [\n    {\n        \"query\": \"Do you need oxygen to breathe?\",\n        \"response\": \"Yes\",\n    },\n    {\n        \"query\": \"What is the boiling point of water?\",\n        \"response\": \"100 degrees Celsius\",\n    },\n] # (1)\ndataset.records.log(\n    records=data,\n    mapping={\"query\": \"question\", \"response\": \"answer\"} # (2)\n)\n
    1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
    2. The data structure has keys query and response and the Argilla dataset has question and answer. You can use the mapping parameter to map the keys in the data structure to the fields in the Argilla dataset.

    You can also add records to a dataset using a Hugging Face dataset. This is useful when you want to use a dataset from the Hugging Face Hub and add it to your Argilla dataset.

    You can add the dataset where the column names correspond to the names of fields, questions, metadata or vectors in the Argilla dataset.

    If the dataset's schema does not correspond to your Argilla dataset names, you can use a mapping to indicate which columns in the dataset correspond to the Argilla dataset fields.

    from datasets import load_dataset\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (1)\n\ndataset.records.log(records=hf_dataset)\n
    1. In this example, the Hugging Face dataset matches the Argilla dataset schema. If that is not the case, you could use the .map of the datasets library to prepare the data before adding it to the Argilla dataset.

    Here we use the mapping parameter to specify the relationship between the Hugging Face dataset and the Argilla dataset.

    dataset.records.log(records=hf_dataset, mapping={\"txt\": \"text\", \"y\": \"label\"}) # (1)\n
    1. In this case, the txt key in the Hugging Face dataset corresponds to the text field in the Argilla dataset, and the y key in the Hugging Face dataset corresponds to the label field in the Argilla dataset.
    "},{"location":"reference/argilla/datasets/dataset_records/#updating-records-in-a-dataset","title":"Updating records in a dataset","text":"

    Records can also be updated using the log method with records that contain an id to identify the records to be updated. As above, records can be added as dictionaries or as Record objects.

    As a Record objectFrom a data structureFrom a data structure with a mappingFrom a Hugging Face dataset

    You can update records in a dataset by initializing a Record object directly and providing the id field.

    records = [\n    rg.Record(\n        metadata={\"department\": \"toys\"},\n        id=\"2\" # (1)\n    ),\n] # (1)\n\ndataset.records.log(records)\n
    1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.

    You can also update records in a dataset by providing the id field in the data structure.

    data = [\n    {\n        \"metadata\": {\"department\": \"toys\"},\n        \"id\": \"2\" # (1)\n    },\n] # (1)\n\ndataset.records.log(data)\n
    1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.

    You can also update records in a dataset by providing the id field in the data structure and using a mapping to map the keys in the data structure to the fields in the dataset.

    data = [\n    {\n        \"metadata\": {\"department\": \"toys\"},\n        \"my_id\": \"2\" # (1)\n    },\n]\n\ndataset.records.log(\n    records=data,\n    mapping={\"my_id\": \"id\"} # (2)\n)\n
    1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record. 2. Let's say that your data structure has keys my_id instead of id. You can use the mapping parameter to map the keys in the data structure to the fields in the dataset.

    You can also update records to an Argilla dataset using a Hugging Face dataset. To update records, the Hugging Face dataset must contain an id field to identify the records to be updated, or you can use a mapping to map the keys in the Hugging Face dataset to the fields in the Argilla dataset.

    from datasets import load_dataset\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (1)\n\ndataset.records.log(records=hf_dataset, mapping={\"uuid\": \"id\"}) # (2)\n
    1. In this example, the Hugging Face dataset matches the Argilla dataset schema.
    2. The uuid key in the Hugging Face dataset corresponds to the id field in the Argilla dataset.
    "},{"location":"reference/argilla/datasets/dataset_records/#iterating-over-records-in-a-dataset","title":"Iterating over records in a dataset","text":"

    Dataset.records can be used to iterate over records in a dataset from the server. The records will be fetched in batches from the server::

    for record in dataset.records:\n    print(record)\n\n# Fetch records with suggestions and responses\nfor record in dataset.records(with_suggestions=True, with_responses=True):\n    print(record.suggestions)\n    print(record.responses)\n\n# Filter records by a query and fetch records with vectors\nfor record in dataset.records(query=\"capital\", with_vectors=True):\n    print(record.vectors)\n

    Check out the rg.Record class reference for more information on the properties and methods available on a record and the rg.Query class reference for more information on the query syntax.

    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords","title":"DatasetRecords","text":"

    Bases: Iterable[Record], LoggingMixin

    This class is used to work with records from a dataset and is accessed via Dataset.records. The responsibility of this class is to provide an interface to interact with records in a dataset, by adding, updating, fetching, querying, deleting, and exporting records.

    Attributes:

    Name Type Description client Argilla

    The Argilla client object.

    dataset Dataset

    The dataset object.

    Source code in src/argilla/records/_dataset_records.py
    class DatasetRecords(Iterable[Record], LoggingMixin):\n    \"\"\"This class is used to work with records from a dataset and is accessed via `Dataset.records`.\n    The responsibility of this class is to provide an interface to interact with records in a dataset,\n    by adding, updating, fetching, querying, deleting, and exporting records.\n\n    Attributes:\n        client (Argilla): The Argilla client object.\n        dataset (Dataset): The dataset object.\n    \"\"\"\n\n    _api: RecordsAPI\n\n    DEFAULT_BATCH_SIZE = 256\n    DEFAULT_DELETE_BATCH_SIZE = 64\n\n    def __init__(self, client: \"Argilla\", dataset: \"Dataset\"):\n        \"\"\"Initializes a DatasetRecords object with a client and a dataset.\n        Args:\n            client: An Argilla client object.\n            dataset: A Dataset object.\n        \"\"\"\n        self.__client = client\n        self.__dataset = dataset\n        self._api = self.__client.api.records\n\n    def __iter__(self):\n        return DatasetRecordsIterator(self.__dataset, self.__client, with_suggestions=True, with_responses=True)\n\n    def __call__(\n        self,\n        query: Optional[Union[str, Query]] = None,\n        batch_size: Optional[int] = DEFAULT_BATCH_SIZE,\n        start_offset: int = 0,\n        with_suggestions: bool = True,\n        with_responses: bool = True,\n        with_vectors: Optional[Union[List, bool, str]] = None,\n    ) -> DatasetRecordsIterator:\n        \"\"\"Returns an iterator over the records in the dataset on the server.\n\n        Parameters:\n            query: A string or a Query object to filter the records.\n            batch_size: The number of records to fetch in each batch. The default is 256.\n            start_offset: The offset from which to start fetching records. The default is 0.\n            with_suggestions: Whether to include suggestions in the records. The default is True.\n            with_responses: Whether to include responses in the records. The default is True.\n            with_vectors: A list of vector names to include in the records. The default is None.\n                If a list is provided, only the specified vectors will be included.\n                If True is provided, all vectors will be included.\n\n        Returns:\n            An iterator over the records in the dataset on the server.\n\n        \"\"\"\n        if query and isinstance(query, str):\n            query = Query(query=query)\n\n        if with_vectors:\n            self._validate_vector_names(vector_names=with_vectors)\n\n        return DatasetRecordsIterator(\n            dataset=self.__dataset,\n            client=self.__client,\n            query=query,\n            batch_size=batch_size,\n            start_offset=start_offset,\n            with_suggestions=with_suggestions,\n            with_responses=with_responses,\n            with_vectors=with_vectors,\n        )\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}({self.__dataset})\"\n\n    ############################\n    # Public methods\n    ############################\n\n    def log(\n        self,\n        records: Union[List[dict], List[Record], HFDataset],\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        user_id: Optional[UUID] = None,\n        batch_size: int = DEFAULT_BATCH_SIZE,\n    ) -> \"DatasetRecords\":\n        \"\"\"Add or update records in a dataset on the server using the provided records.\n        If the record includes a known `id` field, the record will be updated.\n        If the record does not include a known `id` field, the record will be added as a new record.\n        See `rg.Record` for more information on the record definition.\n\n        Parameters:\n            records: A list of `Record` objects, a Hugging Face Dataset, or a list of dictionaries representing the records.\n                     If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the\n                     fields in the Argilla dataset's fields and questions. `id` should be provided to identify the records when updating.\n            mapping: A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset.\n                     To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.\n            user_id: The user id to be associated with the records' response. If not provided, the current user id is used.\n            batch_size: The number of records to send in each batch. The default is 256.\n\n        Returns:\n            A list of Record objects representing the updated records.\n        \"\"\"\n        record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id or self.__client.me.id)\n        batch_size = self._normalize_batch_size(\n            batch_size=batch_size,\n            records_length=len(record_models),\n            max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,\n        )\n\n        created_or_updated = []\n        records_updated = 0\n\n        for batch in tqdm(\n            iterable=range(0, len(records), batch_size),\n            desc=\"Sending records...\",\n            total=len(records) // batch_size,\n            unit=\"batch\",\n        ):\n            self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n            batch_records = record_models[batch : batch + batch_size]\n            models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)\n            created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])\n            records_updated += updated\n\n        records_created = len(created_or_updated) - records_updated\n        self._log_message(\n            message=f\"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}\",\n            level=\"info\",\n        )\n\n        return self\n\n    def delete(\n        self,\n        records: List[Record],\n        batch_size: int = DEFAULT_DELETE_BATCH_SIZE,\n    ) -> List[Record]:\n        \"\"\"Delete records in a dataset on the server using the provided records\n            and matching based on the id.\n\n        Parameters:\n            records: A list of `Record` objects representing the records to be deleted.\n            batch_size: The number of records to send in each batch. The default is 64.\n\n        Returns:\n            A list of Record objects representing the deleted records.\n\n        \"\"\"\n        mapping = None\n        user_id = self.__client.me.id\n        record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id)\n        batch_size = self._normalize_batch_size(\n            batch_size=batch_size,\n            records_length=len(record_models),\n            max_value=self._api.MAX_RECORDS_PER_DELETE_BULK,\n        )\n\n        records_deleted = 0\n        for batch in tqdm(\n            iterable=range(0, len(records), batch_size),\n            desc=\"Sending records...\",\n            total=len(records) // batch_size,\n            unit=\"batch\",\n        ):\n            self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n            batch_records = record_models[batch : batch + batch_size]\n            self._api.delete_many(dataset_id=self.__dataset.id, records=batch_records)\n            records_deleted += len(batch_records)\n\n        self._log_message(\n            message=f\"Deleted {len(record_models)} records from dataset {self.__dataset.name}\",\n            level=\"info\",\n        )\n\n        return records\n\n    def to_dict(self, flatten: bool = False, orient: str = \"names\") -> Dict[str, Any]:\n        \"\"\"\n        Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().\n\n        Parameters:\n            flatten (bool): The structure of the exported dictionary.\n                - True: The record fields, metadata, suggestions and responses will be flattened.\n                - False: The record fields, metadata, suggestions and responses will be nested.\n            orient (str): The orientation of the exported dictionary.\n                - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.\n                - \"index\": The keys of the dictionary will be the id of the records.\n        Returns:\n            A dictionary of records.\n\n        \"\"\"\n        return self().to_dict(flatten=flatten, orient=orient)\n\n    def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:\n        \"\"\"\n        Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().\n\n        Parameters:\n            flatten (bool): The structure of the exported dictionaries in the list.\n                - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, `label.suggestion` and `label.response`. Records responses are spread across multiple columns for values and users.\n                - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.\n        Returns:\n            A list of dictionaries of records.\n        \"\"\"\n        data = self().to_list(flatten=flatten)\n        return data\n\n    def to_json(self, path: Union[Path, str]) -> Path:\n        \"\"\"\n        Export the records to a file on disk.\n\n        Parameters:\n            path (str): The path to the file to save the records.\n\n        Returns:\n            The path to the file where the records were saved.\n\n        \"\"\"\n        return self().to_json(path=path)\n\n    def from_json(self, path: Union[Path, str]) -> List[Record]:\n        \"\"\"Creates a DatasetRecords object from a disk path to a JSON file.\n            The JSON file should be defined by `DatasetRecords.to_json`.\n\n        Args:\n            path (str): The path to the file containing the records.\n\n        Returns:\n            DatasetRecords: The DatasetRecords object created from the disk path.\n\n        \"\"\"\n        records = JsonIO._records_from_json(path=path)\n        return self.log(records=records)\n\n    def to_datasets(self) -> HFDataset:\n        \"\"\"\n        Export the records to a HFDataset.\n\n        Returns:\n            The dataset containing the records.\n\n        \"\"\"\n\n        return self().to_datasets()\n\n    ############################\n    # Private methods\n    ############################\n\n    def _ingest_records(\n        self,\n        records: Union[List[Dict[str, Any]], List[Record], HFDataset],\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        user_id: Optional[UUID] = None,\n    ) -> List[RecordModel]:\n        \"\"\"Ingests records from a list of dictionaries, a Hugging Face Dataset, or a list of Record objects.\"\"\"\n\n        if len(records) == 0:\n            raise ValueError(\"No records provided to ingest.\")\n\n        if HFDatasetsIO._is_hf_dataset(dataset=records):\n            records = HFDatasetsIO._record_dicts_from_datasets(dataset=records)\n\n        ingested_records = []\n        record_mapper = IngestedRecordMapper(mapping=mapping, dataset=self.__dataset, user_id=user_id)\n        for record in records:\n            try:\n                if isinstance(record, dict):\n                    record = record_mapper(data=record)\n                elif isinstance(record, Record):\n                    record.dataset = self.__dataset\n                else:\n                    raise ValueError(\n                        \"Records should be a a list Record instances, \"\n                        \"a Hugging Face Dataset, or a list of dictionaries representing the records.\"\n                        f\"Found a record of type {type(record)}: {record}.\"\n                    )\n            except Exception as e:\n                raise RecordsIngestionError(f\"Failed to ingest record from dict {record}: {e}\")\n            ingested_records.append(record.api_model())\n        return ingested_records\n\n    def _normalize_batch_size(self, batch_size: int, records_length, max_value: int):\n        norm_batch_size = min(batch_size, records_length, max_value)\n\n        if batch_size != norm_batch_size:\n            self._log_message(\n                message=f\"The provided batch size {batch_size} was normalized. Using value {norm_batch_size}.\",\n                level=\"warning\",\n            )\n\n        return norm_batch_size\n\n    def _validate_vector_names(self, vector_names: Union[List[str], str]) -> None:\n        if not isinstance(vector_names, list):\n            vector_names = [vector_names]\n        for vector_name in vector_names:\n            if isinstance(vector_name, bool):\n                continue\n            if vector_name not in self.__dataset.schema:\n                raise ValueError(f\"Vector field {vector_name} not found in dataset schema.\")\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.__init__","title":"__init__(client, dataset)","text":"

    Initializes a DatasetRecords object with a client and a dataset. Args: client: An Argilla client object. dataset: A Dataset object.

    Source code in src/argilla/records/_dataset_records.py
    def __init__(self, client: \"Argilla\", dataset: \"Dataset\"):\n    \"\"\"Initializes a DatasetRecords object with a client and a dataset.\n    Args:\n        client: An Argilla client object.\n        dataset: A Dataset object.\n    \"\"\"\n    self.__client = client\n    self.__dataset = dataset\n    self._api = self.__client.api.records\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.__call__","title":"__call__(query=None, batch_size=DEFAULT_BATCH_SIZE, start_offset=0, with_suggestions=True, with_responses=True, with_vectors=None)","text":"

    Returns an iterator over the records in the dataset on the server.

    Parameters:

    Name Type Description Default query Optional[Union[str, Query]]

    A string or a Query object to filter the records.

    None batch_size Optional[int]

    The number of records to fetch in each batch. The default is 256.

    DEFAULT_BATCH_SIZE start_offset int

    The offset from which to start fetching records. The default is 0.

    0 with_suggestions bool

    Whether to include suggestions in the records. The default is True.

    True with_responses bool

    Whether to include responses in the records. The default is True.

    True with_vectors Optional[Union[List, bool, str]]

    A list of vector names to include in the records. The default is None. If a list is provided, only the specified vectors will be included. If True is provided, all vectors will be included.

    None

    Returns:

    Type Description DatasetRecordsIterator

    An iterator over the records in the dataset on the server.

    Source code in src/argilla/records/_dataset_records.py
    def __call__(\n    self,\n    query: Optional[Union[str, Query]] = None,\n    batch_size: Optional[int] = DEFAULT_BATCH_SIZE,\n    start_offset: int = 0,\n    with_suggestions: bool = True,\n    with_responses: bool = True,\n    with_vectors: Optional[Union[List, bool, str]] = None,\n) -> DatasetRecordsIterator:\n    \"\"\"Returns an iterator over the records in the dataset on the server.\n\n    Parameters:\n        query: A string or a Query object to filter the records.\n        batch_size: The number of records to fetch in each batch. The default is 256.\n        start_offset: The offset from which to start fetching records. The default is 0.\n        with_suggestions: Whether to include suggestions in the records. The default is True.\n        with_responses: Whether to include responses in the records. The default is True.\n        with_vectors: A list of vector names to include in the records. The default is None.\n            If a list is provided, only the specified vectors will be included.\n            If True is provided, all vectors will be included.\n\n    Returns:\n        An iterator over the records in the dataset on the server.\n\n    \"\"\"\n    if query and isinstance(query, str):\n        query = Query(query=query)\n\n    if with_vectors:\n        self._validate_vector_names(vector_names=with_vectors)\n\n    return DatasetRecordsIterator(\n        dataset=self.__dataset,\n        client=self.__client,\n        query=query,\n        batch_size=batch_size,\n        start_offset=start_offset,\n        with_suggestions=with_suggestions,\n        with_responses=with_responses,\n        with_vectors=with_vectors,\n    )\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.log","title":"log(records, mapping=None, user_id=None, batch_size=DEFAULT_BATCH_SIZE)","text":"

    Add or update records in a dataset on the server using the provided records. If the record includes a known id field, the record will be updated. If the record does not include a known id field, the record will be added as a new record. See rg.Record for more information on the record definition.

    Parameters:

    Name Type Description Default records Union[List[dict], List[Record], HFDataset]

    A list of Record objects, a Hugging Face Dataset, or a list of dictionaries representing the records. If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the fields in the Argilla dataset's fields and questions. id should be provided to identify the records when updating.

    required mapping Optional[Dict[str, Union[str, Sequence[str]]]]

    A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset. To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.

    None user_id Optional[UUID]

    The user id to be associated with the records' response. If not provided, the current user id is used.

    None batch_size int

    The number of records to send in each batch. The default is 256.

    DEFAULT_BATCH_SIZE

    Returns:

    Type Description DatasetRecords

    A list of Record objects representing the updated records.

    Source code in src/argilla/records/_dataset_records.py
    def log(\n    self,\n    records: Union[List[dict], List[Record], HFDataset],\n    mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n    user_id: Optional[UUID] = None,\n    batch_size: int = DEFAULT_BATCH_SIZE,\n) -> \"DatasetRecords\":\n    \"\"\"Add or update records in a dataset on the server using the provided records.\n    If the record includes a known `id` field, the record will be updated.\n    If the record does not include a known `id` field, the record will be added as a new record.\n    See `rg.Record` for more information on the record definition.\n\n    Parameters:\n        records: A list of `Record` objects, a Hugging Face Dataset, or a list of dictionaries representing the records.\n                 If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the\n                 fields in the Argilla dataset's fields and questions. `id` should be provided to identify the records when updating.\n        mapping: A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset.\n                 To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.\n        user_id: The user id to be associated with the records' response. If not provided, the current user id is used.\n        batch_size: The number of records to send in each batch. The default is 256.\n\n    Returns:\n        A list of Record objects representing the updated records.\n    \"\"\"\n    record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id or self.__client.me.id)\n    batch_size = self._normalize_batch_size(\n        batch_size=batch_size,\n        records_length=len(record_models),\n        max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,\n    )\n\n    created_or_updated = []\n    records_updated = 0\n\n    for batch in tqdm(\n        iterable=range(0, len(records), batch_size),\n        desc=\"Sending records...\",\n        total=len(records) // batch_size,\n        unit=\"batch\",\n    ):\n        self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n        batch_records = record_models[batch : batch + batch_size]\n        models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)\n        created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])\n        records_updated += updated\n\n    records_created = len(created_or_updated) - records_updated\n    self._log_message(\n        message=f\"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}\",\n        level=\"info\",\n    )\n\n    return self\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.delete","title":"delete(records, batch_size=DEFAULT_DELETE_BATCH_SIZE)","text":"

    Delete records in a dataset on the server using the provided records and matching based on the id.

    Parameters:

    Name Type Description Default records List[Record]

    A list of Record objects representing the records to be deleted.

    required batch_size int

    The number of records to send in each batch. The default is 64.

    DEFAULT_DELETE_BATCH_SIZE

    Returns:

    Type Description List[Record]

    A list of Record objects representing the deleted records.

    Source code in src/argilla/records/_dataset_records.py
    def delete(\n    self,\n    records: List[Record],\n    batch_size: int = DEFAULT_DELETE_BATCH_SIZE,\n) -> List[Record]:\n    \"\"\"Delete records in a dataset on the server using the provided records\n        and matching based on the id.\n\n    Parameters:\n        records: A list of `Record` objects representing the records to be deleted.\n        batch_size: The number of records to send in each batch. The default is 64.\n\n    Returns:\n        A list of Record objects representing the deleted records.\n\n    \"\"\"\n    mapping = None\n    user_id = self.__client.me.id\n    record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id)\n    batch_size = self._normalize_batch_size(\n        batch_size=batch_size,\n        records_length=len(record_models),\n        max_value=self._api.MAX_RECORDS_PER_DELETE_BULK,\n    )\n\n    records_deleted = 0\n    for batch in tqdm(\n        iterable=range(0, len(records), batch_size),\n        desc=\"Sending records...\",\n        total=len(records) // batch_size,\n        unit=\"batch\",\n    ):\n        self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n        batch_records = record_models[batch : batch + batch_size]\n        self._api.delete_many(dataset_id=self.__dataset.id, records=batch_records)\n        records_deleted += len(batch_records)\n\n    self._log_message(\n        message=f\"Deleted {len(record_models)} records from dataset {self.__dataset.name}\",\n        level=\"info\",\n    )\n\n    return records\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_dict","title":"to_dict(flatten=False, orient='names')","text":"

    Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().

    Parameters:

    Name Type Description Default flatten bool

    The structure of the exported dictionary. - True: The record fields, metadata, suggestions and responses will be flattened. - False: The record fields, metadata, suggestions and responses will be nested.

    False orient str

    The orientation of the exported dictionary. - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses. - \"index\": The keys of the dictionary will be the id of the records.

    'names'

    Returns: A dictionary of records.

    Source code in src/argilla/records/_dataset_records.py
    def to_dict(self, flatten: bool = False, orient: str = \"names\") -> Dict[str, Any]:\n    \"\"\"\n    Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().\n\n    Parameters:\n        flatten (bool): The structure of the exported dictionary.\n            - True: The record fields, metadata, suggestions and responses will be flattened.\n            - False: The record fields, metadata, suggestions and responses will be nested.\n        orient (str): The orientation of the exported dictionary.\n            - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.\n            - \"index\": The keys of the dictionary will be the id of the records.\n    Returns:\n        A dictionary of records.\n\n    \"\"\"\n    return self().to_dict(flatten=flatten, orient=orient)\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_list","title":"to_list(flatten=False)","text":"

    Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().

    Parameters:

    Name Type Description Default flatten bool

    The structure of the exported dictionaries in the list. - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, label.suggestion and label.response. Records responses are spread across multiple columns for values and users. - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.

    False

    Returns: A list of dictionaries of records.

    Source code in src/argilla/records/_dataset_records.py
    def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:\n    \"\"\"\n    Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().\n\n    Parameters:\n        flatten (bool): The structure of the exported dictionaries in the list.\n            - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, `label.suggestion` and `label.response`. Records responses are spread across multiple columns for values and users.\n            - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.\n    Returns:\n        A list of dictionaries of records.\n    \"\"\"\n    data = self().to_list(flatten=flatten)\n    return data\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_json","title":"to_json(path)","text":"

    Export the records to a file on disk.

    Parameters:

    Name Type Description Default path str

    The path to the file to save the records.

    required

    Returns:

    Type Description Path

    The path to the file where the records were saved.

    Source code in src/argilla/records/_dataset_records.py
    def to_json(self, path: Union[Path, str]) -> Path:\n    \"\"\"\n    Export the records to a file on disk.\n\n    Parameters:\n        path (str): The path to the file to save the records.\n\n    Returns:\n        The path to the file where the records were saved.\n\n    \"\"\"\n    return self().to_json(path=path)\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.from_json","title":"from_json(path)","text":"

    Creates a DatasetRecords object from a disk path to a JSON file. The JSON file should be defined by DatasetRecords.to_json.

    Parameters:

    Name Type Description Default path str

    The path to the file containing the records.

    required

    Returns:

    Name Type Description DatasetRecords List[Record]

    The DatasetRecords object created from the disk path.

    Source code in src/argilla/records/_dataset_records.py
    def from_json(self, path: Union[Path, str]) -> List[Record]:\n    \"\"\"Creates a DatasetRecords object from a disk path to a JSON file.\n        The JSON file should be defined by `DatasetRecords.to_json`.\n\n    Args:\n        path (str): The path to the file containing the records.\n\n    Returns:\n        DatasetRecords: The DatasetRecords object created from the disk path.\n\n    \"\"\"\n    records = JsonIO._records_from_json(path=path)\n    return self.log(records=records)\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_datasets","title":"to_datasets()","text":"

    Export the records to a HFDataset.

    Returns:

    Type Description HFDataset

    The dataset containing the records.

    Source code in src/argilla/records/_dataset_records.py
    def to_datasets(self) -> HFDataset:\n    \"\"\"\n    Export the records to a HFDataset.\n\n    Returns:\n        The dataset containing the records.\n\n    \"\"\"\n\n    return self().to_datasets()\n
    "},{"location":"reference/argilla/datasets/datasets/","title":"rg.Dataset","text":"

    Dataset is a class that represents a collection of records. It is used to store and manage records in Argilla.

    "},{"location":"reference/argilla/datasets/datasets/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/datasets/datasets/#creating-a-dataset","title":"Creating a Dataset","text":"

    To create a new dataset you need to define its name and settings. Optional parameters are workspace and client, if you want to create the dataset in a specific workspace or on a specific Argilla instance.

    dataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=rg.Settings(\n        fields=[\n            rg.TextField(name=\"text\"),\n        ],\n        questions=[\n            rg.TextQuestion(name=\"response\"),\n        ],\n    ),\n)\ndataset.create()\n

    For a detail guide of the dataset creation and publication process, see the Dataset how to guide.

    "},{"location":"reference/argilla/datasets/datasets/#retrieving-an-existing-dataset","title":"Retrieving an existing Dataset","text":"

    To retrieve an existing dataset, use client.datasets(\"my_dataset\") instead.

    dataset = client.datasets(\"my_dataset\")\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset","title":"Dataset","text":"

    Bases: Resource, HubImportExportMixin, DiskImportExportMixin

    Class for interacting with Argilla Datasets

    Attributes:

    Name Type Description name str

    Name of the dataset.

    records DatasetRecords

    The records object for the dataset. Used to interact with the records of the dataset by iterating, searching, etc.

    settings Settings

    The settings object of the dataset. Used to configure the dataset with fields, questions, guidelines, etc.

    fields list

    The fields of the dataset, for example the rg.TextField of the dataset. Defined in the settings.

    questions list

    The questions of the dataset defined in the settings. For example, the rg.TextQuestion that you want labelers to answer.

    guidelines str

    The guidelines of the dataset defined in the settings. Used to provide instructions to labelers.

    allow_extra_metadata bool

    True if extra metadata is allowed, False otherwise.

    Source code in src/argilla/datasets/_resource.py
    class Dataset(Resource, HubImportExportMixin, DiskImportExportMixin):\n    \"\"\"Class for interacting with Argilla Datasets\n\n    Attributes:\n        name: Name of the dataset.\n        records (DatasetRecords): The records object for the dataset. Used to interact with the records of the dataset by iterating, searching, etc.\n        settings (Settings): The settings object of the dataset. Used to configure the dataset with fields, questions, guidelines, etc.\n        fields (list): The fields of the dataset, for example the `rg.TextField` of the dataset. Defined in the settings.\n        questions (list): The questions of the dataset defined in the settings. For example, the `rg.TextQuestion` that you want labelers to answer.\n        guidelines (str): The guidelines of the dataset defined in the settings. Used to provide instructions to labelers.\n        allow_extra_metadata (bool): True if extra metadata is allowed, False otherwise.\n    \"\"\"\n\n    name: str\n    id: Optional[UUID]\n\n    _api: \"DatasetsAPI\"\n    _model: \"DatasetModel\"\n\n    def __init__(\n        self,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str, UUID]] = None,\n        settings: Optional[Settings] = None,\n        client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Initializes a new Argilla Dataset object with the given parameters.\n\n        Parameters:\n            name (str): Name of the dataset. Replaced by random UUID if not assigned.\n            workspace (UUID): Workspace of the dataset. Default is the first workspace found in the server.\n            settings (Settings): Settings class to be used to configure the dataset.\n            client (Argilla): Instance of Argilla to connect with the server. Default is the default client.\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.datasets)\n        if name is None:\n            name = f\"dataset_{uuid4()}\"\n            self._log_message(f\"Settings dataset name to unique UUID: {name}\")\n\n        self._workspace = workspace\n        self._model = DatasetModel(name=name)\n        self._settings = settings._copy() if settings else Settings(_dataset=self)\n        self._settings.dataset = self\n        self.__records = DatasetRecords(client=self._client, dataset=self)\n\n    #####################\n    #  Properties       #\n    #####################\n\n    @property\n    def name(self) -> str:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def records(self) -> \"DatasetRecords\":\n        return self.__records\n\n    @property\n    def settings(self) -> Settings:\n        return self._settings\n\n    @settings.setter\n    def settings(self, value: Settings) -> None:\n        settings_copy = value._copy()\n        settings_copy.dataset = self\n        self._settings = settings_copy\n\n    @property\n    def fields(self) -> list:\n        return self.settings.fields\n\n    @property\n    def questions(self) -> list:\n        return self.settings.questions\n\n    @property\n    def guidelines(self) -> str:\n        return self.settings.guidelines\n\n    @guidelines.setter\n    def guidelines(self, value: str) -> None:\n        self.settings.guidelines = value\n\n    @property\n    def allow_extra_metadata(self) -> bool:\n        return self.settings.allow_extra_metadata\n\n    @allow_extra_metadata.setter\n    def allow_extra_metadata(self, value: bool) -> None:\n        self.settings.allow_extra_metadata = value\n\n    @property\n    def schema(self) -> dict:\n        return self.settings.schema\n\n    @property\n    def workspace(self) -> Workspace:\n        self._workspace = self._resolve_workspace()\n        return self._workspace\n\n    @property\n    def distribution(self) -> TaskDistribution:\n        return self.settings.distribution\n\n    @distribution.setter\n    def distribution(self, value: TaskDistribution) -> None:\n        self.settings.distribution = value\n\n    #####################\n    #  Core methods     #\n    #####################\n\n    def get(self) -> \"Dataset\":\n        super().get()\n        self.settings.get()\n        return self\n\n    def create(self) -> \"Dataset\":\n        \"\"\"Creates the dataset on the server with the `Settings` configuration.\n\n        Returns:\n            Dataset: The created dataset object.\n        \"\"\"\n        super().create()\n        try:\n            return self._publish()\n        except Exception as e:\n            self._log_message(message=f\"Error creating dataset: {e}\", level=\"error\")\n            self._rollback_dataset_creation()\n            raise SettingsError from e\n\n    def update(self) -> \"Dataset\":\n        \"\"\"Updates the dataset on the server with the current settings.\n\n        Returns:\n            Dataset: The updated dataset object.\n        \"\"\"\n        self.settings.update()\n        return self\n\n    @classmethod\n    def from_model(cls, model: DatasetModel, client: \"Argilla\") -> \"Dataset\":\n        instance = cls(client=client, workspace=model.workspace_id, name=model.name)\n        instance._model = model\n\n        return instance\n\n    #####################\n    #  Utility methods  #\n    #####################\n\n    def api_model(self) -> DatasetModel:\n        self._model.workspace_id = self.workspace.id\n        return self._model\n\n    def _publish(self) -> \"Dataset\":\n        self._settings.create()\n        self._api.publish(dataset_id=self._model.id)\n\n        return self.get()\n\n    def _resolve_workspace(self) -> Workspace:\n        workspace = self._workspace\n\n        if workspace is None:\n            workspace = self._client.workspaces.default\n            warnings.warn(f\"Workspace not provided. Using default workspace: {workspace.name} id: {workspace.id}\")\n        elif isinstance(workspace, str):\n            workspace = self._client.workspaces(workspace)\n            if workspace is None:\n                available_workspace_names = [ws.name for ws in self._client.workspaces]\n                raise NotFoundError(\n                    f\"Workspace with name {workspace} not found. Available workspaces: {available_workspace_names}\"\n                )\n        elif isinstance(workspace, UUID):\n            ws_model = self._client.api.workspaces.get(workspace)\n            workspace = Workspace.from_model(ws_model, client=self._client)\n        elif not isinstance(workspace, Workspace):\n            raise ValueError(f\"Wrong workspace value found {workspace}\")\n\n        return workspace\n\n    def _rollback_dataset_creation(self):\n        if not self._is_published():\n            self.delete()\n\n    def _is_published(self) -> bool:\n        return self._model.status == \"ready\"\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.__init__","title":"__init__(name=None, workspace=None, settings=None, client=None)","text":"

    Initializes a new Argilla Dataset object with the given parameters.

    Parameters:

    Name Type Description Default name str

    Name of the dataset. Replaced by random UUID if not assigned.

    None workspace UUID

    Workspace of the dataset. Default is the first workspace found in the server.

    None settings Settings

    Settings class to be used to configure the dataset.

    None client Argilla

    Instance of Argilla to connect with the server. Default is the default client.

    None Source code in src/argilla/datasets/_resource.py
    def __init__(\n    self,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str, UUID]] = None,\n    settings: Optional[Settings] = None,\n    client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Initializes a new Argilla Dataset object with the given parameters.\n\n    Parameters:\n        name (str): Name of the dataset. Replaced by random UUID if not assigned.\n        workspace (UUID): Workspace of the dataset. Default is the first workspace found in the server.\n        settings (Settings): Settings class to be used to configure the dataset.\n        client (Argilla): Instance of Argilla to connect with the server. Default is the default client.\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.datasets)\n    if name is None:\n        name = f\"dataset_{uuid4()}\"\n        self._log_message(f\"Settings dataset name to unique UUID: {name}\")\n\n    self._workspace = workspace\n    self._model = DatasetModel(name=name)\n    self._settings = settings._copy() if settings else Settings(_dataset=self)\n    self._settings.dataset = self\n    self.__records = DatasetRecords(client=self._client, dataset=self)\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.create","title":"create()","text":"

    Creates the dataset on the server with the Settings configuration.

    Returns:

    Name Type Description Dataset Dataset

    The created dataset object.

    Source code in src/argilla/datasets/_resource.py
    def create(self) -> \"Dataset\":\n    \"\"\"Creates the dataset on the server with the `Settings` configuration.\n\n    Returns:\n        Dataset: The created dataset object.\n    \"\"\"\n    super().create()\n    try:\n        return self._publish()\n    except Exception as e:\n        self._log_message(message=f\"Error creating dataset: {e}\", level=\"error\")\n        self._rollback_dataset_creation()\n        raise SettingsError from e\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.update","title":"update()","text":"

    Updates the dataset on the server with the current settings.

    Returns:

    Name Type Description Dataset Dataset

    The updated dataset object.

    Source code in src/argilla/datasets/_resource.py
    def update(self) -> \"Dataset\":\n    \"\"\"Updates the dataset on the server with the current settings.\n\n    Returns:\n        Dataset: The updated dataset object.\n    \"\"\"\n    self.settings.update()\n    return self\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._disk.DiskImportExportMixin","title":"DiskImportExportMixin","text":"

    Bases: ABC

    A mixin for exporting and importing datasets to and from disk.

    Source code in src/argilla/datasets/_export/_disk.py
    class DiskImportExportMixin(ABC):\n    \"\"\"A mixin for exporting and importing datasets to and from disk.\"\"\"\n\n    _model: DatasetModel\n    _DEFAULT_RECORDS_PATH = \"records.json\"\n    _DEFAULT_CONFIG_REPO_DIR = \".argilla\"\n    _DEFAULT_SETTINGS_PATH = f\"{_DEFAULT_CONFIG_REPO_DIR}/settings.json\"\n    _DEFAULT_DATASET_PATH = f\"{_DEFAULT_CONFIG_REPO_DIR}/dataset.json\"\n    _DEFAULT_CONFIGURATION_FILES = [_DEFAULT_SETTINGS_PATH, _DEFAULT_DATASET_PATH]\n\n    def to_disk(self: \"Dataset\", path: str, *, with_records: bool = True) -> str:\n        \"\"\"Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.\n\n        Parameters:\n            path (str): The path to export the dataset to. Must be an empty directory.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        \"\"\"\n        dataset_path, settings_path, records_path = self._define_child_paths(path=path)\n        logging.info(f\"Loading dataset from {dataset_path}\")\n        logging.info(f\"Loading settings from {settings_path}\")\n        logging.info(f\"Loading records from {records_path}\")\n        # Export the dataset model, settings and records\n        self._persist_dataset_model(path=dataset_path)\n        self.settings.to_json(path=settings_path)\n        if with_records:\n            self.records.to_json(path=records_path)\n\n        return path\n\n    @classmethod\n    def from_disk(\n        cls: Type[\"Dataset\"],\n        path: str,\n        *,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str]] = None,\n        client: Optional[\"Argilla\"] = None,\n        with_records: bool = True,\n    ) -> \"Dataset\":\n        \"\"\"Imports a dataset from disk as a directory containing the dataset model, settings and records.\n        The directory should be defined using the `to_disk` method.\n\n        Parameters:\n            path (str): The path to the directory containing the dataset model, settings and records.\n            name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n            workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n            client (Argilla, optional): The client to use for the import. Defaults to None and the default client is used.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        \"\"\"\n\n        client = client or Argilla._get_default()\n\n        dataset_path, settings_path, records_path = cls._define_child_paths(path=path)\n        logging.info(f\"Loading dataset from {dataset_path}\")\n        logging.info(f\"Loading settings from {settings_path}\")\n        logging.info(f\"Loading records from {records_path}\")\n        dataset_model = cls._load_dataset_model(path=dataset_path)\n\n        # Get the relevant workspace_id of the incoming dataset\n        if isinstance(workspace, str):\n            workspace = client.workspaces(workspace)\n            if not workspace:\n                raise ArgillaError(f\"Workspace {workspace} not found on the server.\")\n        else:\n            warnings.warn(\"Workspace not provided. Using default workspace.\")\n            workspace = client.workspaces.default\n        dataset_model.workspace_id = workspace.id\n\n        if name and (name != dataset_model.name):\n            logging.info(f\"Changing dataset name from {dataset_model.name} to {name}\")\n            dataset_model.name = name\n\n        if client.api.datasets.name_exists(name=dataset_model.name, workspace_id=workspace.id):\n            warnings.warn(\n                f\"Loaded dataset name {dataset_model.name} already exists in the workspace {workspace.name} so using it. To create a new dataset, provide a unique name to the `name` parameter.\"\n            )\n            dataset_model = client.api.datasets.get_by_name_and_workspace_id(\n                name=dataset_model.name, workspace_id=workspace.id\n            )\n            dataset = cls.from_model(model=dataset_model, client=client)\n        else:\n            # Create a new dataset and load the settings and records\n            dataset = cls.from_model(model=dataset_model, client=client)\n            dataset.settings = Settings.from_json(path=settings_path)\n            dataset.create()\n\n        if os.path.exists(records_path) and with_records:\n            try:\n                dataset.records.from_json(path=records_path)\n            except RecordsIngestionError as e:\n                raise RecordsIngestionError(\n                    message=\"Error importing dataset records from disk. Records and datasets settings are not compatible.\"\n                ) from e\n        return dataset\n\n    ############################\n    # Utility methods\n    ############################\n\n    def _persist_dataset_model(self, path: Path):\n        \"\"\"Persists the dataset model to disk.\"\"\"\n        if path.exists():\n            raise FileExistsError(f\"Dataset already exists at {path}\")\n        with open(file=path, mode=\"w\") as f:\n            json.dump(self.api_model().model_dump(), f)\n\n    @classmethod\n    def _load_dataset_model(cls, path: Path):\n        \"\"\"Loads the dataset model from disk.\"\"\"\n        if not os.path.exists(path):\n            raise FileNotFoundError(f\"Dataset model not found at {path}\")\n        with open(file=path, mode=\"r\") as f:\n            dataset_model = json.load(f)\n            dataset_model = DatasetModel(**dataset_model)\n        return dataset_model\n\n    @classmethod\n    def _define_child_paths(cls, path: Union[Path, str]) -> Tuple[Path, Path, Path]:\n        path = Path(path)\n        if not path.is_dir():\n            raise NotADirectoryError(f\"Path {path} is not a directory\")\n        main_path = path / cls._DEFAULT_CONFIG_REPO_DIR\n        main_path.mkdir(exist_ok=True)\n        dataset_path = path / cls._DEFAULT_DATASET_PATH\n        settings_path = path / cls._DEFAULT_SETTINGS_PATH\n        records_path = path / cls._DEFAULT_RECORDS_PATH\n        return dataset_path, settings_path, records_path\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._disk.DiskImportExportMixin.to_disk","title":"to_disk(path, *, with_records=True)","text":"

    Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.

    Parameters:

    Name Type Description Default path str

    The path to export the dataset to. Must be an empty directory.

    required with_records bool

    whether to load the records from the Hugging Face dataset. Defaults to True.

    True Source code in src/argilla/datasets/_export/_disk.py
    def to_disk(self: \"Dataset\", path: str, *, with_records: bool = True) -> str:\n    \"\"\"Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.\n\n    Parameters:\n        path (str): The path to export the dataset to. Must be an empty directory.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n    \"\"\"\n    dataset_path, settings_path, records_path = self._define_child_paths(path=path)\n    logging.info(f\"Loading dataset from {dataset_path}\")\n    logging.info(f\"Loading settings from {settings_path}\")\n    logging.info(f\"Loading records from {records_path}\")\n    # Export the dataset model, settings and records\n    self._persist_dataset_model(path=dataset_path)\n    self.settings.to_json(path=settings_path)\n    if with_records:\n        self.records.to_json(path=records_path)\n\n    return path\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._disk.DiskImportExportMixin.from_disk","title":"from_disk(path, *, name=None, workspace=None, client=None, with_records=True) classmethod","text":"

    Imports a dataset from disk as a directory containing the dataset model, settings and records. The directory should be defined using the to_disk method.

    Parameters:

    Name Type Description Default path str

    The path to the directory containing the dataset model, settings and records.

    required name str

    The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.

    None workspace Union[Workspace, str]

    The workspace to import the dataset to. Defaults to None and default workspace is used.

    None client Argilla

    The client to use for the import. Defaults to None and the default client is used.

    None with_records bool

    whether to load the records from the Hugging Face dataset. Defaults to True.

    True Source code in src/argilla/datasets/_export/_disk.py
    @classmethod\ndef from_disk(\n    cls: Type[\"Dataset\"],\n    path: str,\n    *,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str]] = None,\n    client: Optional[\"Argilla\"] = None,\n    with_records: bool = True,\n) -> \"Dataset\":\n    \"\"\"Imports a dataset from disk as a directory containing the dataset model, settings and records.\n    The directory should be defined using the `to_disk` method.\n\n    Parameters:\n        path (str): The path to the directory containing the dataset model, settings and records.\n        name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n        workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n        client (Argilla, optional): The client to use for the import. Defaults to None and the default client is used.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n    \"\"\"\n\n    client = client or Argilla._get_default()\n\n    dataset_path, settings_path, records_path = cls._define_child_paths(path=path)\n    logging.info(f\"Loading dataset from {dataset_path}\")\n    logging.info(f\"Loading settings from {settings_path}\")\n    logging.info(f\"Loading records from {records_path}\")\n    dataset_model = cls._load_dataset_model(path=dataset_path)\n\n    # Get the relevant workspace_id of the incoming dataset\n    if isinstance(workspace, str):\n        workspace = client.workspaces(workspace)\n        if not workspace:\n            raise ArgillaError(f\"Workspace {workspace} not found on the server.\")\n    else:\n        warnings.warn(\"Workspace not provided. Using default workspace.\")\n        workspace = client.workspaces.default\n    dataset_model.workspace_id = workspace.id\n\n    if name and (name != dataset_model.name):\n        logging.info(f\"Changing dataset name from {dataset_model.name} to {name}\")\n        dataset_model.name = name\n\n    if client.api.datasets.name_exists(name=dataset_model.name, workspace_id=workspace.id):\n        warnings.warn(\n            f\"Loaded dataset name {dataset_model.name} already exists in the workspace {workspace.name} so using it. To create a new dataset, provide a unique name to the `name` parameter.\"\n        )\n        dataset_model = client.api.datasets.get_by_name_and_workspace_id(\n            name=dataset_model.name, workspace_id=workspace.id\n        )\n        dataset = cls.from_model(model=dataset_model, client=client)\n    else:\n        # Create a new dataset and load the settings and records\n        dataset = cls.from_model(model=dataset_model, client=client)\n        dataset.settings = Settings.from_json(path=settings_path)\n        dataset.create()\n\n    if os.path.exists(records_path) and with_records:\n        try:\n            dataset.records.from_json(path=records_path)\n        except RecordsIngestionError as e:\n            raise RecordsIngestionError(\n                message=\"Error importing dataset records from disk. Records and datasets settings are not compatible.\"\n            ) from e\n    return dataset\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._hub.HubImportExportMixin","title":"HubImportExportMixin","text":"

    Bases: DiskImportExportMixin

    Source code in src/argilla/datasets/_export/_hub.py
    class HubImportExportMixin(DiskImportExportMixin):\n    def to_hub(\n        self: \"Dataset\",\n        repo_id: str,\n        *,\n        with_records: bool = True,\n        generate_card: Optional[bool] = True,\n        **kwargs,\n    ) -> None:\n        \"\"\"Pushes the `Dataset` to the Hugging Face Hub. If the dataset has been previously pushed to the\n        Hugging Face Hub, it will be updated instead of creating a new dataset repo.\n\n        Parameters:\n            repo_id: the ID of the Hugging Face Hub repo to push the `Dataset` to.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n            generate_card: whether to generate a dataset card for the `Dataset` in the Hugging Face Hub. Defaults\n                to `True`.\n            **kwargs: the kwargs to pass to `datasets.Dataset.push_to_hub`.\n        \"\"\"\n\n        from huggingface_hub import DatasetCardData, HfApi\n\n        from argilla.datasets._export.card import (\n            ArgillaDatasetCard,\n            size_categories_parser,\n        )\n\n        hf_api = HfApi(token=kwargs.get(\"token\"))\n\n        hfds = False\n        if with_records:\n            hfds = self.records(with_vectors=True, with_responses=True, with_suggestions=True).to_datasets()\n            hfds.push_to_hub(repo_id, **kwargs)\n        else:\n            hf_api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=kwargs.get(\"exist_ok\") or True)\n\n        with TemporaryDirectory() as tmpdirname:\n            config_dir = os.path.join(tmpdirname)\n\n            self.to_disk(path=config_dir, with_records=False)\n\n            if generate_card:\n                sample_argilla_record = next(iter(self.records(with_suggestions=True, with_responses=True)))\n                if hfds:\n                    sample_huggingface_record = hfds[0]\n                    size_categories = len(hfds)\n                else:\n                    sample_huggingface_record = \"No sample records provided\"\n                    size_categories = 0\n                card = ArgillaDatasetCard.from_template(\n                    card_data=DatasetCardData(\n                        size_categories=size_categories_parser(size_categories),\n                        tags=[\"rlfh\", \"argilla\", \"human-feedback\"],\n                    ),\n                    repo_id=repo_id,\n                    argilla_fields=self.settings.fields,\n                    argilla_questions=self.settings.questions,\n                    argilla_guidelines=self.settings.guidelines or None,\n                    argilla_vectors_settings=self.settings.vectors or None,\n                    argilla_metadata_properties=self.settings.metadata,\n                    argilla_record=sample_argilla_record.to_dict(),\n                    huggingface_record=sample_huggingface_record,\n                )\n                card.save(filepath=os.path.join(tmpdirname, \"README.md\"))\n\n            hf_api.upload_folder(\n                folder_path=tmpdirname,\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n            )\n\n    @classmethod\n    def from_hub(\n        cls: Type[\"Dataset\"],\n        repo_id: str,\n        *,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str]] = None,\n        client: Optional[\"Argilla\"] = None,\n        with_records: bool = True,\n        settings: Optional[\"Settings\"] = None,\n        **kwargs: Any,\n    ):\n        \"\"\"Loads a `Dataset` from the Hugging Face Hub.\n\n        Parameters:\n            repo_id: the ID of the Hugging Face Hub repo to load the `Dataset` from.\n            name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n            workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n            client: the client to use to load the `Dataset`. If not provided, the default client will be used.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n            **kwargs: the kwargs to pass to `datasets.Dataset.load_from_hub`.\n\n        Returns:\n            A `Dataset` loaded from the Hugging Face Hub.\n        \"\"\"\n        from datasets import load_dataset\n        from huggingface_hub import snapshot_download\n\n        if name is None:\n            name = repo_id.replace(\"/\", \"_\")\n\n        if settings is not None:\n            dataset = cls(name=name, settings=settings)\n            dataset.create()\n        else:\n            # download configuration files from the hub\n            folder_path = snapshot_download(\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n                allow_patterns=cls._DEFAULT_CONFIGURATION_FILES,\n                token=kwargs.get(\"token\"),\n            )\n\n            dataset = cls.from_disk(\n                path=folder_path, workspace=workspace, name=name, client=client, with_records=with_records\n            )\n\n        if with_records:\n            try:\n                hf_dataset = load_dataset(path=repo_id, **kwargs)  # type: ignore\n                hf_dataset = cls._get_dataset_split(hf_dataset=hf_dataset, **kwargs)\n                cls._log_dataset_records(hf_dataset=hf_dataset, dataset=dataset)\n            except EmptyDatasetError:\n                warnings.warn(\n                    message=\"Trying to load a dataset `with_records=True` but dataset does not contain any records.\",\n                    category=UserWarning,\n                )\n\n        return dataset\n\n    @staticmethod\n    def _log_dataset_records(hf_dataset: \"HFDataset\", dataset: \"Dataset\"):\n        \"\"\"This method extracts the responses from a Hugging Face dataset and returns a list of `Record` objects\"\"\"\n\n        # Identify columns that colunms that contain responses\n        responses_columns = [col for col in hf_dataset.column_names if \".responses\" in col]\n        response_questions = defaultdict(dict)\n        user_ids = {}\n        for col in responses_columns:\n            question_name = col.split(\".\")[0]\n            if col.endswith(\"users\"):\n                response_questions[question_name][\"users\"] = hf_dataset[col]\n                user_ids.update({UUID(user_id): UUID(user_id) for user_id in set(sum(hf_dataset[col], []))})\n            elif col.endswith(\"responses\"):\n                response_questions[question_name][\"responses\"] = hf_dataset[col]\n            elif col.endswith(\"status\"):\n                response_questions[question_name][\"status\"] = hf_dataset[col]\n\n        # Check if all user ids are known to this Argilla client\n        known_users_ids = [user.id for user in dataset._client.users]\n        unknown_user_ids = set(user_ids.keys()) - set(known_users_ids)\n        my_user = dataset._client.me\n        if len(unknown_user_ids) > 1:\n            warnings.warn(\n                message=f\"\"\"Found unknown user ids in dataset repo: {unknown_user_ids}.\n                    Assigning first response for each record to current user ({my_user.username}) and discarding the rest.\"\"\"\n            )\n        for unknown_user_id in unknown_user_ids:\n            user_ids[unknown_user_id] = my_user.id\n\n        # Create a mapper to map the Hugging Face dataset to a Record object\n        mapping = {col: col for col in hf_dataset.column_names if \".suggestion\" in col}\n        mapper = IngestedRecordMapper(dataset=dataset, mapping=mapping, user_id=my_user.id)\n\n        # Extract responses and create Record objects\n        records = []\n        for idx, row in enumerate(hf_dataset):\n            record = mapper(row)\n            for question_name, values in response_questions.items():\n                response_values = values[\"responses\"][idx]\n                response_users = values[\"users\"][idx]\n                response_status = values[\"status\"][idx]\n                for value, user_id, status in zip(response_values, response_users, response_status):\n                    user_id = user_ids[UUID(user_id)]\n                    if user_id in response_users:\n                        continue\n                    response_users[user_id] = True\n                    response = Response(\n                        user_id=user_id,\n                        question_name=question_name,\n                        value=value,\n                        status=status,\n                    )\n                    record.responses.add(response)\n            records.append(record)\n\n        try:\n            dataset.records.log(records=records)\n        except (RecordsIngestionError, UnprocessableEntityError) as e:\n            raise SettingsError(\n                message=f\"Failed to load records from Hugging Face dataset. Defined settings do not match dataset schema. Hugging face dataset features: {hf_dataset.features}. Argilla dataset settings : {dataset.settings}\"\n            ) from e\n\n    @staticmethod\n    def _get_dataset_split(hf_dataset: \"HFDataset\", split: Optional[str] = None, **kwargs: Dict) -> \"HFDataset\":\n        \"\"\"Get a single dataset from a Hugging Face dataset.\n\n        Parameters:\n            hf_dataset (HFDataset): The Hugging Face dataset to get a single dataset from.\n\n        Returns:\n            HFDataset: The single dataset.\n        \"\"\"\n\n        if isinstance(hf_dataset, DatasetDict) and split is None:\n            split = next(iter(hf_dataset.keys()))\n            if len(hf_dataset.keys()) > 1:\n                warnings.warn(\n                    message=f\"Multiple splits found in Hugging Face dataset. Using the first split: {split}. \"\n                    f\"Available splits are: {', '.join(hf_dataset.keys())}.\"\n                )\n            hf_dataset = hf_dataset[split]\n        return hf_dataset\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._hub.HubImportExportMixin.to_hub","title":"to_hub(repo_id, *, with_records=True, generate_card=True, **kwargs)","text":"

    Pushes the Dataset to the Hugging Face Hub. If the dataset has been previously pushed to the Hugging Face Hub, it will be updated instead of creating a new dataset repo.

    Parameters:

    Name Type Description Default repo_id str

    the ID of the Hugging Face Hub repo to push the Dataset to.

    required with_records bool

    whether to load the records from the Hugging Face dataset. Defaults to True.

    True generate_card Optional[bool]

    whether to generate a dataset card for the Dataset in the Hugging Face Hub. Defaults to True.

    True **kwargs

    the kwargs to pass to datasets.Dataset.push_to_hub.

    {} Source code in src/argilla/datasets/_export/_hub.py
    def to_hub(\n    self: \"Dataset\",\n    repo_id: str,\n    *,\n    with_records: bool = True,\n    generate_card: Optional[bool] = True,\n    **kwargs,\n) -> None:\n    \"\"\"Pushes the `Dataset` to the Hugging Face Hub. If the dataset has been previously pushed to the\n    Hugging Face Hub, it will be updated instead of creating a new dataset repo.\n\n    Parameters:\n        repo_id: the ID of the Hugging Face Hub repo to push the `Dataset` to.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        generate_card: whether to generate a dataset card for the `Dataset` in the Hugging Face Hub. Defaults\n            to `True`.\n        **kwargs: the kwargs to pass to `datasets.Dataset.push_to_hub`.\n    \"\"\"\n\n    from huggingface_hub import DatasetCardData, HfApi\n\n    from argilla.datasets._export.card import (\n        ArgillaDatasetCard,\n        size_categories_parser,\n    )\n\n    hf_api = HfApi(token=kwargs.get(\"token\"))\n\n    hfds = False\n    if with_records:\n        hfds = self.records(with_vectors=True, with_responses=True, with_suggestions=True).to_datasets()\n        hfds.push_to_hub(repo_id, **kwargs)\n    else:\n        hf_api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=kwargs.get(\"exist_ok\") or True)\n\n    with TemporaryDirectory() as tmpdirname:\n        config_dir = os.path.join(tmpdirname)\n\n        self.to_disk(path=config_dir, with_records=False)\n\n        if generate_card:\n            sample_argilla_record = next(iter(self.records(with_suggestions=True, with_responses=True)))\n            if hfds:\n                sample_huggingface_record = hfds[0]\n                size_categories = len(hfds)\n            else:\n                sample_huggingface_record = \"No sample records provided\"\n                size_categories = 0\n            card = ArgillaDatasetCard.from_template(\n                card_data=DatasetCardData(\n                    size_categories=size_categories_parser(size_categories),\n                    tags=[\"rlfh\", \"argilla\", \"human-feedback\"],\n                ),\n                repo_id=repo_id,\n                argilla_fields=self.settings.fields,\n                argilla_questions=self.settings.questions,\n                argilla_guidelines=self.settings.guidelines or None,\n                argilla_vectors_settings=self.settings.vectors or None,\n                argilla_metadata_properties=self.settings.metadata,\n                argilla_record=sample_argilla_record.to_dict(),\n                huggingface_record=sample_huggingface_record,\n            )\n            card.save(filepath=os.path.join(tmpdirname, \"README.md\"))\n\n        hf_api.upload_folder(\n            folder_path=tmpdirname,\n            repo_id=repo_id,\n            repo_type=\"dataset\",\n        )\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._hub.HubImportExportMixin.from_hub","title":"from_hub(repo_id, *, name=None, workspace=None, client=None, with_records=True, settings=None, **kwargs) classmethod","text":"

    Loads a Dataset from the Hugging Face Hub.

    Parameters:

    Name Type Description Default repo_id str

    the ID of the Hugging Face Hub repo to load the Dataset from.

    required name str

    The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.

    None workspace Union[Workspace, str]

    The workspace to import the dataset to. Defaults to None and default workspace is used.

    None client Optional[Argilla]

    the client to use to load the Dataset. If not provided, the default client will be used.

    None with_records bool

    whether to load the records from the Hugging Face dataset. Defaults to True.

    True **kwargs Any

    the kwargs to pass to datasets.Dataset.load_from_hub.

    {}

    Returns:

    Type Description

    A Dataset loaded from the Hugging Face Hub.

    Source code in src/argilla/datasets/_export/_hub.py
    @classmethod\ndef from_hub(\n    cls: Type[\"Dataset\"],\n    repo_id: str,\n    *,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str]] = None,\n    client: Optional[\"Argilla\"] = None,\n    with_records: bool = True,\n    settings: Optional[\"Settings\"] = None,\n    **kwargs: Any,\n):\n    \"\"\"Loads a `Dataset` from the Hugging Face Hub.\n\n    Parameters:\n        repo_id: the ID of the Hugging Face Hub repo to load the `Dataset` from.\n        name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n        workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n        client: the client to use to load the `Dataset`. If not provided, the default client will be used.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        **kwargs: the kwargs to pass to `datasets.Dataset.load_from_hub`.\n\n    Returns:\n        A `Dataset` loaded from the Hugging Face Hub.\n    \"\"\"\n    from datasets import load_dataset\n    from huggingface_hub import snapshot_download\n\n    if name is None:\n        name = repo_id.replace(\"/\", \"_\")\n\n    if settings is not None:\n        dataset = cls(name=name, settings=settings)\n        dataset.create()\n    else:\n        # download configuration files from the hub\n        folder_path = snapshot_download(\n            repo_id=repo_id,\n            repo_type=\"dataset\",\n            allow_patterns=cls._DEFAULT_CONFIGURATION_FILES,\n            token=kwargs.get(\"token\"),\n        )\n\n        dataset = cls.from_disk(\n            path=folder_path, workspace=workspace, name=name, client=client, with_records=with_records\n        )\n\n    if with_records:\n        try:\n            hf_dataset = load_dataset(path=repo_id, **kwargs)  # type: ignore\n            hf_dataset = cls._get_dataset_split(hf_dataset=hf_dataset, **kwargs)\n            cls._log_dataset_records(hf_dataset=hf_dataset, dataset=dataset)\n        except EmptyDatasetError:\n            warnings.warn(\n                message=\"Trying to load a dataset `with_records=True` but dataset does not contain any records.\",\n                category=UserWarning,\n            )\n\n    return dataset\n
    "},{"location":"reference/argilla/records/metadata/","title":"metadata","text":"

    Metadata in argilla is a dictionary that can be attached to a record. It is used to store additional information about the record that is not part of the record's fields or responses. For example, the source of the record, the date it was created, or any other information that is relevant to the record. Metadata can be added to a record directly or as valules within a dictionary.

    "},{"location":"reference/argilla/records/metadata/#usage-examples","title":"Usage Examples","text":"

    To use metadata within a dataset, you must define a metadata property in the dataset settings. The metadata property is a list of metadata properties that can be attached to a record. The following example demonstrates how to add metadata to a dataset and how to access metadata from a record object:

    import argilla as rg\n\ndataset = Dataset(\n    name=\"dataset_with_metadata\",\n    settings=Settings(\n        fields=[TextField(name=\"text\")],\n        questions=[LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])],\n        metadata=[\n            rg.TermsMetadataProperty(name=\"category\", options=[\"A\", \"B\", \"C\"]),\n        ],\n    ),\n)\ndataset.create()\n

    Then, you can add records to the dataset with metadata that corresponds to the metadata property defined in the dataset settings:

    dataset_with_metadata.records.log(\n    [\n        {\"text\": \"text\", \"label\": \"positive\", \"category\": \"A\"},\n        {\"text\": \"text\", \"label\": \"negative\", \"category\": \"B\"},\n    ]\n)\n
    "},{"location":"reference/argilla/records/metadata/#format-per-metadataproperty-type","title":"Format per MetadataProperty type","text":"

    Depending on the MetadataProperty type, metadata might need to be formatted in a slightly different way.

    For TermsMetadataPropertyFor FloatMetadataPropertyFor IntegerMetadataProperty
    rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": \"A\"}\n)\n\n# with multiple terms\n\nrg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": [\"A\", \"B\"]}\n)\n
    rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": 2.1}\n)\n
    rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": 42}\n)\n
    "},{"location":"reference/argilla/records/records/","title":"rg.Record","text":"

    The Record object is used to represent a single record in Argilla. It contains fields, suggestions, responses, metadata, and vectors.

    "},{"location":"reference/argilla/records/records/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/records/records/#creating-a-record","title":"Creating a Record","text":"

    To create records, you can use the Record class and pass it to the Dataset.records.log method. The Record class requires a fields parameter, which is a dictionary of field names and values. The field names must match the field names in the dataset's Settings object to be accepted.

    dataset.records.log(\n    records=[\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n        ),\n    ]\n) # (1)\n
    1. The Argilla dataset contains a field named text matching the key here.
    "},{"location":"reference/argilla/records/records/#accessing-record-attributes","title":"Accessing Record Attributes","text":"

    The Record object has suggestions, responses, metadata, and vectors attributes that can be accessed directly whilst iterating over records in a dataset.

    for record in dataset.records(\n    with_suggestions=True,\n    with_responses=True,\n    with_metadata=True,\n    with_vectors=True\n    ):\n    print(record.suggestions)\n    print(record.responses)\n    print(record.metadata)\n    print(record.vectors)\n

    Record properties can also be updated whilst iterating over records in a dataset.

    for record in dataset.records(with_metadata=True):\n    record.metadata = {\"department\": \"toys\"}\n

    For changes to take effect, the user must call the update method on the Dataset object, or pass the updated records to Dataset.records.log. All core record atttributes can be updated in this way. Check their respective documentation for more information: Suggestions, Responses, Metadata, Vectors.

    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record","title":"Record","text":"

    Bases: Resource

    The class for interacting with Argilla Records. A Record is a single sample in a dataset. Records receives feedback in the form of responses and suggestions. Records contain fields, metadata, and vectors.

    Attributes:

    Name Type Description id Union[str, UUID]

    The id of the record.

    fields RecordFields

    The fields of the record.

    metadata RecordMetadata

    The metadata of the record.

    vectors RecordVectors

    The vectors of the record.

    responses RecordResponses

    The responses of the record.

    suggestions RecordSuggestions

    The suggestions of the record.

    dataset Dataset

    The dataset to which the record belongs.

    _server_id UUID

    An id for the record generated by the Argilla server.

    Source code in src/argilla/records/_resource.py
    class Record(Resource):\n    \"\"\"The class for interacting with Argilla Records. A `Record` is a single sample\n    in a dataset. Records receives feedback in the form of responses and suggestions.\n    Records contain fields, metadata, and vectors.\n\n    Attributes:\n        id (Union[str, UUID]): The id of the record.\n        fields (RecordFields): The fields of the record.\n        metadata (RecordMetadata): The metadata of the record.\n        vectors (RecordVectors): The vectors of the record.\n        responses (RecordResponses): The responses of the record.\n        suggestions (RecordSuggestions): The suggestions of the record.\n        dataset (Dataset): The dataset to which the record belongs.\n        _server_id (UUID): An id for the record generated by the Argilla server.\n    \"\"\"\n\n    _model: RecordModel\n\n    def __init__(\n        self,\n        id: Optional[Union[UUID, str]] = None,\n        fields: Optional[Dict[str, FieldValue]] = None,\n        metadata: Optional[Dict[str, MetadataValue]] = None,\n        vectors: Optional[Dict[str, VectorValue]] = None,\n        responses: Optional[List[Response]] = None,\n        suggestions: Optional[List[Suggestion]] = None,\n        _server_id: Optional[UUID] = None,\n        _dataset: Optional[\"Dataset\"] = None,\n    ):\n        \"\"\"Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id.\n        Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions\n        and passed to Dataset.DatasetRecords.add() as a list of dictionaries.\n\n        Args:\n            id: An id for the record. If not provided, a UUID will be generated.\n            fields: A dictionary of fields for the record.\n            metadata: A dictionary of metadata for the record.\n            vectors: A dictionary of vectors for the record.\n            responses: A list of Response objects for the record.\n            suggestions: A list of Suggestion objects for the record.\n            _server_id: An id for the record. (Read-only and set by the server)\n            _dataset: The dataset object to which the record belongs.\n        \"\"\"\n        if fields is None and metadata is None and vectors is None and responses is None and suggestions is None:\n            raise ValueError(\"At least one of fields, metadata, vectors, responses, or suggestions must be provided.\")\n        if fields is None and id is None:\n            raise ValueError(\"If fields are not provided, an id must be provided.\")\n        if fields == {} and id is None:\n            raise ValueError(\"If fields are an empty dictionary, an id must be provided.\")\n\n        self._dataset = _dataset\n        self._model = RecordModel(external_id=id, id=_server_id)\n        self.__fields = RecordFields(fields=fields)\n        self.__vectors = RecordVectors(vectors=vectors)\n        self.__metadata = RecordMetadata(metadata=metadata)\n        self.__responses = RecordResponses(responses=responses, record=self)\n        self.__suggestions = RecordSuggestions(suggestions=suggestions, record=self)\n\n    def __repr__(self) -> str:\n        return (\n            f\"Record(id={self.id},status={self.status},fields={self.fields},metadata={self.metadata},\"\n            f\"suggestions={self.suggestions},responses={self.responses})\"\n        )\n\n    ############################\n    # Properties\n    ############################\n\n    @property\n    def id(self) -> str:\n        return self._model.external_id\n\n    @id.setter\n    def id(self, value: str) -> None:\n        self._model.external_id = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n\n    @property\n    def fields(self) -> \"RecordFields\":\n        return self.__fields\n\n    @property\n    def responses(self) -> \"RecordResponses\":\n        return self.__responses\n\n    @property\n    def suggestions(self) -> \"RecordSuggestions\":\n        return self.__suggestions\n\n    @property\n    def metadata(self) -> \"RecordMetadata\":\n        return self.__metadata\n\n    @property\n    def vectors(self) -> \"RecordVectors\":\n        return self.__vectors\n\n    @property\n    def status(self) -> str:\n        return self._model.status\n\n    @property\n    def _server_id(self) -> Optional[UUID]:\n        return self._model.id\n\n    ############################\n    # Public methods\n    ############################\n\n    def api_model(self) -> RecordModel:\n        return RecordModel(\n            id=self._model.id,\n            external_id=self._model.external_id,\n            fields=self.fields.to_dict(),\n            metadata=self.metadata.api_models(),\n            vectors=self.vectors.api_models(),\n            responses=self.responses.api_models(),\n            suggestions=self.suggestions.api_models(),\n            status=self.status,\n        )\n\n    def serialize(self) -> Dict[str, Any]:\n        \"\"\"Serializes the Record to a dictionary for interaction with the API\"\"\"\n        serialized_model = self._model.model_dump()\n        serialized_suggestions = [suggestion.serialize() for suggestion in self.__suggestions]\n        serialized_responses = [response.serialize() for response in self.__responses]\n        serialized_model[\"responses\"] = serialized_responses\n        serialized_model[\"suggestions\"] = serialized_suggestions\n        return serialized_model\n\n    def to_dict(self) -> Dict[str, Dict]:\n        \"\"\"Converts a Record object to a dictionary for export.\n        Returns:\n            A dictionary representing the record where the keys are \"fields\",\n            \"metadata\", \"suggestions\", and \"responses\". Each field and question is\n            represented as a key-value pair in the dictionary of the respective key. i.e.\n            `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},\n        \"\"\"\n        id = str(self.id) if self.id else None\n        server_id = str(self._model.id) if self._model.id else None\n        status = self.status\n        fields = self.fields.to_dict()\n        metadata = self.metadata.to_dict()\n        suggestions = self.suggestions.to_dict()\n        responses = self.responses.to_dict()\n        vectors = self.vectors.to_dict()\n\n        return {\n            \"id\": id,\n            \"fields\": fields,\n            \"metadata\": metadata,\n            \"suggestions\": suggestions,\n            \"responses\": responses,\n            \"vectors\": vectors,\n            \"status\": status,\n            \"_server_id\": server_id,\n        }\n\n    @classmethod\n    def from_dict(cls, data: Dict[str, Dict], dataset: Optional[\"Dataset\"] = None) -> \"Record\":\n        \"\"\"Converts a dictionary to a Record object.\n        Args:\n            data: A dictionary representing the record.\n            dataset: The dataset object to which the record belongs.\n        Returns:\n            A Record object.\n        \"\"\"\n        fields = data.get(\"fields\", {})\n        metadata = data.get(\"metadata\", {})\n        suggestions = data.get(\"suggestions\", {})\n        responses = data.get(\"responses\", {})\n        vectors = data.get(\"vectors\", {})\n        record_id = data.get(\"id\", None)\n        _server_id = data.get(\"_server_id\", None)\n\n        suggestions = [Suggestion(question_name=question_name, **value) for question_name, value in suggestions.items()]\n        responses = [\n            Response(question_name=question_name, **value)\n            for question_name, _responses in responses.items()\n            for value in _responses\n        ]\n\n        return cls(\n            id=record_id,\n            fields=fields,\n            suggestions=suggestions,\n            responses=responses,\n            vectors=vectors,\n            metadata=metadata,\n            _dataset=dataset,\n            _server_id=_server_id,\n        )\n\n    @classmethod\n    def from_model(cls, model: RecordModel, dataset: \"Dataset\") -> \"Record\":\n        \"\"\"Converts a RecordModel object to a Record object.\n        Args:\n            model: A RecordModel object.\n            dataset: The dataset object to which the record belongs.\n        Returns:\n            A Record object.\n        \"\"\"\n        instance = cls(\n            id=model.external_id,\n            fields=model.fields,\n            metadata={meta.name: meta.value for meta in model.metadata},\n            vectors={vector.name: vector.vector_values for vector in model.vectors},\n            # Responses and their models are not aligned 1-1.\n            responses=[\n                response\n                for response_model in model.responses\n                for response in UserResponse.from_model(response_model, dataset=dataset)\n            ],\n            suggestions=[Suggestion.from_model(model=suggestion, dataset=dataset) for suggestion in model.suggestions],\n        )\n\n        # set private attributes\n        instance._dataset = dataset\n        instance._model.id = model.id\n        instance._model.status = model.status\n\n        return instance\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.__init__","title":"__init__(id=None, fields=None, metadata=None, vectors=None, responses=None, suggestions=None, _server_id=None, _dataset=None)","text":"

    Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id. Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions and passed to Dataset.DatasetRecords.add() as a list of dictionaries.

    Parameters:

    Name Type Description Default id Optional[Union[UUID, str]]

    An id for the record. If not provided, a UUID will be generated.

    None fields Optional[Dict[str, FieldValue]]

    A dictionary of fields for the record.

    None metadata Optional[Dict[str, MetadataValue]]

    A dictionary of metadata for the record.

    None vectors Optional[Dict[str, VectorValue]]

    A dictionary of vectors for the record.

    None responses Optional[List[Response]]

    A list of Response objects for the record.

    None suggestions Optional[List[Suggestion]]

    A list of Suggestion objects for the record.

    None _server_id Optional[UUID]

    An id for the record. (Read-only and set by the server)

    None _dataset Optional[Dataset]

    The dataset object to which the record belongs.

    None Source code in src/argilla/records/_resource.py
    def __init__(\n    self,\n    id: Optional[Union[UUID, str]] = None,\n    fields: Optional[Dict[str, FieldValue]] = None,\n    metadata: Optional[Dict[str, MetadataValue]] = None,\n    vectors: Optional[Dict[str, VectorValue]] = None,\n    responses: Optional[List[Response]] = None,\n    suggestions: Optional[List[Suggestion]] = None,\n    _server_id: Optional[UUID] = None,\n    _dataset: Optional[\"Dataset\"] = None,\n):\n    \"\"\"Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id.\n    Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions\n    and passed to Dataset.DatasetRecords.add() as a list of dictionaries.\n\n    Args:\n        id: An id for the record. If not provided, a UUID will be generated.\n        fields: A dictionary of fields for the record.\n        metadata: A dictionary of metadata for the record.\n        vectors: A dictionary of vectors for the record.\n        responses: A list of Response objects for the record.\n        suggestions: A list of Suggestion objects for the record.\n        _server_id: An id for the record. (Read-only and set by the server)\n        _dataset: The dataset object to which the record belongs.\n    \"\"\"\n    if fields is None and metadata is None and vectors is None and responses is None and suggestions is None:\n        raise ValueError(\"At least one of fields, metadata, vectors, responses, or suggestions must be provided.\")\n    if fields is None and id is None:\n        raise ValueError(\"If fields are not provided, an id must be provided.\")\n    if fields == {} and id is None:\n        raise ValueError(\"If fields are an empty dictionary, an id must be provided.\")\n\n    self._dataset = _dataset\n    self._model = RecordModel(external_id=id, id=_server_id)\n    self.__fields = RecordFields(fields=fields)\n    self.__vectors = RecordVectors(vectors=vectors)\n    self.__metadata = RecordMetadata(metadata=metadata)\n    self.__responses = RecordResponses(responses=responses, record=self)\n    self.__suggestions = RecordSuggestions(suggestions=suggestions, record=self)\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.serialize","title":"serialize()","text":"

    Serializes the Record to a dictionary for interaction with the API

    Source code in src/argilla/records/_resource.py
    def serialize(self) -> Dict[str, Any]:\n    \"\"\"Serializes the Record to a dictionary for interaction with the API\"\"\"\n    serialized_model = self._model.model_dump()\n    serialized_suggestions = [suggestion.serialize() for suggestion in self.__suggestions]\n    serialized_responses = [response.serialize() for response in self.__responses]\n    serialized_model[\"responses\"] = serialized_responses\n    serialized_model[\"suggestions\"] = serialized_suggestions\n    return serialized_model\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.to_dict","title":"to_dict()","text":"

    Converts a Record object to a dictionary for export. Returns: A dictionary representing the record where the keys are \"fields\", \"metadata\", \"suggestions\", and \"responses\". Each field and question is represented as a key-value pair in the dictionary of the respective key. i.e. `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},

    Source code in src/argilla/records/_resource.py
    def to_dict(self) -> Dict[str, Dict]:\n    \"\"\"Converts a Record object to a dictionary for export.\n    Returns:\n        A dictionary representing the record where the keys are \"fields\",\n        \"metadata\", \"suggestions\", and \"responses\". Each field and question is\n        represented as a key-value pair in the dictionary of the respective key. i.e.\n        `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},\n    \"\"\"\n    id = str(self.id) if self.id else None\n    server_id = str(self._model.id) if self._model.id else None\n    status = self.status\n    fields = self.fields.to_dict()\n    metadata = self.metadata.to_dict()\n    suggestions = self.suggestions.to_dict()\n    responses = self.responses.to_dict()\n    vectors = self.vectors.to_dict()\n\n    return {\n        \"id\": id,\n        \"fields\": fields,\n        \"metadata\": metadata,\n        \"suggestions\": suggestions,\n        \"responses\": responses,\n        \"vectors\": vectors,\n        \"status\": status,\n        \"_server_id\": server_id,\n    }\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.from_dict","title":"from_dict(data, dataset=None) classmethod","text":"

    Converts a dictionary to a Record object. Args: data: A dictionary representing the record. dataset: The dataset object to which the record belongs. Returns: A Record object.

    Source code in src/argilla/records/_resource.py
    @classmethod\ndef from_dict(cls, data: Dict[str, Dict], dataset: Optional[\"Dataset\"] = None) -> \"Record\":\n    \"\"\"Converts a dictionary to a Record object.\n    Args:\n        data: A dictionary representing the record.\n        dataset: The dataset object to which the record belongs.\n    Returns:\n        A Record object.\n    \"\"\"\n    fields = data.get(\"fields\", {})\n    metadata = data.get(\"metadata\", {})\n    suggestions = data.get(\"suggestions\", {})\n    responses = data.get(\"responses\", {})\n    vectors = data.get(\"vectors\", {})\n    record_id = data.get(\"id\", None)\n    _server_id = data.get(\"_server_id\", None)\n\n    suggestions = [Suggestion(question_name=question_name, **value) for question_name, value in suggestions.items()]\n    responses = [\n        Response(question_name=question_name, **value)\n        for question_name, _responses in responses.items()\n        for value in _responses\n    ]\n\n    return cls(\n        id=record_id,\n        fields=fields,\n        suggestions=suggestions,\n        responses=responses,\n        vectors=vectors,\n        metadata=metadata,\n        _dataset=dataset,\n        _server_id=_server_id,\n    )\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.from_model","title":"from_model(model, dataset) classmethod","text":"

    Converts a RecordModel object to a Record object. Args: model: A RecordModel object. dataset: The dataset object to which the record belongs. Returns: A Record object.

    Source code in src/argilla/records/_resource.py
    @classmethod\ndef from_model(cls, model: RecordModel, dataset: \"Dataset\") -> \"Record\":\n    \"\"\"Converts a RecordModel object to a Record object.\n    Args:\n        model: A RecordModel object.\n        dataset: The dataset object to which the record belongs.\n    Returns:\n        A Record object.\n    \"\"\"\n    instance = cls(\n        id=model.external_id,\n        fields=model.fields,\n        metadata={meta.name: meta.value for meta in model.metadata},\n        vectors={vector.name: vector.vector_values for vector in model.vectors},\n        # Responses and their models are not aligned 1-1.\n        responses=[\n            response\n            for response_model in model.responses\n            for response in UserResponse.from_model(response_model, dataset=dataset)\n        ],\n        suggestions=[Suggestion.from_model(model=suggestion, dataset=dataset) for suggestion in model.suggestions],\n    )\n\n    # set private attributes\n    instance._dataset = dataset\n    instance._model.id = model.id\n    instance._model.status = model.status\n\n    return instance\n
    "},{"location":"reference/argilla/records/responses/","title":"rg.Response","text":"

    Class for interacting with Argilla Responses of records. Responses are answers to questions by a user. Therefore, a recod question can have multiple responses, one for each user that has answered the question. A Response is typically created by a user in the UI or consumed from a data source as a label, unlike a Suggestion which is typically created by a model prediction.

    "},{"location":"reference/argilla/records/responses/#usage-examples","title":"Usage Examples","text":"

    Responses can be added to an instantiated Record directly or as a dictionary a dictionary. The following examples demonstrate how to add responses to a record object and how to access responses from a record object:

    Instantiate the Record and related Response objects:

    dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            responses=[rg.Response(\"label\", \"negative\", user_id=user.id)],\n            external_id=str(uuid.uuid4()),\n        )\n    ]\n)\n

    Or, add a response from a dictionary where key is the question name and value is the response:

    dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"label.response\": \"negative\",\n        },\n    ]\n)\n

    Responses can be accessed from a Record via their question name as an attribute of the record. So if a question is named label, the response can be accessed as record.label. The following example demonstrates how to access responses from a record object:

    # iterate over the records and responses\n\nfor record in dataset.records:\n    for response in record.responses[\"label\"]: # (1)\n        print(response.value)\n        print(response.user_id)\n\n# validate that the record has a response\n\nfor record in dataset.records:\n    if record.responses[\"label\"]:\n        for response in record.responses[\"label\"]:\n            print(response.value)\n            print(response.user_id)\n    else:\n        record.responses.add(\n            rg.Response(\"label\", \"positive\", user_id=user.id)\n        ) # (2)\n
    1. Access the responses for the question named label for each record like a dictionary containing a list of Response objects. 2. Add a response to the record if it does not already have one.

    "},{"location":"reference/argilla/records/responses/#format-per-question-type","title":"Format per Question type","text":"

    Depending on the Question type, responses might need to be formatted in a slightly different way.

    For LabelQuestionFor MultiLabelQuestionFor RankingQuestionFor RatingQuestionFor SpanQuestionFor TextQuestion
    rg.Response(\n    question_name=\"label\",\n    value=\"positive\",\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"multi-label\",\n    value=[\"positive\", \"negative\"],\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"rank\",\n    value=[\"1\", \"3\", \"2\"],\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"rating\",\n    value=4,\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"span\",\n    value=[{\"start\": 0, \"end\": 9, \"label\": \"MISC\"}],\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"text\",\n    value=\"value\",\n    user_id=user.id,\n    status=\"draft\"\n)\n
    "},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response","title":"Response","text":"

    Class for interacting with Argilla Responses of records. Responses are answers to questions by a user. Therefore, a record question can have multiple responses, one for each user that has answered the question. A Response is typically created by a user in the UI or consumed from a data source as a label, unlike a Suggestion which is typically created by a model prediction.

    Source code in src/argilla/responses.py
    class Response:\n    \"\"\"Class for interacting with Argilla Responses of records. Responses are answers to questions by a user.\n    Therefore, a record question can have multiple responses, one for each user that has answered the question.\n    A `Response` is typically created by a user in the UI or consumed from a data source as a label,\n    unlike a `Suggestion` which is typically created by a model prediction.\n\n    \"\"\"\n\n    def __init__(\n        self,\n        question_name: str,\n        value: Any,\n        user_id: UUID,\n        status: Optional[Union[ResponseStatus, str]] = None,\n        _record: Optional[\"Record\"] = None,\n    ) -> None:\n        \"\"\"Initializes a `Response` for a `Record` with a user_id and value\n\n        Attributes:\n            question_name (str): The name of the question that the suggestion is for.\n            value (str): The value of the response\n            user_id (UUID): The id of the user that submits the response\n            status (Union[ResponseStatus, str]): The status of the response as \"draft\", \"submitted\", \"discarded\".\n        \"\"\"\n\n        if question_name is None:\n            raise ValueError(\"question_name is required\")\n        if value is None:\n            raise ValueError(\"value is required\")\n        if user_id is None:\n            raise ValueError(\"user_id is required\")\n\n        if isinstance(status, str):\n            status = ResponseStatus(status)\n\n        self.record = _record\n        self.question_name = question_name\n        self.value = value\n        self.user_id = user_id\n        self.status = status\n\n    def serialize(self) -> dict[str, Any]:\n        \"\"\"Serializes the Response to a dictionary. This is principally used for sending the response to the API, \\\n            but can be used for data wrangling or manual export.\n\n        Returns:\n            dict[str, Any]: The serialized response as a dictionary with keys `question_name`, `value`, and `user_id`.\n\n        Examples:\n\n        ```python\n        response = rg.Response(\"label\", \"negative\", user_id=user.id)\n        response.serialize()\n        ```\n        \"\"\"\n        return {\n            \"question_name\": self.question_name,\n            \"value\": self.value,\n            \"user_id\": self.user_id,\n            \"status\": self.status,\n        }\n
    "},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.__init__","title":"__init__(question_name, value, user_id, status=None, _record=None)","text":"

    Initializes a Response for a Record with a user_id and value

    Attributes:

    Name Type Description question_name str

    The name of the question that the suggestion is for.

    value str

    The value of the response

    user_id UUID

    The id of the user that submits the response

    status Union[ResponseStatus, str]

    The status of the response as \"draft\", \"submitted\", \"discarded\".

    Source code in src/argilla/responses.py
    def __init__(\n    self,\n    question_name: str,\n    value: Any,\n    user_id: UUID,\n    status: Optional[Union[ResponseStatus, str]] = None,\n    _record: Optional[\"Record\"] = None,\n) -> None:\n    \"\"\"Initializes a `Response` for a `Record` with a user_id and value\n\n    Attributes:\n        question_name (str): The name of the question that the suggestion is for.\n        value (str): The value of the response\n        user_id (UUID): The id of the user that submits the response\n        status (Union[ResponseStatus, str]): The status of the response as \"draft\", \"submitted\", \"discarded\".\n    \"\"\"\n\n    if question_name is None:\n        raise ValueError(\"question_name is required\")\n    if value is None:\n        raise ValueError(\"value is required\")\n    if user_id is None:\n        raise ValueError(\"user_id is required\")\n\n    if isinstance(status, str):\n        status = ResponseStatus(status)\n\n    self.record = _record\n    self.question_name = question_name\n    self.value = value\n    self.user_id = user_id\n    self.status = status\n
    "},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.serialize","title":"serialize()","text":"

    Serializes the Response to a dictionary. This is principally used for sending the response to the API, but can be used for data wrangling or manual export.

    Returns:

    Type Description dict[str, Any]

    dict[str, Any]: The serialized response as a dictionary with keys question_name, value, and user_id.

    Examples:

    response = rg.Response(\"label\", \"negative\", user_id=user.id)\nresponse.serialize()\n
    Source code in src/argilla/responses.py
    def serialize(self) -> dict[str, Any]:\n    \"\"\"Serializes the Response to a dictionary. This is principally used for sending the response to the API, \\\n        but can be used for data wrangling or manual export.\n\n    Returns:\n        dict[str, Any]: The serialized response as a dictionary with keys `question_name`, `value`, and `user_id`.\n\n    Examples:\n\n    ```python\n    response = rg.Response(\"label\", \"negative\", user_id=user.id)\n    response.serialize()\n    ```\n    \"\"\"\n    return {\n        \"question_name\": self.question_name,\n        \"value\": self.value,\n        \"user_id\": self.user_id,\n        \"status\": self.status,\n    }\n
    "},{"location":"reference/argilla/records/suggestions/","title":"rg.Suggestion","text":"

    Class for interacting with Argilla Suggestions of records. Suggestions are typically created by a model prediction, unlike a Response which is typically created by a user in the UI or consumed from a data source as a label.

    "},{"location":"reference/argilla/records/suggestions/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/records/suggestions/#adding-records-with-suggestions","title":"Adding records with suggestions","text":"

    Suggestions can be added to a record directly or via a dictionary structure. The following examples demonstrate how to add suggestions to a record object and how to access suggestions from a record object:

    Add a response from a dictionary where key is the question name and value is the response:

    dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"label\": \"negative\", # this will be used as a suggestion\n        },\n    ]\n)\n

    If your data contains scores for suggestions you can add them as well via the mapping parameter. The following example demonstrates how to add a suggestion with a score to a record object:

    dataset.records.log(\n    [\n        {\n            \"prompt\": \"Hello World, how are you?\",\n            \"label\": \"negative\",  # this will be used as a suggestion\n            \"score\": 0.9,  # this will be used as the suggestion score\n            \"model\": \"model_name\",  # this will be used as the suggestion agent\n        },\n    ],\n    mapping={\n        \"score\": \"label.suggestion.score\",\n        \"model\": \"label.suggestion.agent\",\n    },  # `label` is the question name in the dataset settings\n)\n

    Or, instantiate the Record and related Suggestions objects directly, like this:

    dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            suggestions=[rg.Suggestion(\"negative\", \"label\", score=0.9, agent=\"model_name\")],\n        )\n    ]\n)\n
    "},{"location":"reference/argilla/records/suggestions/#iterating-over-records-with-suggestions","title":"Iterating over records with suggestions","text":"

    Just like responses, suggestions can be accessed from a Record via their question name as an attribute of the record. So if a question is named label, the suggestion can be accessed as record.label. The following example demonstrates how to access suggestions from a record object:

    for record in dataset.records(with_suggestions=True):\n    print(record.suggestions[\"label\"].value)\n

    We can also add suggestions to records as we iterate over them using the add method:

    for record in dataset.records(with_suggestions=True):\n    if not record.suggestions[\"label\"]: # (1)\n        record.suggestions.add(\n            rg.Suggestion(\"positive\", \"label\", score=0.9, agent=\"model_name\")\n        ) # (2)\n
    1. Validate that the record has a suggestion
    2. Add a suggestion to the record if it does not already have one
    "},{"location":"reference/argilla/records/suggestions/#format-per-question-type","title":"Format per Question type","text":"

    Depending on the Question type, responses might need to be formatted in a slightly different way.

    For LabelQuestionFor MultiLabelQuestionFor RankingQuestionFor RatingQuestionFor SpanQuestionFor TextQuestion
    rg.Suggestion(\n    question_name=\"label\",\n    value=\"positive\",\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"multi-label\",\n    value=[\"positive\", \"negative\"],\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"rank\",\n    value=[\"1\", \"3\", \"2\"],\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"rating\",\n    value=4,\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"span\",\n    value=[{\"start\": 0, \"end\": 9, \"label\": \"MISC\"}],\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"text\",\n    value=\"value\",\n    score=0.9,\n    agent=\"model_name\"\n)\n
    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion","title":"Suggestion","text":"

    Bases: Resource

    Class for interacting with Argilla Suggestions. Suggestions are typically model predictions for records. Suggestions are rendered in the user interfaces as 'hints' or 'suggestions' for the user to review and accept or reject.

    Attributes:

    Name Type Description question_name str

    The name of the question that the suggestion is for.

    value str

    The value of the suggestion

    score float

    The score of the suggestion. For example, the probability of the model prediction.

    agent str

    The agent that created the suggestion. For example, the model name.

    type str

    The type of suggestion, either 'model' or 'human'.

    Source code in src/argilla/suggestions.py
    class Suggestion(Resource):\n    \"\"\"Class for interacting with Argilla Suggestions. Suggestions are typically model predictions for records.\n    Suggestions are rendered in the user interfaces as 'hints' or 'suggestions' for the user to review and accept or reject.\n\n    Attributes:\n        question_name (str): The name of the question that the suggestion is for.\n        value (str): The value of the suggestion\n        score (float): The score of the suggestion. For example, the probability of the model prediction.\n        agent (str): The agent that created the suggestion. For example, the model name.\n        type (str): The type of suggestion, either 'model' or 'human'.\n    \"\"\"\n\n    _model: SuggestionModel\n\n    def __init__(\n        self,\n        question_name: str,\n        value: Any,\n        score: Union[float, List[float], None] = None,\n        agent: Optional[str] = None,\n        type: Optional[Literal[\"model\", \"human\"]] = None,\n        _record: Optional[\"Record\"] = None,\n    ) -> None:\n        super().__init__()\n\n        if question_name is None:\n            raise ValueError(\"question_name is required\")\n        if value is None:\n            raise ValueError(\"value is required\")\n\n        self.record = _record\n        self._model = SuggestionModel(\n            question_name=question_name,\n            value=value,\n            type=type,\n            score=score,\n            agent=agent,\n        )\n\n    ##############################\n    # Properties\n    ##############################\n\n    @property\n    def value(self) -> Any:\n        \"\"\"The value of the suggestion.\"\"\"\n        return self._model.value\n\n    @property\n    def question_name(self) -> Optional[str]:\n        \"\"\"The name of the question that the suggestion is for.\"\"\"\n        return self._model.question_name\n\n    @question_name.setter\n    def question_name(self, value: str) -> None:\n        self._model.question_name = value\n\n    @property\n    def type(self) -> Optional[Literal[\"model\", \"human\"]]:\n        \"\"\"The type of suggestion, either 'model' or 'human'.\"\"\"\n        return self._model.type\n\n    @property\n    def score(self) -> Optional[Union[float, List[float]]]:\n        \"\"\"The score of the suggestion.\"\"\"\n        return self._model.score\n\n    @score.setter\n    def score(self, value: float) -> None:\n        self._model.score = value\n\n    @property\n    def agent(self) -> Optional[str]:\n        \"\"\"The agent that created the suggestion.\"\"\"\n        return self._model.agent\n\n    @agent.setter\n    def agent(self, value: str) -> None:\n        self._model.agent = value\n\n    @classmethod\n    def from_model(cls, model: SuggestionModel, dataset: \"Dataset\") -> \"Suggestion\":\n        question = dataset.settings.questions[model.question_id]\n        model.question_name = question.name\n        model.value = cls.__from_model_value(model.value, question)\n\n        instance = cls(question.name, model.value)\n        instance._model = model\n\n        return instance\n\n    def api_model(self) -> SuggestionModel:\n        if self.record is None or self.record.dataset is None:\n            return self._model\n\n        question = self.record.dataset.settings.questions[self.question_name]\n        if question:\n            return SuggestionModel(\n                value=self.__to_model_value(self.value, question),\n                question_name=None if not question else question.name,\n                question_id=None if not question else question.id,\n                type=self._model.type,\n                score=self._model.score,\n                agent=self._model.agent,\n                id=self._model.id,\n            )\n        else:\n            raise RecordSuggestionsError(\n                f\"Record suggestion is invalid because question with name={self.question_name} does not exist in the dataset ({self.record.dataset.name}). Available questions are: {list(self.record.dataset.settings.questions._properties_by_name.keys())}\"\n            )\n\n    @classmethod\n    def __to_model_value(cls, value: Any, question: \"QuestionType\") -> Any:\n        if isinstance(question, RankingQuestion):\n            return cls.__ranking_to_model_value(value)\n        return value\n\n    @classmethod\n    def __from_model_value(cls, value: Any, question: \"QuestionType\") -> Any:\n        if isinstance(question, RankingQuestion):\n            return cls.__ranking_from_model_value(value)\n        return value\n\n    @classmethod\n    def __ranking_from_model_value(cls, value: List[Dict[str, Any]]) -> List[str]:\n        return [v[\"value\"] for v in value]\n\n    @classmethod\n    def __ranking_to_model_value(cls, value: List[str]) -> List[Dict[str, str]]:\n        return [{\"value\": str(v)} for v in value]\n
    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.value","title":"value: Any property","text":"

    The value of the suggestion.

    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.question_name","title":"question_name: Optional[str] property writable","text":"

    The name of the question that the suggestion is for.

    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.type","title":"type: Optional[Literal['model', 'human']] property","text":"

    The type of suggestion, either 'model' or 'human'.

    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.score","title":"score: Optional[Union[float, List[float]]] property writable","text":"

    The score of the suggestion.

    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.agent","title":"agent: Optional[str] property writable","text":"

    The agent that created the suggestion.

    "},{"location":"reference/argilla/records/vectors/","title":"rg.Vector","text":"

    A vector is a numerical representation of a Record field or attribute, usually the record's text. Vectors can be used to search for similar records via the UI or SDK. Vectors can be added to a record directly or as a dictionary with a key that the matches rg.VectorField name.

    "},{"location":"reference/argilla/records/vectors/#usage-examples","title":"Usage Examples","text":"

    To use vectors within a dataset, you must define a vector field in the dataset settings. The vector field is a list of vector fields that can be attached to a record. The following example demonstrates how to add vectors to a dataset and how to access vectors from a record object:

    import argilla as rg\n\ndataset = Dataset(\n    name=\"dataset_with_metadata\",\n    settings=Settings(\n        fields=[TextField(name=\"text\")],\n        questions=[LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])],\n        vectors=[\n            VectorField(name=\"vector_name\"),\n        ],\n    ),\n)\ndataset.create()\n

    Then, you can add records to the dataset with vectors that correspond to the vector field defined in the dataset settings:

    dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"vector_name\": [0.1, 0.2, 0.3]\n        }\n    ]\n)\n

    Vectors can be passed using a mapping, where the key is the key in the data source and the value is the name in the dataset's setting's rg.VectorField object. For example, the following code adds a record with a vector using a mapping:

    dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"x\": [0.1, 0.2, 0.3]\n        }\n    ],\n    mapping={\"x\": \"vector_name\"}\n)\n

    Or, vectors can be instantiated and added to a record directly, like this:

    dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            vectors=[rg.Vector(\"embedding\", [0.1, 0.2, 0.3])],\n        )\n    ]\n)\n
    "},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector","title":"Vector","text":"

    Bases: Resource

    Class for interacting with Argilla Vectors. Vectors are typically used to represent embeddings or features of records. The Vector class is used to deliver vectors to the Argilla server.

    Attributes:

    Name Type Description name str

    The name of the vector.

    values list[float]

    The values of the vector.

    Source code in src/argilla/vectors.py
    class Vector(Resource):\n    \"\"\" Class for interacting with Argilla Vectors. Vectors are typically used to represent \\\n        embeddings or features of records. The `Vector` class is used to deliver vectors to the Argilla server.\n\n    Attributes:\n        name (str): The name of the vector.\n        values (list[float]): The values of the vector.\n    \"\"\"\n\n    _model: VectorModel\n\n    def __init__(\n        self,\n        name: str,\n        values: list[float],\n    ) -> None:\n        \"\"\"Initializes a Vector with a name and values that can be used to search in the Argilla ui.\n\n        Parameters:\n            name (str): Name of the vector\n            values (list[float]): List of float values\n\n        \"\"\"\n        self._model = VectorModel(\n            name=name,\n            vector_values=values,\n        )\n\n    def __repr__(self) -> str:\n        return repr(f\"{self.__class__.__name__}({self._model})\")\n\n    ##############################\n    # Properties\n    ##############################\n\n    @property\n    def name(self) -> str:\n        \"\"\"Name of the vector that corresponds to the name of the vector in the dataset's `Settings`\"\"\"\n        return self._model.name\n\n    @property\n    def values(self) -> list[float]:\n        \"\"\"List of float values that represent the vector.\"\"\"\n        return self._model.vector_values\n\n    ##############################\n    # Methods\n    ##############################\n\n    @classmethod\n    def from_model(cls, model: VectorModel) -> \"Vector\":\n        return cls(\n            name=model.name,\n            values=model.vector_values,\n        )\n\n    def serialize(self) -> dict[str, Any]:\n        dumped_model = self._model.model_dump()\n        name = dumped_model.pop(\"name\")\n        values = dumped_model.pop(\"vector_values\")\n        return {name: values}\n
    "},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.name","title":"name: str property","text":"

    Name of the vector that corresponds to the name of the vector in the dataset's Settings

    "},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.values","title":"values: list[float] property","text":"

    List of float values that represent the vector.

    "},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.__init__","title":"__init__(name, values)","text":"

    Initializes a Vector with a name and values that can be used to search in the Argilla ui.

    Parameters:

    Name Type Description Default name str

    Name of the vector

    required values list[float]

    List of float values

    required Source code in src/argilla/vectors.py
    def __init__(\n    self,\n    name: str,\n    values: list[float],\n) -> None:\n    \"\"\"Initializes a Vector with a name and values that can be used to search in the Argilla ui.\n\n    Parameters:\n        name (str): Name of the vector\n        values (list[float]): List of float values\n\n    \"\"\"\n    self._model = VectorModel(\n        name=name,\n        vector_values=values,\n    )\n
    "},{"location":"reference/argilla/settings/fields/","title":"Fields","text":"

    Fields in Argilla define the content of a record that will be reviewed by a user.

    "},{"location":"reference/argilla/settings/fields/#usage-examples","title":"Usage Examples","text":"

    To define a field, instantiate the TextField class and pass it to the fields parameter of the Settings class.

    text_field = rg.TextField(name=\"text\")\nmarkdown_field = rg.TextField(name=\"text\", use_markdown=True)\n\nsettings = rg.Settings(\n    fields=[\n        text_field,\n        markdown_field,\n    ],\n    questions=[\n        rg.TextQuestion(name=\"response\"),\n    ],\n)\n\ndata = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

    To add records with values for fields, refer to the rg.Dataset.records documentation.

    "},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.TextField","title":"TextField","text":"

    Bases: SettingsPropertyBase

    Text field for use in Argilla Dataset Settings

    Source code in src/argilla/settings/_field.py
    class TextField(SettingsPropertyBase):\n    \"\"\"Text field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    _model: FieldModel\n    _api: FieldsAPI\n\n    _dataset: Optional[\"Dataset\"]\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        use_markdown: Optional[bool] = False,\n        required: bool = True,\n        description: Optional[str] = None,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Text field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str]): The name of the field, as it will be displayed in the UI.\n            use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n                to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n            required (bool): Whether the field is required. At least one field must be required.\n            description (Optional[str]): The description of the field.\n        \"\"\"\n        client = client or Argilla._get_default()\n\n        super().__init__(api=client.api.fields, client=client)\n\n        self._model = FieldModel(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=TextFieldSettings(use_markdown=use_markdown),\n        )\n\n        self._dataset = None\n\n    @classmethod\n    def from_model(cls, model: FieldModel) -> \"TextField\":\n        instance = cls(name=model.name)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"TextField\":\n        model = FieldModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def use_markdown(self) -> Optional[bool]:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, value: bool) -> None:\n        self._model.settings.use_markdown = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n        self._model.dataset_id = self._dataset.id\n        self._with_client(self._dataset._client)\n\n    def _with_client(self, client: \"Argilla\") -> \"Self\":\n        # TODO: Review and simplify. Maybe only one of them is required\n        self._client = client\n        self._api = self._client.api.fields\n\n        return self\n
    "},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.TextField.__init__","title":"__init__(name, title=None, use_markdown=False, required=True, description=None, client=None)","text":"

    Text field for use in Argilla Dataset Settings

    Parameters:

    Name Type Description Default name str

    The name of the field

    required title Optional[str]

    The name of the field, as it will be displayed in the UI.

    None use_markdown Optional[bool]

    Whether to render the markdown in the UI. When True, you will be able to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.

    False required bool

    Whether the field is required. At least one field must be required.

    True description Optional[str]

    The description of the field.

    None Source code in src/argilla/settings/_field.py
    def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    use_markdown: Optional[bool] = False,\n    required: bool = True,\n    description: Optional[str] = None,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Text field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str]): The name of the field, as it will be displayed in the UI.\n        use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n            to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n        required (bool): Whether the field is required. At least one field must be required.\n        description (Optional[str]): The description of the field.\n    \"\"\"\n    client = client or Argilla._get_default()\n\n    super().__init__(api=client.api.fields, client=client)\n\n    self._model = FieldModel(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=TextFieldSettings(use_markdown=use_markdown),\n    )\n\n    self._dataset = None\n
    "},{"location":"reference/argilla/settings/metadata_property/","title":"Metadata Properties","text":"

    Metadata properties are used to define metadata fields in a dataset. Metadata fields are used to store additional information about the records in the dataset. For example, the category of a record, the price of a product, or any other information that is relevant to the record.

    "},{"location":"reference/argilla/settings/metadata_property/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/settings/metadata_property/#defining-metadata-property-for-a-dataset","title":"Defining Metadata Property for a dataset","text":"

    We define metadata properties via type specific classes. The following example demonstrates how to define metadata properties as either a float, integer, or terms metadata property and pass them to the Settings.

    TermsMetadataProperty is used to define a metadata field with a list of options. For example, a color field with options red, blue, and green. FloatMetadataProperty and IntegerMetadataProperty is used to define a metadata field with a float value. For example, a price field with a minimum value of 0.0 and a maximum value of 100.0.

    metadata_field = rg.TermsMetadataProperty(\n    name=\"color\",\n    options=[\"red\", \"blue\", \"green\"],\n    title=\"Color\",\n)\n\nfloat_metadata_field = rg.FloatMetadataProperty(\n    name=\"price\",\n    min=0.0,\n    max=100.0,\n    title=\"Price\",\n)\n\nint_metadata_field = rg.IntegerMetadataProperty(\n    name=\"quantity\",\n    min=0,\n    max=100,\n    title=\"Quantity\",\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=rg.Settings(\n        fields=[\n            rg.TextField(name=\"text\"),\n        ],\n        questions=[\n            rg.TextQuestion(name=\"response\"),\n        ],\n        metadata=[\n            metadata_field,\n            float_metadata_field,\n            int_metadata_field,\n        ],\n    ),\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

    To add records with metadata, refer to the rg.Metadata class documentation.

    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.FloatMetadataProperty","title":"FloatMetadataProperty","text":"

    Bases: MetadataPropertyBase

    Source code in src/argilla/settings/_metadata.py
    class FloatMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        min: Optional[float] = None,\n        max: Optional[float] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with float settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            min (Optional[float]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n            max (Optional[float]): The maximum valid value. If none is provided, it will be computed from the values provided in the records.\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings.\n        \"\"\"\n\n        super().__init__(client=client)\n\n        try:\n            settings = FloatMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.float)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.float,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def min(self) -> Optional[int]:\n        return self._model.settings.min\n\n    @min.setter\n    def min(self, value: Optional[int]) -> None:\n        self._model.settings.min = value\n\n    @property\n    def max(self) -> Optional[int]:\n        return self._model.settings.max\n\n    @max.setter\n    def max(self, value: Optional[int]) -> None:\n        self._model.settings.max = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"FloatMetadataProperty\":\n        instance = FloatMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.FloatMetadataProperty.__init__","title":"__init__(name, min=None, max=None, title=None, visible_for_annotators=True, client=None)","text":"

    Create a metadata field with float settings.

    Parameters:

    Name Type Description Default name str

    The name of the metadata field

    required min Optional[float]

    The minimum valid value. If none is provided, it will be computed from the values provided in the records.

    None max Optional[float]

    The maximum valid value. If none is provided, it will be computed from the values provided in the records.

    None title Optional[str]

    The title of the metadata to be shown in the UI

    None visible_for_annotators Optional[bool]

    Whether the metadata field is visible for annotators.

    True

    Raises:

    Type Description MetadataError

    If an error occurs while defining metadata settings.

    Source code in src/argilla/settings/_metadata.py
    def __init__(\n    self,\n    name: str,\n    min: Optional[float] = None,\n    max: Optional[float] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with float settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        min (Optional[float]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n        max (Optional[float]): The maximum valid value. If none is provided, it will be computed from the values provided in the records.\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings.\n    \"\"\"\n\n    super().__init__(client=client)\n\n    try:\n        settings = FloatMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.float)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.float,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.IntegerMetadataProperty","title":"IntegerMetadataProperty","text":"

    Bases: MetadataPropertyBase

    Source code in src/argilla/settings/_metadata.py
    class IntegerMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        min: Optional[int] = None,\n        max: Optional[int] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with integer settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            min (Optional[int]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n            max (Optional[int]): The maximum  valid value. If none is provided, it will be computed from the values provided in the records.\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings.\n        \"\"\"\n        super().__init__(client=client)\n\n        try:\n            settings = IntegerMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.integer)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.integer,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def min(self) -> Optional[int]:\n        return self._model.settings.min\n\n    @min.setter\n    def min(self, value: Optional[int]) -> None:\n        self._model.settings.min = value\n\n    @property\n    def max(self) -> Optional[int]:\n        return self._model.settings.max\n\n    @max.setter\n    def max(self, value: Optional[int]) -> None:\n        self._model.settings.max = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"IntegerMetadataProperty\":\n        instance = IntegerMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.IntegerMetadataProperty.__init__","title":"__init__(name, min=None, max=None, title=None, visible_for_annotators=True, client=None)","text":"

    Create a metadata field with integer settings.

    Parameters:

    Name Type Description Default name str

    The name of the metadata field

    required min Optional[int]

    The minimum valid value. If none is provided, it will be computed from the values provided in the records.

    None max Optional[int]

    The maximum valid value. If none is provided, it will be computed from the values provided in the records.

    None title Optional[str]

    The title of the metadata to be shown in the UI

    None visible_for_annotators Optional[bool]

    Whether the metadata field is visible for annotators.

    True

    Raises:

    Type Description MetadataError

    If an error occurs while defining metadata settings.

    Source code in src/argilla/settings/_metadata.py
    def __init__(\n    self,\n    name: str,\n    min: Optional[int] = None,\n    max: Optional[int] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with integer settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        min (Optional[int]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n        max (Optional[int]): The maximum  valid value. If none is provided, it will be computed from the values provided in the records.\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings.\n    \"\"\"\n    super().__init__(client=client)\n\n    try:\n        settings = IntegerMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.integer)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.integer,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.TermsMetadataProperty","title":"TermsMetadataProperty","text":"

    Bases: MetadataPropertyBase

    Source code in src/argilla/settings/_metadata.py
    class TermsMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        options: Optional[List[str]] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with terms settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            options (Optional[List[str]]): The list of options\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings\n        \"\"\"\n        super().__init__(client=client)\n\n        try:\n            settings = TermsMetadataPropertySettings(values=options, type=MetadataPropertyType.terms)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.terms,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def options(self) -> Optional[List[str]]:\n        return self._model.settings.values\n\n    @options.setter\n    def options(self, value: list[str]) -> None:\n        self._model.settings.values = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"TermsMetadataProperty\":\n        instance = TermsMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.TermsMetadataProperty.__init__","title":"__init__(name, options=None, title=None, visible_for_annotators=True, client=None)","text":"

    Create a metadata field with terms settings.

    Parameters:

    Name Type Description Default name str

    The name of the metadata field

    required options Optional[List[str]]

    The list of options

    None title Optional[str]

    The title of the metadata to be shown in the UI

    None visible_for_annotators Optional[bool]

    Whether the metadata field is visible for annotators.

    True

    Raises:

    Type Description MetadataError

    If an error occurs while defining metadata settings

    Source code in src/argilla/settings/_metadata.py
    def __init__(\n    self,\n    name: str,\n    options: Optional[List[str]] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with terms settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        options (Optional[List[str]]): The list of options\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings\n    \"\"\"\n    super().__init__(client=client)\n\n    try:\n        settings = TermsMetadataPropertySettings(values=options, type=MetadataPropertyType.terms)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.terms,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
    "},{"location":"reference/argilla/settings/questions/","title":"Questions","text":"

    Argilla uses questions to gather the feedback. The questions will be answered by users or models.

    "},{"location":"reference/argilla/settings/questions/#usage-examples","title":"Usage Examples","text":"

    To define a label question, for example, instantiate the LabelQuestion class and pass it to the Settings class.

    label_question = rg.LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])\n\nsettings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        label_question,\n    ],\n)\n

    Questions can be combined in extensible ways based on the type of feedback you want to collect. For example, you can combine a label question with a text question to collect both a label and a text response.

    label_question = rg.LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])\ntext_question = rg.TextQuestion(name=\"response\")\n\nsettings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        label_question,\n        text_question,\n    ],\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

    To add records with responses to questions, refer to the rg.Response class documentation.

    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.LabelQuestion","title":"LabelQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class LabelQuestion(QuestionPropertyBase):\n    _model: LabelQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        labels: Union[List[str], Dict[str, str]],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n        visible_labels: Optional[int] = None,\n    ) -> None:\n        \"\"\" Define a new label question for `Settings` of a `Dataset`. A label \\\n            question is a question where the user can select one label from \\\n            a list of available labels.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            title (Optional[str]): The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n        \"\"\"\n        self._model = LabelQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=LabelQuestionSettings(\n                options=self._render_values_as_options(labels), visible_options=visible_labels\n            ),\n        )\n\n    @classmethod\n    def from_model(cls, model: LabelQuestionModel) -> \"LabelQuestion\":\n        instance = cls(name=model.name, labels=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"LabelQuestion\":\n        model = LabelQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    ##############################\n    # Public properties\n    ##############################\n\n    @property\n    def labels(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @labels.setter\n    def labels(self, labels: List[str]) -> None:\n        self._model.settings.options = self._render_values_as_options(labels)\n\n    @property\n    def visible_labels(self) -> Optional[int]:\n        return self._model.settings.visible_options\n\n    @visible_labels.setter\n    def visible_labels(self, visible_labels: Optional[int]) -> None:\n        self._model.settings.visible_options = visible_labels\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.LabelQuestion.__init__","title":"__init__(name, labels, title=None, description=None, required=True, visible_labels=None)","text":"

    Define a new label question for Settings of a Dataset. A label question is a question where the user can select one label from a list of available labels.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required labels Union[List[str], Dict[str, str]]

    The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

    required title Optional[str]

    The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True visible_labels Optional[int]

    The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

    None Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    labels: Union[List[str], Dict[str, str]],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n    visible_labels: Optional[int] = None,\n) -> None:\n    \"\"\" Define a new label question for `Settings` of a `Dataset`. A label \\\n        question is a question where the user can select one label from \\\n        a list of available labels.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        title (Optional[str]): The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n        visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n            Setting it to None show all options.\n    \"\"\"\n    self._model = LabelQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=LabelQuestionSettings(\n            options=self._render_values_as_options(labels), visible_options=visible_labels\n        ),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.MultiLabelQuestion","title":"MultiLabelQuestion","text":"

    Bases: LabelQuestion

    Source code in src/argilla/settings/_question.py
    class MultiLabelQuestion(LabelQuestion):\n    _model: MultiLabelQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        labels: Union[List[str], Dict[str, str]],\n        visible_labels: Optional[int] = None,\n        labels_order: Literal[\"natural\", \"suggestion\"] = \"natural\",\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new multi-label question for `Settings` of a `Dataset`. A \\\n            multi-label question is a question where the user can select multiple \\\n            labels from a list of available labels.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n            labels_order (Literal[\"natural\", \"suggestion\"]): The order of the labels in the UI. \\\n                Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). \\\n                The score of the suggestion will be taken into account for ordering if available.\n            title (Optional[str]: The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = MultiLabelQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=MultiLabelQuestionSettings(\n                options=self._render_values_as_options(labels),\n                visible_options=visible_labels,\n                options_order=labels_order,\n            ),\n        )\n\n    @classmethod\n    def from_model(cls, model: MultiLabelQuestionModel) -> \"MultiLabelQuestion\":\n        instance = cls(\n            name=model.name,\n            labels=cls._render_options_as_values(model.settings.options),\n            labels_order=model.settings.options_order,\n        )\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"MultiLabelQuestion\":\n        model = MultiLabelQuestionModel(**data)\n        return cls.from_model(model=model)\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.MultiLabelQuestion.__init__","title":"__init__(name, labels, visible_labels=None, labels_order='natural', title=None, description=None, required=True)","text":"

    Create a new multi-label question for Settings of a Dataset. A multi-label question is a question where the user can select multiple labels from a list of available labels.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required labels Union[List[str], Dict[str, str]]

    The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

    required visible_labels Optional[int]

    The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

    None labels_order Literal['natural', 'suggestion']

    The order of the labels in the UI. Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). The score of the suggestion will be taken into account for ordering if available.

    'natural' title Optional[str]

    The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    labels: Union[List[str], Dict[str, str]],\n    visible_labels: Optional[int] = None,\n    labels_order: Literal[\"natural\", \"suggestion\"] = \"natural\",\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new multi-label question for `Settings` of a `Dataset`. A \\\n        multi-label question is a question where the user can select multiple \\\n        labels from a list of available labels.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n            Setting it to None show all options.\n        labels_order (Literal[\"natural\", \"suggestion\"]): The order of the labels in the UI. \\\n            Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). \\\n            The score of the suggestion will be taken into account for ordering if available.\n        title (Optional[str]: The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = MultiLabelQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=MultiLabelQuestionSettings(\n            options=self._render_values_as_options(labels),\n            visible_options=visible_labels,\n            options_order=labels_order,\n        ),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RankingQuestion","title":"RankingQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class RankingQuestion(QuestionPropertyBase):\n    _model: RankingQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        values: Union[List[str], Dict[str, str]],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new ranking question for `Settings` of a `Dataset`. A ranking question \\\n            is a question where the user can rank a list of options.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            values (Union[List[str], Dict[str, str]]): The list of options to be ranked, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = RankingQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=RankingQuestionSettings(options=self._render_values_as_options(values)),\n        )\n\n    @classmethod\n    def from_model(cls, model: RankingQuestionModel) -> \"RankingQuestion\":\n        instance = cls(name=model.name, values=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"RankingQuestion\":\n        model = RankingQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def values(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @values.setter\n    def values(self, values: List[int]) -> None:\n        self._model.settings.options = self._render_values_as_options(values)\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RankingQuestion.__init__","title":"__init__(name, values, title=None, description=None, required=True)","text":"

    Create a new ranking question for Settings of a Dataset. A ranking question is a question where the user can rank a list of options.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required values Union[List[str], Dict[str, str]]

    The list of options to be ranked, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

    required title Optional[str]

    ) The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    values: Union[List[str], Dict[str, str]],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new ranking question for `Settings` of a `Dataset`. A ranking question \\\n        is a question where the user can rank a list of options.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        values (Union[List[str], Dict[str, str]]): The list of options to be ranked, or a \\\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        title (Optional[str]:) The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = RankingQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=RankingQuestionSettings(options=self._render_values_as_options(values)),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.TextQuestion","title":"TextQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class TextQuestion(QuestionPropertyBase):\n    _model: TextQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n        use_markdown: bool = False,\n    ) -> None:\n        \"\"\"Create a new text question for `Settings` of a `Dataset`. A text question \\\n            is a question where the user can input text.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            title (Optional[str]): The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n            use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n                to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n        \"\"\"\n        self._model = TextQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=TextQuestionSettings(use_markdown=use_markdown),\n        )\n\n    @classmethod\n    def from_model(cls, model: TextQuestionModel) -> \"TextQuestion\":\n        instance = cls(name=model.name)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"TextQuestion\":\n        model = TextQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def use_markdown(self) -> bool:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, use_markdown: bool) -> None:\n        self._model.settings.use_markdown = use_markdown\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.TextQuestion.__init__","title":"__init__(name, title=None, description=None, required=True, use_markdown=False)","text":"

    Create a new text question for Settings of a Dataset. A text question is a question where the user can input text.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required title Optional[str]

    The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True use_markdown Optional[bool]

    Whether to render the markdown in the UI. When True, you will be able to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.

    False Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n    use_markdown: bool = False,\n) -> None:\n    \"\"\"Create a new text question for `Settings` of a `Dataset`. A text question \\\n        is a question where the user can input text.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        title (Optional[str]): The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n        use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n            to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n    \"\"\"\n    self._model = TextQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=TextQuestionSettings(use_markdown=use_markdown),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RatingQuestion","title":"RatingQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class RatingQuestion(QuestionPropertyBase):\n    _model: RatingQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        values: List[int],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new rating question for `Settings` of a `Dataset`. A rating question \\\n            is a question where the user can select a value from a sequential list of options.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            values (List[int]): The list of selectable values. It should be defined in the range [0, 10].\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = RatingQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            values=values,\n            settings=RatingQuestionSettings(options=self._render_values_as_options(values)),\n        )\n\n    @classmethod\n    def from_model(cls, model: RatingQuestionModel) -> \"RatingQuestion\":\n        instance = cls(name=model.name, values=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"RatingQuestion\":\n        model = RatingQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def values(self) -> List[int]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @values.setter\n    def values(self, values: List[int]) -> None:\n        self._model.values = self._render_values_as_options(values)\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RatingQuestion.__init__","title":"__init__(name, values, title=None, description=None, required=True)","text":"

    Create a new rating question for Settings of a Dataset. A rating question is a question where the user can select a value from a sequential list of options.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required values List[int]

    The list of selectable values. It should be defined in the range [0, 10].

    required title Optional[str]

    ) The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    values: List[int],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new rating question for `Settings` of a `Dataset`. A rating question \\\n        is a question where the user can select a value from a sequential list of options.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        values (List[int]): The list of selectable values. It should be defined in the range [0, 10].\n        title (Optional[str]:) The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = RatingQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        values=values,\n        settings=RatingQuestionSettings(options=self._render_values_as_options(values)),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.SpanQuestion","title":"SpanQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class SpanQuestion(QuestionPropertyBase):\n    _model: SpanQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        field: str,\n        labels: Union[List[str], Dict[str, str]],\n        allow_overlapping: bool = False,\n        visible_labels: Optional[int] = None,\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ):\n        \"\"\" Create a new span question for `Settings` of a `Dataset`. A span question \\\n            is a question where the user can select a section of text within a text field \\\n            and assign it a label.\n\n            Parameters:\n                name (str): The name of the question to be used as a reference.\n                field (str): The name of the text field where the span question will be applied.\n                labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                    dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n                allow_overlapping (bool) This value specifies whether overlapped spans are allowed or not.\n                visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                    Setting it to None show all options.\n                title (Optional[str]:) The title of the question to be shown in the UI.\n                description (Optional[str]): The description of the question to be shown in the UI.\n                required (bool): If the question is required for a record to be valid. At least one question must be required.\n            \"\"\"\n        self._model = SpanQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=SpanQuestionSettings(\n                field=field,\n                allow_overlapping=allow_overlapping,\n                visible_options=visible_labels,\n                options=self._render_values_as_options(labels),\n            ),\n        )\n\n    @property\n    def name(self):\n        return self._model.name\n\n    @property\n    def field(self):\n        return self._model.settings.field\n\n    @field.setter\n    def field(self, field: str):\n        self._model.settings.field = field\n\n    @property\n    def allow_overlapping(self):\n        return self._model.settings.allow_overlapping\n\n    @allow_overlapping.setter\n    def allow_overlapping(self, allow_overlapping: bool):\n        self._model.settings.allow_overlapping = allow_overlapping\n\n    @property\n    def visible_labels(self) -> Optional[int]:\n        return self._model.settings.visible_options\n\n    @visible_labels.setter\n    def visible_labels(self, visible_labels: Optional[int]) -> None:\n        self._model.settings.visible_options = visible_labels\n\n    @property\n    def labels(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @labels.setter\n    def labels(self, labels: List[str]) -> None:\n        self._model.settings.options = self._render_values_as_options(labels)\n\n    @classmethod\n    def from_model(cls, model: SpanQuestionModel) -> \"SpanQuestion\":\n        instance = cls(\n            name=model.name,\n            field=model.settings.field,\n            labels=cls._render_options_as_values(model.settings.options),\n        )\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"SpanQuestion\":\n        model = SpanQuestionModel(**data)\n        return cls.from_model(model=model)\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.SpanQuestion.__init__","title":"__init__(name, field, labels, allow_overlapping=False, visible_labels=None, title=None, description=None, required=True)","text":"

    Create a new span question for Settings of a Dataset. A span question is a question where the user can select a section of text within a text field and assign it a label.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required field str

    The name of the text field where the span question will be applied.

    required labels Union[List[str], Dict[str, str]]

    The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

    required visible_labels Optional[int]

    The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

    None title Optional[str]

    ) The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    field: str,\n    labels: Union[List[str], Dict[str, str]],\n    allow_overlapping: bool = False,\n    visible_labels: Optional[int] = None,\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n):\n    \"\"\" Create a new span question for `Settings` of a `Dataset`. A span question \\\n        is a question where the user can select a section of text within a text field \\\n        and assign it a label.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            field (str): The name of the text field where the span question will be applied.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            allow_overlapping (bool) This value specifies whether overlapped spans are allowed or not.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n    self._model = SpanQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=SpanQuestionSettings(\n            field=field,\n            allow_overlapping=allow_overlapping,\n            visible_options=visible_labels,\n            options=self._render_values_as_options(labels),\n        ),\n    )\n
    "},{"location":"reference/argilla/settings/settings/","title":"rg.Settings","text":"

    rg.Settings is used to define the setttings of an Argilla Dataset. The settings can be used to configure the behavior of the dataset, such as the fields, questions, guidelines, metadata, and vectors. The Settings class is passed to the Dataset class and used to create the dataset on the server. Once created, the settings of a dataset cannot be changed.

    "},{"location":"reference/argilla/settings/settings/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/settings/settings/#creating-a-new-dataset-with-settings","title":"Creating a new dataset with settings","text":"

    To create a new dataset with settings, instantiate the Settings class and pass it to the Dataset class.

    import argilla as rg\n\nsettings = rg.Settings(\n    guidelines=\"Select the sentiment of the prompt.\",\n    fields=[rg.TextField(name=\"prompt\", use_markdown=True)],\n    questions=[rg.LabelQuestion(name=\"sentiment\", labels=[\"positive\", \"negative\"])],\n)\n\ndataset = rg.Dataset(name=\"sentiment_analysis\", settings=settings)\n\n# Create the dataset on the server\ndataset.create()\n

    To define the settings for fields, questions, metadata, vectors, or distribution, refer to the rg.TextField, rg.LabelQuestion, rg.TermsMetadataProperty, and rg.VectorField, rg.TaskDistribution class documentation.

    "},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings","title":"Settings","text":"

    Bases: Resource

    Settings class for Argilla Datasets.

    This class is used to define the representation of a Dataset within the UI.

    Source code in src/argilla/settings/_resource.py
    class Settings(Resource):\n    \"\"\"\n    Settings class for Argilla Datasets.\n\n    This class is used to define the representation of a Dataset within the UI.\n    \"\"\"\n\n    def __init__(\n        self,\n        fields: Optional[List[TextField]] = None,\n        questions: Optional[List[QuestionType]] = None,\n        vectors: Optional[List[VectorField]] = None,\n        metadata: Optional[List[MetadataType]] = None,\n        guidelines: Optional[str] = None,\n        allow_extra_metadata: bool = False,\n        distribution: Optional[TaskDistribution] = None,\n        _dataset: Optional[\"Dataset\"] = None,\n    ) -> None:\n        \"\"\"\n        Args:\n            fields (List[TextField]): A list of TextField objects that represent the fields in the Dataset.\n            questions (List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]):\n                A list of Question objects that represent the questions in the Dataset.\n            vectors (List[VectorField]): A list of VectorField objects that represent the vectors in the Dataset.\n            metadata (List[MetadataField]): A list of MetadataField objects that represent the metadata in the Dataset.\n            guidelines (str): A string containing the guidelines for the Dataset.\n            allow_extra_metadata (bool): A boolean that determines whether or not extra metadata is allowed in the\n                Dataset. Defaults to False.\n            distribution (TaskDistribution): The annotation task distribution configuration.\n                Default to DEFAULT_TASK_DISTRIBUTION\n        \"\"\"\n        super().__init__(client=_dataset._client if _dataset else None)\n\n        self._dataset = _dataset\n        self._distribution = distribution\n        self.__guidelines = self.__process_guidelines(guidelines)\n        self.__allow_extra_metadata = allow_extra_metadata\n\n        self.__questions = QuestionsProperties(self, questions)\n        self.__fields = SettingsProperties(self, fields)\n        self.__vectors = SettingsProperties(self, vectors)\n        self.__metadata = SettingsProperties(self, metadata)\n\n    #####################\n    # Properties        #\n    #####################\n\n    @property\n    def fields(self) -> \"SettingsProperties\":\n        return self.__fields\n\n    @fields.setter\n    def fields(self, fields: List[TextField]):\n        self.__fields = SettingsProperties(self, fields)\n\n    @property\n    def questions(self) -> \"SettingsProperties\":\n        return self.__questions\n\n    @questions.setter\n    def questions(self, questions: List[QuestionType]):\n        self.__questions = QuestionsProperties(self, questions)\n\n    @property\n    def vectors(self) -> \"SettingsProperties\":\n        return self.__vectors\n\n    @vectors.setter\n    def vectors(self, vectors: List[VectorField]):\n        self.__vectors = SettingsProperties(self, vectors)\n\n    @property\n    def metadata(self) -> \"SettingsProperties\":\n        return self.__metadata\n\n    @metadata.setter\n    def metadata(self, metadata: List[MetadataType]):\n        self.__metadata = SettingsProperties(self, metadata)\n\n    @property\n    def guidelines(self) -> str:\n        return self.__guidelines\n\n    @guidelines.setter\n    def guidelines(self, guidelines: str):\n        self.__guidelines = self.__process_guidelines(guidelines)\n\n    @property\n    def allow_extra_metadata(self) -> bool:\n        return self.__allow_extra_metadata\n\n    @allow_extra_metadata.setter\n    def allow_extra_metadata(self, value: bool):\n        self.__allow_extra_metadata = value\n\n    @property\n    def distribution(self) -> TaskDistribution:\n        return self._distribution or TaskDistribution.default()\n\n    @distribution.setter\n    def distribution(self, value: TaskDistribution) -> None:\n        self._distribution = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, dataset: \"Dataset\"):\n        self._dataset = dataset\n        self._client = dataset._client\n\n    @cached_property\n    def schema(self) -> dict:\n        schema_dict = {}\n\n        for field in self.fields:\n            schema_dict[field.name] = field\n\n        for question in self.questions:\n            schema_dict[question.name] = question\n\n        for vector in self.vectors:\n            schema_dict[vector.name] = vector\n\n        for metadata in self.metadata:\n            schema_dict[metadata.name] = metadata\n\n        return schema_dict\n\n    @cached_property\n    def schema_by_id(self) -> Dict[UUID, Union[TextField, QuestionType, MetadataType, VectorField]]:\n        return {v.id: v for v in self.schema.values()}\n\n    def validate(self) -> None:\n        self._validate_empty_settings()\n        self._validate_duplicate_names()\n\n    #####################\n    #  Public methods   #\n    #####################\n\n    def get(self) -> \"Settings\":\n        self.fields = self._fetch_fields()\n        self.questions = self._fetch_questions()\n        self.vectors = self._fetch_vectors()\n        self.metadata = self._fetch_metadata()\n        self.__fetch_dataset_related_attributes()\n\n        self._update_last_api_call()\n        return self\n\n    def create(self) -> \"Settings\":\n        self.validate()\n\n        self._update_dataset_related_attributes()\n        self.__fields.create()\n        self.__questions.create()\n        self.__vectors.create()\n        self.__metadata.create()\n\n        self._update_last_api_call()\n        return self\n\n    def update(self) -> \"Resource\":\n        self.validate()\n\n        self._update_dataset_related_attributes()\n        self.__fields.update()\n        self.__vectors.update()\n        self.__metadata.update()\n        # self.questions.update()\n\n        self._update_last_api_call()\n        return self\n\n    def serialize(self):\n        try:\n            return {\n                \"guidelines\": self.guidelines,\n                \"questions\": self.__questions.serialize(),\n                \"fields\": self.__fields.serialize(),\n                \"vectors\": self.vectors.serialize(),\n                \"metadata\": self.metadata.serialize(),\n                \"allow_extra_metadata\": self.allow_extra_metadata,\n                \"distribution\": self.distribution.to_dict(),\n            }\n        except Exception as e:\n            raise ArgillaSerializeError(f\"Failed to serialize the settings. {e.__class__.__name__}\") from e\n\n    def to_json(self, path: Union[Path, str]) -> None:\n        \"\"\"Save the settings to a file on disk\n\n        Parameters:\n            path (str): The path to save the settings to\n        \"\"\"\n        if not isinstance(path, Path):\n            path = Path(path)\n        if path.exists():\n            raise FileExistsError(f\"File {path} already exists\")\n        with open(path, \"w\") as file:\n            json.dump(self.serialize(), file)\n\n    @classmethod\n    def from_json(cls, path: Union[Path, str]) -> \"Settings\":\n        \"\"\"Load the settings from a file on disk\"\"\"\n\n        with open(path, \"r\") as file:\n            settings_dict = json.load(file)\n            return cls._from_dict(settings_dict)\n\n    def __eq__(self, other: \"Settings\") -> bool:\n        return self.serialize() == other.serialize()  # TODO: Create proper __eq__ methods for fields and questions\n\n    #####################\n    #  Repr Methods     #\n    #####################\n\n    def __repr__(self) -> str:\n        return (\n            f\"Settings(guidelines={self.guidelines}, allow_extra_metadata={self.allow_extra_metadata}, \"\n            f\"distribution={self.distribution}, \"\n            f\"fields={self.fields}, questions={self.questions}, vectors={self.vectors}, metadata={self.metadata})\"\n        )\n\n    #####################\n    #  Private methods  #\n    #####################\n\n    @classmethod\n    def _from_dict(cls, settings_dict: dict) -> \"Settings\":\n        fields = settings_dict.get(\"fields\", [])\n        vectors = settings_dict.get(\"vectors\", [])\n        metadata = settings_dict.get(\"metadata\", [])\n        guidelines = settings_dict.get(\"guidelines\")\n        distribution = settings_dict.get(\"distribution\")\n        allow_extra_metadata = settings_dict.get(\"allow_extra_metadata\")\n\n        questions = [question_from_dict(question) for question in settings_dict.get(\"questions\", [])]\n        fields = [TextField.from_dict(field) for field in fields]\n        vectors = [VectorField.from_dict(vector) for vector in vectors]\n        metadata = [MetadataField.from_dict(metadata) for metadata in metadata]\n\n        if distribution:\n            distribution = TaskDistribution.from_dict(distribution)\n\n        return cls(\n            questions=questions,\n            fields=fields,\n            vectors=vectors,\n            metadata=metadata,\n            guidelines=guidelines,\n            allow_extra_metadata=allow_extra_metadata,\n            distribution=distribution,\n        )\n\n    def _copy(self) -> \"Settings\":\n        instance = self.__class__._from_dict(self.serialize())\n        return instance\n\n    def _fetch_fields(self) -> List[TextField]:\n        models = self._client.api.fields.list(dataset_id=self._dataset.id)\n        return [TextField.from_model(model) for model in models]\n\n    def _fetch_questions(self) -> List[QuestionType]:\n        models = self._client.api.questions.list(dataset_id=self._dataset.id)\n        return [question_from_model(model) for model in models]\n\n    def _fetch_vectors(self) -> List[VectorField]:\n        models = self.dataset._client.api.vectors.list(self.dataset.id)\n        return [VectorField.from_model(model) for model in models]\n\n    def _fetch_metadata(self) -> List[MetadataType]:\n        models = self._client.api.metadata.list(dataset_id=self._dataset.id)\n        return [MetadataField.from_model(model) for model in models]\n\n    def __fetch_dataset_related_attributes(self):\n        # This flow may be a bit weird, but it's the only way to update the dataset related attributes\n        # Everything is point that we should have several settings-related endpoints in the API to handle this.\n        # POST /api/v1/datasets/{dataset_id}/settings\n        # {\n        #   \"guidelines\": ....,\n        #   \"allow_extra_metadata\": ....,\n        # }\n        # But this is not implemented yet, so we need to update the dataset model directly\n        dataset_model = self._client.api.datasets.get(self._dataset.id)\n\n        self.guidelines = dataset_model.guidelines\n        self.allow_extra_metadata = dataset_model.allow_extra_metadata\n\n        if dataset_model.distribution:\n            self.distribution = TaskDistribution.from_model(dataset_model.distribution)\n\n    def _update_dataset_related_attributes(self):\n        # This flow may be a bit weird, but it's the only way to update the dataset related attributes\n        # Everything is point that we should have several settings-related endpoints in the API to handle this.\n        # POST /api/v1/datasets/{dataset_id}/settings\n        # {\n        #   \"guidelines\": ....,\n        #   \"allow_extra_metadata\": ....,\n        # }\n        # But this is not implemented yet, so we need to update the dataset model directly\n        dataset_model = DatasetModel(\n            id=self._dataset.id,\n            name=self._dataset.name,\n            guidelines=self.guidelines,\n            allow_extra_metadata=self.allow_extra_metadata,\n            distribution=self.distribution._api_model(),\n        )\n        self._client.api.datasets.update(dataset_model)\n\n    def _validate_empty_settings(self):\n        if not all([self.fields, self.questions]):\n            message = \"Fields and questions are required\"\n            raise SettingsError(message=message)\n\n    def _validate_duplicate_names(self) -> None:\n        dataset_properties_by_name = {}\n\n        for properties in [self.fields, self.questions, self.vectors, self.metadata]:\n            for property in properties:\n                if property.name in dataset_properties_by_name:\n                    raise SettingsError(\n                        f\"names of dataset settings must be unique, \"\n                        f\"but the name {property.name!r} is used by {type(property).__name__!r} and {type(dataset_properties_by_name[property.name]).__name__!r} \"\n                    )\n                dataset_properties_by_name[property.name] = property\n\n    def __process_guidelines(self, guidelines):\n        if guidelines is None:\n            return guidelines\n\n        if not isinstance(guidelines, str):\n            raise SettingsError(\"Guidelines must be a string or a path to a file\")\n\n        if os.path.exists(guidelines):\n            with open(guidelines, \"r\") as file:\n                return file.read()\n\n        return guidelines\n
    "},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.__init__","title":"__init__(fields=None, questions=None, vectors=None, metadata=None, guidelines=None, allow_extra_metadata=False, distribution=None, _dataset=None)","text":"

    Parameters:

    Name Type Description Default fields List[TextField]

    A list of TextField objects that represent the fields in the Dataset.

    None questions List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]

    A list of Question objects that represent the questions in the Dataset.

    None vectors List[VectorField]

    A list of VectorField objects that represent the vectors in the Dataset.

    None metadata List[MetadataField]

    A list of MetadataField objects that represent the metadata in the Dataset.

    None guidelines str

    A string containing the guidelines for the Dataset.

    None allow_extra_metadata bool

    A boolean that determines whether or not extra metadata is allowed in the Dataset. Defaults to False.

    False distribution TaskDistribution

    The annotation task distribution configuration. Default to DEFAULT_TASK_DISTRIBUTION

    None Source code in src/argilla/settings/_resource.py
    def __init__(\n    self,\n    fields: Optional[List[TextField]] = None,\n    questions: Optional[List[QuestionType]] = None,\n    vectors: Optional[List[VectorField]] = None,\n    metadata: Optional[List[MetadataType]] = None,\n    guidelines: Optional[str] = None,\n    allow_extra_metadata: bool = False,\n    distribution: Optional[TaskDistribution] = None,\n    _dataset: Optional[\"Dataset\"] = None,\n) -> None:\n    \"\"\"\n    Args:\n        fields (List[TextField]): A list of TextField objects that represent the fields in the Dataset.\n        questions (List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]):\n            A list of Question objects that represent the questions in the Dataset.\n        vectors (List[VectorField]): A list of VectorField objects that represent the vectors in the Dataset.\n        metadata (List[MetadataField]): A list of MetadataField objects that represent the metadata in the Dataset.\n        guidelines (str): A string containing the guidelines for the Dataset.\n        allow_extra_metadata (bool): A boolean that determines whether or not extra metadata is allowed in the\n            Dataset. Defaults to False.\n        distribution (TaskDistribution): The annotation task distribution configuration.\n            Default to DEFAULT_TASK_DISTRIBUTION\n    \"\"\"\n    super().__init__(client=_dataset._client if _dataset else None)\n\n    self._dataset = _dataset\n    self._distribution = distribution\n    self.__guidelines = self.__process_guidelines(guidelines)\n    self.__allow_extra_metadata = allow_extra_metadata\n\n    self.__questions = QuestionsProperties(self, questions)\n    self.__fields = SettingsProperties(self, fields)\n    self.__vectors = SettingsProperties(self, vectors)\n    self.__metadata = SettingsProperties(self, metadata)\n
    "},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.to_json","title":"to_json(path)","text":"

    Save the settings to a file on disk

    Parameters:

    Name Type Description Default path str

    The path to save the settings to

    required Source code in src/argilla/settings/_resource.py
    def to_json(self, path: Union[Path, str]) -> None:\n    \"\"\"Save the settings to a file on disk\n\n    Parameters:\n        path (str): The path to save the settings to\n    \"\"\"\n    if not isinstance(path, Path):\n        path = Path(path)\n    if path.exists():\n        raise FileExistsError(f\"File {path} already exists\")\n    with open(path, \"w\") as file:\n        json.dump(self.serialize(), file)\n
    "},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.from_json","title":"from_json(path) classmethod","text":"

    Load the settings from a file on disk

    Source code in src/argilla/settings/_resource.py
    @classmethod\ndef from_json(cls, path: Union[Path, str]) -> \"Settings\":\n    \"\"\"Load the settings from a file on disk\"\"\"\n\n    with open(path, \"r\") as file:\n        settings_dict = json.load(file)\n        return cls._from_dict(settings_dict)\n
    "},{"location":"reference/argilla/settings/task_distribution/","title":"Distribution","text":"

    Distribution settings are used to define the criteria used by the tool to automatically manage records in the dataset depending on the expected number of submitted responses per record.

    "},{"location":"reference/argilla/settings/task_distribution/#usage-examples","title":"Usage Examples","text":"

    The default minimum submitted responses per record is 1. If you wish to increase this value, you can define it through the TaskDistribution class and pass it to the Settings class.

    settings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3)\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings\n)\n
    "},{"location":"reference/argilla/settings/task_distribution/#src.argilla.settings._task_distribution.OverlapTaskDistribution","title":"OverlapTaskDistribution","text":"

    The task distribution settings class.

    This task distribution defines a number of submitted responses required to complete a record.

    Parameters:

    Name Type Description Default min_submitted int

    The number of min. submitted responses to complete the record

    required Source code in src/argilla/settings/_task_distribution.py
    class OverlapTaskDistribution:\n    \"\"\"The task distribution settings class.\n\n    This task distribution defines a number of submitted responses required to complete a record.\n\n    Parameters:\n        min_submitted (int): The number of min. submitted responses to complete the record\n    \"\"\"\n\n    strategy: Literal[\"overlap\"] = \"overlap\"\n\n    def __init__(self, min_submitted: int):\n        self._model = OverlapTaskDistributionModel(min_submitted=min_submitted, strategy=self.strategy)\n\n    def __repr__(self) -> str:\n        return f\"OverlapTaskDistribution(min_submitted={self.min_submitted})\"\n\n    def __eq__(self, other) -> bool:\n        if not isinstance(other, self.__class__):\n            return False\n\n        return self._model == other._model\n\n    @classmethod\n    def default(cls) -> \"OverlapTaskDistribution\":\n        return cls(min_submitted=1)\n\n    @property\n    def min_submitted(self):\n        return self._model.min_submitted\n\n    @min_submitted.setter\n    def min_submitted(self, value: int):\n        self._model.min_submitted = value\n\n    @classmethod\n    def from_model(cls, model: OverlapTaskDistributionModel) -> \"OverlapTaskDistribution\":\n        return cls(min_submitted=model.min_submitted)\n\n    @classmethod\n    def from_dict(cls, dict: Dict[str, Any]) -> \"OverlapTaskDistribution\":\n        return cls.from_model(OverlapTaskDistributionModel.model_validate(dict))\n\n    def to_dict(self):\n        return self._model.model_dump()\n\n    def _api_model(self) -> OverlapTaskDistributionModel:\n        return self._model\n
    "},{"location":"reference/argilla/settings/vectors/","title":"Vectors","text":"

    Vector fields in Argilla are used to define the vector form of a record that will be reviewed by a user.

    "},{"location":"reference/argilla/settings/vectors/#usage-examples","title":"Usage Examples","text":"

    To define a vector field, instantiate the VectorField class with a name and dimensions, then pass it to the vectors parameter of the Settings class.

    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    vectors=[\n        rg.VectorField(\n            name=\"my_vector\",\n            dimension=768,\n            title=\"Document Embedding\",\n        ),\n    ],\n)\n

    To add records with vectors, refer to the rg.Vector class documentation.

    "},{"location":"reference/argilla/settings/vectors/#src.argilla.settings._vector.VectorField","title":"VectorField","text":"

    Bases: Resource

    Vector field for use in Argilla Dataset Settings

    Source code in src/argilla/settings/_vector.py
    class VectorField(Resource):\n    \"\"\"Vector field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    _model: VectorFieldModel\n    _api: VectorsAPI\n    _dataset: Optional[\"Dataset\"]\n\n    def __init__(\n        self,\n        name: str,\n        dimensions: int,\n        title: Optional[str] = None,\n        _client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Vector field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the vector field\n            dimensions (int): The number of dimensions in the vector\n            title (Optional[str]): The title of the vector to be shown in the UI.\n        \"\"\"\n        client = _client or Argilla._get_default()\n        super().__init__(api=client.api.vectors, client=client)\n        self._model = VectorFieldModel(name=name, title=title, dimensions=dimensions)\n        self._dataset = None\n\n    @property\n    def name(self) -> str:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def title(self) -> Optional[str]:\n        return self._model.title\n\n    @title.setter\n    def title(self, value: Optional[str]) -> None:\n        self._model.title = value\n\n    @property\n    def dimensions(self) -> int:\n        return self._model.dimensions\n\n    @dimensions.setter\n    def dimensions(self, value: int) -> None:\n        self._model.dimensions = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n        self._model.dataset_id = self._dataset.id\n        self._with_client(self._dataset._client)\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(name={self.name}, title={self.title}, dimensions={self.dimensions})\"\n\n    @classmethod\n    def from_model(cls, model: VectorFieldModel) -> \"VectorField\":\n        instance = cls(name=model.name, dimensions=model.dimensions)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"VectorField\":\n        model = VectorFieldModel(**data)\n        return cls.from_model(model=model)\n\n    def _with_client(self, client: \"Argilla\") -> \"VectorField\":\n        # TODO: Review and simplify. Maybe only one of them is required\n        self._client = client\n        self._api = self._client.api.vectors\n\n        return self\n
    "},{"location":"reference/argilla/settings/vectors/#src.argilla.settings._vector.VectorField.__init__","title":"__init__(name, dimensions, title=None, _client=None)","text":"

    Vector field for use in Argilla Dataset Settings

    Parameters:

    Name Type Description Default name str

    The name of the vector field

    required dimensions int

    The number of dimensions in the vector

    required title Optional[str]

    The title of the vector to be shown in the UI.

    None Source code in src/argilla/settings/_vector.py
    def __init__(\n    self,\n    name: str,\n    dimensions: int,\n    title: Optional[str] = None,\n    _client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Vector field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the vector field\n        dimensions (int): The number of dimensions in the vector\n        title (Optional[str]): The title of the vector to be shown in the UI.\n    \"\"\"\n    client = _client or Argilla._get_default()\n    super().__init__(api=client.api.vectors, client=client)\n    self._model = VectorFieldModel(name=name, title=title, dimensions=dimensions)\n    self._dataset = None\n
    "},{"location":"reference/argilla-server/configuration/","title":"Server configuration","text":"

    This section explains advanced operations and settings for running the Argilla Server and Argilla Python Client.

    By default, the Argilla Server will look for your Elasticsearch (ES) endpoint at http://localhost:9200. You can customize this by setting the ARGILLA_ELASTICSEARCH environment variable. Have a look at the list of available environment variables to further configure the Argilla server.

    From the Argilla version 1.19.0, you must set up the search engine manually to work with datasets. You should set the environment variable ARGILLA_SEARCH_ENGINE=opensearch or ARGILLA_SEARCH_ENGINE=elasticsearch depending on the backend you're using The default value for this variable is set to elasticsearch. The minimal version for Elasticsearch is 8.5.0, and for Opensearch is 2.4.0. Please, review your backend and upgrade it if necessary.

    Warning

    For vector search in OpenSearch, the filtering applied is using a post_filter step, since there is a bug that makes queries fail using filtering + knn from Argilla. See https://github.com/opensearch-project/k-NN/issues/1286

    This may result in unexpected results when combining filtering with vector search with this engine.

    "},{"location":"reference/argilla-server/configuration/#launching","title":"Launching","text":""},{"location":"reference/argilla-server/configuration/#using-a-proxy","title":"Using a proxy","text":"

    If you run Argilla behind a proxy by adding some extra prefix to expose the service, you should set the ARGILLA_BASE_URL environment variable to properly route requests to the server application.

    For example, if your proxy exposes Argilla in the URL https://my-proxy/custom-path-for-argilla, you should launch the Argilla server with ARGILLA_BASE_URL=/custom-path-for-argilla.

    NGINX and Traefik have been tested and are known to work with Argilla:

    • NGINX example
    • Traefik example
    "},{"location":"reference/argilla-server/configuration/#environment-variables","title":"Environment variables","text":"

    You can set the following environment variables to further configure your server and client.

    "},{"location":"reference/argilla-server/configuration/#server","title":"Server","text":""},{"location":"reference/argilla-server/configuration/#fastapi","title":"FastAPI","text":"
    • ARGILLA_HOME_PATH: The directory where Argilla will store all the files needed to run. If the path doesn't exist it will be automatically created (Default: ~/.argilla).

    • ARGILLA_BASE_URL: If you want to launch the Argilla server in a specific base path other than /, you should set up this environment variable. This can be useful when running Argilla behind a proxy that adds a prefix path to route the service (Default: \"/\").

    • ARGILLA_CORS_ORIGINS: List of host patterns for CORS origin access.

    • ARGILLA_DOCS_ENABLED: If False, disables openapi docs endpoint at /api/docs.

    • ARGILLA_ENABLE_TELEMETRY: If False, disables telemetry for usage metrics.

    "},{"location":"reference/argilla-server/configuration/#authentication","title":"Authentication","text":"
    • ARGILLA_AUTH_SECRET_KEY: The secret key used to sign the API token data. You can use openssl rand -hex 32 to generate a 32 character string to use with this environment variable. By default a random value is generated, so if you are using more than one server worker (or more than one Argilla server) you will need to set the same value for all of them.
    "},{"location":"reference/argilla-server/configuration/#database","title":"Database","text":"
    • ARGILLA_DATABASE_URL: A URL string that contains the necessary information to connect to a database. Argilla uses SQLite by default, PostgreSQL is also officially supported (Default: sqlite:///$ARGILLA_HOME_PATH/argilla.db?check_same_thread=False).
    "},{"location":"reference/argilla-server/configuration/#sqlite","title":"SQLite","text":"

    The following environment variables are useful only when SQLite is used:

    • ARGILLA_DATABASE_SQLITE_TIMEOUT: How many seconds the connection should wait before raising an OperationalError when a table is locked. If another connection opens a transaction to modify a table, that table will be locked until the transaction is committed. (Defaut: 15 seconds).
    "},{"location":"reference/argilla-server/configuration/#postgresql","title":"PostgreSQL","text":"

    The following environment variables are useful only when PostgreSQL is used:

    • ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE: The number of connections to keep open inside the database connection pool (Default: 15).

    • ARGILLA_DATABASE_POSTGRESQL_MAX_OVERFLOW: The number of connections that can be opened above and beyond ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE setting (Default: 10).

    "},{"location":"reference/argilla-server/configuration/#search-engine","title":"Search engine","text":"
    • ARGILLA_ELASTICSEARCH: URL of the connection endpoint of the Elasticsearch instance (Default: http://localhost:9200).

    • ARGILLA_SEARCH_ENGINE: Search engine to use. Valid values are \"elasticsearch\" and \"opensearch\" (Default: \"elasticsearch\").

    • ARGILLA_ELASTICSEARCH_SSL_VERIFY: If \"False\", disables SSL certificate verification when connecting to the Elasticsearch backend.

    • ARGILLA_ELASTICSEARCH_CA_PATH: Path to CA cert for ES host. For example: /full/path/to/root-ca.pem (Optional)

    "},{"location":"reference/argilla-server/configuration/#datasets","title":"Datasets","text":"
    • ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS: Set the number of maximum items to be allowed by label and multi label questions (Default: 500).

    • ARGILLA_SPAN_OPTIONS_MAX_ITEMS: Set the number of maximum items to be allowed by span questions (Default: 500).

    "},{"location":"reference/argilla-server/configuration/#hugging-face","title":"Hugging Face","text":"
    • ARGILLA_SHOW_HUGGINGFACE_SPACE_PERSISTENT_STORAGE_WARNING: When Argilla is running on Hugging Face Spaces you can use this environment variable to disable the warning message showed when persistent storage is disabled for the space (Default: true).
    "},{"location":"reference/argilla-server/configuration/#docker-images-only","title":"Docker images only","text":"
    • REINDEX_DATASET: If true or 1, the datasets will be reindexed in the search engine. This is needed when some search configuration changed or data must be refreshed (Default: 0).

    • USERNAME: If provided, the owner username. This can be combined with HF OAuth to define the argilla server owner (Default: \"\").

    • PASSWORD: If provided, the owner password. If USERNAME and PASSWORD are provided, the owner user will be created with these credentials on the server startup (Default: \"\").

    • API_KEY: The default user api key to user. If API_KEY is not provided, a new random api key will be generated (Default: \"\").

    "},{"location":"reference/argilla-server/configuration/#rest-api-docs","title":"REST API docs","text":"

    FastAPI also provides beautiful REST API docs that you can check at http://localhost:6900/api/v1/docs.

    "},{"location":"reference/argilla-server/telemetry/","title":"Server Telemetry","text":"

    Argilla uses telemetry to report anonymous usage and error information. As an open-source software, this type of information is important to improve and understand how the product is used.

    "},{"location":"reference/argilla-server/telemetry/#how-to-opt-out","title":"How to opt out","text":"

    You can opt out of telemetry reporting using the ENV variable ARGILLA_ENABLE_TELEMETRY before launching the server. Setting this variable to 0 will completely disable telemetry reporting.

    If you are a Linux/MacOs user, you should run:

    export ARGILLA_ENABLE_TELEMETRY=0\n

    If you are a Windows user, you should run:

    set ARGILLA_ENABLE_TELEMETRY=0\n

    To opt in again, you can set the variable to 1.

    "},{"location":"reference/argilla-server/telemetry/#why-reporting-telemetry","title":"Why reporting telemetry","text":"

    Anonymous telemetry information enables us to continuously improve the product and detect recurring problems to better serve all users. We collect aggregated information about general usage and errors. We do NOT collect any information on users' data records, datasets, or metadata information.

    "},{"location":"reference/argilla-server/telemetry/#sensitive-data","title":"Sensitive data","text":"

    We do not collect any piece of information related to the source data you store in Argilla. We don't identify individual users. Your data does not leave your server at any time:

    • No dataset record is collected.
    • No dataset names or metadata are collected.
    "},{"location":"reference/argilla-server/telemetry/#information-reported","title":"Information reported","text":"

    The following usage and error information is reported:

    • The code of the raised error and the entity type related to the error, if any (Dataset, Workspace,...)
    • The user-agent and accept-language http headers
    • Task name and number of records for bulk operations
    • An anonymous generated user uuid
    • The Argilla version running the server
    • The Python version, e.g. 3.8.13
    • The system/OS name, such as Linux, Darwin, Windows
    • The system\u2019s release version, e.g. Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020
    • The machine type, e.g. AMD64
    • The underlying platform spec with as much useful information as possible. (eg. macOS-10.16-x86_64-i386-64bit)
    • The type of deployment: huggingface_space or server
    • The dockerized deployment flag: True or False

    For transparency, you can inspect the source code where this is performed here.

    If you have any doubts, don't hesitate to join our Discord channel or open a GitHub issue. We'd be very happy to discuss how we can improve this.

    "},{"location":"tutorials/","title":"Tutorials","text":"

    These are the tutorials for the Argilla SDK. They provide step-by-step instructions for common tasks.

    • Text classification

      Learn about a standard workflow to improve data quality for a text classification task. Tutorial

    • Token classification

      Learn about a standard workflow to improve data quality for a token classification task. Tutorial

    "},{"location":"tutorials/text_classification/","title":"Text classification","text":"

    In this tutorial, we will show a standard workflow for a text classification task, in this case, using SetFit and Argilla.

    We will follow these steps:

    • Configure the Argilla dataset
    • Add initial model suggestions
    • Evaluate with Argilla
    • Train your model
    • Update the suggestions with the new model

    If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

    To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

    !pip install argilla\n
    !pip install setfit==1.0.3 transformers==4.40.2\n

    Let's make the required imports:

    import argilla as rg\n\nfrom datasets import load_dataset, Dataset\nfrom setfit import SetFitModel, Trainer, get_templated_dataset, sample_dataset\n

    You also need to connect to the Argilla server using the api_url and api_key.

    # Replace api_url with your url if using Docker\n# Replace api_key if you configured a custom API key\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"owner.apikey\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

    Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a text field and a label question.

    Note

    Check this how-to guide to know more about configuring and creating a dataset.

    labels = [\"positive\", \"negative\"]\n\nsettings = rg.Settings(\n    guidelines=\"Classify the reviews as positive or negative.\",\n    fields=[\n        rg.TextField(\n            name=\"review\",\n            title=\"Text from the review\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"sentiment_label\",\n            title=\"In which category does this article fit?\",\n            labels=labels,\n        )\n    ],\n)\n

    Let's create the dataset with the name and the defined settings:

    dataset = rg.Dataset(\n    name=\"text_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

    Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the imdb dataset from the Hugging Face Hub. Specifically, we will use 100 samples from the train split.

    hf_dataset = load_dataset(\"imdb\", split=\"train[:100]\")\n

    We will easily add them to the dataset using log and the mapping, where we indicate that the column text is the data that should be added to the field review.

    dataset.records.log(records=hf_dataset, mapping={\"text\": \"review\"})\n

    The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a zero-shot SetFit model. However, you can use a framework or technique of your choice.

    We will start by defining an example training set with the required labels: positive and negative. Using get_templated_dataset will create sentences from the default template: \"This sentence is {label}.\"

    zero_ds = get_templated_dataset(\n    candidate_labels=labels,\n    sample_size=8,\n)\n

    Now, we will prepare a function to train the SetFit model.

    Note

    For further customization, you can check the SetFit documentation.

    def train_model(model_name, dataset):\n    model = SetFitModel.from_pretrained(model_name)\n\n    trainer = Trainer(\n        model=model,\n        train_dataset=dataset,\n    )\n\n    trainer.train()\n\n    return model\n

    Let's train the model. We will use TaylorAI/bge-micro-v2, available in the Hugging Face Hub.

    model = train_model(model_name=\"TaylorAI/bge-micro-v2\", dataset=zero_ds)\n

    You can save it locally or push it to the Hub. And then, load it from there.

    # Save and load locally\n# model.save_pretrained(\"text_classification_model\")\n# model = SetFitModel.from_pretrained(\"text_classification_model\")\n\n# Push and load in HF\n# model.push_to_hub(\"[username]/text_classification_model\")\n# model = SetFitModel.from_pretrained(\"[username]/text_classification_model\")\n

    It's time to make the predictions! We will set a function that uses the predict method to get the suggested label. The model will infer the label based on the text.

    def predict(model, input, labels):\n    model.labels = labels\n\n    prediction = model.predict([input])\n\n    return prediction[0]\n

    To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

    data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"sentiment_label\": predict(model, sample[\"review\"], labels),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

    Voil\u00e0! We have added the suggestions to the dataset, and they will appear in the UI marked with a \u2728.

    Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

    Note

    Check this how-to guide to know more about annotating in the UI.

    After the annotation, we will have a robust dataset to train the main model. In our case, we will fine-tune using SetFit. However, you can select the one that best fits your requirements. So, let's start by retrieving the annotated records.

    Note

    Check this how-to guide to know more about filtering and querying in Argilla.

    dataset = client.datasets(\"text_classification_dataset\")\n
    status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_list(flatten=True)\n

    As we have a single response per record, we can retrieve the selected label straightforwardly and create the training set with 8 samples per label. We selected 8 samples per label to have a balanced dataset for few-shot learning.

    train_records = [\n    {\n        \"text\": r[\"review\"],\n        \"label\": r[\"sentiment_label.responses\"][0],\n    }\n    for r in submitted\n]\ntrain_dataset = Dataset.from_list(train_records)\ntrain_dataset = sample_dataset(train_dataset, label_column=\"label\", num_samples=8)\n

    We can train the model using our previous function, but this time with a high-quality human-annotated training set.

    model = train_model(model_name=\"TaylorAI/bge-micro-v2\", dataset=train_dataset)\n

    As the training data had a better-quality, we can expect a better model. So, we can update the remaining non-annotated records with the new model's suggestions.

    data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"sentiment_label\": predict(model, sample[\"review\"], labels),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

    In this tutorial, we present an end-to-end example of a text classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

    We started by configuring the dataset, adding records, and training a zero-shot SetFit model, as an example, to add suggestions. After the annotation process, we trained a new model with the annotated data and updated the remaining records with the new suggestions.

    "},{"location":"tutorials/text_classification/#text-classification","title":"Text classification","text":""},{"location":"tutorials/text_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/text_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/text_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/text_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/text_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/text_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/text_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/text_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/text_classification/#conclusions","title":"Conclusions","text":""},{"location":"tutorials/token_classification/","title":"Token classification","text":"

    In this tutorial, we will show a standard workflow for a token classification task, in this case, using GLiNER, SpanMarker and Argilla.

    We will follow these steps:

    • Configure the Argilla dataset
    • Add initial model suggestions
    • Evaluate with Argilla
    • Train your model
    • Update the suggestions with the new model

    If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

    To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

    !pip install argilla\n
    !pip install gliner==0.2.6 transformers==4.40.2 span_marker==1.5.0\n

    Let's make the needed imports:

    import re\n\nimport argilla as rg\n\nfrom datasets import load_dataset, Dataset, DatasetDict\nfrom gliner import GLiNER\nfrom span_marker import SpanMarkerModel, Trainer\nfrom transformers import TrainingArguments\n

    You also need to connect to the Argilla server with the api_url and api_key.

    # Replace api_url with your url if using Docker\n# Replace api_key if you configured a custom API key\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"owner.apikey\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

    Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a text field and a span question. We will focus on Name Entity Recognition, but this workflow can also be applied to Span Classification, which differs in that the spans are less clearly defined and often overlap.

    labels = [\n    \"CARDINAL\",\n    \"DATE\",\n    \"PERSON\",\n    \"NORP\",\n    \"GPE\",\n    \"LAW\",\n    \"PERCENT\",\n    \"ORDINAL\",\n    \"MONEY\",\n    \"WORK_OF_ART\",\n    \"FAC\",\n    \"TIME\",\n    \"QUANTITY\",\n    \"PRODUCT\",\n    \"LANGUAGE\",\n    \"ORG\",\n    \"LOC\",\n    \"EVENT\",\n]\n\nsettings = rg.Settings(\n    guidelines=\"Classify individual tokens according to the specified categories, ensuring that any overlapping or nested entities are accurately captured.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n            title=\"Text\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.SpanQuestion(\n            name=\"span_label\",\n            field=\"text\",\n            labels=labels,\n            title=\"Classify the tokens according to the specified categories.\",\n            allow_overlapping=False,\n        )\n    ],\n)\n

    Let's create the dataset with the name and the defined settings:

    dataset = rg.Dataset(\n    name=\"token_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

    We have created the dataset (you can check it in the UI), but we still need to add the data for annotation. In this case, we will use the ontonote5 dataset from the Hugging Face Hub. Specifically, we will use 2100 samples from the test split.

    hf_dataset = load_dataset(\"tner/ontonotes5\", split=\"test[:2100]\")\n

    We will iterate over the Hugging Face dataset, adding data to the corresponding field in the Record object for the Argilla dataset. Then, we will easily add them to the dataset using log.

    records = [rg.Record(fields={\"text\": \" \".join(row[\"tokens\"])}) for row in hf_dataset]\n\ndataset.records.log(records)\n

    The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a GLiNER model. However, you can use a framework or technique of your choice.

    Note

    For further information, you can check the GLiNER repository and the original paper.

    We will start by loading the pre-trained GLiNER model. Specifically, we will use gliner_mediumv2, available in Hugging Face Hub.

    gliner_model = GLiNER.from_pretrained(\"urchade/gliner_mediumv2.1\")\n

    Next, we will create a function to generate predictions using this general model, which can identify the specified labels without being pre-trained on them. The function will return a dictionary formatted with the necessary schema to add entities to our Argilla dataset. This schema includes the keys 'start\u2019 and \u2018end\u2019 to indicate the indices where the span begins and ends, as well as \u2018label\u2019 for the entity label.

    def predict_gliner(model, text, labels, threshold):\n    entities = model.predict_entities(text, labels, threshold)\n    return [\n        {k: v for k, v in ent.items() if k not in {\"score\", \"text\"}} for ent in entities\n    ]\n

    To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

    data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"span_label\": predict_gliner(\n            model=gliner_model, text=sample[\"text\"], labels=labels, threshold=0.70\n        ),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

    Voil\u00e0! We have added the suggestions to the dataset and they will appear in the UI marked with \u2728.

    Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

    Note

    Check this how-to guide to know more about annotating in the UI.

    After the annotation, we will have a robust dataset to train our model for entity recognition. For our case, we will train a SpanMarker model, but you can select any model of your choice. So, let's start by retrieving the annotated records.

    Note

    Check this how-to guide to learn more about filtering and querying in Argilla.

    dataset = client.datasets(\"token_classification_dataset\")\n

    In our case, we submitted 2000 annotations using the bulk view.

    status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_list(flatten=True)\n

    SpanMarker accepts any dataset as long as it has the tokens and ner_tags columns. The ner_tags can be annotated using the IOB, IOB2, BIOES or BILOU labeling scheme, as well as regular unschemed labels. In our case, we have chosen to use the IOB format. Thus, we will define a function to extract the annotated NER tags according to this schema.

    Note

    For further information, you can check the SpanMarker documentation.

    def get_iob_tag_for_token(token_start, token_end, ner_spans):\n    for span in ner_spans:\n        if token_start &gt;= span[\"start\"] and token_end &lt;= span[\"end\"]:\n            if token_start == span[\"start\"]:\n                return f\"B-{span['label']}\"\n            else:\n                return f\"I-{span['label']}\"\n    return \"O\"\n\n\ndef extract_ner_tags(text, responses):\n    tokens = re.split(r\"(\\s+)\", text)\n    ner_tags = []\n\n    current_position = 0\n    for token in tokens:\n        if token.strip():\n            token_start = current_position\n            token_end = current_position + len(token)\n            tag = get_iob_tag_for_token(token_start, token_end, responses)\n            ner_tags.append(tag)\n        current_position += len(token)\n\n    return ner_tags\n

    Let's now extract them and save two lists with the tokens and NER tags, which will help us build our dataset to train the SpanMarker model.

    tokens = []\nner_tags = []\nfor r in submitted:\n    tags = extract_ner_tags(r[\"text\"], r[\"span_label.responses\"][0])\n    tks = r[\"text\"].split()\n    tokens.append(tks)\n    ner_tags.append(tags)\n

    In addition, we will have to indicate the labels and they should be formatted as integers. So, we will retrieve them and map them.

    labels = list(set([item for sublist in ner_tags for item in sublist]))\n\nid2label = {i: label for i, label in enumerate(labels)}\nlabel2id = {label: id_ for id_, label in id2label.items()}\n\nmapped_ner_tags = [[label2id[label] for label in ner_tag] for ner_tag in ner_tags]\n

    Finally, we will create a dataset with the train and validation sets.

    records = [\n    {\n        \"tokens\": token,\n        \"ner_tags\": ner_tag,\n    }\n    for token, ner_tag in zip(tokens, mapped_ner_tags)\n]\nspan_dataset = DatasetDict(\n    {\n        \"train\": Dataset.from_list(records[:1500]),\n        \"validation\": Dataset.from_list(records[1501:2000]),\n    }\n)\n

    Now, let's prepare to train our model. For this, it is recommended to use GPU. You can check if it is available as shown below.

    import torch\n\nif torch.cuda.is_available():\n    device = torch.device(\"cuda\")\n    print(f\"Using {torch.cuda.get_device_name(0)}\")\nelif torch.backends.mps.is_available():\n    device = torch.device(\"mps\")\n    print(\"Using MPS device\")\nelse:\n    device = torch.device(\"cpu\")\n    print(\"No GPU available, using CPU instead.\")\n

    We will define our model and arguments. In this case, we will use the bert-base-cased, available in the Hugging Face Hub, but others can be applied.

    Note

    The training arguments are inherited from the Transformers library. You can check more information here.

    encoder_id = \"bert-base-cased\"\nmodel = SpanMarkerModel.from_pretrained(\n    encoder_id,\n    labels=labels,\n    model_max_length=256,\n    entity_max_length=8,\n)\n\nargs = TrainingArguments(\n    output_dir=\"models/span-marker\",\n    learning_rate=5e-5,\n    per_device_train_batch_size=8,\n    per_device_eval_batch_size=8,\n    num_train_epochs=1,\n    weight_decay=0.01,\n    warmup_ratio=0.1,\n    fp16=False,  # Set to True if available\n    logging_first_step=True,\n    logging_steps=50,\n    evaluation_strategy=\"steps\",\n    save_strategy=\"steps\",\n    eval_steps=500,\n    save_total_limit=2,\n    dataloader_num_workers=2,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=args,\n    train_dataset=span_dataset[\"train\"],\n    eval_dataset=span_dataset[\"validation\"],\n)\n

    Let's train it! This time, we use a high-quality human-annotated training set, so the results are expected to have improved.

    trainer.train()\n
    trainer.evaluate()\n

    You can save it locally or push it to the Hub. And then load it from there.

    # Save and load locally\n# model.save_pretrained(\"token_classification_model\")\n# model = SpanMarkerModel.from_pretrained(\"token_classification_model\")\n\n# Push and load in HF\n# model.push_to_hub(\"[username]/token_classification_model\")\n# model = SpanMarkerModel.from_pretrained(\"[username]/token_classification_model\")\n

    It's time to make the predictions! We will set a function that uses the predict method to get the suggested label. The model will infer the label based on the text. The function will return the spans in the corresponding structure for the Argilla dataset.

    def predict_spanmarker(model, text):\n    entities = model.predict(text)\n    return [\n        {\n            \"start\": ent[\"char_start_index\"],\n            \"end\": ent[\"char_end_index\"],\n            \"label\": ent[\"label\"],\n        }\n        for ent in entities\n    ]\n

    As the training data was of better quality, we can expect a better model. So we can update the remaining non-annotated records with the new model's suggestions.

    data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"span_label\": predict_spanmarker(model=model, text=sample[\"text\"]),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

    In this tutorial, we present an end-to-end example of a token classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

    We started by configuring the dataset, adding records, and adding suggestions based on the GLiNer predictions. After the annotation process, we trained a SpanMarker model with the annotated data and updated the remaining records with the new suggestions.

    "},{"location":"tutorials/token_classification/#token-classification","title":"Token classification","text":""},{"location":"tutorials/token_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/token_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/token_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/token_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/token_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/token_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/token_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/token_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/token_classification/#conclusions","title":"Conclusions","text":""}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to Argilla","text":"

    Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets.

    To get started:

    • Get started in 5 minutes!

      Deploy Argilla for free on the Hugging Face Hub or with Docker. Install the Python SDK with pip and create your first project.

      Quickstart

    • How-to guides

      Get familiar with the basic workflows of Argilla. Learn how to manage Users, Workspaces, Datasets, and Records to set up your data annotation projects.

      Learn more

    Or, play with the Argilla UI by signing in with your Hugging Face account:

    Looking for Argilla 1.x?

    Looking for documentation for Argilla 1.x? Visit the latest release.

    Migrate to Argilla 2.x

    Want to learn how to migrate from Argilla 1.x to 2.x? Take a look at our dedicated Migration Guide.

    "},{"location":"#why-use-argilla","title":"Why use Argilla?","text":"

    Argilla can be used for collecting human feedback for a wide variety of AI projects like traditional NLP (text classification, NER, etc.), LLMs (RAG, preference tuning, etc.), or multimodal models (text to image, etc.).

    Argilla's programmatic approach lets you build workflows for continuous evaluation and model improvement. The goal of Argilla is to ensure your data work pays off by quickly iterating on the right data and models.

    Improve your AI output quality through data quality

    Compute is expensive and output quality is important. We help you focus on data, which tackles the root cause of both of these problems at once. Argilla helps you to achieve and keep high-quality standards for your data. This means you can improve the quality of your AI outputs.

    Take control of your data and models

    Most AI tools are black boxes. Argilla is different. We believe that you should be the owner of both your data and your models. That's why we provide you with all the tools your team needs to manage your data and models in a way that suits you best.

    Improve efficiency by quickly iterating on the right data and models

    Gathering data is a time-consuming process. Argilla helps by providing a tool that allows you to interact with your data in a more engaging way. This means you can quickly and easily label your data with filters, AI feedback suggestions and semantic search. So you can focus on training your models and monitoring their performance.

    "},{"location":"#what-do-people-build-with-argilla","title":"What do people build with Argilla?","text":"

    Datasets and models

    Argilla is a tool that can be used to achieve and keep high-quality data standards with a focus on NLP and LLMs. The community uses Argilla to create amazing open-source datasets and models, and we love contributions to open-source too.

    • cleaned UltraFeedback dataset and the Notus and Notux models, where we improved benchmark and empirical human judgment for the Mistral and Mixtral models with cleaner data using human feedback.
    • distilabeled Intel Orca DPO dataset and the improved OpenHermes model, show how we improve model performance by filtering out 50% of the original dataset through human and AI feedback.

    Projects and pipelines

    AI teams from companies like the Red Cross, Loris.ai and Prolific use Argilla to improve the quality and efficiency of AI projects. They shared their experiences in the AI community meetup.

    • AI for good: the Red Cross presentation showcases how their experts and AI team collaborate by classifying and redirecting requests from refugees of the Ukrainian crisis to streamline the support processes of the Red Cross.
    • Customer support: during the Loris meetup they showed how their AI team uses unsupervised and few-shot contrastive learning to help them quickly validate and gain labelled samples for a huge amount of multi-label classifiers.
    • Research studies: the showcase from Prolific announced their integration with Argilla. They use it to actively distribute data collection projects among their annotating workforce. This allows them to quickly and efficiently collect high-quality data for their research studies.
    "},{"location":"community/","title":"Community","text":"

    We are an open-source community-driven project not only focused on building a great product but also on building a great community, where you can get support, share your experiences, and contribute to the project! We would love to hear from you and help you get started with Argilla.

    • Discord

      In our Discord channels (#argilla-distilabel-general and #argilla-distilabel-help), you can get direct support from the community.

      Discord \u2197

    • Community Meetup

      We host bi-weekly community meetups where you can listen in or present your work.

      Community Meetup \u2197

    • Changelog

      The changelog is where you can find the latest updates and changes to the Argilla project.

      Changelog \u2197

    • Roadmap

      We love to discuss our plans with the community. Feel encouraged to participate in our roadmap discussions.

      Roadmap \u2197

    "},{"location":"community/changelog/","title":"Changelog","text":"

    All notable changes to this project will be documented in this file.

    The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

    "},{"location":"community/changelog/#201","title":"2.0.1","text":""},{"location":"community/changelog/#fixed","title":"Fixed","text":"
    • Fixed error when creating optional fields. (#5362)
    • Fixed error creating integer and float metadata with visible_for_annotators. (#5364)
    • Fixed error when logging records with suggestions or responses for non-existent questions. (#5396 by @maxserras)
    • Fixed error from conflicts in testing suite when running tests in parallel. (#5349)
    • Fixed error in response model when creating a response with a None value. (#5343)
    "},{"location":"community/changelog/#changed","title":"Changed","text":"
    • Changed from_hub method to raise an error when a dataset with the same name exists. (#5258)
    • Changed log method when ingesting records with no known keys to raise a descriptive error. (#5356)
    • Changed code snippets to add new datasets (#5395)
    "},{"location":"community/changelog/#added","title":"Added","text":"
    • Added Google Analytics to the documentation site. (#5366)
    • Added frontend skeletons to progress metrics to optimise load time and improve user experience. (#5391)
    • Added documentation in methods in API references for the Python SDK. (#5400)
    "},{"location":"community/changelog/#fixed_1","title":"Fixed","text":"
    • Fix bug when submit the latest record, sometimes you navigate to non existing page #5419
    "},{"location":"community/changelog/#200","title":"2.0.0","text":""},{"location":"community/changelog/#added_1","title":"Added","text":"
    • Added core class refactors. For an overview, see this blog post
    • Added TaskDistribution to define distribution of records to users .
    • Added new documentation site and structure and migrated legacy documentation.
    "},{"location":"community/changelog/#changed_1","title":"Changed","text":"
    • Changed FeedbackDataset to Dataset.
    • Changed rg.init into rg.Argilla class to interact with Argilla server.
    "},{"location":"community/changelog/#deprecated","title":"Deprecated","text":"
    • Deprecated task specific dataset classes like TextClassification and TokenClassification. To migrate legacy datasets to rg.Dataset class, see the how-to-guide.
    • Deprecated use case extensions like listeners and ArgillaTrainer.
    "},{"location":"community/changelog/#200rc1","title":"2.0.0rc1","text":"

    [!NOTE] This release for 2.0.0rc1 does not contain any changelog entries because it is the first release candidate for the 2.0.0 version. The following versions will contain the changelog entries again. For a general overview of the changes in the 2.0.0 version, please refer to our blog or our new documentation.

    "},{"location":"community/changelog/#1290","title":"1.29.0","text":""},{"location":"community/changelog/#added_2","title":"Added","text":"
    • Added support for rating questions to include 0 as a valid value. (#4860)
    • Added support for Python 3.12. (#4837)
    • Added search by field in the FeedbackDataset UI search. (#4746)
    • Added record metadata info in the FeedbackDataset UI. (#4851)
    • Added highlight on search results in the FeedbackDataset UI. (#4747)
    "},{"location":"community/changelog/#fixed_2","title":"Fixed","text":"
    • Fix wildcard import for the whole argilla module. (#4874)
    • Fix issue when record does not have vectors related. (#4856)
    • Fix issue on character level. (#4836)
    "},{"location":"community/changelog/#1280","title":"1.28.0","text":""},{"location":"community/changelog/#added_3","title":"Added","text":"
    • Added suggestion multi score attribute. (#4730)
    • Added order by suggestion first. (#4731)
    • Added multi selection entity dropdown for span annotation overlap. (#4735)
    • Added pre selection highlight for span annotation. (#4726)
    • Added banner when persistent storage is not enabled. (#4744)
    • Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)
    "},{"location":"community/changelog/#changed_2","title":"Changed","text":"
    • Changed the way how Hugging Face space and user is showed in sign in. (#4748)
    "},{"location":"community/changelog/#fixed_3","title":"Fixed","text":"
    • Fixed Korean character reversed. (#4753)
    "},{"location":"community/changelog/#fixed_4","title":"Fixed","text":"
    • Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)
    "},{"location":"community/changelog/#1270","title":"1.27.0","text":""},{"location":"community/changelog/#added_4","title":"Added","text":"
    • Added Allow overlap spans in the FeedbackDataset. (#4668)
    • Added allow_overlapping parameter for span questions. (#4697)
    • Added overall progress bar on Datasets table. (#4696)
    • Added German language translation. (#4688)
    "},{"location":"community/changelog/#changed_3","title":"Changed","text":"
    • New UI design for suggestions. (#4682)
    "},{"location":"community/changelog/#fixed_5","title":"Fixed","text":"
    • Improve performance for more than 250 labels. (#4702)
    "},{"location":"community/changelog/#1261","title":"1.26.1","text":""},{"location":"community/changelog/#added_5","title":"Added","text":"
    • Added support for automatic detection of RTL languages. (#4686)
    "},{"location":"community/changelog/#1260","title":"1.26.0","text":""},{"location":"community/changelog/#added_6","title":"Added","text":"
    • If you expand the labels of a single or multi label Question, the state is maintained during the entire annotation process. (#4630)
    • Added support for span questions in the Python SDK. (#4617)
    • Added support for span values in suggestions and responses. (#4623)
    • Added span questions for FeedbackDataset. (#4622)
    • Added ARGILLA_CACHE_DIR environment variable to configure the client cache directory. (#4509)
    "},{"location":"community/changelog/#fixed_6","title":"Fixed","text":"
    • Fixed contextualized workspaces. (#4665)
    • Fixed prepare for training when passing RankingValueSchema instances to suggestions. (#4628)
    • Fixed parsing ranking values in suggestions from HF datasets. (#4629)
    • Fixed reading description from API response payload. (#4632)
    • Fixed pulling (n*chunk_size)+1 records when using ds.pull or iterating over the dataset. (#4662)
    • Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)
    "},{"location":"community/changelog/#1250","title":"1.25.0","text":"

    [!NOTE] For changes in the argilla-server module, visit the argilla-server release notes

    "},{"location":"community/changelog/#added_7","title":"Added","text":"
    • Reorder labels in dataset settings page for single/multi label questions (#4598)
    • Added pandas v2 support using the python SDK. (#4600)
    "},{"location":"community/changelog/#removed","title":"Removed","text":"
    • Removed missing response for status filter. Use pending instead. (#4533)
    "},{"location":"community/changelog/#fixed_7","title":"Fixed","text":"
    • Fixed FloatMetadataProperty: value is not a valid float (#4570)
    • Fixed redirect to user-settings instead of 404 user_settings (#4609)
    "},{"location":"community/changelog/#1240","title":"1.24.0","text":"

    [!NOTE] This release does not contain any new features, but it includes a major change in the argilla-server dependency. The package is using the argilla-server dependency defined here. (#4537)

    "},{"location":"community/changelog/#changed_4","title":"Changed","text":"
    • The package is using the argilla-server dependency defined here. (#4537)
    "},{"location":"community/changelog/#1231","title":"1.23.1","text":""},{"location":"community/changelog/#fixed_8","title":"Fixed","text":"
    • Fixed Responsive view for Feedback Datasets. (#4579)
    "},{"location":"community/changelog/#1230","title":"1.23.0","text":""},{"location":"community/changelog/#added_8","title":"Added","text":"
    • Added bulk annotation by filter criteria. (#4516)
    • Automatically fetch new datasets on focus tab. (#4514)
    • API v1 responses returning Record schema now always include dataset_id as attribute. (#4482)
    • API v1 responses returning Response schema now always include record_id as attribute. (#4482)
    • API v1 responses returning Question schema now always include dataset_id attribute. (#4487)
    • API v1 responses returning Field schema now always include dataset_id attribute. (#4488)
    • API v1 responses returning MetadataProperty schema now always include dataset_id attribute. (#4489)
    • API v1 responses returning VectorSettings schema now always include dataset_id attribute. (#4490)
    • Added pdf_to_html function to .html_utils module that convert PDFs to dataURL to be able to render them in tha Argilla UI. (#4481)
    • Added ARGILLA_AUTH_SECRET_KEY environment variable. (#4539)
    • Added ARGILLA_AUTH_ALGORITHM environment variable. (#4539)
    • Added ARGILLA_AUTH_TOKEN_EXPIRATION environment variable. (#4539)
    • Added ARGILLA_AUTH_OAUTH_CFG environment variable. (#4546)
    • Added OAuth2 support for HuggingFace Hub. (#4546)
    "},{"location":"community/changelog/#deprecated_1","title":"Deprecated","text":"
    • Deprecated ARGILLA_LOCAL_AUTH_* environment variables. Will be removed in the release v1.25.0. (#4539)
    "},{"location":"community/changelog/#changed_5","title":"Changed","text":"
    • Changed regex pattern for username attribute in UserCreate. Now uppercase letters are allowed. (#4544)
    "},{"location":"community/changelog/#removed_1","title":"Removed","text":"
    • Remove sending Authorization header from python SDK requests. (#4535)
    "},{"location":"community/changelog/#fixed_9","title":"Fixed","text":"
    • Fixed keyboard shortcut for label questions. (#4530)
    "},{"location":"community/changelog/#1220","title":"1.22.0","text":""},{"location":"community/changelog/#added_9","title":"Added","text":"
    • Added Bulk annotation support. (#4333)
    • Restore filters from feedback dataset settings. ([#4461])(https://github.com/argilla-io/argilla/pull/4461)
    • Warning on feedback dataset settings when leaving page with unsaved changes. (#4461)
    • Added pydantic v2 support using the python SDK. (#4459)
    • Added vector_settings to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset. (#4454)
    • Added integration for sentence-transformers using SentenceTransformersExtractor to configure vector_settings in FeedbackDataset and FeedbackRecord. (#4454)
    "},{"location":"community/changelog/#changed_6","title":"Changed","text":"
    • Module argilla.cli.server definitions have been moved to argilla.server.cli module. (#4472)
    • [breaking] Changed vector_settings_by_name for generic property_by_name usage, which will return None instead of raising an error. (#4454)
    • The constant definition ES_INDEX_REGEX_PATTERN in module argilla._constants is now private. (#4472)
    • nan values in metadata properties will raise a 422 error when creating/updating records. (#4300)
    • None values are now allowed in metadata properties. (#4300)
    • Refactor and add width, height, autoplay and loop attributes as optional args in to_html functions. (#4481)
    "},{"location":"community/changelog/#fixed_10","title":"Fixed","text":"
    • Paginating to a new record, automatically scrolls down to selected form area. (#4333)
    "},{"location":"community/changelog/#deprecated_2","title":"Deprecated","text":"
    • The missing response status for filtering records is deprecated and will be removed in the release v1.24.0. Use pending instead. (#4433)
    "},{"location":"community/changelog/#removed_2","title":"Removed","text":"
    • The deprecated python -m argilla database command has been removed. (#4472)
    "},{"location":"community/changelog/#1210","title":"1.21.0","text":""},{"location":"community/changelog/#added_10","title":"Added","text":"
    • Added new draft queue for annotation view (#4334)
    • Added annotation metrics module for the FeedbackDataset (argilla.client.feedback.metrics). (#4175).
    • Added strategy to handle and translate errors from the server for 401 HTTP status code` (#4362)
    • Added integration for textdescriptives using TextDescriptivesExtractor to configure metadata_properties in FeedbackDataset and FeedbackRecord. (#4400). Contributed by @m-newhauser
    • Added POST /api/v1/me/responses/bulk endpoint to create responses in bulk for current user. (#4380)
    • Added list support for term metadata properties. (Closes #4359)
    • Added new CLI task to reindex datasets and records into the search engine. (#4404)
    • Added httpx_extra_kwargs argument to rg.init and Argilla to allow passing extra arguments to httpx.Client used by Argilla. (#4440)
    • Added ResponseStatusFilter enum in __init__ imports of Argilla (#4118). Contributed by @Piyush-Kumar-Ghosh.
    "},{"location":"community/changelog/#changed_7","title":"Changed","text":"
    • More productive and simpler shortcut system (#4215)
    • Move ArgillaSingleton, init and active_client to a new module singleton. (#4347)
    • Updated argilla.load functions to also work with FeedbackDatasets. (#4347)
    • [breaking] Updated argilla.delete functions to also work with FeedbackDatasets. It now raises an error if the dataset does not exist. (#4347)
    • Updated argilla.list_datasets functions to also work with FeedbackDatasets. (#4347)
    "},{"location":"community/changelog/#fixed_11","title":"Fixed","text":"
    • Fixed error in TextClassificationSettings.from_dict method in which the label_schema created was a list of dict instead of a list of str. (#4347)
    • Fixed total records on pagination component (#4424)
    "},{"location":"community/changelog/#removed_3","title":"Removed","text":"
    • Removed draft auto save for annotation view (#4334)
    "},{"location":"community/changelog/#1200","title":"1.20.0","text":""},{"location":"community/changelog/#added_11","title":"Added","text":"
    • Added GET /api/v1/datasets/:dataset_id/records/search/suggestions/options endpoint to return suggestion available options for searching. (#4260)
    • Added metadata_properties to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset.(#4192).
    • Added get_model_kwargs, get_trainer_kwargs, get_trainer_model, get_trainer_tokenizer and get_trainer -methods to the ArgillaTrainer to improve interoperability across frameworks. (#4214).
    • Added additional formatting checks to the ArgillaTrainer to allow for better interoperability of defaults and formatting_func usage. (#4214).
    • Added a warning to the update_config-method of ArgillaTrainer to emphasize if the kwargs were updated correctly. (#4214).
    • Added argilla.client.feedback.utils module with html_utils (this mainly includes video/audio/image_to_html that convert media to dataURL to be able to render them in tha Argilla UI and create_token_highlights to highlight tokens in a custom way. Both work on TextQuestion and TextField with use_markdown=True) and assignments (this mainly includes assign_records to assign records according to a number of annotators and records, an overlap and the shuffle option; and assign_workspace to assign and create if needed a workspace according to the record assignment). (#4121)
    "},{"location":"community/changelog/#fixed_12","title":"Fixed","text":"
    • Fixed error in ArgillaTrainer, with numerical labels, using RatingQuestion instead of RankingQuestion (#4171)
    • Fixed error in ArgillaTrainer, now we can train for extractive_question_answering using a validation sample (#4204)
    • Fixed error in ArgillaTrainer, when training for sentence-similarity it didn't work with a list of values per record (#4211)
    • Fixed error in the unification strategy for RankingQuestion (#4295)
    • Fixed TextClassificationSettings.labels_schema order was not being preserved. Closes #3828 (#4332)
    • Fixed error when requesting non-existing API endpoints. Closes #4073 (#4325)
    • Fixed error when passing draft responses to create records endpoint. (#4354)
    "},{"location":"community/changelog/#changed_8","title":"Changed","text":"
    • [breaking] Suggestions agent field only accepts now some specific characters and a limited length. (#4265)
    • [breaking] Suggestions score field only accepts now float values in the range 0 to 1. (#4266)
    • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support optional query attribute. (#4327)
    • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support filter and sort attributes. (#4327)
    • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support optional query attribute. (#4270)
    • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support filter and sort attributes. (#4270)
    • Changed the logging style while pulling and pushing FeedbackDataset to Argilla from tqdm style to rich. (#4267). Contributed by @zucchini-nlp.
    • Updated push_to_argilla to print repr of the pushed RemoteFeedbackDataset after push and changed show_progress to True by default. (#4223)
    • Changed models and tokenizer for the ArgillaTrainer to explicitly allow for changing them when needed. (#4214).
    "},{"location":"community/changelog/#1190","title":"1.19.0","text":""},{"location":"community/changelog/#added_12","title":"Added","text":"
    • Added POST /api/v1/datasets/:dataset_id/records/search endpoint to search for records without user context, including responses by all users. (#4143)
    • Added POST /api/v1/datasets/:dataset_id/vectors-settings endpoint for creating vector settings for a dataset. (#3776)
    • Added GET /api/v1/datasets/:dataset_id/vectors-settings endpoint for listing the vectors settings for a dataset. (#3776)
    • Added DELETE /api/v1/vectors-settings/:vector_settings_id endpoint for deleting a vector settings. (#3776)
    • Added PATCH /api/v1/vectors-settings/:vector_settings_id endpoint for updating a vector settings. (#4092)
    • Added GET /api/v1/records/:record_id endpoint to get a specific record. (#4039)
    • Added support to include vectors for GET /api/v1/datasets/:dataset_id/records endpoint response using include query param. (#4063)
    • Added support to include vectors for GET /api/v1/me/datasets/:dataset_id/records endpoint response using include query param. (#4063)
    • Added support to include vectors for POST /api/v1/me/datasets/:dataset_id/records/search endpoint response using include query param. (#4063)
    • Added show_progress argument to from_huggingface() method to make the progress bar for parsing records process optional.(#4132).
    • Added a progress bar for parsing records process to from_huggingface() method with trange in tqdm.(#4132).
    • Added to sort by inserted_at or updated_at for datasets with no metadata. (4147)
    • Added max_records argument to pull() method for RemoteFeedbackDataset.(#4074)
    • Added functionality to push your models to the Hugging Face hub with ArgillaTrainer.push_to_huggingface (#3976). Contributed by @Racso-3141.
    • Added filter_by argument to ArgillaTrainer to filter by response_status (#4120).
    • Added sort_by argument to ArgillaTrainer to sort by metadata (#4120).
    • Added max_records argument to ArgillaTrainer to limit record used for training (#4120).
    • Added add_vector_settings method to local and remote FeedbackDataset. (#4055)
    • Added update_vectors_settings method to local and remote FeedbackDataset. (#4122)
    • Added delete_vectors_settings method to local and remote FeedbackDataset. (#4130)
    • Added vector_settings_by_name method to local and remote FeedbackDataset. (#4055)
    • Added find_similar_records method to local and remote FeedbackDataset. (#4023)
    • Added ARGILLA_SEARCH_ENGINE environment variable to configure the search engine to use. (#4019)
    "},{"location":"community/changelog/#changed_9","title":"Changed","text":"
    • [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. (#4173)
    • [breaking] Users working with OpenSearch engines must use version >=2.4 and set ARGILLA_SEARCH_ENGINE=opensearch. (#4019 and #4111)
    • [breaking] Changed FeedbackDataset.*_by_name() methods to return None when no match is found (#4101).
    • [breaking] limit query parameter for GET /api/v1/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
    • [breaking] limit query parameter for GET /api/v1/me/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
    • Update GET /api/v1/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
    • Update GET /api/v1/me/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
    • Update POST /api/v1/datasets/:dataset_id/records endpoint to allow to create records with vectors (#4022)
    • Update PATCH /api/v1/datasets/:dataset_id endpoint to allow updating allow_extra_metadata attribute. (#4112)
    • Update PATCH /api/v1/datasets/:dataset_id/records endpoint to allow to update records with vectors. (#4062)
    • Update PATCH /api/v1/records/:record_id endpoint to allow to update record with vectors. (#4062)
    • Update POST /api/v1/me/datasets/:dataset_id/records/search endpoint to allow to search records with vectors. (#4019)
    • Update BaseElasticAndOpenSearchEngine.index_records method to also index record vectors. (#4062)
    • Update FeedbackDataset.__init__ to allow passing a list of vector settings. (#4055)
    • Update FeedbackDataset.push_to_argilla to also push vector settings. (#4055)
    • Update FeedbackDatasetRecord to support the creation of records with vectors. (#4043)
    • Using cosine similarity to compute similarity between vectors. (#4124)
    "},{"location":"community/changelog/#fixed_13","title":"Fixed","text":"
    • Fixed svg images out of screen with too large images (#4047)
    • Fixed creating records with responses from multiple users. Closes #3746 and #3808 (#4142)
    • Fixed deleting or updating responses as an owner for annotators. (Commit 403a66d)
    • Fixed passing user_id when getting records by id. (Commit 98c7927)
    • Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes #4089 (#4200)
    "},{"location":"community/changelog/#1180","title":"1.18.0","text":""},{"location":"community/changelog/#added_13","title":"Added","text":"
    • New GET /api/v1/datasets/:dataset_id/metadata-properties endpoint for listing dataset metadata properties. (#3813)
    • New POST /api/v1/datasets/:dataset_id/metadata-properties endpoint for creating dataset metadata properties. (#3813)
    • New PATCH /api/v1/metadata-properties/:metadata_property_id endpoint allowing the update of a specific metadata property. (#3952)
    • New DELETE /api/v1/metadata-properties/:metadata_property_id endpoint for deletion of a specific metadata property. (#3911)
    • New GET /api/v1/metadata-properties/:metadata_property_id/metrics endpoint to compute metrics for a specific metadata property. (#3856)
    • New PATCH /api/v1/records/:record_id endpoint to update a record. (#3920)
    • New PATCH /api/v1/dataset/:dataset_id/records endpoint to bulk update the records of a dataset. (#3934)
    • Missing validations to PATCH /api/v1/questions/:question_id. Now title and description are using the same validations used to create questions. (#3967)
    • Added TermsMetadataProperty, IntegerMetadataProperty and FloatMetadataProperty classes allowing to define metadata properties for a FeedbackDataset. (#3818)
    • Added metadata_filters to filter_by method in RemoteFeedbackDataset to filter based on metadata i.e. TermsMetadataFilter, IntegerMetadataFilter, and FloatMetadataFilter. (#3834)
    • Added a validation layer for both metadata_properties and metadata_filters in their schemas and as part of the add_records and filter_by methods, respectively. (#3860)
    • Added sort_by query parameter to listing records endpoints that allows to sort the records by inserted_at, updated_at or metadata property. (#3843)
    • Added add_metadata_property method to both FeedbackDataset and RemoteFeedbackDataset (i.e. FeedbackDataset in Argilla). (#3900)
    • Added fields inserted_at and updated_at in RemoteResponseSchema. (#3822)
    • Added support for sort_by for RemoteFeedbackDataset i.e. a FeedbackDataset uploaded to Argilla. (#3925)
    • Added metadata_properties support for both push_to_huggingface and from_huggingface. (#3947)
    • Add support for update records (metadata) from Python SDK. (#3946)
    • Added delete_metadata_properties method to delete metadata properties. (#3932)
    • Added update_metadata_properties method to update metadata_properties. (#3961)
    • Added automatic model card generation through ArgillaTrainer.save (#3857)
    • Added FeedbackDataset TaskTemplateMixin for pre-defined task templates. (#3969)
    • A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
    • New last_activity_at field to FeedbackDataset exposing when the last activity for the associated dataset occurs. (#3992)
    "},{"location":"community/changelog/#changed_10","title":"Changed","text":"
    • GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records and POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to return the total number of records. (#3848, #3903)
    • Implemented __len__ method for filtered datasets to return the number of records matching the provided filters. (#3916)
    • Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
    • Force elastic index refresh after records creation. (#3929)
    • Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
    • Using metadata property name instead of id for indexing data in search engine index. (#3994)
    "},{"location":"community/changelog/#fixed_14","title":"Fixed","text":"
    • Fixed response schemas to allow values to be None i.e. when a record is discarded the response.values are set to None. (#3926)
    "},{"location":"community/changelog/#1170","title":"1.17.0","text":""},{"location":"community/changelog/#added_14","title":"Added","text":"
    • Added fields inserted_at and updated_at in RemoteResponseSchema (#3822).
    • Added automatic model card generation through ArgillaTrainer.save (#3857).
    • Added task templates to the FeedbackDataset (#3973).
    "},{"location":"community/changelog/#changed_11","title":"Changed","text":"
    • Updated Dockerfile to use multi stage build (#3221 and #3793).
    • Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
    • Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
    • FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
    • The unify_responses support for remote datasets (#3937).
    "},{"location":"community/changelog/#fixed_15","title":"Fixed","text":"
    • Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
    • Updated active learning for text classification notebooks to pass ids of type int to TextClassificationRecord (#3831).
    • Fixed record fields validation that was preventing from logging records with optional fields (i.e. required=True) when the field value was None (#3846).
    • Always set pretrained_model_name_or_path attribute as string in ArgillaTrainer (#3914).
    • The inserted_at and updated_at attributes are create using the utcnow factory to avoid unexpected race conditions on timestamp creation (#3945)
    • Fixed configure_dataset_settings when providing the workspace via the arg workspace (#3887).
    • Fixed saving of models trained with ArgillaTrainer with a peft_config parameter (#3795).
    • Fixed backwards compatibility on from_huggingface when loading a FeedbackDataset from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829).
    • Fixed wrong __repr__ problem for TrainingTask. (#3969)
    • Fixed wrong key return error prepare_for_training_with_* for TrainingTask. (#3969)
    "},{"location":"community/changelog/#deprecated_3","title":"Deprecated","text":"
    • Function rg.configure_dataset is deprecated in favour of rg.configure_dataset_settings. The former will be removed in version 1.19.0
    "},{"location":"community/changelog/#1160","title":"1.16.0","text":""},{"location":"community/changelog/#added_15","title":"Added","text":"
    • Added ArgillaTrainer integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739)
    • Added ArgillaTrainer integration with TrainingTask.for_question_answering (#3740)
    • Added Auto save record to save automatically the current record that you are working on (#3541)
    • Added ArgillaTrainer integration with OpenAI, allowing fine tuning for chat completion (#3615)
    • Added workspaces list command to list Argilla workspaces (#3594).
    • Added datasets list command to list Argilla datasets (#3658).
    • Added users create command to create users (#3667).
    • Added whoami command to get current user (#3673).
    • Added users delete command to delete users (#3671).
    • Added users list command to list users (#3688).
    • Added workspaces delete-user command to remove a user from a workspace (#3699).
    • Added datasets list command to list Argilla datasets (#3658).
    • Added users create command to create users (#3667).
    • Added users delete command to delete users (#3671).
    • Added workspaces create command to create an Argilla workspace (#3676).
    • Added datasets push-to-hub command to push a FeedbackDataset from Argilla into the HuggingFace Hub (#3685).
    • Added info command to get info about the used Argilla client and server (#3707).
    • Added datasets delete command to delete a FeedbackDataset from Argilla (#3703).
    • Added created_at and updated_at properties to RemoteFeedbackDataset and FilteredRemoteFeedbackDataset (#3709).
    • Added handling PermissionError when executing a command with a logged in user with not enough permissions (#3717).
    • Added workspaces add-user command to add a user to workspace (#3712).
    • Added workspace_id param to GET /api/v1/me/datasets endpoint (#3727).
    • Added workspace_id arg to list_datasets in the Python SDK (#3727).
    • Added argilla script that allows to execute Argilla CLI using the argilla command (#3730).
    • Added support for passing already initialized model and tokenizer instances to the ArgillaTrainer (#3751)
    • Added server_info function to check the Argilla server information (also accessible via rg.server_info) (#3772).
    "},{"location":"community/changelog/#changed_12","title":"Changed","text":"
    • Move database commands under server group of commands (#3710)
    • server commands only included in the CLI app when server extra requirements are installed (#3710).
    • Updated PUT /api/v1/responses/{response_id} to replace values stored with received values in request (#3711).
    • Display a UserWarning when the user_id in Workspace.add_user and Workspace.delete_user is the ID of an user with the owner role as they don't require explicit permissions (#3716).
    • Rename tasks sub-package to cli (#3723).
    • Changed argilla database command in the CLI to now be accessed via argilla server database, to be deprecated in the upcoming release (#3754).
    • Changed visible_options (of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).
    "},{"location":"community/changelog/#fixed_16","title":"Fixed","text":"
    • Fixed remove user modification in text component on clear answers (#3775)
    • Fixed Highlight raw text field in dataset feedback task (#3731)
    • Fixed Field title too long (#3734)
    • Fixed error messages when deleting a DatasetForTextClassification (#3652)
    • Fixed Pending queue pagination problems when during data annotation (#3677)
    • Fixed visible_labels default value to be 20 just when visible_labels not provided and len(labels) > 20, otherwise it will either be the provided visible_labels value or None, for LabelQuestion and MultiLabelQuestion (#3702).
    • Fixed DatasetCard generation when RemoteFeedbackDataset contains suggestions (#3718).
    • Add missing draft status in ResponseSchema as now there can be responses with draft status when annotating via the UI (#3749).
    • Searches when queried words are distributed along the record fields (#3759).
    • Fixed Python 3.11 compatibility issue with /api/datasets endpoints due to the TaskType enum replacement in the endpoint URL (#3769).
    • Fixed RankingValueSchema and FeedbackRankingValueModel schemas to allow rank=None when status=draft (#3781).
    "},{"location":"community/changelog/#1151","title":"1.15.1","text":""},{"location":"community/changelog/#fixed_17","title":"Fixed","text":"
    • Fixed Text component text content sanitization behavior just for markdown to prevent disappear the text(#3738)
    • Fixed Text component now you need to press Escape to exit the text area (#3733)
    • Fixed SearchEngine was creating the same number of primary shards and replica shards for each FeedbackDataset (#3736).
    "},{"location":"community/changelog/#1150","title":"1.15.0","text":""},{"location":"community/changelog/#added_16","title":"Added","text":"
    • Added Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI (#3489)
    • Added ArgillaTrainer integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467)
    • Added formatting_func to ArgillaTrainer for FeedbackDataset datasets add a custom formatting for the data (#3599).
    • Added login function in argilla.client.login to login into an Argilla server and store the credentials locally (#3582).
    • Added login command to login into an Argilla server (#3600).
    • Added logout command to logout from an Argilla server (#3605).
    • Added DELETE /api/v1/suggestions/{suggestion_id} endpoint to delete a suggestion given its ID (#3617).
    • Added DELETE /api/v1/records/{record_id}/suggestions endpoint to delete several suggestions linked to the same record given their IDs (#3617).
    • Added response_status param to GET /api/v1/datasets/{dataset_id}/records to be able to filter by response_status as previously included for GET /api/v1/me/datasets/{dataset_id}/records (#3613).
    • Added list classmethod to ArgillaMixin to be used as FeedbackDataset.list(), also including the workspace to list from as arg (#3619).
    • Added filter_by method in RemoteFeedbackDataset to filter based on response_status (#3610).
    • Added list_workspaces function (to be used as rg.list_workspaces, but Workspace.list is preferred) to list all the workspaces from an user in Argilla (#3641).
    • Added list_datasets function (to be used as rg.list_datasets) to list the TextClassification, TokenClassification, and Text2Text datasets in Argilla (#3638).
    • Added RemoteSuggestionSchema to manage suggestions in Argilla, including the delete method to delete suggestios from Argilla via DELETE /api/v1/suggestions/{suggestion_id} (#3651).
    • Added delete_suggestions to RemoteFeedbackRecord to remove suggestions from Argilla via DELETE /api/v1/records/{record_id}/suggestions (#3651).
    "},{"location":"community/changelog/#changed_13","title":"Changed","text":"
    • Changed Optional label for * mark for required question (#3608)
    • Updated RemoteFeedbackDataset.delete_records to use batch delete records endpoint (#3580).
    • Included allowed_for_roles for some RemoteFeedbackDataset, RemoteFeedbackRecords, and RemoteFeedbackRecord methods that are only allowed for users with roles owner and admin (#3601).
    • Renamed ArgillaToFromMixin to ArgillaMixin (#3619).
    • Move users CLI app under database CLI app (#3593).
    • Move server Enum classes to argilla.server.enums module (#3620).
    "},{"location":"community/changelog/#fixed_18","title":"Fixed","text":"
    • Fixed Filter by workspace in breadcrumbs (#3577)
    • Fixed Filter by workspace in datasets table (#3604)
    • Fixed Query search highlight for Text2Text and TextClassification (#3621)
    • Fixed RatingQuestion.values validation to raise a ValidationError when values are out of range i.e. [1, 10] (#3626).
    "},{"location":"community/changelog/#removed_4","title":"Removed","text":"
    • Removed multi_task_text_token_classification from TaskType as not used (#3640).
    • Removed argilla_id in favor of id from RemoteFeedbackDataset (#3663).
    • Removed fetch_records from RemoteFeedbackDataset as now the records are lazily fetched from Argilla (#3663).
    • Removed push_to_argilla from RemoteFeedbackDataset, as it just works when calling it through a FeedbackDataset locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663).
    • Removed set_suggestions in favor of update(suggestions=...) for both FeedbackRecord and RemoteFeedbackRecord, as all the updates of any \"updateable\" attribute of a record will go through update instead (#3663).
    • Remove unused owner attribute for client Dataset data model (#3665)
    "},{"location":"community/changelog/#1141","title":"1.14.1","text":""},{"location":"community/changelog/#fixed_19","title":"Fixed","text":"
    • Fixed PostgreSQL database not being updated after begin_nested because of missing commit (#3567).
    "},{"location":"community/changelog/#fixed_20","title":"Fixed","text":"
    • Fixed settings could not be provided when updating a rating or ranking question (#3552).
    "},{"location":"community/changelog/#1140","title":"1.14.0","text":""},{"location":"community/changelog/#added_17","title":"Added","text":"
    • Added PATCH /api/v1/fields/{field_id} endpoint to update the field title and markdown settings (#3421).
    • Added PATCH /api/v1/datasets/{dataset_id} endpoint to update dataset name and guidelines (#3402).
    • Added PATCH /api/v1/questions/{question_id} endpoint to update question title, description and some settings (depending on the type of question) (#3477).
    • Added DELETE /api/v1/records/{record_id} endpoint to remove a record given its ID (#3337).
    • Added pull method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) to pull all the records from it and return it as a local copy as a FeedbackDataset (#3465).
    • Added delete method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) (#3512).
    • Added delete_records method in RemoteFeedbackDataset, and delete method in RemoteFeedbackRecord to delete records from Argilla (#3526).
    "},{"location":"community/changelog/#changed_14","title":"Changed","text":"
    • Improved efficiency of weak labeling when dataset contains vectors (#3444).
    • Added ArgillaDatasetMixin to detach the Argilla-related functionality from the FeedbackDataset (#3427)
    • Moved FeedbackDataset-related pydantic.BaseModel schemas to argilla.client.feedback.schemas instead, to be better structured and more scalable and maintainable (#3427)
    • Update CLI to use database async connection (#3450).
    • Limit rating questions values to the positive range [1, 10] (#3451).
    • Updated POST /api/users endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462).
    • Updated Python client User.create method to be able to provide a list of workspace names to which the user should be linked to (#3462).
    • Updated GET /api/v1/me/datasets/{dataset_id}/records endpoint to allow getting records matching one of the response statuses provided via query param (#3359).
    • Updated POST /api/v1/me/datasets/{dataset_id}/records endpoint to allow searching records matching one of the response statuses provided via query param (#3359).
    • Updated SearchEngine.search method to allow searching records matching one of the response statuses provided (#3359).
    • After calling FeedbackDataset.push_to_argilla, the methods FeedbackDataset.add_records and FeedbackRecord.set_suggestions will automatically call Argilla with no need of calling push_to_argilla explicitly (#3465).
    • Now calling FeedbackDataset.push_to_huggingface dumps the responses as a List[Dict[str, Any]] instead of Sequence to make it more readable via \ud83e\udd17datasets (#3539).
    "},{"location":"community/changelog/#fixed_21","title":"Fixed","text":"
    • Fixed issue with bool values and default from Jinja2 while generating the HuggingFace DatasetCard from argilla_template.md (#3499).
    • Fixed DatasetConfig.from_yaml which was failing when calling FeedbackDataset.from_huggingface as the UUIDs cannot be deserialized automatically by PyYAML, so UUIDs are neither dumped nor loaded anymore (#3502).
    • Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
    • TextClassificationSettings and TokenClassificationSettings labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).
    • Fixed PUT /api/v1/datasets/{dataset_id}/publish to check whether at least one field and question has required=True (#3511).
    • Fixed FeedbackDataset.from_huggingface as suggestions were being lost when there were no responses (#3539).
    • Fixed QuestionSchema and FieldSchema not validating name attribute (#3550).
    "},{"location":"community/changelog/#deprecated_4","title":"Deprecated","text":"
    • After calling FeedbackDataset.push_to_argilla, calling push_to_argilla again won't do anything since the dataset is already pushed to Argilla (#3465).
    • After calling FeedbackDataset.push_to_argilla, calling fetch_records won't do anything since the records are lazily fetched from Argilla (#3465).
    • After calling FeedbackDataset.push_to_argilla, the Argilla ID is no longer stored in the attribute/property argilla_id but in id instead (#3465).
    "},{"location":"community/changelog/#1133","title":"1.13.3","text":""},{"location":"community/changelog/#fixed_22","title":"Fixed","text":"
    • Fixed ModuleNotFoundError caused because the argilla.utils.telemetry module used in the ArgillaTrainer was importing an optional dependency not installed by default (#3471).
    • Fixed ImportError caused because the argilla.client.feedback.config module was importing pyyaml optional dependency not installed by default (#3471).
    "},{"location":"community/changelog/#1132","title":"1.13.2","text":""},{"location":"community/changelog/#fixed_23","title":"Fixed","text":"
    • The suggestion_type_enum ENUM data type created in PostgreSQL didn't have any value (#3445).
    "},{"location":"community/changelog/#1131","title":"1.13.1","text":""},{"location":"community/changelog/#fixed_24","title":"Fixed","text":"
    • Fix database migration for PostgreSQL (See #3438)
    "},{"location":"community/changelog/#1130","title":"1.13.0","text":""},{"location":"community/changelog/#added_18","title":"Added","text":"
    • Added GET /api/v1/users/{user_id}/workspaces endpoint to list the workspaces to which a user belongs (#3308 and #3343).
    • Added HuggingFaceDatasetMixin for internal usage, to detach the FeedbackDataset integrations from the class itself, and use Mixins instead (#3326).
    • Added GET /api/v1/records/{record_id}/suggestions API endpoint to get the list of suggestions for the responses associated to a record (#3304).
    • Added POST /api/v1/records/{record_id}/suggestions API endpoint to create a suggestion for a response associated to a record (#3304).
    • Added support for RankingQuestionStrategy, RankingQuestionUnification and the .for_text_classification method for the TrainingTaskMapping (#3364)
    • Added PUT /api/v1/records/{record_id}/suggestions API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391).
    • Added suggestions attribute to FeedbackRecord, and allow adding and retrieving suggestions from the Python client (#3370)
    • Added allowed_for_roles Python decorator to check whether the current user has the required role to access the decorated function/method for User and Workspace (#3383)
    • Added API and Python Client support for workspace deletion (Closes #3260)
    • Added GET /api/v1/me/workspaces endpoint to list the workspaces of the current active user (#3390)
    "},{"location":"community/changelog/#changed_15","title":"Changed","text":"
    • Updated output payload for GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records, POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to include the suggestions of the records based on the value of the include query parameter (#3304).
    • Updated POST /api/v1/datasets/{dataset_id}/records input payload to add suggestions (#3304).
    • The POST /api/datasets/:dataset-id/:task/bulk endpoints don't create the dataset if does not exists (Closes #3244)
    • Added Telemetry support for ArgillaTrainer (closes #3325)
    • User.workspaces is no longer an attribute but a property, and is calling list_user_workspaces to list all the workspace names for a given user ID (#3334)
    • Renamed FeedbackDatasetConfig to DatasetConfig and export/import from YAML as default instead of JSON (just used internally on push_to_huggingface and from_huggingface methods of FeedbackDataset) (#3326).
    • The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
    • Updated Dockerfile parent image from python:3.9.16-slim to python:3.10.12-slim (#3425).
    • Updated quickstart.Dockerfile parent image from elasticsearch:8.5.3 to argilla/argilla-server:${ARGILLA_VERSION} (#3425).
    "},{"location":"community/changelog/#removed_5","title":"Removed","text":"
    • Removed support to non-prefixed environment variables. All valid env vars start with ARGILLA_ (See #3392).
    "},{"location":"community/changelog/#fixed_25","title":"Fixed","text":"
    • Fixed GET /api/v1/me/datasets/{dataset_id}/records endpoint returning always the responses for the records even if responses was not provided via the include query parameter (#3304).
    • Values for protected metadata fields are not truncated (Closes #3331).
    • Big number ids are properly rendered in UI (Closes #3265)
    • Fixed ArgillaDatasetCard to include the values/labels for all the existing questions (#3366)
    "},{"location":"community/changelog/#deprecated_5","title":"Deprecated","text":"
    • Integer support for record id in text classification, token classification and text2text datasets.
    "},{"location":"community/changelog/#1121","title":"1.12.1","text":""},{"location":"community/changelog/#fixed_26","title":"Fixed","text":"
    • Using rg.init with default argilla user skips setting the default workspace if not available. (Closes #3340)
    • Resolved wrong import structure for ArgillaTrainer and TrainingTaskMapping (Closes #3345)
    • Pin pydantic dependency to version < 2 (Closes 3348)
    "},{"location":"community/changelog/#1120","title":"1.12.0","text":""},{"location":"community/changelog/#added_19","title":"Added","text":"
    • Added RankingQuestionSettings class allowing to create ranking questions in the API using POST /api/v1/datasets/{dataset_id}/questions endpoint (#3232)
    • Added RankingQuestion in the Python client to create ranking questions (#3275).
    • Added Ranking component in feedback task question form (#3177 & #3246).
    • Added FeedbackDataset.prepare_for_training method for generaring a framework-specific dataset with the responses provided for RatingQuestion, LabelQuestion and MultiLabelQuestion (#3151).
    • Added ArgillaSpaCyTransformersTrainer class for supporting the training with spacy-transformers (#3256).
    "},{"location":"community/changelog/#docs","title":"Docs","text":"
    • Added instructions for how to run the Argilla frontend in the developer docs (#3314).
    "},{"location":"community/changelog/#changed_16","title":"Changed","text":"
    • All docker related files have been moved into the docker folder (#3053).
    • release.Dockerfile have been renamed to Dockerfile (#3133).
    • Updated rg.load function to raise a ValueError with a explanatory message for the cases in which the user tries to use the function to load a FeedbackDataset (#3289).
    • Updated ArgillaSpaCyTrainer to allow re-using tok2vec (#3256).
    "},{"location":"community/changelog/#fixed_27","title":"Fixed","text":"
    • Check available workspaces on Argilla on rg.set_workspace (Closes #3262)
    "},{"location":"community/changelog/#1110","title":"1.11.0","text":""},{"location":"community/changelog/#fixed_28","title":"Fixed","text":"
    • Replaced np.float alias by float to avoid AttributeError when using find_label_errors function with numpy>=1.24.0 (#3214).
    • Fixed format_as(\"datasets\") when no responses or optional respones in FeedbackRecord, to set their value to what \ud83e\udd17 Datasets expects instead of just None (#3224).
    • Fixed push_to_huggingface() when generate_card=True (default behaviour), as we were passing a sample record to the ArgillaDatasetCard class, and UUIDs introduced in 1.10.0 (#3192), are not JSON-serializable (#3231).
    • Fixed from_argilla and push_to_argilla to ensure consistency on both field and question re-construction, and to ensure UUIDs are properly serialized as str, respectively (#3234).
    • Refactored usage of import argilla as rg to clarify package navigation (#3279).
    "},{"location":"community/changelog/#docs_1","title":"Docs","text":"
    • Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
    • Fixed library buttons' formatting on Tutorials page (#3255).
    • Modified styling of error code outputs in notebooks (#3270).
    • Added ElasticSearch and OpenSearch versions (#3280).
    • Removed template notebook from table of contents (#3271).
    • Fixed tutorials with pip install argilla to not use older versions of the package (#3282).
    "},{"location":"community/changelog/#added_20","title":"Added","text":"
    • Added metadata attribute to the Record of the FeedbackDataset (#3194)
    • New users update command to update the role for an existing user (#3188)
    • New Workspace class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180)
    • Added User class to let users manage their Argilla users via the Python client (#3169).
    • Added an option to display tqdm progress bar to FeedbackDataset.push_to_argilla when looping over the records to upload (#3233).
    "},{"location":"community/changelog/#changed_17","title":"Changed","text":"
    • The role system now support three different roles owner, admin and annotator (#3104)
    • admin role is scoped to workspace-level operations (#3115)
    • The owner user is created among the default pool of users in the quickstart, and the default user in the server has now owner role (#3248), reverting (#3188).
    "},{"location":"community/changelog/#deprecated_6","title":"Deprecated","text":"
    • As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
    "},{"location":"community/changelog/#1100","title":"1.10.0","text":""},{"location":"community/changelog/#added_21","title":"Added","text":"
    • Added search component for feedback datasets (#3138)
    • Added markdown support for feedback dataset guidelines (#3153)
    • Added Train button for feedback datasets (#3170)
    "},{"location":"community/changelog/#changed_18","title":"Changed","text":"
    • Updated SearchEngine and POST /api/v1/me/datasets/{dataset_id}/records/search to return the total number of records matching the search query (#3166)
    "},{"location":"community/changelog/#fixed_29","title":"Fixed","text":"
    • Replaced Enum for string value in URLs for client API calls (Closes #3149)
    • Resolve breaking issue with ArgillaSpanMarkerTrainer for Named Entity Recognition with span_marker v1.1.x onwards.
    • Move ArgillaDatasetCard import under @requires_version decorator, so that the ImportError on huggingface_hub is handled properly (#3174)
    • Allow flow FeedbackDataset.from_argilla -> FeedbackDataset.push_to_argilla under different dataset names and/or workspaces (#3192)
    "},{"location":"community/changelog/#docs_2","title":"Docs","text":"
    • Resolved typos in the docs (#3240).
    • Fixed mention of master branch (#3254).
    "},{"location":"community/changelog/#190","title":"1.9.0","text":""},{"location":"community/changelog/#added_22","title":"Added","text":"
    • Added boolean use_markdown property to TextFieldSettings model.
    • Added boolean use_markdown property to TextQuestionSettings model.
    • Added new status draft for the Response model.
    • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API (#3005)
    • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API (#3010).
    • Added POST /api/v1/me/datasets/{dataset_id}/records/search endpoint (#3068).
    • Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
    • Added docstrings to the pydantic.BaseModels defined at argilla/client/feedback/schemas.py (#3137)
    • Added the information about executing tests in the developer documentation ([#3143]).
    "},{"location":"community/changelog/#changed_19","title":"Changed","text":"
    • Updated GET /api/v1/me/datasets/:dataset_id/metrics output payload to include the count of responses with draft status.
    • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API.
    • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API.
    • Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
    • Updated alembic setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044)
    • Improved DatasetCard generation on FeedbackDataset.push_to_huggingface when generate_card=True, following the official HuggingFace Hub template, but suited to FeedbackDatasets from Argilla (#3110)
    "},{"location":"community/changelog/#fixed_30","title":"Fixed","text":"
    • Disallow fields and questions in FeedbackDataset with the same name (#3126).
    • Fixed broken links in the documentation and updated the development branch name from development to develop ([#3145]).
    "},{"location":"community/changelog/#180","title":"1.8.0","text":""},{"location":"community/changelog/#added_23","title":"Added","text":"
    • /api/v1/datasets new endpoint to list and create datasets (#2615).
    • /api/v1/datasets/{dataset_id} new endpoint to get and delete datasets (#2615).
    • /api/v1/datasets/{dataset_id}/publish new endpoint to publish a dataset (#2615).
    • /api/v1/datasets/{dataset_id}/questions new endpoint to list and create dataset questions (#2615)
    • /api/v1/datasets/{dataset_id}/fields new endpoint to list and create dataset fields (#2615)
    • /api/v1/datasets/{dataset_id}/questions/{question_id} new endpoint to delete a dataset questions (#2615)
    • /api/v1/datasets/{dataset_id}/fields/{field_id} new endpoint to delete a dataset field (#2615)
    • /api/v1/workspaces/{workspace_id} new endpoint to get workspaces by id (#2615)
    • /api/v1/responses/{response_id} new endpoint to update and delete a response (#2615)
    • /api/v1/datasets/{dataset_id}/records new endpoint to create and list dataset records (#2615)
    • /api/v1/me/datasets new endpoint to list user visible datasets (#2615)
    • /api/v1/me/dataset/{dataset_id}/records new endpoint to list dataset records with user responses (#2615)
    • /api/v1/me/datasets/{dataset_id}/metrics new endpoint to get the dataset user metrics (#2615)
    • /api/v1/me/records/{record_id}/responses new endpoint to create record user responses (#2615)
    • showing new feedback task datasets in datasets list ([#2719])
    • new page for feedback task ([#2680])
    • show feedback task metrics ([#2822])
    • user can delete dataset in dataset settings page ([#2792])
    • Support for FeedbackDataset in Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003])
    • Integration with the HuggingFace Hub ([#2949])
    • Added ArgillaPeftTrainer for text and token classificaiton #2854
    • Added predict_proba() method to ArgillaSetFitTrainer
    • Added ArgillaAutoTrainTrainer for Text Classification #2664
    • New database revisions command showing database revisions info
    "},{"location":"community/changelog/#fixes","title":"Fixes","text":"
    • Avoid rendering html for invalid html strings in Text2text ([#2911]https://github.com/argilla-io/argilla/issues/2911)
    "},{"location":"community/changelog/#changed_20","title":"Changed","text":"
    • The database migrate command accepts a --revision param to provide specific revision id
    • tokens_length metrics function returns empty data (#3045)
    • token_length metrics function returns empty data (#3045)
    • mention_length metrics function returns empty data (#3045)
    • entity_density metrics function returns empty data (#3045)
    "},{"location":"community/changelog/#deprecated_7","title":"Deprecated","text":"
    • Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
    • tokens_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
    • token_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
    • mention_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
    • entity_density metrics function has been deprecated and will be removed in 1.10.0 (#3045)
    "},{"location":"community/changelog/#removed_6","title":"Removed","text":"
    • Removed mention density, tokens_length and chars_length metrics from token classification metrics storage (#3045)
    • Removed token char_start, char_end, tag, and score metrics from token classification metrics storage (#3045)
    • Removed tags-related metrics from token classification metrics storage (#3045)
    "},{"location":"community/changelog/#170","title":"1.7.0","text":""},{"location":"community/changelog/#added_24","title":"Added","text":"
    • add max_retries and num_threads parameters to rg.log to run data logging request concurrently with backoff retry policy. See #2458 and #2533
    • rg.load accepts include_vectors and include_metrics when loading data. Closes #2398
    • Added settings param to prepare_for_training (#2689)
    • Added prepare_for_training for openai (#2658)
    • Added ArgillaOpenAITrainer (#2659)
    • Added ArgillaSpanMarkerTrainer for Named Entity Recognition (#2693)
    • Added ArgillaTrainer CLI support. Closes (#2809)
    "},{"location":"community/changelog/#fixes_1","title":"Fixes","text":"
    • fix image alignment on token classification
    "},{"location":"community/changelog/#changed_21","title":"Changed","text":"
    • Argilla quickstart image dependencies are externalized into quickstart.requirements.txt. See #2666
    • bulk endpoints will upsert data when record id is present. Closes #2535
    • moved from click to typer CLI support. Closes (#2815)
    • Argilla server docker image is built with PostgreSQL support. Closes #2686
    • The rg.log computes all batches and raise an error for all failed batches.
    • The default batch size for rg.log is now 100.
    "},{"location":"community/changelog/#fixed_31","title":"Fixed","text":"
    • argilla.training bugfixes and unification (#2665)
    • Resolved several small bugs in the ArgillaTrainer.
    "},{"location":"community/changelog/#deprecated_8","title":"Deprecated","text":"
    • The rg.log_async function is deprecated and will be removed in next minor release.
    "},{"location":"community/changelog/#160","title":"1.6.0","text":""},{"location":"community/changelog/#added_25","title":"Added","text":"
    • ARGILLA_HOME_PATH new environment variable (#2564).
    • ARGILLA_DATABASE_URL new environment variable (#2564).
    • Basic support for user roles with admin and annotator (#2564).
    • id, first_name, last_name, role, inserted_at and updated_at new user fields (#2564).
    • /api/users new endpoint to list and create users (#2564).
    • /api/users/{user_id} new endpoint to delete users (#2564).
    • /api/workspaces new endpoint to list and create workspaces (#2564).
    • /api/workspaces/{workspace_id}/users new endpoint to list workspace users (#2564).
    • /api/workspaces/{workspace_id}/users/{user_id} new endpoint to create and delete workspace users (#2564).
    • argilla.tasks.users.migrate new task to migrate users from old YAML file to database (#2564).
    • argilla.tasks.users.create new task to create a user (#2564).
    • argilla.tasks.users.create_default new task to create a user with default credentials (#2564).
    • argilla.tasks.database.migrate new task to execute database migrations (#2564).
    • release.Dockerfile and quickstart.Dockerfile now creates a default argilladata volume to persist data (#2564).
    • Add user settings page. Closes #2496
    • Added Argilla.training module with support for spacy, setfit, and transformers. Closes #2504
    "},{"location":"community/changelog/#fixes_2","title":"Fixes","text":"
    • Now the prepare_for_training method is working when multi_label=True. Closes #2606
    "},{"location":"community/changelog/#changed_22","title":"Changed","text":"
    • ARGILLA_USERS_DB_FILE environment variable now it's only used to migrate users from YAML file to database (#2564).
    • full_name user field is now deprecated and first_name and last_name should be used instead (#2564).
    • password user field now requires a minimum of 8 and a maximum of 100 characters in size (#2564).
    • quickstart.Dockerfile image default users from team and argilla to admin and annotator including new passwords and API keys (#2564).
    • Datasets to be managed only by users with admin role (#2564).
    • The list of rules is now accessible while metrics are computed. Closes#2117
    • Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648
    "},{"location":"community/changelog/#removed_7","title":"Removed","text":"
    • email user field (#2564).
    • disabled user field (#2564).
    • Support for private workspaces (#2564).
    • ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY and ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD environment variables. Use python -m argilla.tasks.users.create_default instead (#2564).
    • The old headers for API Key and workspace from python client
    • The default value for old API Key constant. Closes #2251
    "},{"location":"community/changelog/#151-2023-03-30","title":"1.5.1 - 2023-03-30","text":""},{"location":"community/changelog/#fixes_3","title":"Fixes","text":"
    • Copying datasets between workspaces with proper owner/workspace info. Closes #2562
    • Copy dataset with empty workspace to the default user workspace 905d4de
    • Using elasticsearch config to request backend version. Closes #2311
    • Remove sorting by score in labels. Closes #2622
    "},{"location":"community/changelog/#changed_23","title":"Changed","text":"
    • Update field name in metadata for image url. See #2609
    • Improvements in tutorial doc cards. Closes #2216
    "},{"location":"community/changelog/#150-2023-03-21","title":"1.5.0 - 2023-03-21","text":""},{"location":"community/changelog/#added_26","title":"Added","text":"
    • Add the fields to retrieve when loading the data from argilla. rg.load takes too long because of the vector field, even when users don't need it. Closes #2398
    • Add new page and components for dataset settings. Closes #2442
    • Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
    • Non-searchable fields support in metadata. #2570
    • Add record ID references to the prepare for training methods. Closes #2483
    • Add tutorial on Image Classification. #2420
    • Add Train button, visible for \"admin\" role, with code snippets from a selection of libraries. Closes [#2591] (https://github.com/argilla-io/argilla/pull/2591)
    "},{"location":"community/changelog/#changed_24","title":"Changed","text":"
    • Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see https://github.com/argilla-io/argilla/issues/2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
    • The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
    • Update \"Define a labeling schema\" section in docs.
    • The record inputs are sorted alphabetically in UI by default. #2581
    • The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587
    "},{"location":"community/changelog/#fixes_4","title":"Fixes","text":"
    • Allow URL to be clickable in Jupyter notebook again. Closes #2527
    "},{"location":"community/changelog/#removed_8","title":"Removed","text":"
    • Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client <v1.3.0
    • Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version <1.3.0
    • Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.
    "},{"location":"community/contributor/","title":"How to contribute?","text":"

    Thank you for investing your time in contributing to the project! Any contribution you make will be reflected in the most recent version of Argilla \ud83e\udd29.

    New to contributing in general?

    If you're a new contributor, read the README to get an overview of the project. In addition, here are some resources to help you get started with open-source contributions:

    • Discord: You are welcome to join the Argilla Discord community, where you can keep in touch with other users, contributors and the Argilla team. In the following section, you can find more information on how to get started in Discord.
    • Git: This is a very useful tool to keep track of the changes in your files. Using the command-line interface (CLI), you can make your contributions easily. For that, you need to have it installed and updated on your computer.
    • GitHub: It is a platform and cloud-based service that uses git and allows developers to collaborate on projects. To contribute to Argilla, you'll need to create an account. Check the Contributor Workflow with Git and Github for more info.
    • Developer Documentation: To collaborate, you'll need to set up an efficient environment. Check the Server and Frontend READMEs to know how to do it.
    • Schedule a meeting with our developer advocate: If you have more questions, do not hesitate to contact our developer advocate and schedule a meeting.
    "},{"location":"community/contributor/#first-contact-in-discord","title":"First Contact in Discord","text":"

    Discord is a handy tool for more casual conversations and to answer day-to-day questions. As part of Hugging Face, we have set up some Argilla channels on the server. Click here to join the Hugging Face Discord community effortlessly.

    When part of the Hugging Face Discord, you can select \"Channels & roles\" and select \"Argilla\" along with any of the other groups that are interesting to you. \"Argilla\" will cover anything about argilla and distilabel. You can join the following channels:

    • #argilla-distilabel-general: \ud83d\udce3 Stay up-to-date and general discussions.
    • #argilla-distilabel-help: \ud83d\ude4b\u200d\u2640\ufe0f Need assistance? We're always here to help. Select the appropriate label (argilla or distilabel) for your issue and post it.

    So now there is only one thing left to do: introduce yourself and talk to the community. You'll always be welcome! \ud83e\udd17\ud83d\udc4b

    "},{"location":"community/contributor/#contributor-workflow-with-git-and-github","title":"Contributor Workflow with Git and GitHub","text":"

    If you're working with Argilla and suddenly a new idea comes to your mind or you find an issue that can be improved, it's time to actively participate and contribute to the project!

    "},{"location":"community/contributor/#report-an-issue","title":"Report an issue","text":"

    If you spot a problem, search if an issue already exists. You can use the Label filter. If that is the case, participate in the conversation. If it does not exist, create an issue by clicking on New Issue.

    This will show various templates, choose the one that best suits your issue.

    Below, you can see an example of the Feature request template. Once you choose one, you will need to fill in it following the guidelines. Try to be as clear as possible. In addition, you can assign yourself to the issue and add or choose the right labels. Finally, click on Submit new issue.

    "},{"location":"community/contributor/#work-with-a-fork","title":"Work with a fork","text":""},{"location":"community/contributor/#fork-the-argilla-repository","title":"Fork the Argilla repository","text":"

    After having reported the issue, you can start working on it. For that, you will need to create a fork of the project. To do that, click on the Fork button.

    Now, fill in the information. Remember to uncheck the Copy develop branch only if you are going to work in or from another branch (for instance, to fix documentation the main branch is used). Then, click on Create fork.

    Now, you will be redirected to your fork. You can see that you are in your fork because the name of the repository will be your username/argilla, and it will indicate forked from argilla-io/argilla.

    "},{"location":"community/contributor/#clone-your-forked-repository","title":"Clone your forked repository","text":"

    In order to make the required adjustments, clone the forked repository to your local machine. Choose the destination folder and run the following command:

    git clone https://github.com/[your-github-username]/argilla.git\ncd argilla\n

    To keep your fork\u2019s main/develop branch up to date with our repo, add it as an upstream remote branch.

    git remote add upstream https://github.com/argilla-io/argilla.git\n
    "},{"location":"community/contributor/#create-a-new-branch","title":"Create a new branch","text":"

    For each issue you're addressing, it's advisable to create a new branch. GitHub offers a straightforward method to streamline this process.

    \u26a0\ufe0f Never work directly on the main or develop branch. Always create a new branch for your changes.

    Navigate to your issue and on the right column, select Create a branch.

    After the new window pops up, the branch will be named after the issue, include a prefix such as feature/, bug/, or docs/ to facilitate quick recognition of the issue type. In the Repository destination, pick your fork ( [your-github-username]/argilla), and then select Change branch source to specify the source branch for creating the new one. Complete the process by clicking Create branch.

    \ud83e\udd14 Remember that the main branch is only used to work with the documentation. For any other changes, use the develop branch.

    Now, locally change to the new branch you just created.

    git fetch origin\ngit checkout [branch-name]\n
    "},{"location":"community/contributor/#use-changelogmd","title":"Use CHANGELOG.md","text":"

    If you are working on a new feature, it is a good practice to make note of it for others to keep up with the changes. For that, we utilize the CHANGELOG.md file in the root directory. This file is used to list changes made in each version of the project and there are headers that we use to denote each type of change.

    • Added: for new features.
    • Changed: for changes in existing functionality.
    • Deprecated: for soon-to-be removed features.
    • Removed: for now removed features.
    • Fixed: for any bug fixes.
    • Security: in case of vulnerabilities.

    A sample addition would be:

    - Fixed the key errors for the `init` method ([#NUMBER_OF_PR](LINK_TO_PR)). Contributed by @github_handle.\n

    You can have a look at the CHANGELOG.md) file to see more cases and examples.

    "},{"location":"community/contributor/#make-changes-and-push-them","title":"Make changes and push them","text":"

    Make the changes you want in your local repository, and test that everything works and you are following the guidelines.

    Once you have finished, you can check the status of your repository and synchronize with the upstreaming repo with the following command:

    # Check the status of your repository\ngit status\n\n# Synchronize with the upstreaming repo\ngit checkout [branch-name]\ngit rebase [default-branch]\n

    If everything is right, we need to commit and push the changes to your fork. For that, run the following commands:

    # Add the changes to the staging area\ngit add filename\n\n# Commit the changes by writing a proper message\ngit commit -m \"commit-message\"\n\n# Push the changes to your fork\ngit push origin [branch-name]\n

    When pushing, you will be asked to enter your GitHub login credentials. Once the push is complete, all local commits will be on your GitHub repository.

    "},{"location":"community/contributor/#create-a-pull-request","title":"Create a pull request","text":"

    Come back to GitHub, navigate to the original repository where you created your fork, and click on Compare & pull request.

    First, click on compare across forks and select the right repositories and branches.

    In the base repository, keep in mind to select either main or develop based on the modifications made. In the head repository, indicate your forked repository and the branch corresponding to the issue.

    Then, fill in the pull request template. You should add a prefix to the PR name as we did with the branch above. If you are working on a new feature, you can name your PR as feat: TITLE. If your PR consists of a solution for a bug, you can name your PR as bug: TITLE And, if your work is for improving the documentation, you can name your PR as docs: TITLE.

    In addition, on the right side, you can select a reviewer (for instance, if you discussed the issue with a member of the Argilla team) and assign the pull request to yourself. It is highly advisable to add labels to PR as well. You can do this again by the labels section right to the screen. For instance, if you are addressing a bug, add the bug label or if the PR is related to the documentation, add the documentation label. This way, PRs can be easily filtered.

    Finally, fill in the template carefully and follow the guidelines. Remember to link the original issue and enable the checkbox to allow maintainer edits so the branch can be updated for a merge. Then, click on Create pull request.

    "},{"location":"community/contributor/#review-your-pull-request","title":"Review your pull request","text":"

    Once you submit your PR, a team member will review your proposal. We may ask questions, request additional information or ask for changes to be made before a PR can be merged, either using suggested changes or pull request comments.

    You can apply the changes directly through the UI (check the files changed and click on the right-corner three dots, see image below) or from your fork, and then commit them to your branch. The PR will be updated automatically and the suggestions will appear as outdated.

    If you run into any merge issues, check out this git tutorial to help you resolve merge conflicts and other issues.

    "},{"location":"community/contributor/#your-pr-is-merged","title":"Your PR is merged!","text":"

    Congratulations \ud83c\udf89\ud83c\udf8a We thank you \ud83e\udd29

    Once your PR is merged, your contributions will be publicly visible on the Argilla GitHub.

    Additionally, we will include your changes in the next release based on our development branch.

    "},{"location":"community/contributor/#additional-resources","title":"Additional resources","text":"

    Here are some helpful resources for your reference.

    • Configuring Discord, a guide to learn how to get started with Discord.
    • Pro Git, a book to learn Git.
    • Git in VSCode, a guide to learn how to easily use Git in VSCode.
    • GitHub Skills, an interactive course to learn GitHub.
    "},{"location":"community/popular_issues/","title":"Issue dashboard","text":"Most engaging open issuesLatest issues open by the communityPlanned issues for upcoming releases Rank Issue Reactions Comments 1 4637 - [FEATURE] Label breakdown in Feedback dataset stats \ud83d\udc4d 6 \ud83d\udcac 3 2 1607 - Support for hierarchical multilabel text classification (taxonomy) \ud83d\udc4d 4 \ud83d\udcac 15 3 4658 - Active listeners for Feedback Dataset \ud83d\udc4d 4 \ud83d\udcac 5 4 4867 - [FEATURE] Add dependencies to support dev mode on HF Spaces \ud83d\udc4d 3 \ud83d\udcac 5 5 4964 - [DOCS] Improvements to docker-compose getting started installation guide \ud83d\udc4d 3 \ud83d\udcac 4 6 3338 - [FEATURE] Add conversation support to fields in Argilla dataset (ChatField) \ud83d\udc4d 2 \ud83d\udcac 11 7 1800 - Add comments/notes to annotation datasets to share with teammates. \ud83d\udc4d 2 \ud83d\udcac 6 8 1837 - Custom Record UI Templates \ud83d\udc4d 2 \ud83d\udcac 6 9 1630 - Accepting several predictions/annotations for the same record \ud83d\udc4d 2 \ud83d\udcac 2 10 4823 - [FEATURE] ImageField \ud83d\udc4d 2 \ud83d\udcac 1 Rank Issue Author 1 \ud83d\udfe2 5442 - [BUG-python/deployment] by nicolamassarenti 2 \ud83d\udfe2 5438 - [FEATURE] Make text box size of TextQuestion adjustable by MoritzLaurer 3 \ud83d\udfe3 5424 - [BUG-python/deployment]The status of all the dataset.records.to_dict(orient='index') records are pending by Huarong 4 \ud83d\udfe2 5414 - docker download failed by njhouse365 5 \ud83d\udfe3 5357 - [BUG-python/deployment] Response sanity check not working due to variable renaming by maxserras 6 \ud83d\udfe2 5348 - [FEATURE] Ability to create new labels on-the-fly by uahmad235 7 \ud83d\udfe2 5338 - [BUG-UI/UX] CSS is being stripped from TextQuestion by paulbauriegel 8 \ud83d\udfe2 5318 - [BUG-python/deployment] filter_by returning unexpected results for response_status by bertozzivill 9 \ud83d\udfe2 5302 - [FEATURE]Auto-annotation of Repeated Tokens by bikash119 10 \ud83d\udfe3 5290 - [BUG-python/deployment] Docker deployment issues by zhongze-fish Rank Issue Milestone 1 \ud83d\udfe2 5204 - [FEATURE] add huggingface_hub.utils.send_telemetry to the argilla-server v2.1.0 2 \ud83d\udfe2 4952 - [DOCS] Add contributing pages v2.1.0 3 \ud83d\udfe2 4951 - [DOCS] Add tutorials pages to new documentation v2.1.0 4 \ud83d\udfe2 4950 - [DOCS] Add integrations to new documentation v2.1.0 5 \ud83d\udfe2 4949 - FEAT Implement external integrations v2.1.0 6 \ud83d\udfe2 4823 - [FEATURE] ImageField v2.1.0 7 \ud83d\udfe2 4944 - [REFACTOR] Simplify naming of serialize, to_dict, and to_json methods v2.1.0 8 \ud83d\udfe2 5361 - [BUG-UI/UX] required/optional differentiation forFields are not represented in the dataset settings v2.1.0 9 \ud83d\udfe2 3338 - [FEATURE] Add conversation support to fields in Argilla dataset (ChatField) v2.1.0 10 \ud83d\udfe2 5371 - [UI/UX] Implement dark theme v2.1.0

    Last update: 2024-08-30

    "},{"location":"getting_started/faq/","title":"FAQs","text":"What is Argilla?

    Argilla is a collaboration tool for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency. It is designed to help you achieve and keep high-quality data standards, store your training data, store the results of your models, evaluate their performance, and improve the data through human and AI feedback.

    Does Argilla cost money?

    No. Argilla is an open-source project and is free to use. You can deploy Argilla on your own infrastructure or use our cloud offering.

    What data types does Argilla support?

    Text data, mostly. Argilla natively supports textual data, however, we do support rich text, which means you can represent different types of data in Argilla as long as you can convert it to text. For example, you can store images, audio, video, and any other type of data as long as you can convert it to their base64 representation or render them as HTML in for example an IFrame.

    Does Argilla train models?

    No. Argilla is a collaboration tool to achieve and keep high-quality data standards. You can use Argilla to store your training data, store the results of your models, evaluate their performance and improve the data. For training models, you can use any machine learning framework or library that you prefer even though we recommend starting with Hugging Face Transformers.

    Does Argilla provide annotation workforces?

    Yes, kind of. We don't provide annotation workforce in-house but we do have partnerships with workforce providers that ensure ethical practices and secure work environments. Feel free to schedule a meeting here or contact us via email.

    How does Argilla differ from competitors like Lilac, Snorkel, Prodigy and Scale?

    Argilla distinguishes itself for its focus on specific use cases and human-in-the-loop approaches. While it does offer programmatic features, Argilla\u2019s core value lies in actively involving human experts in the tool-building process, setting it apart from other competitors.

    Furthermore, Argilla places particular emphasis on smooth integration with other tools in the community, particularly within the realms of MLOps and NLP. So, its compatibility with popular frameworks like spaCy and Hugging Face makes it exceptionally user-friendly and accessible.

    Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, often require a significant commitment. Argilla, on the other hand, works more as a tool within the MLOps ecosystem, allowing users to begin with specific use cases and then scale up as needed. This flexibility is particularly beneficial for users and customers who prefer to start small and expand their applications over time, as opposed to committing to an all-encompassing tool from the outset.

    What is the difference between Argilla 2.0 and the legacy datasets in 1.0?

    Argilla 1.0 relied on 3 main task datasets: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text. These tasks were designed to be simple, easy to use and high in functionality but they were limited in adaptability. With the introduction of Large Language Models (LLMs) and the increasing complexity of NLP tasks, we realized that we needed to expand the capabilities of Argilla to support more advanced feedback mechanisms which led to the introduction of the FeedbackDataset. Compared to its predecessor it was high in adaptability but still limited in functionality. After having ported all of the functionality of the legacy tasks to the new FeedbackDataset, we decided to deprecate the legacy tasks in favor of a brand new SDK with the FeedbackDataset at its core.

    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/","title":"Hugging Face Spaces Settings","text":"

    This section details how to configure and deploy Argilla on Hugging Face Spaces. It covers:

    • Persistent storage
    • How to deploy Argilla under a Hugging Face Organization
    • How to configure and disable HF OAuth access
    • How to use Private Spaces

    Looking to get started easily?

    If you just discovered Argilla and want to get started quickly, go to the Quickstart guide.

    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#persistent-storage","title":"Persistent storage","text":"

    In the Space creation UI, persistent storage is set to Small PAID, which is a paid service, charged per hour of usage.

    Spaces get restarted due to maintainance, inactivity, and every time you change your Spaces settings. Persistent storage enables Argilla to save to disk your datasets and configurations across restarts.

    Ephimeral FREE persistent storage

    Not setting persistent storage to Small means that you will loose your data when the Space restarts.

    If you plan to use the Argilla Space beyond testing, it's highly recommended to set persistent storage to Small.

    If you just want to quickly test or use Argilla for a few hours with the risk of loosing your datasets, choose Ephemeral FREE. Ephemeral FREE means your datasets and configuration will not be saved to disk, when the Space is restarted your datasets, workspaces, and users will be lost.

    If you want to disable the persistence storage warning, you can set the environment variable ARGILLA_SHOW_HUGGINGFACE_SPACE_PERSISTENT_STORAGE_WARNING=false

    Read this if you have datasets and want to enable persistent storage

    If you want to enable persistent storage Small PAID and you have created datasets, users, or workspaces, follow this process:

    • First, make a local or remote copy of your datasets, following the Import and Export guide. This is the most important step, because changing the settings of your Space leads to a restart and thus a data loss.
    • If you have created users (not signed in with Hugging Face login), consider storing a copy of users following the manage users guide.
    • Once you have stored all your data safely, go to you Space Settings Tab and select Small.
    • Your Space will be restarted and existing data will be lost. From now on, all the new data you create in Argilla will be kept safely
    • Recover your data, by following the above mentioned guides.
    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-configure-and-disable-oauth-access","title":"How to configure and disable OAuth access","text":"

    By default, Argilla Spaces are configured with Hugging Face OAuth, in the following way:

    • Any Hugging Face user that can see your Space, can use the Sign in button, join as an annotator, and contribute to the datasets available under the argilla workspace. This workspace is created during the deployment process.
    • These users can only explore and annotate datasets in the argilla workspace but can't perform any critical operation like create, delete, update, or configure datasets. By default, any other workspace you create, won't be visible to these users.

    To restrict access or change the default behaviour, there's two options:

    Set your Space to private. This is especially useful if your Space is under an organization. This will only allow members within your organization to see and join your Argilla space. It can also be used for personal, solo projects.

    Modify the .oauth.yml configuration file. You can find and modify this file under the Files tab of your Space. The default file looks like this:

    # Change to `false` to disable HF oauth integration\n#enabled: false\n\nproviders:\n  - name: huggingface\n\n# Allowed workspaces must exists\nallowed_workspaces:\n  - name: argilla\n
    You can modify two things:

    • Uncomment enabled: false to completely disable the Sign in with Hugging Face. If you disable it make sure to set the USERNAME and PASSWORD Space secrets to be able to login as an owner.
    • Change the list of allowed workspaces.

    For example if you want to let users join a new workspace community-initiative:

    allowed_workspaces:\n  - name: argilla\n  - name: community-initiative\n
    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-deploy-argilla-under-a-hugging-face-organization","title":"How to deploy Argilla under a Hugging Face Organization","text":"

    Creating an Argilla Space within an organization is useful for several scenarios:

    • You want to only enable members of your organization to join your Space. You can achieve this by setting your Space to private.
    • You want manage the Space together with other users (e.g., Space settings, etc.). Note that if you just want to manage your Argilla datasets, workspaces, you can achieve this by adding other Argilla owner roles to your Argilla Server.
    • More generally, you want to make available your space under an organization/community umbrella.

    The steps are very similar the Quickstart guide with two important differences:

    Setup USERNAME

    You need to set up the USERNAME Space Secret with your Hugging Face username. This way, the first time you enter with the Hugging Face Sign in button, you'll be granted the owner role.

    Enable Persistent Storage SMALL

    Not setting persistent storage to Small means that you will loose your data when the Space restarts.

    For Argilla Spaces with many users, it's strongly recommended to set persistent storage to Small.

    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#how-to-use-private-spaces","title":"How to use Private Spaces","text":"

    Setting your Space visibility to private can be useful if:

    • You want to work on your personal, solo project.
    • You want your Argilla to be available only to members of the organization where you deploy the Argilla Space.

    You can set the visibility of the Space during the Space creation process or afterwards under the Settings Tab.

    To use the Python SDK with private Spaces you need to specify your HF_TOKEN which can be found here, when creating the client:

    import argilla as rg\n\nHF_TOKEN = \"...\"\n\nclient = rg.Argilla(\n    api_url=\"<api_url>\",\n    api_key=\"<api_key>\"\n    headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n
    "},{"location":"getting_started/how-to-configure-argilla-on-huggingface/#space-secrets-overview","title":"Space Secrets overview","text":"

    There's two optional secrets to set up the USERNAME and PASSWORD of the owner of the Argilla Space. Remember that, by default Argilla Spaces are configured with a Sign in with Hugging Face button, which is also used to grant an owner to the creator of the Space for personal spaces.

    The USERNAME and PASSWORD are only useful in a couple of scenarios:

    • You have disabled Hugging Face OAuth.
    • You want to set up Argilla under an organization and want your Hugging Face username to be granted the owner role.

    In summary, when setting up a Space:

    Creating a Space under your personal account

    If you are creating the Space under your personal account, don't insert any value for USERNAME and PASSWORD. Once you launch the Space you will be able to Sign in with your Hugging Face username and the owner role.

    Creating a Space under an organization

    If you are creating the Space under an organization make sure to insert your Hugging Face username in the secret USERNAME. In this way, you'll be able to Sign in with your Hugging Face user.

    "},{"location":"getting_started/how-to-deploy-argilla-with-docker/","title":"Deploy with Docker","text":"

    This guide describes how to deploy the Argilla Server with docker compose. This is useful if you want to deploy Argilla locally, and/or have full control over the configuration the server, database, and search engine (Elasticsearch).

    First, you need to install docker on your machine and make sure you can run docker compose.

    Then, create a folder (you can modify the folder name):

    mkdir argilla && cd argilla\n

    Download docker-compose.yaml:

    wget -O docker-compose.yaml https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml\n

    or using curl:

    curl https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml -o docker-compose.yaml\n

    Run to deploy the server on http://localhost:6900:

    docker compose up -d\n

    Once is completed, go to this URL with your browser: http://localhost:6900 and you should see the Argilla login page.

    If it's not available, check the logs:

    docker compose logs -f\n

    Most of the deployment issues are related to ElasticSearch. Join Hugging Face Discord's server and ask for support on the Argilla channel.

    "},{"location":"getting_started/quickstart/","title":"Quickstart","text":"

    Argilla is a free, open-source, self-hosted tool. This means you need to deploy its UI to start using it. There is two main ways to deploy Argilla:

    Deploy on the Hugging Face Hub

    The recommended choice to get started. You can get up and running in under 5 minutes and don't need to maintain a server or run any commands.

    If you're just getting started with Argilla, click the deploy button below:

    You can use the default values following these steps:

    • Leave the default Space owner (your personal account)
    • Leave USERNAME and PASSWORD secrets empty since you'll sign in with your HF user as the Argilla Space owner.
    • Click create Space to launch Argilla \ud83d\ude80.
    • Once you see the Argilla UI, go to the Sign in into the Argilla UI section. If you see the Building message for longer than 2-3 min refresh the page.

    Persistent storage SMALL

    Not setting persistent storage to SMALL means that you will loose your data when the Space restarts. Spaces get restarted due to maintainance, inactivity, and every time you change your Spaces settings. If you want to use the Space just for testing you can use FREE temporarily.

    If you want to deploy Argilla within a Hugging Face organization, setup a more stable Space, or understand the settings, check out the HF Spaces settings guide.

    Deploy with Docker

    If you want to run Argilla locally on your machine or a server, or tune the server configuration, choose this option. To use this option, check this guide.

    "},{"location":"getting_started/quickstart/#sign-in-into-the-argilla-ui","title":"Sign in into the Argilla UI","text":"

    If everything went well, you should see the Argilla sign in page that looks like this:

    Building errors

    If you get a build error, sometimes restarting the Space from the Settings page works, otherwise check the HF Spaces settings guide.

    In the sign in page:

    1. Click on Sign in with Hugging Face
    2. Authorize the application and you will be logged in into Argilla as an owner.

    Unauthorized error

    Sometimes, after authorizing you'll see an unauthorized error, and get redirected to the sign in page. Typically, clicking the Sign in button solves the issue.

    Congrats! Your Argilla server is ready to start your first project using the Python SDK. You now have full rights to create datasets. Follow the instructions in the home page, or keep reading this guide if you want a more detailed explanation.

    "},{"location":"getting_started/quickstart/#install-the-python-sdk","title":"Install the Python SDK","text":"

    To manage workspaces and datasets in Argilla, you need to use the Argilla Python SDK. You can install it with pip as follows:

    pip install argilla\n
    "},{"location":"getting_started/quickstart/#create-your-first-dataset","title":"Create your first dataset","text":"

    For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab.

    To start interacting with your Argilla server, you need to create a instantiate a client with an API key and API URL:

    • The <api_key> is in the My Settings page of your Argilla Space.

    • The <api_url> is the URL shown in your browser if it ends with *.hf.space.

    import argilla as rg\n\nclient = rg.Argilla(\n    api_url=\"<api_url>\",\n    api_key=\"<api_key>\"\n)\n

    You can't find your API URL

    If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, there are two options: 1. Click on the three points menu at the top of the Space, select \"Embed this Space\", and open the direct URL. 2. Use this pattern: https://[your-owner-name]-[your_space_name].hf.space.

    To create a dataset with a simple text classification task, first, you need to define the dataset settings.

    settings = rg.Settings(\n    guidelines=\"Classify the reviews as positive or negative.\",\n    fields=[\n        rg.TextField(\n            name=\"review\",\n            title=\"Text from the review\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"my_label\",\n            title=\"In which category does this article fit?\",\n            labels=[\"positive\", \"negative\"],\n        )\n    ],\n)\n

    Now you can create the dataset with these settings. Publish the dataset to make it available in the UI and add the records.

    About workspaces

    Workspaces in Argilla group datasets and user access rights. The workspace parameter is optional in this case. If you don't specify it, the dataset will be created in the default workspace argilla.

    By default, this workspace will be visible to users joining with the Sign in with Hugging Face button. You can create other workspaces and decide to grant access to users either with the SDK or the changing the OAuth configuration.

    dataset = rg.Dataset(\n    name=f\"my_first_dataset\",\n    settings=settings,\n    client=client,\n    #workspace=\"argilla\"\n)\ndataset.create()\n

    Now you can add records to your dataset. We will use the IMDB dataset from the Hugging Face Datasets library as an example. The mapping parameter indicates which keys/columns in the source dataset correspond to the Argilla dataset fields.

    from datasets import load_dataset\n\ndata = load_dataset(\"imdb\", split=\"train[:100]\").to_list()\n\ndataset.records.log(records=data, mapping={\"text\": \"review\"})\n

    \ud83c\udf89 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records.

    "},{"location":"getting_started/quickstart/#next-steps","title":"Next steps","text":"
    • To learn how to create your datasets, workspace, and manage users, check the how-to guides.

    • To learn Argilla with hands-on examples, check the Tutorials section.

    • To further configure your Argilla Space, check the Hugging Face Spaces settings guide.

    "},{"location":"how_to_guides/","title":"How-to guides","text":"

    These guides provide step-by-step instructions for common scenarios, including detailed explanations and code samples. They are divided into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Argilla, while the advanced guides will help you explore more advanced features.

    "},{"location":"how_to_guides/#basic","title":"Basic","text":"
    • Manage users and credentials

      Learn what they are and how to manage (create, read and delete) Users in Argilla.

      How-to guide

    • Manage workspaces

      Learn what they are and how to manage (create, read and delete) Workspaces in Argilla.

      How-to guide

    • Create, update, and delete datasets

      Learn what they are and how to manage (create, read and delete) Datasets and customize them using the Settings for Fields, Questions, Metadata and Vectors.

      How-to guide

    • Add, update, and delete records

      Learn what they are and how to add, update and delete the values for a Record, which are made up of Metadata, Vectors, Suggestions and Responses.

      How-to guide

    • Distribute the annotation

      Learn how to use Argilla's automatic task distribution to annotate as a team efficiently.

      How-to guide

    • Annotate a dataset

      Learn how to use the Argilla UI to navigate datasets and submit responses.

      How-to guide

    • Query and filter a dataset

      Learn how to query and filter a Dataset.

      How-to guide

    • Import and export datasets and records

      Learn how to export your dataset or its records to Python, your local disk, or the Hugging face Hub.

      How-to guide

    "},{"location":"how_to_guides/#advanced","title":"Advanced","text":"
    • Use Markdown to format rich content

      Learn how to use Markdown and HTML in TextFields to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

      How-to guide

    • Migrate to Argilla V2

      Learn how to migrate your legacy datasets from Argilla 1.x to 2.x.

      How-to guide

    "},{"location":"how_to_guides/annotate/","title":"Annotate your dataset","text":"

    To experience the UI features firsthand, you can take a look at the Demo \u2197.

    Argilla UI offers many functions to help you manage your annotation workflow, aiming to provide the most flexible approach to fit the wide variety of use cases handled by the community.

    "},{"location":"how_to_guides/annotate/#annotation-interface-overview","title":"Annotation interface overview","text":""},{"location":"how_to_guides/annotate/#flexible-layout","title":"Flexible layout","text":"

    The UI is responsive with two columns for larger devices and one column for smaller devices. This enables you to annotate data using your mobile phone for simple datasets (i.e., not very long text and 1-2 questions) or resize your screen to get a more compact UI.

    HeaderLeft paneRight paneLeft bottom panelRight bottom panel

    At the right side of the navigation breadcrumb, you can customize the dataset settings and edit your profile.

    This area displays the control panel on the top. The control panel is used for performing keyword-based search, applying filters, and sorting the results.

    Below the control panel, the record card(s) are displayed one by one (Focus view) or in a vertical list (Bulk view).

    This is where you annotate your dataset. Simply fill it out as a form, then choose to Submit, Save as Draft, or Discard.

    This expandable area displays the annotation guidelines. The annotation guidelines can be edited by owner and admin roles in the dataset settings.

    This expandable area displays your annotation progress.

    "},{"location":"how_to_guides/annotate/#shortcuts","title":"Shortcuts","text":"

    The Argilla UI includes a range of shortcuts. For the main actions (submit, discard, save as draft and selecting labels) the keys are showed in the corresponding button.

    To learn how to move from one question to another or between records using the keyboard, take a look at the table below.

    Shortcuts provide a smoother annotation experience, especially with datasets using a single question (Label, MultiLabel, Rating, or Ranking).

    Available shortcuts Action Keys Activate form \u21e5 Tab Move between questions \u2193 Down arrow\u00a0or\u00a0\u2191 Up arrow Select and unselect label 1,\u00a02,\u00a03 Move between labels or ranking options \u21e5 Tab\u00a0or\u00a0\u21e7 Shift\u00a0\u21e5 Tab Select rating and rank 1,\u00a02,\u00a03 Fit span to character selection Hold\u00a0\u21e7 Shift Activate text area \u21e7 Shift\u00a0\u21b5 Enter Exit text area Esc Discard \u232b Backspace Save draft (Mac os) \u2318 Cmd\u00a0S Save draft (Other) Ctrl\u00a0S Submit \u21b5 Enter Move between pages \u2192 Right arrow\u00a0or\u00a0\u2190 Left arrow"},{"location":"how_to_guides/annotate/#view-by-status","title":"View by status","text":"

    The view selector is set by default on Pending.

    If you are starting an annotation effort, all the records are initially kept in the Pending view. Once you start annotating, the records will move to the other queues: Draft, Submitted, Discarded.

    • Pending: The records without a response.
    • Draft: The records with partial responses. They can be submitted or discarded later. You can\u2019t move them back to the pending queue.
    • Discarded: The records may or may not have responses. They can be edited but you can\u2019t move them back to the pending queue.
    • Submitted: The records have been fully annotated and have already been submitted. You can remove them from this queue and send them to the draft or discarded queues, but never back to the pending queue.

    Note

    If you are working as part of a team, the number of records in your Pending queue may change as other members of the team submit responses and those records get completed.

    Tip

    If you are working as part of a team, the records in the draft queue that have been completed by other team members will show a check mark to indicate that there is no need to provide a response.

    "},{"location":"how_to_guides/annotate/#suggestions","title":"Suggestions","text":"

    If your dataset includes model predictions, you will see them represented by a sparkle icon \u2728 in the label or value button. We call them \u201cSuggestions\u201d and they appear in the form as pre-filled responses. If confidence scores have been included by the dataset admin, they will be shown alongside with the label. Additionally, admins can choose to always show suggested labels at the beginning of the list. This can be configured from the dataset settings.

    If you agree with the suggestions, you just need to click on the Submit button, and they will be considered as your response. If the suggestion is incorrect, you can modify it and submit your final response.

    "},{"location":"how_to_guides/annotate/#focus-view","title":"Focus view","text":"

    This is the default view to annotate your dataset linearly, displaying one record after another.

    Tip

    You should use this view if you have a large number of required questions or need a strong focus on the record content to be labelled. This is also the recommended view for annotating a dataset sample to avoid potential biases introduced by using filters, search, sorting and bulk labelling.

    Once you submit your first response, the next record will appear automatically. To see again your submitted response, just click on Prev.

    Navigating through the records

    To navigate through the records, you can use the\u00a0Prev, shown as\u00a0<, and\u00a0Next,\u00a0> buttons on top of the record card.

    Each time the page is fully refreshed, the records with modified statuses (Pending to Discarded, Pending to Save as Draft, Pending to Submitted) are sent to the corresponding queue. The control panel displays the status selector, which is set to Pending by default.

    "},{"location":"how_to_guides/annotate/#bulk-view","title":"Bulk view","text":"

    The bulk view is designed to speed up the annotation and get a quick overview of the whole dataset.

    The bulk view displays the records in a vertical list. Once this view is active, some functions from the control panel will activate to optimize the view. You can define the number of records to display by page between 10, 25, 50, 100 and whether records are shown with a fixed (Collapse records) or their natural height (Expand records).

    Tip

    You should use this to quickly explore a dataset. This view is also recommended if you have a good understanding of the domain and want to apply your knowledge based on things like similarity and keyword search, filters, and suggestion score thresholds. For a datasets with a large number of required questions or very long fields, the focus view would be more suitable.

    With multiple questions, think about using the bulk view to annotate massively one question. Then, you can complete the annotation per record from the draft queue.

    Note

    Please note that suggestions are not shown in bulk view (except for Spans) and that you will need to save as a draft when you are not providing responses to all required questions.

    "},{"location":"how_to_guides/annotate/#annotation-progress","title":"Annotation progress","text":"

    You can track the progress of an annotation task in the progress bar shown in the dataset list and in the progress panel inside the dataset. This bar shows the number of records that have been completed (i.e., those that have the minimum number of submitted responses) and those left to be completed.

    You can also track your own progress in real time expanding the right-bottom panel inside the dataset page. There you can see the number of records for which you have Pending,\u00a0Draft,\u00a0Submitted\u00a0and\u00a0Discarded responses.

    "},{"location":"how_to_guides/annotate/#use-search-filters-and-sort","title":"Use search, filters, and sort","text":"

    The UI offers various features designed for data exploration and understanding. Combining these features with bulk labelling can save you and your team hours of time.

    Tip

    You should use this when you are familiar with your data and have large volumes to annotate based on verified beliefs and experience.

    "},{"location":"how_to_guides/annotate/#search","title":"Search","text":"

    From the control panel at the top of the left pane, you can search by keyword across the entire dataset. If you have more than one field in your records, you may specify if the search is to be performed \u201cAll\u201d fields or on a specific one. Matched results are highlighted in color.

    Note

    If you introduce more than one keyword, the search will return results where all keywords have a match.

    Tip

    For more advanced searches, take a look at the advanced queries DSL.

    "},{"location":"how_to_guides/annotate/#order-by-record-semantic-similarity","title":"Order by record semantic similarity","text":"

    You can retrieve records based on their similarity to another record if vectors have been added to the dataset.

    Note

    Check these guides to know how to add vectors to your\u00a0dataset and\u00a0records.

    To use the search by semantic similarity function, click on Find similar within the record you wish to use as a reference. If multiple vectors are available, select the desired vector. You can also choose whether to retrieve the most or least similar records.

    The retrieved records are then ordered by similarity, with the similarity score displayed on each record card.

    While the semantic search is active, you can update the selected vector or adjust the order of similarity, and specify the number of desired results.

    To cancel the search, click on the cross icon next to the reference record.

    "},{"location":"how_to_guides/annotate/#filter-and-sort-by-metadata-responses-and-suggestions","title":"Filter and sort by metadata, responses, and suggestions","text":""},{"location":"how_to_guides/annotate/#filter","title":"Filter","text":"

    If the dataset contains metadata, responses and suggestions, click on\u00a0Filter in the control panel to display the available filters. You can select multiple filters and combine them.

    Note

    Record info including metadata is visible from the ellipsis menu in the record card.

    From the Metadata dropdown, type and select the property. You can set a range for integer and float properties, and select specific values for term metadata.

    Note

    Note that if a metadata property was set to visible_for_annotators=False this metadata property will only appear in the metadata filter for users with the admin or owner role.

    From the Responses dropdown, type and select the question. You can set a range for rating questions and select specific values for label, multi-label, and span questions.

    Note

    The text and ranking questions are not available for filtering.

    From the Suggestions dropdown, filter the suggestions by\u00a0Suggestion values,\u00a0Score\u00a0, or\u00a0Agent.\u00a0

    "},{"location":"how_to_guides/annotate/#sort","title":"Sort","text":"

    You can sort your records according to one or several attributes.

    The insertion time and last update are general to all records.

    The suggestion scores, response, and suggestion values for rating questions and metadata properties are available only when they were provided.

    "},{"location":"how_to_guides/dataset/","title":"Dataset management","text":"

    This guide provides an overview of datasets, explaining the basics of how to set them up and manage them in Argilla.

    A dataset is a collection of records that you can configure for labelers to provide feedback using the UI. Depending on the specific requirements of your task, you may need various types of feedback. You can customize the dataset to include different kinds of questions, so the first step will be to define the aim of your project and the kind of data and feedback you will need. With this information, you can start configuring a dataset by defining fields, questions, metadata, vectors, and guidelines through settings.

    Question: Who can manage datasets?

    Only users with the owner role can manage (create, retrieve, update and delete) all the datasets.

    The users with the admin role can manage (create, retrieve, update and delete) the datasets in the workspaces they have access to.

    Main Classes

    rg.Datasetrg.Settings
    rg.Dataset(\n    name=\"name\",\n    workspace=\"workspace\",\n    settings=settings,\n    client=client\n)\n

    Check the Dataset - Python Reference to see the attributes, arguments, and methods of the Dataset class in detail.

    rg.Settings(\n    fields=[rg.TextField(name=\"text\")],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        )\n    ],\n    metadata=[rg.TermsMetadataProperty(name=\"metadata\")],\n    vectors=[rg.VectorField(name=\"vector\", dimensions=10)],\n    guidelines=\"guidelines\",\n    allow_extra_metadata=True,\n    distribution=rg.TaskDistribution(min_submitted=2),\n)\n

    Check the Settings - Python Reference to see the attributes, arguments, and methods of the Settings class in detail.

    "},{"location":"how_to_guides/dataset/#create-a-dataset","title":"Create a dataset","text":"

    To create a dataset, you can define it in the Dataset class and then call the create method that will send the dataset to the server so that it can be visualized in the UI. If the dataset does not appear in the UI, you may need to click the refresh button to update the view. For further configuration of the dataset, you can refer to the settings section.

    Info

    If you have deployed Argilla with Hugging Face Spaces and HF Sign in, you can use argilla as a workspace name. Otherwise, you might need to create a workspace following this guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nsettings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    settings=settings,\n)\n\ndataset.create()\n

    The created dataset will be empty, to add records go to this how-to guide.

    Accessing attributes

    Access the attributes of a dataset by calling them directly on the dataset object. For example, dataset.id, dataset.name or dataset.settings. You can similarly access the fields, questions, metadata, vectors and guidelines. For instance, dataset.fields or dataset.questions.

    "},{"location":"how_to_guides/dataset/#create-multiple-datasets-with-the-same-settings","title":"Create multiple datasets with the same settings","text":"

    To create multiple datasets with the same settings, define the settings once and pass it to each dataset.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nsettings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[rg.TextField(name=\"text\", use_markdown=True)],\n    questions=[\n        rg.LabelQuestion(name=\"label\", labels=[\"label_1\", \"label_2\", \"label_3\"])\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3),\n)\n\ndataset1 = rg.Dataset(name=\"my_dataset_1\", settings=settings)\ndataset2 = rg.Dataset(name=\"my_dataset_2\", settings=settings)\n\n# Create the datasets on the server\ndataset1.create()\ndataset2.create()\n
    "},{"location":"how_to_guides/dataset/#create-a-dataset-from-an-existing-dataset","title":"Create a dataset from an existing dataset","text":"

    To create a new dataset from an existing dataset, get the settings from the existing dataset and pass them to the new dataset.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nexisting_dataset = client.datasets(\"my_dataset\")\n\nnew_dataset = rg.Dataset(name=\"my_dataset_copy\", settings=existing_dataset.settings)\n\nnew_dataset.create()\n

    Info

    You can also copy the records from the original dataset to the new one:

    records = list(existing_dataset.records)\nnew_dataset.records.log(records)\n
    "},{"location":"how_to_guides/dataset/#define-dataset-settings","title":"Define dataset settings","text":""},{"location":"how_to_guides/dataset/#fields","title":"Fields","text":"

    The fields in a dataset consist of one or more data items requiring annotation. Currently, Argilla only supports plain text and markdown through the TextField, though we plan to introduce additional field types in future updates.

    Note

    The order of the fields in the UI follows the order in which these are added to the fields attribute in the Python SDK.

    Check the Field - Python Reference to see the field classes in detail.

    rg.TextField(\n    name=\"text\",\n    title=\"Text\",\n    use_markdown=False,\n    required=True,\n    description=\"Field description\",\n)\n

    "},{"location":"how_to_guides/dataset/#questions","title":"Questions","text":"

    To collect feedback for your dataset, you need to formulate questions that annotators will be asked to answer.

    Check the Questions - Python Reference to see the question classes in detail.

    LabelMulti-labelRankingRatingSpanText

    A LabelQuestion asks annotators to choose a unique label from a list of options. This type is useful for text classification tasks. In the UI, they will have a rounded shape.

    rg.LabelQuestion(\n    name=\"label\",\n    labels={\"YES\": \"Yes\", \"NO\": \"No\"}, # or [\"YES\", \"NO\"]\n    title=\"Is the response relevant for the given prompt?\",\n    description=\"Select the one that applies.\",\n    required=True,\n    visible_labels=10\n)\n

    A MultiLabelQuestion asks annotators to choose all applicable labels from a list of options. This type is useful for multi-label text classification tasks. In the UI, they will have a squared shape.

    rg.MultiLabelQuestion(\n    name=\"multi_label\",\n    labels={\n        \"hate\": \"Hate Speech\",\n        \"sexual\": \"Sexual content\",\n        \"violent\": \"Violent content\",\n        \"pii\": \"Personal information\",\n        \"untruthful\": \"Untruthful info\",\n        \"not_english\": \"Not English\",\n        \"inappropriate\": \"Inappropriate content\"\n    }, # or [\"hate\", \"sexual\", \"violent\", \"pii\", \"untruthful\", \"not_english\", \"inappropriate\"]\n    title=\"Does the response include any of the following?\",\n    description=\"Select all that apply.\",\n    required=True,\n    visible_labels=10,\n    labels_order=\"natural\"\n)\n

    A RankingQuestion asks annotators to order a list of options. It is useful to gather information on the preference or relevance of a set of options.

    rg.RankingQuestion(\n    name=\"ranking\",\n    values={\n        \"reply-1\": \"Reply 1\",\n        \"reply-2\": \"Reply 2\",\n        \"reply-3\": \"Reply 3\"\n    } # or [\"reply-1\", \"reply-2\", \"reply-3\"]\n    title=\"Order replies based on your preference\",\n    description=\"1 = best, 3 = worst. Ties are allowed.\",\n    required=True,\n)\n

    A RatingQuestion asks annotators to select one option from a list of integer values. This type is useful for collecting numerical scores.

    rg.RatingQuestion(\n    name=\"rating\",\n    values=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n    title=\"How satisfied are you with the response?\",\n    description=\"1 = very unsatisfied, 10 = very satisfied\",\n    required=True,\n)\n

    A SpanQuestion asks annotators to select a portion of the text of a specific field and apply a label to it. This type of question is useful for named entity recognition or information extraction tasks.

    rg.SpanQuestion(\n    name=\"span\",\n    field=\"text\",\n    labels={\n        \"PERSON\": \"Person\",\n        \"ORG\": \"Organization\",\n        \"LOC\": \"Location\",\n        \"MISC\": \"Miscellaneous\"\n    }, # or [\"PERSON\", \"ORG\", \"LOC\", \"MISC\"]\n    title=\"Select the entities in the text\",\n    description=\"Select the entities in the text\",\n    required=True,\n    allow_overlapping=False,\n    visible_labels=10\n)\n

    A TextQuestion offers to annotators a free-text area where they can enter any text. This type is useful for collecting natural language data, such as corrections or explanations.

    rg.TextQuestion(\n    name=\"text\",\n    title=\"Please provide feedback on the response\",\n    description=\"Please provide feedback on the response\",\n    required=True,\n    use_markdown=True\n)\n

    "},{"location":"how_to_guides/dataset/#metadata","title":"Metadata","text":"

    Metadata properties allow you to configure the use of metadata information for the filtering and sorting features available in the UI and Python SDK.

    Check the Metadata - Python Reference to see the metadata classes in detail.

    TermsIntegerFloat

    A TermsMetadataProperty allows to add a list of strings as metadata options.

    rg.TermsMetadataProperty(\n    name=\"terms\",\n    options=[\"group-a\", \"group-b\", \"group-c\"]\n    title=\"Annotation groups\",\n    visible_for_annotators=True,\n)\n

    An IntegerMetadataProperty allows to add integer values as metadata.

    rg.IntegerMetadataProperty(\n    name=\"integer\",\n    title=\"length-input\",\n    min=42,\n    max=1984,\n)\n

    A FloatMetadataProperty allows to add float values as metadata.

    rg.FloatMetadataProperty(\n    name=\"float\",\n    title=\"Reading ease\",\n    min=-92.29914,\n    max=119.6975,\n)\n

    Note

    You can also set the allow_extra_metadata argument in the dataset to True to specify whether the dataset will allow metadata fields in the records other than those specified under metadata. Note that these will not be accessible from the UI for any user, only retrievable using the Python SDK.

    "},{"location":"how_to_guides/dataset/#vectors","title":"Vectors","text":"

    To use the similarity search in the UI and the Python SDK, you will need to configure vectors using the VectorField class.

    Check the Vector - Python Reference to see the VectorField class in detail.

    rg.VectorField(\n    name=\"my_vector\",\n    title=\"My Vector\",\n    dimensions=768\n)\n

    "},{"location":"how_to_guides/dataset/#guidelines","title":"Guidelines","text":"

    Once you have decided on the data to show and the questions to ask, it's important to provide clear guidelines to the annotators. These guidelines help them understand the task and answer the questions consistently. You can provide guidelines in two ways:

    • In the dataset guidelines: this is added as an argument when you create your dataset in the Python SDK. They will appear in the annotation interface.

    guidelines = \"In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. [...]\"\n

    • As question descriptions: these are added as an argument when you create questions in the Python SDK. This text will appear in a tooltip next to the question in the UI.

    It is good practice to use at least the dataset guidelines if not both methods. Question descriptions should be short and provide context to a specific question. They can be a summary of the guidelines to that question, but often that is not sufficient to align the whole annotation team. In the guidelines, you can include a description of the project, details on how to answer each question with examples, instructions on when to discard a record, etc.

    Tip

    If you want further guidance on good practices for guidelines during the project development, check our blog post.

    "},{"location":"how_to_guides/dataset/#distribution","title":"Distribution","text":"

    When working as a team, you may want to distribute the annotation task to ensure efficiency and quality. You can use the\u00a0TaskDistribution settings to configure the number of minimum submitted responses expected for each record. Argilla will use this setting to automatically handle records in your team members' pending queues.

    Check the Task Distribution - Python Reference to see the TaskDistribution class in detail.

    rg.TaskDistribution(\n    min_submitted = 2\n)\n

    To learn more about how to distribute the task among team members in the Distribute the annotation guide.

    "},{"location":"how_to_guides/dataset/#list-datasets","title":"List datasets","text":"

    You can list all the datasets available in a workspace using the datasets attribute of the Workspace class. You can also use len(workspace.datasets) to get the number of datasets in a workspace.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\ndatasets = workspace.datasets\n\nfor dataset in datasets:\n    print(dataset)\n

    When you list datasets, dataset settings are not preloaded, since this can introduce extra requests to the server. If you want to work with settings when listing datasets, you need to load them:

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nfor dataset in client.datasets:\n    dataset.settings.get() # this will get the dataset settings from the server\n    print(dataset.settings)\n

    Notebooks

    When using a notebook, executing client.datasets will display a table with the nameof the existing datasets, the id, workspace_id to which they belong, and the last update as updated_at. .

    "},{"location":"how_to_guides/dataset/#retrieve-a-dataset","title":"Retrieve a dataset","text":"

    You can retrieve a dataset by calling the datasets method on the Argilla class and passing the name or id of the dataset as an argument. If the dataset does not exist, a warning message will be raised and None will be returned.

    By nameBy id

    By default, this method attempts to retrieve the dataset from the first workspace. If the dataset is in a different workspace, you must specify either the workspace or workspace name as an argument.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\n# Retrieve the dataset from the first workspace\nretrieved_dataset = client.datasets(name=\"my_dataset\")\n\n# Retrieve the dataset from the specified workspace\nretrieved_dataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(id=\"<uuid-or-uuid-string>\")\n
    "},{"location":"how_to_guides/dataset/#check-dataset-existence","title":"Check dataset existence","text":"

    You can check if a dataset exists. The client.datasets method will return None if the dataset was not found.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\nif dataset is not None:\n    pass\n
    "},{"location":"how_to_guides/dataset/#update-a-dataset","title":"Update a dataset","text":"

    Once a dataset is published, there are limited things you can update. Here is a summary of the attributes you can change for each setting:

    FieldsQuestionsMetadataVectorsGuidelinesDistribution Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Required \u274c \u274c Use markdown \u2705 \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u274c \u2705 Description \u274c \u2705 Required \u274c \u274c Labels \u274c \u274c Values \u274c \u274c Label order \u274c \u2705 Suggestions first \u274c \u2705 Visible labels \u274c \u2705 Field \u274c \u274c Allow overlapping \u274c \u274c Use markdown \u274c \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Options \u274c \u274c Minimum value \u274c \u274c Maximum value \u274c \u274c Visible for annotators \u2705 \u2705 Allow extra metadata \u2705 \u2705 Attributes From SDK From UI Name \u274c \u274c Title \u2705 \u2705 Dimensions \u274c \u274c From SDK From UI \u2705 \u2705 Attributes From SDK From UI Minimum submitted \u2705* \u2705*

    * Can be changed as long as the dataset doesn't have any responses.

    To modify these attributes, you can simply set the new value of the attributes you wish to change and call the update method on the Dataset object.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.fields[\"text\"].use_markdown = True\ndataset.settings.metadata[\"my_metadata\"].visible_for_annotators = False\n\ndataset.update()\n

    You can also add and delete metadata properties and vector fields using the add and delete methods.

    AddDelete
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.vectors.add(rg.VectorField(name=\"my_new_vector\", dimensions=123))\ndataset.settings.metadata.add(\n    rg.TermsMetadataProperty(\n        name=\"my_new_metadata\",\n        options=[\"option_1\", \"option_2\", \"option_3\"],\n    ),\n)\ndataset.update()\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.vectors[\"my_old_vector\"].delete()\ndataset.settings.metadata[\"my_old_metadata\"].delete()\n\ndataset.update()\n
    "},{"location":"how_to_guides/dataset/#delete-a-dataset","title":"Delete a dataset","text":"

    You can delete an existing dataset by calling the delete method on the Dataset class.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset_to_delete = client.datasets(name=\"my_dataset\")\n\ndataset_deleted = dataset_to_delete.delete()\n
    "},{"location":"how_to_guides/distribution/","title":"Distribute the annotation task among the team","text":"

    This guide explains how you can use Argilla\u2019s automatic task distribution to efficiently divide the task of annotating a dataset among multiple team members.

    Owners and admins can define the minimum number of submitted responses expected for each record. Argilla will use this setting to handle automatically the records that will be shown in the pending queues of all users with access to the dataset.

    When a record has met the minimum number of submissions, the status of the record will change to completed, and the record will be removed from the Pending queue of all team members so they can focus on providing responses where they are most needed. The dataset\u2019s annotation task will be fully completed once all records have the completed status.

    Note

    The status of a record can be either completed, when it has the required number of responses with submitted status, or pending, when it doesn\u2019t meet this requirement.

    Each record can have multiple responses, and each of those can have the status submitted, discarded, or draft.

    Main Class

    rg.TaskDistribution(\n    min_submitted = 2\n)\n

    Check the Task Distribution - Python Reference to see the attributes, arguments, and methods of the TaskDistribution class in detail.

    "},{"location":"how_to_guides/distribution/#configure-task-distribution-settings","title":"Configure task distribution settings","text":"

    By default, Argilla will set the required minimum submitted responses to 1. This means that whenever a record has at least 1 response with the status submitted the status of the record will be completed and removed from the Pending queue of other team members.

    Tip

    Leave the default value of minimum submissions (1) if you are working on your own or when you don't require more than one submitted response per record.

    If you wish to set a different number, you can do so through the distribution setting in your dataset settings:

    settings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3)\n)\n

    Learn more about configuring dataset settings in the Dataset management guide.

    Tip

    Increase the number of minimum subsmissions if you\u2019d like to ensure you get more than one submitted response per record. Make sure that this number is never higher than the number of members in your team. Note that the lower this number is, the faster the task will be completed.

    Note

    Note that some records may have more responses than expected if multiple team members submit responses on the same record simultaneously.

    "},{"location":"how_to_guides/distribution/#change-task-distribution-settings","title":"Change task distribution settings","text":"

    If you wish to change the minimum submitted responses required in a dataset, you can do so as long as the annotation hasn\u2019t started, i.e., the dataset has no responses for any records.

    Admins and owners can change this value from the dataset settings page in the UI or from the SDK:

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(\"my_dataset\")\n\ndataset.settings.distribution.min_submitted = 4\n\ndataset.update()\n
    "},{"location":"how_to_guides/import_export/","title":"Importing and exporting datasets and records","text":"

    This guide provides an overview of how to import and export your dataset or its records to Python, your local disk, or the Hugging Face Hub.

    In Argilla, you can import/export two main components of a dataset:

    • The dataset's complete configuration is defined in rg.Settings. This is useful if you want to share your feedback task or restore it later in Argilla.
    • The records stored in the dataset, including Metadata, Vectors, Suggestions, and Responses. This is useful if you want to use your dataset's records outside of Argilla.

    Check the Dataset - Python Reference to see the attributes, arguments, and methods of the export Dataset class in detail.

    Main Classes

    rg.Dataset.to_hubrg.Dataset.from_hubrg.Dataset.to_diskrg.Dataset.from_diskrg.Dataset.records.to_datasets()rg.Dataset.records.to_dict()rg.Dataset.records.to_list()
    rg.Dataset.to_hub(\n    repo_id=\"<my_org>/<my_dataset>\",\n    with_records=True,\n    generate_card=True\n)\n
    rg.Dataset.from_hub(\n    repo_id=\"<my_org>/<my_dataset>\",\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    client=rg.Client(),\n    with_records=True\n)\n
    rg.Dataset.to_disk(\n    path=\"<path-empty-directory>\",\n    with_records=True\n)\n
    rg.Dataset.from_disk(\n    path=\"<path-dataset-directory>\",\n    name=\"my_dataset\",\n    workspace=\"my_workspace\",\n    client=rg.Client(),\n    with_records=True\n)\n
    rg.Dataset.records.to_datasets()\n
    rg.Dataset.records.to_dict()\n
    rg.Dataset.records.to_list()\n

    Check the Dataset - Python Reference to see the attributes, arguments, and methods of the export Dataset class in detail.

    Check the Record - Python Reference to see the attributes, arguments, and methods of the Record class in detail.

    "},{"location":"how_to_guides/import_export/#importing-and-exporting-datasets","title":"Importing and exporting datasets","text":"

    First, we will go through exporting a complete dataset from Argilla. This includes the dataset's settings and records. All of these methods use the rg.Dataset.from_* and rg.Dataset.to_* methods.

    "},{"location":"how_to_guides/import_export/#hugging-face-hub","title":"Hugging Face Hub","text":""},{"location":"how_to_guides/import_export/#export-to-hub","title":"Export to Hub","text":"

    You can push a dataset from Argilla to the Hugging Face Hub. This is useful if you want to share your dataset with the community or version control it. You can push the dataset to the Hugging Face Hub using the rg.Dataset.to_hub method.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\ndataset.to_hub(repo_id=\"<my_org>/<my_dataset>\")\n

    With or without records

    The example above will push the dataset's Settings and records to the hub. If you only want to push the dataset's configuration, you can set the with_records parameter to False. This is useful if you're just interested in a specific dataset template or you want to make changes in the dataset settings and/or records.

    dataset.to_hub(repo_id=\"<my_org>/<my_dataset>\", with_records=False)\n
    "},{"location":"how_to_guides/import_export/#import-from-hub","title":"Import from Hub","text":"

    You can pull a dataset from the Hugging Face Hub to Argilla. This is useful if you want to restore a dataset and its configuration. You can pull the dataset from the Hugging Face Hub using the rg.Dataset.from_hub method.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\")\n

    The rg.Dataset.from_hub method loads the configuration and records from the dataset repo. If you only want to load records, you can pass a datasets.Dataset object to the rg.Dataset.log method. This enables you to configure your own dataset and reuse existing Hub datasets. See the guide on records for more information.

    With or without records

    The example above will pull the dataset's Settings and records from the hub. If you only want to pull the dataset's configuration, you can set the with_records parameter to False. This is useful if you're just interested in a specific dataset template or you want to make changes in the dataset settings and/or records.

    dataset = rg.Dataset.from_hub(repo_id=\"<my_org>/<my_dataset>\", with_records=False)\n

    With the dataset's configuration, you could then make changes to the dataset. For example, you could adapt the dataset's settings for a different task:

    dataset.settings.questions = [rg.TextQuestion(name=\"answer\")]\ndataset.update()\n

    You could then log the dataset's records using the load_dataset method of the datasets package and pass the dataset to the rg.Dataset.log method.

    hf_dataset = load_dataset(\"<my_org>/<my_dataset>\")\ndataset.records.log(hf_dataset)\n
    "},{"location":"how_to_guides/import_export/#local-disk","title":"Local Disk","text":""},{"location":"how_to_guides/import_export/#export-to-disk","title":"Export to Disk","text":"

    You can save a dataset from Argilla to your local disk. This is useful if you want to back up your dataset. You can use the rg.Dataset.to_disk method. We recommend you to use an empty directory.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\ndataset.to_disk(path=\"<path-empty-directory>\")\n

    This will save the dataset's configuration and records to the specified path. If you only want to save the dataset's configuration, you can set the with_records parameter to False.

    dataset.to_disk(path=\"<path-empty-directory>\", with_records=False)\n
    "},{"location":"how_to_guides/import_export/#import-from-disk","title":"Import from Disk","text":"

    You can load a dataset from your local disk to Argilla. This is useful if you want to restore a dataset's configuration. You can use the rg.Dataset.from_disk method.

    import argilla as rg\n\ndataset = rg.Dataset.from_disk(path=\"<path-dataset-directory>\")\n

    Directing the dataset to a name and workspace

    You can also specify the name and workspace of the dataset when loading it from the disk.

    dataset = rg.Dataset.from_disk(path=\"<path-dataset-directory>\", name=\"my_dataset\", workspace=\"my_workspace\")\n
    "},{"location":"how_to_guides/import_export/#importing-and-exporting-records","title":"Importing and exporting records","text":"

    The records alone can be exported from a dataset in Argilla. This is useful if you want to process the records in Python, export them to a different platform, or use them in model training. All of these methods use the rg.Dataset.records attribute.

    "},{"location":"how_to_guides/import_export/#export-records","title":"Export records","text":"

    The records can be exported as a dictionary, a list of dictionaries, or a Dataset of the datasets package.

    To a python dictionaryTo a python listTo the datasets package

    Records can be exported from Dataset.records as a dictionary. The to_dict method can be used to export records as a dictionary. You can specify the orientation of the dictionary output. You can also decide if to flatten or not the dictionary.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\")\n\n# Export records as a dictionary\nexported_records = dataset.records.to_dict()\n# {'fields': [{'text': 'Hello'},{'text': 'World'}], suggestions': [{'label': {'value': 'positive'}}, {'label': {'value': 'negative'}}]\n\n# Export records as a dictionary with orient=index\nexported_records = dataset.records.to_dict(orient=\"index\")\n# {\"uuid\": {'fields': {'text': 'Hello'}, 'suggestions': {'label': {'value': 'positive'}}}, {\"uuid\": {'fields': {'text': 'World'}, 'suggestions': {'label': {'value': 'negative'}}},\n\n# Export records as a dictionary with flatten=True\nexported_records = dataset.records.to_dict(flatten=True)\n# {\"text\": [\"Hello\", \"World\"], \"label.suggestion\": [\"greeting\", \"greeting\"]}\n

    Records can be exported from Dataset.records as a list of dictionaries. The to_list method can be used to export records as a list of dictionaries. You can decide if to flatten it or not.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=workspace)\n\n# Export records as a list of dictionaries\nexported_records = dataset.records.to_list()\n# [{'fields': {'text': 'Hello'}, 'suggestion': {'label': {value: 'greeting'}}}, {'fields': {'text': 'World'}, 'suggestion': {'label': {value: 'greeting'}}}]\n\n# Export records as a list of dictionaries with flatten=True\nexported_records = dataset.records.to_list(flatten=True)\n# [{\"text\": \"Hello\", \"label\": \"greeting\"}, {\"text\": \"World\", \"label\": \"greeting\"}]\n

    Records can be exported from Dataset.records to the datasets package. The to_dataset method can be used to export records to the datasets package. You can specify the name of the dataset and the split to export the records.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\")\n\n# Export records as a dictionary\nexported_dataset = dataset.records.to_datasets()\n
    "},{"location":"how_to_guides/import_export/#import-records","title":"Import records","text":"

    To import records to a dataset, use the rg.Datasets.records.log method. There is a guide on how to do this in How-to guides - Record, or you can check the Record - Python Reference.

    "},{"location":"how_to_guides/migrate_from_legacy_datasets/","title":"Migrate your legacy datasets to Argilla V2","text":"

    This guide will help you migrate task specific datasets to Argilla V2. These do not include the FeedbackDataset which is just an interim naming convention for the latest extensible dataset. Task specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the SDK migration blog post.

    Note

    Legacy Datasets include: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text.

    FeedbackDataset's do not need to be migrated as they are already in the Argilla V2 format.

    To follow this guide, you will need to have the following prerequisites:

    • An argilla 1.* server instance running with legacy datasets.
    • An argilla >=1.29 server instance running. If you don't have one, you can create one by following this Argilla guide.
    • The argilla sdk package installed in your environment.

    If your current legacy datasets are on a server with Argilla release after 1.29, you could chose to recreate your legacy datasets as new datasets on the same server. You could then upgrade the server to Argilla 2.0 and carry on working their. Your legacy datasets will not be visible on the new server, but they will remain in storage layers if you need to access them.

    "},{"location":"how_to_guides/migrate_from_legacy_datasets/#steps","title":"Steps","text":"

    The guide will take you through three steps:

    1. Retrieve the legacy dataset from the Argilla V1 server using the new argilla package.
    2. Define the new dataset in the Argilla V2 format.
    3. Upload the dataset records to the new Argilla V2 dataset format and attributes.
    "},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-1-retrieve-the-legacy-dataset","title":"Step 1: Retrieve the legacy dataset","text":"

    Connect to the Argilla V1 server via the new argilla package. First, you should install an extra dependency:

    pip install \"argilla[legacy]\"\n

    Now, you can use the v1 module to connect to the Argilla V1 server.

    import argilla.v1 as rg_v1\n\n# Initialize the API with an Argilla server less than 2.0\napi_url = \"<your-url>\"\napi_key = \"<your-api-key>\"\nrg_v1.init(api_url, api_key)\n

    Next, load the dataset settings and records from the Argilla V1 server:

    dataset_name = \"news-programmatic-labeling\"\nworkspace = \"demo\"\n\nsettings_v1 = rg_v1.load_dataset_settings(dataset_name, workspace)\nrecords_v1 = rg_v1.load(dataset_name, workspace)\nhf_dataset = records_v1.to_datasets()\n

    Your legacy dataset is now loaded into the hf_dataset object.

    "},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-2-define-the-new-dataset","title":"Step 2: Define the new dataset","text":"

    Define the new dataset in the Argilla V2 format. The new dataset format is defined in the argilla package. You can create a new dataset with the Settings and Dataset classes:

    First, instantiate the Argilla class to connect to the Argilla V2 server:

    import argilla as rg\n\nclient = rg.Argilla()\n

    Next, define the new dataset settings:

    For single-label classificationFor multi-label classificationFor token classificationFor text generation
    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"), # (1)\n    ],\n    questions=[\n        rg.LabelQuestion(name=\"label\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (2)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (3)\n    ],\n)\n
    1. The default field in DatasetForTextClassification is text, but make sure you provide all fields included in record.inputs.

    2. Make sure you provide all relevant metadata fields available in the dataset.

    3. Make sure you provide all relevant vectors available in the dataset.

    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"), # (1)\n    ],\n    questions=[\n        rg.MultiLabelQuestion(name=\"labels\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (2)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (3)\n    ],\n)\n
    1. The default field in DatasetForTextClassification is text, but we should provide all fields included in record.inputs.

    2. Make sure you provide all relevant metadata fields available in the dataset.

    3. Make sure you provide all relevant vectors available in the dataset.

    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        rg.SpanQuestion(name=\"spans\", labels=settings_v1.label_schema),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (1)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (2)\n    ],\n)\n
    1. Make sure you provide all relevant metadata fields available in the dataset.

    2. Make sure you provide all relevant vectors available in the dataset.

    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        rg.TextQuestion(name=\"text_generation\"),\n    ],\n    metadata=[\n        rg.TermsMetadataProperty(name=\"split\"), # (1)\n    ],\n    vectors=[\n        rg.VectorField(name='mini-lm-sentence-transformers', dimensions=384), # (2)\n    ],\n)\n
    1. We should provide all relevant metadata fields available in the dataset.

    2. We should provide all relevant vectors available in the dataset.

    Finally, create the new dataset on the Argilla V2 server:

    dataset = rg.Dataset(name=dataset_name, settings=settings)\ndataset.create()\n

    Note

    If a dataset with the same name already exists, the create method will raise an exception. You can check if the dataset exists and delete it before creating a new one.

    dataset = client.datasets(name=dataset_name)\n\nif dataset is not None:\n    dataset.delete()\n
    "},{"location":"how_to_guides/migrate_from_legacy_datasets/#step-3-upload-the-dataset-records","title":"Step 3: Upload the dataset records","text":"

    To upload the records to the new server, we will need to convert the records from the Argilla V1 format to the Argilla V2 format. The new argilla sdk package uses a generic Record class, but legacy datasets have specific record classes. We will need to convert the records to the generic Record class.

    Here are a set of example functions to convert the records for single-label and multi-label classification. You can modify these functions to suit your dataset.

    For single-label classificationFor multi-label classificationFor token classificationFor text generation
    def map_to_record_for_single_label(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        label, score = prediction[0].values()\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"label\", # (1)\n                value=label,\n                score=score,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"label\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields=data[\"inputs\"],\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        suggestions=suggestions,\n        responses=responses,\n    )\n
    1. Make sure the question_name matches the name of the question in question settings.

    2. Make sure the question_name matches the name of the question in question settings.

    def map_to_record_for_multi_label(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        labels, scores = zip(*[(pred[\"label\"], pred[\"score\"]) for pred in prediction])\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"labels\", # (1)\n                value=labels,\n                score=scores,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"labels\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields=data[\"inputs\"],\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        suggestions=suggestions,\n        responses=responses,\n    )\n
    1. Make sure the question_name matches the name of the question in question settings.

    2. Make sure the question_name matches the name of the question in question settings.

    def map_to_record_for_span(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a token classification record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        scores = [span[\"score\"] for span in prediction]\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"spans\", # (1)\n                value=prediction,\n                score=scores,\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"spans\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields={\"text\": data[\"text\"]},\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        # The vectors field should be a dictionary with the same keys as the `vectors` in the settings\n        suggestions=suggestions,\n        responses=responses,\n    )\n
    1. Make sure the question_name matches the name of the question in question settings.

    2. Make sure the question_name matches the name of the question in question settings.

    def map_to_record_for_text_generation(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:\n    \"\"\" This function maps a text2text record dictionary to the new Argilla record.\"\"\"\n    suggestions = []\n    responses = []\n\n    if prediction := data.get(\"prediction\"):\n        first = prediction[0]\n        agent = data[\"prediction_agent\"]\n        suggestions.append(\n            rg.Suggestion(\n                question_name=\"text_generation\", # (1)\n                value=first[\"text\"],\n                score=first[\"score\"],\n                agent=agent\n            )\n        )\n\n    if annotation := data.get(\"annotation\"):\n        # From data[annotation]\n        user_id = users_by_name.get(data[\"annotation_agent\"], current_user).id\n        responses.append(\n            rg.Response(\n                question_name=\"text_generation\", # (2)\n                value=annotation,\n                user_id=user_id\n            )\n        )\n\n    return rg.Record(\n        id=data[\"id\"],\n        fields={\"text\": data[\"text\"]},\n        # The inputs field should be a dictionary with the same keys as the `fields` in the settings\n        metadata=data[\"metadata\"],\n        # The metadata field should be a dictionary with the same keys as the `metadata` in the settings\n        vectors=data.get(\"vectors\") or {},\n        # The vectors field should be a dictionary with the same keys as the `vectors` in the settings\n        suggestions=suggestions,\n        responses=responses,\n    )\n
    1. Make sure the question_name matches the name of the question in question settings.

    2. Make sure the question_name matches the name of the question in question settings.

    The functions above depend on the users_by_name dictionary and the current_user object to assign responses to users, we need to load the existing users. You can retrieve the users from the Argilla V2 server and the current user as follows:

    users_by_name = {user.username: user for user in client.users}\ncurrent_user = client.me\n

    Finally, upload the records to the new dataset using the log method and map functions.

    records = []\n\nfor data in hf_records:\n    records.append(map_to_record_for_single_label(data, users_by_name, current_user))\n\n# Upload the records to the new dataset\ndataset.records.log(records)\n

    You have now successfully migrated your legacy dataset to Argilla V2. For more guides on how to use the Argilla SDK, please refer to the How to guides.

    "},{"location":"how_to_guides/query/","title":"Query and filter records","text":"

    This guide provides an overview of how to query and filter a dataset in Argilla.

    You can search for records in your dataset by querying or filtering. The query focuses on the content of the text field, while the filter is used to filter the records based on conditions. You can use them independently or combine multiple filters to create complex search queries. You can also export records from a dataset either as a single dictionary or a list of dictionaries.

    Main Classes

    rg.queryrg.Filter
    rg.Query(\n    query=\"query\",\n    filter=filter\n)\n

    Check the Query - Python Reference to see the attributes, arguments, and methods of the Query class in detail.

    rg.Filter(\n    [\n        (\"field\", \"==\", \"value\"),\n    ]\n)\n

    Check the Filter - Python Reference to see the attributes, arguments, and methods of the Filter class in detail.

    "},{"location":"how_to_guides/query/#query-with-search-terms","title":"Query with search terms","text":"

    To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field. You can search a single term or various terms, in the latter, all of them should appear in the record to be retrieved.

    Single term searchMultiple terms search
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery = rg.Query(query=\"my_term\")\n\nqueried_records = dataset.records(query=query).to_list(flatten=True)\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery = rg.Query(query=\"my_term1 my_term2\")\n\nqueried_records = dataset.records(query=query).to_list(flatten=True)\n
    "},{"location":"how_to_guides/query/#advanced-queries","title":"Advanced queries","text":"

    If you need more complex searches, you can use Elasticsearch's simple query string syntax. Here is a summary of the different available operators:

    operator description example + or space AND: search both terms argilla + distilabel or argilla distilabel return records that include the terms \"argilla\" and \"distilabel\" | OR: search either term argilla | distilabel returns records that include the term \"argilla\" or \"distilabel\" - Negation: exclude a term argilla -distilabel returns records that contain the term \"argilla\" and don't have the term \"distilabel\" * Prefix: search a prefix arg* returns records with any words starting with \"arg-\" \" Phrase: search a phrase \"argilla and distilabel\" returns records that contain the phrase \"argilla and distilabel\" ( and ) Precedence: group terms (argilla | distilabel) rules returns records that contain either \"argilla\" or \"distilabel\" and \"rules\" ~N Edit distance: search a term or phrase with an edit distance argilla~1 returns records that contain the term \"argilla\" with an edit distance of 1, e.g. \"argila\"

    Tip

    To use one of these characters literally, escape it with a preceding backslash \\, e.g. \"1 \\+ 2\" would match records where the phrase \"1 + 2\" is found.

    "},{"location":"how_to_guides/query/#filter-by-conditions","title":"Filter by conditions","text":"

    You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include \"==\", \">=\", \"<=\", or \"in\". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses. You can use a single condition or multiple conditions to filter records.

    operator description == The field value is equal to the value >= The field value is greater than or equal to the value <= The field value is less than or equal to the value in The field value is included in a list of values Single conditionMultiple conditions
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nfilter_label = rg.Filter((\"label\", \"==\", \"positive\"))\n\nfiltered_records = dataset.records(query=rg.Query(filter=filter_label)).to_list(\n    flatten=True\n)\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nfilters = rg.Filter(\n    [\n        (\"label.suggestion\", \"==\", \"positive\"),\n        (\"metadata.count\", \">=\", 10),\n        (\"metadata.count\", \"<=\", 20),\n        (\"label\", \"in\", [\"positive\", \"negative\"])\n    ]\n)\n\nfiltered_records = dataset.records(\n    query=rg.Query(filter=filters), with_suggestions=True\n).to_list(flatten=True)\n
    "},{"location":"how_to_guides/query/#filter-by-status","title":"Filter by status","text":"

    You can filter records based on record or response status. Record status can be pending or completed, and response status can be pending, draft, submitted, or discarded.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nstatus_filter = rg.Query(\n    filter=rg.Filter(\n        [\n            (\"status\", \"==\", \"completed\"),\n            (\"response.status\", \"==\", \"discarded\")\n        ]\n    )\n)\n\nfiltered_records = dataset.records(status_filter).to_list(flatten=True)\n
    "},{"location":"how_to_guides/query/#query-and-filter-a-dataset","title":"Query and filter a dataset","text":"

    As mentioned, you can use a query with a search term and a filter or various filters to create complex search queries.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\", workspace=\"my_workspace\")\n\nquery_filter = rg.Query(\n    query=\"my_term\",\n    filter=rg.Filter(\n        [\n            (\"label.suggestion\", \"==\", \"positive\"),\n            (\"metadata.count\", \">=\", 10),\n        ]\n    )\n)\n\nqueried_filtered_records = dataset.records(\n    query=query_filter,\n    with_metadata=True,\n    with_suggestions=True\n).to_list(flatten=True)\n
    "},{"location":"how_to_guides/record/","title":"Add, update, and delete records","text":"

    This guide provides an overview of records, explaining the basics of how to define and manage them in Argilla.

    A record in Argilla is a data item that requires annotation, consisting of one or more fields. These are the pieces of information displayed to the user in the UI to facilitate the completion of the annotation task. Each record also includes questions that annotators are required to answer, with the option of adding suggestions and responses to assist them. Guidelines are also provided to help annotators effectively complete their tasks.

    A record is part of a dataset, so you will need to create a dataset before adding records. Check this guide to learn how to create a dataset.

    Main Class

    rg.Record(\n    external_id=\"1234\",\n    fields={\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\"\n    },\n    metadata={\n        \"category\": \"A\"\n    },\n    vectors={\n        \"my_vector\": [0.1, 0.2, 0.3],\n    },\n    suggestions=[\n        rg.Suggestion(\"my_label\", \"positive\", score=0.9, agent=\"model_name\")\n    ],\n    responses=[\n        rg.Response(\"label\", \"positive\", user_id=user_id)\n    ],\n)\n

    Check the Record - Python Reference to see the attributes, arguments, and methods of the Record class in detail.

    "},{"location":"how_to_guides/record/#add-records","title":"Add records","text":"

    You can add records to a dataset in two different ways: either by using a dictionary or by directly initializing a Record object. You should ensure that fields, metadata and vectors match those configured in the dataset settings. In both cases, are added via the Dataset.records.log method. As soon as you add the records, these will be available in the Argilla UI. If they do not appear in the UI, you may need to click the refresh button to update the view.

    Tip

    Take some time to inspect the data before adding it to the dataset in case this triggers changes in the questions or fields.

    Note

    If you are planning to use public data, the Datasets page of the Hugging Face Hub is a good place to start. Remember to always check the license to make sure you can legally use it for your specific use case.

    As Record objectsFrom a generic data structureFrom a Hugging Face dataset

    You can add records to a dataset by initializing a Record object directly. This is ideal if you need to apply logic to the data before defining the record. If the data is already structured, you should consider adding it directly as a dictionary or Hugging Face dataset.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n    ), # (1)\n]\n\ndataset.records.log(records)\n
    1. This is an illustration of a definition. In a real-world scenario, you would iterate over a data structure and create Record objects for each iteration.

    You can add the data directly as a dictionary like structure, where the keys correspond to the names of fields, questions, metadata or vectors in the dataset and the values are the data to be added.

    If your data structure does not correspond to your Argilla dataset names, you can use a mapping to indicate which keys in the source data correspond to the dataset fields, metadata, vectors, suggestions, or responses. If you need to add the same data to multiple attributes, you can also use a list with the name of the attributes.

    We illustrate this python dictionaries that represent your data, but we would not advise you to define dictionaries. Instead, use the Record object to instantiate records.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ndataset = client.datasets(name=\"my_dataset\")\n\n# Add records to the dataset with the fields 'question' and 'answer'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n    }, # (1)\n]\ndataset.records.log(data)\n\n# Add records to the dataset with a mapping of the fields 'question' and 'answer'\ndata = [\n    {\n        \"query\": \"Do you need oxygen to breathe?\",\n        \"response\": \"Yes\",\n    },\n    {\n        \"query\": \"What is the boiling point of water?\",\n        \"response\": \"100 degrees Celsius\",\n    },\n]\ndataset.records.log(data, mapping={\"query\": \"question\", \"response\": \"answer\"}) # (2)\n
    1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
    2. The data structure has keys query and response, and the Argilla dataset has fields question and answer. You can use the mapping parameter to map the keys in the data structure to the fields in the Argilla dataset.

    You can also add records to a dataset using a Hugging Face dataset. This is useful when you want to use a dataset from the Hugging Face Hub and add it to your Argilla dataset.

    You can add the dataset where the column names correspond to the names of fields, metadata or vectors in the Argilla dataset.

    import argilla as rg\nfrom datasets import load_dataset\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\ndataset = client.datasets(name=\"my_dataset\") # (1)\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (2)\n\ndataset.records.log(records=hf_dataset)\n
    1. In this case, we are using the my_dataset dataset from the Argilla workspace. The dataset has a text field and a label question.

    2. In this example, the Hugging Face dataset matches the Argilla dataset schema. If that is not the case, you could use the .map of the datasets library to prepare the data before adding it to the Argilla dataset.

    If the Hugging Face dataset's schema does not correspond to your Argilla dataset field names, you can use a mapping to specify the relationship. You should indicate as key the column name of the Hugging Face dataset and, as value, the field name of the Argilla dataset.

    dataset.records.log(\n    records=hf_dataset, mapping={\"text\": \"review\", \"label\": \"sentiment\"}\n) # (1)\n
    1. In this case, the text key in the Hugging Face dataset would correspond to the review field in the Argilla dataset, and the label key in the Hugging Face dataset would correspond to the sentiment field in the Argilla dataset.
    "},{"location":"how_to_guides/record/#metadata","title":"Metadata","text":"

    Record metadata can include any information about the record that is not part of the fields in the form of a dictionary. To use metadata for filtering and sorting records, make sure that the key of the dictionary corresponds with the metadata property name. When the key doesn't correspond, this will be considered extra metadata that will get stored with the record (as long as allow_extra_metadata is set to True for the dataset), but will not be usable for filtering and sorting.

    Note

    Remember that to use metadata within a dataset, you must define a metadata property in the dataset settings.

    Check the Metadata - Python Reference to see the attributes, arguments, and methods for using metadata in detail.

    As Record objectsFrom a generic data structure

    You can add metadata to a record in an initialized Record object.

    # Add records to the dataset with the metadata 'category'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        metadata={\"my_metadata\": \"option_1\"},\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        metadata={\"my_metadata\": \"option_1\"},\n    ),\n]\ndataset.records.log(records)\n

    You can add metadata to a record directly as a dictionary structure, where the keys correspond to the names of metadata properties in the dataset and the values are the metadata to be added. Remember that you can also use the mapping parameter to specify the data structure.

    # Add records to the dataset with the metadata 'category'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"my_metadata\": \"option_1\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"my_metadata\": \"option_1\",\n    },\n]\ndataset.records.log(data)\n
    "},{"location":"how_to_guides/record/#vectors","title":"Vectors","text":"

    You can associate vectors, like text embeddings, to your records. They can be used for semantic search in the UI and the Python SDK. Make sure that the length of the list corresponds to the dimensions set in the vector settings.

    Note

    Remember that to use vectors within a dataset, you must define them in the dataset settings.

    Check the Vector - Python Reference to see the attributes, arguments, and methods of the Vector class in detail.

    As Record objectsFrom a generic data structure

    You can also add vectors to a record in an initialized Record object.

    # Add records to the dataset with the vector 'my_vector' and dimension=3\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        vectors={\n            \"my_vector\": [0.1, 0.2, 0.3]\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        vectors={\n            \"my_vector\": [0.2, 0.5, 0.3]\n        },\n    ),\n]\ndataset.records.log(records)\n

    You can add vectors from a dictionary-like structure, where the keys correspond to the names of the vector settings that were configured for your dataset and the value is a list of floats. Remember that you can also use the mapping parameter to specify the data structure.

    # Add records to the dataset with the vector 'my_vector' and dimension=3\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"my_vector\": [0.1, 0.2, 0.3],\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"my_vector\": [0.2, 0.5, 0.3],\n    },\n]\ndataset.records.log(data)\n
    "},{"location":"how_to_guides/record/#suggestions","title":"Suggestions","text":"

    Suggestions refer to suggested responses (e.g. model predictions) that you can add to your records to make the annotation process faster. These can be added during the creation of the record or at a later stage. Only one suggestion can be provided for each question, and suggestion values must be compliant with the pre-defined questions e.g. if we have a RatingQuestion between 1 and 5, the suggestion should have a valid value within that range.

    Check the Suggestions - Python Reference to see the attributes, arguments, and methods of the Suggestion class in detail.

    Tip

    Check the Suggestions - Python Reference for different formats per Question type.

    As Record objectsFrom a generic data structure

    You can also add suggestions to a record in an initialized Record object.

    # Add records to the dataset with the label 'my_label'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        suggestions=[\n            rg.Suggestion(\n                \"my_label\",\n                \"positive\",\n                score=0.9,\n                agent=\"model_name\"\n            )\n        ],\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        suggestions=[\n            rg.Suggestion(\n                \"my_label\",\n                \"negative\",\n                score=0.9,\n                agent=\"model_name\"\n            )\n        ],\n    ),\n]\ndataset.records.log(records)\n

    You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure.

    # Add records to the dataset with the label question 'my_label'\ndata =  [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"label\": \"positive\",\n        \"score\": 0.9,\n        \"agent\": \"model_name\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"label\": \"negative\",\n        \"score\": 0.9,\n        \"agent\": \"model_name\",\n    },\n]\ndataset.records.log(\n    data=data,\n    mapping={\n        \"label\": \"my_label\",\n        \"score\": \"my_label.suggestion.score\",\n        \"agent\": \"my_label.suggestion.agent\",\n    },\n)\n
    "},{"location":"how_to_guides/record/#responses","title":"Responses","text":"

    If your dataset includes some annotations, you can add those to the records as you create them. Make sure that the responses adhere to the same format as Argilla's output and meet the schema requirements for the specific type of question being answered. Make sure to include the user_id in case you're planning to add more than one response for the same question, if not responses will apply to all the annotators.

    Check the Responses - Python Reference to see the attributes, arguments, and methods of the Response class in detail.

    Note

    Keep in mind that records with responses will be displayed as \"Draft\" in the UI.

    Tip

    Check the Responses - Python Reference for different formats per Question type.

    As Record objectsFrom a generic data structure

    You can also add suggestions to a record in an initialized Record object.

    # Add records to the dataset with the label 'my_label'\nrecords = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n        responses=[\n            rg.Response(\"my_label\", \"positive\", user_id=user.id)\n        ]\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n        responses=[\n            rg.Response(\"my_label\", \"negative\", user_id=user.id)\n        ]\n    ),\n]\ndataset.records.log(records)\n

    You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure. If you want to specify the user that added the response, you can use the user_id parameter.

    # Add records to the dataset with the label 'my_label'\ndata = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n        \"label\": \"positive\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n        \"label\": \"negative\",\n    },\n]\ndataset.records.log(data, user_id=user.id, mapping={\"label\": \"my_label.response\"})\n
    "},{"location":"how_to_guides/record/#list-records","title":"List records","text":"

    To list records in a dataset, you can use the records method on the Dataset object. This method returns a list of Record objects that can be iterated over to access the record properties.

    for record in dataset.records(\n    with_suggestions=True,\n    with_responses=True,\n    with_vectors=True\n):\n\n    # Access the record properties\n    print(record.metadata)\n    print(record.vectors)\n    print(record.suggestions)\n    print(record.responses)\n\n    # Access the responses of the record\n    for response in record.responses:\n        print(response.value)\n
    "},{"location":"how_to_guides/record/#update-records","title":"Update records","text":"

    You can update records in a dataset by calling the log method on the Dataset object. To update a record, you need to provide the record id and the new data to be updated.

    data = dataset.records.to_list(flatten=True)\n\nupdated_data = [\n    {\n        \"text\": sample[\"text\"],\n        \"label\": \"positive\",\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n
    Update the metadataUpdate vectorsUpdate suggestionsUpdate responses

    The metadata of the Record object is a python dictionary. To update it, you can iterate over the records and update the metadata by key. After that, you should update the records in the dataset.

    Tip

    Check the Metadata - Python Reference for different formats per MetadataProperty type.

    updated_records = []\n\nfor record in dataset.records():\n\n    record.metadata[\"my_metadata\"] = \"new_value\"\n    record.metadata[\"my_new_metadata\"] = \"new_value\"\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

    If a new vector field is added to the dataset settings or some value for the existing record vectors must be updated, you can iterate over the records and update the vectors by key. After that, you should update the records in the dataset.

    updated_records = []\n\nfor record in dataset.records(with_vectors=True):\n\n    record.vectors[\"my_vector\"] = [ 0, 1, 2, 3, 4, 5 ]\n    record.vectors[\"my_new_vector\"] = [ 0, 1, 2, 3, 4, 5 ]\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

    If some value for the existing record suggestions must be updated, you can iterate over the records and update the suggestions by key. You can also add a suggestion using the add method. After that, you should update the records in the dataset.

    Tip

    Check the Suggestions - Python Reference for different formats per Question type.

    updated_records = []\n\nfor record in dataset.records(with_suggestions=True):\n\n    # We can update existing suggestions\n    record.suggestions[\"label\"].value = \"new_value\"\n    record.suggestions[\"label\"].score = 0.9\n    record.suggestions[\"label\"].agent = \"model_name\"\n\n    # We can also add new suggestions with the `add` method:\n    if not record.suggestions[\"label\"]:\n        record.suggestions.add(\n            rg.Suggestion(\"value\", \"label\", score=0.9, agent=\"model_name\")\n        )\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n

    If some value for the existing record responses must be updated, you can iterate over the records and update the responses by key. You can also add a response using the add method. After that, you should update the records in the dataset.

    Tip

    Check the Responses - Python Reference for different formats per Question type.

    updated_records = []\n\nfor record in dataset.records(with_responses=True):\n\n    for response in record.responses[\"label\"]:\n\n        if response:\n                response.value = \"new_value\"\n                response.user_id = \"existing_user_id\"\n\n        else:\n            record.responses.add(rg.Response(\"label\", \"YES\", user_id=user.id))\n\n    updated_records.append(record)\n\ndataset.records.log(records=updated_records)\n
    "},{"location":"how_to_guides/record/#delete-records","title":"Delete records","text":"

    You can delete records in a dataset calling the delete method on the Dataset object. To delete records, you need to retrieve them from the server and get a list with those that you want to delete.

    records_to_delete = list(dataset.records)[:5]\ndataset.records.delete(records=records_to_delete)\n

    Delete records based on a query

    It can be very useful to avoid eliminating records with responses.

    For more information about the query syntax, check this how-to guide.

    status_filter = rg.Query(\n    filter = rg.Filter((\"response.status\", \"==\", \"pending\"))\n)\nrecords_to_delete = list(dataset.records(status_filter))\n\ndataset.records.delete(records_to_delete)\n
    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/","title":"Use Markdown to format rich content","text":"

    This guide provides an overview of how to use Markdown and HTML in TextFields to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

    The TextField and TextQuestion provide the option to enable Markdown and therefore HTML by setting use_markdown=True. Given the flexibility of HTML, we can get great control over the presentation of data to our annotators. We provide some out-of-the-box methods for multi-modality and chat templates in the examples below.

    Main Methods

    image_to_htmlaudio_to_htmlvideo_to_htmlpdf_to_htmlchat_to_html
    image_to_html(\"local_image_file.png\")\n
    audio_to_html(\"local_audio_file.mp3\")\n
    audio_to_html(\"local_video_file.mp4\")\n
    pdf_to_html(\"local_pdf_file.pdf\")\n
    chat_to_html([{\"role\": \"user\", \"content\": \"hello\"}])\n

    Check the Markdown - Python Reference to see the arguments of the rg.markdown methods in detail.

    Tip

    You can get pretty creative with HTML. For example, think about visualizing graphs and tables. You can use some interesting Python packages methods like pandas.DataFrame.to_html and plotly.io.to_html.

    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/#multi-modal-support-images-audio-video-pdfs-and-more","title":"Multi-modal support: images, audio, video, PDFs and more","text":"

    Argilla has basic multi-modal support in different ways, each with pros and cons, but they both offer the same UI experience because they both rely on HTML.

    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/#local-content-through-dataurls","title":"Local content through DataURLs","text":"

    A DataURL is a scheme that allows data to be encoded into a base64-encoded string and then embedded directly into HTML. To facilitate this, we offer some functions: image_to_html, audio_to_html, video_to_thml, and pdf_to_html. These functions accept either the file path or the file's byte data and return the corresponding HTMurl to render the media file within the Argilla user interface. Additionally, you can also set the width and height in pixel or percentage for video and image (defaults to the original dimensions) and the autoplay and loop attributes to True for audio and video (defaults to False).

    Warning

    DataURLs increase the memory usage of the original filesize. Additionally, different browsers enforce different size limitations for rendering DataURLs which might block the visualization experience per user.

    ImageAudioVideoPDF
    from argilla.markdown import image_to_html\n\nhtml = image_to_html(\n    \"local_image_file.png\",\n    width=\"300px\",\n    height=\"300px\"\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    from argilla.markdown import audio_to_html\n\nhtml = audio_to_html(\n    \"local_audio_file.mp3\",\n    width=\"300px\",\n    height=\"300px\",\n    autoplay=True,\n    loop=True\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    from argilla.markdown import video_to_thml\n\nhtml = video_to_html(\n    \"local_video_file.mp4\",\n    width=\"300px\",\n    height=\"300px\",\n    autoplay=True,\n    loop=True\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    from argilla.markdown import pdf_to_html\n\nhtml = pdf_to_html(\n    \"local_pdf_file.pdf\",\n    width=\"300px\",\n    height=\"300px\"\n)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/#hosted-content","title":"Hosted content","text":"

    Instead of uploading local files through DataURLs, we can also visualize URLs directly linking to media files such as images, audio, video, and PDFs hosted on a public or private server. In this case, you can use basic HTML to visualize content available on platforms like Google Drive or decide to configure a private media server.

    Warning

    When trying to access content from a private media server you have to ensure that the Argilla server has network access to the private media server, which might be done through something like IP whitelisting.

    ImageAudioVideoPDF
    html = \"<img src='https://example.com/public-image-file.jpg'>\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    html = \"\"\"\n<audio controls>\n    <source src=\"https://example.com/public-audio-file.mp3\" type=\"audio/mpeg\">\n</audio>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    html = \"\"\"\n<video width=\"320\" height=\"240\" controls>\n    <source src=\"https://example.com/public-video-file.mp4\" type=\"video/mp4\">\n</video>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    html = \"\"\"\n<iframe\n    src=\"https://example.com/public-pdf-file.pdf\"\n    width=\"600\"\n    height=\"500\">\n</iframe>\n\"\"\"\"\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n
    "},{"location":"how_to_guides/use_markdown_to_format_rich_content/#chat-and-conversation-support","title":"Chat and conversation support","text":"

    When working with chat data from multi-turn interaction with a Large Language Model, it might be nice to be able to visualize the conversation in a similar way as a common chat interface. To facilitate this, we offer the chat_to_html function, which converts messages from OpenAI chat format to an HTML-formatted chat interface.

    OpenAI chat format

    The OpenAI chat format is a way to structure a list of messages as input from users and returns a model-generated message as output. These messages can only contain the roles \"user\" for human messages and \"assistant\", \"system\" or \"model\" for model-generated messages.

    from argilla.markdown import chat_to_html\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Hello! How are you?\"},\n    {\"role\": \"assistant\", \"content\": \"I'm good, thank you!\"}\n]\n\nhtml = chat_to_html(messages)\n\nrg.Record(\n    fields={\"markdown_enabled_field\": html}\n)\n

    "},{"location":"how_to_guides/user/","title":"User management","text":"

    This guide provides an overview of user roles and credentials, explaining how to set up and manage users in Argilla.

    A user in Argilla is an authorized person who, depending on their role, can use the Python SDK and access the UI in a running Argilla instance. We differentiate between three types of users depending on their role, permissions and needs: owner, admin and annotator.

    OverviewOwnerAdminAnnotator Owner Admin Annotator Number Unlimited Unlimited Unlimited Create and delete workspaces Yes No No Assign users to workspaces Yes No No Create, configure, update, and delete datasets Yes Only within assigned workspaces No Create, update, and delete users Yes No No Provide feedback with Argila UI Yes Yes Yes

    The owner refers to the root user who created the Argilla instance. Using workspaces within Argilla proves highly beneficial for organizing tasks efficiently. So, the owner has full access to all workspaces and their functionalities:

    • Workspace management: It can create, read and delete a workspace.
    • User management: It can create a new user, assign it to a workspace, and delete it. It can also list them and search for a specific one.
    • Dataset management: It can create, configure, retrieve, update, and delete datasets.
    • Annotation: It can annotate datasets in the Argilla UI.
    • Feedback: It can provide feedback with the Argilla UI.

    An admin user can only access the workspaces it has been assigned to and cannot assign other users to it. An admin user has the following permissions:

    • Dataset management: It can create, configure, retrieve, update, and delete datasets only on the assigned workspaces.
    • Annotation: It can annotate datasets in the assigned workspaces via the Argilla UI.
    • Feedback: It can provide feedback with the Argilla UI.

    An annotator user is limited to accessing only the datasets assigned to it within the workspace. It has two specific permissions:

    • Annotation: It can annotate the assigned datasets in the Argilla UI.
    • Feedback: It can provide feedback with the Argilla UI.
    Question: Who can manage users?

    Only users with the owner role can manage (create, retrieve, delete) other users.

    "},{"location":"how_to_guides/user/#initial-users-and-credentials","title":"Initial users and credentials","text":"

    Depending on your Argilla deployment, the initial user with the owner role will vary.

    • If you deploy on the Hugging Face Hub, the initial user will correspond to the Space owner (your personal account). The API key is automatically generated and can be copied from the \"Settings\" section of the UI.
    • If you deploy with Docker, the default values for the environment variables are: USERNAME: argilla, PASSWORD: 12345678, API_KEY: argilla.apikey.

    For the new users, the username and password are set during the creation process. The API key can be copied from the \"Settings\" section of the UI.

    Main Class

    rg.User(\n    username=\"username\",\n    first_name=\"first_name\",\n    last_name=\"last_name\",\n    role=\"owner\",\n    password=\"password\",\n    client=client\n)\n

    Check the User - Python Reference to see the attributes, arguments, and methods of the User class in detail.

    "},{"location":"how_to_guides/user/#get-current-user","title":"Get current user","text":"

    To ensure you're using the correct credentials for managing users, you can get the current user in Argilla using the me attribute of the Argilla class.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\ncurrent_user = client.me\n
    "},{"location":"how_to_guides/user/#create-a-user","title":"Create a user","text":"

    To create a new user in Argilla, you can define it in the User class and then call the create method. This method is inherited from the Resource base class and operates without modifications.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser_to_create = rg.User(\n    username=\"my_username\",\n    password=\"12345678\",\n)\n\ncreated_user = user_to_create.create()\n

    Accessing attributes

    Access the attributes of a user by calling them directly on the User object. For example, user.id or user.username.

    "},{"location":"how_to_guides/user/#list-users","title":"List users","text":"

    You can list all the existing users in Argilla by accessing the users attribute on the Argilla class and iterating over them. You can also use len(client.users) to get the number of users.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nusers = client.users\n\nfor user in users:\n    print(user)\n

    Notebooks

    When using a notebook, executing client.users will display a table with username, id, role, and the last update as updated_at.

    "},{"location":"how_to_guides/user/#retrieve-a-user","title":"Retrieve a user","text":"

    You can retrieve an existing user from Argilla by accessing the users attribute on the Argilla class and passing the username or id as an argument. If the user does not exist, a warning message will be raised and None will be returned.

    By usernameBy id
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_user = client.users(\"my_username\")\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_user = client.users(id=\"<uuid-or-uuid-string>\")\n
    "},{"location":"how_to_guides/user/#check-user-existence","title":"Check user existence","text":"

    You can check if a user exists. The client.users method will return None if the user was not found.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users(\"my_username\")\n\nif user is not None:\n    pass\n
    "},{"location":"how_to_guides/user/#list-users-in-a-workspace","title":"List users in a workspace","text":"

    You can list all the users in a workspace by accessing the users attribute on the Workspace class and iterating over them. You can also use len(workspace.users) to get the number of users by workspace.

    For further information on how to manage workspaces, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces('my_workspace')\n\nfor user in workspace.users:\n    print(user)\n
    "},{"location":"how_to_guides/user/#add-a-user-to-a-workspace","title":"Add a user to a workspace","text":"

    You can add an existing user to a workspace in Argilla by calling the add_to_workspace method on the User class.

    For further information on how to manage workspaces, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users('my_username')\nworkspace = client.workspaces('my_workspace')\n\nadded_user = user.add_to_workspace(workspace)\n
    "},{"location":"how_to_guides/user/#remove-a-user-from-a-workspace","title":"Remove a user from a workspace","text":"

    You can remove an existing user from a workspace in Argilla by calling the remove_from_workspace method on the User class.

    For further information on how to manage workspaces, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser = client.users('my_username')\nworkspace = client.workspaces('my_workspace')\n\nremoved_user = user.remove_from_workspace(workspace)\n
    "},{"location":"how_to_guides/user/#delete-a-user","title":"Delete a user","text":"

    You can delete an existing user from Argilla by calling the delete method on the User class.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nuser_to_delete = client.users('my_username')\n\ndeleted_user = user_to_delete.delete()\n
    "},{"location":"how_to_guides/workspace/","title":"Workspace management","text":"

    This guide provides an overview of workspaces, explaining how to set up and manage workspaces in Argilla.

    A workspace is a space inside your Argilla instance where authorized users can collaborate on datasets. It is accessible through the Python SDK and the UI.

    Question: Who can manage workspaces?

    Only users with the owner role can manage (create, read and delete) workspaces.

    A user with the admin role can only read the workspace to which it belongs.

    "},{"location":"how_to_guides/workspace/#initial-workspaces","title":"Initial workspaces","text":"

    Depending on your Argilla deployment, the initial workspace will vary.

    • If you deploy on the Hugging Face Hub, the initial workspace will be the one indicated in the .oauth.yaml file. By default, argilla.
    • If you deploy with Docker, you will need to create a workspace as shown in the next section.

    Main Class

    rg.Workspace(\n    name = \"name\",\n    client=client\n)\n

    Check the Workspace - Python Reference to see the attributes, arguments, and methods of the Workspace class in detail.

    "},{"location":"how_to_guides/workspace/#create-a-new-workspace","title":"Create a new workspace","text":"

    To create a new workspace in Argilla, you can define it in the Workspace class and then call the create method. This method is inherited from the Resource base class and operates without modifications.

    When you create a new workspace, it will be empty. To create and add a new dataset, check these guides.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace_to_create = rg.Workspace(name=\"my_workspace\")\n\ncreated_workspace = workspace_to_create.create()\n

    Accessing attributes

    Access the attributes of a workspace by calling them directly on the Workspace object. For example, workspace.id or workspace.name.

    "},{"location":"how_to_guides/workspace/#list-workspaces","title":"List workspaces","text":"

    You can list all the existing workspaces in Argilla by calling the workspaces attribute on the Argilla class and iterating over them. You can also use len(client.workspaces) to get the number of workspaces.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspaces = client.workspaces\n\nfor workspace in workspaces:\n    print(workspace)\n

    Notebooks

    When using a notebook, executing client.workspaces will display a table with the number of datasets in each workspace, name, id, and the last update as updated_at.

    "},{"location":"how_to_guides/workspace/#retrieve-a-workspace","title":"Retrieve a workspace","text":"

    You can retrieve a workspace by accessing the workspaces method on the Argilla class and passing the name or id of the workspace as an argument. If the workspace does not exist, a warning message will be raised and None will be returned.

    By nameBy id
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_workspace = client.workspaces(\"my_workspace\")\n
    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nretrieved_workspace = client.workspaces(id=\"<uuid-or-uuid-string>\")\n
    "},{"location":"how_to_guides/workspace/#check-workspace-existence","title":"Check workspace existence","text":"

    You can check if a workspace exists. The client.workspaces method will return None if the workspace is not found.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nif workspace is not None:\n    pass\n
    "},{"location":"how_to_guides/workspace/#list-users-in-a-workspace","title":"List users in a workspace","text":"

    You can list all the users in a workspace by accessing the users attribute on the Workspace class and iterating over them. You can also use len(workspace.users) to get the number of users by workspace.

    For further information on how to manage users, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces('my_workspace')\n\nfor user in workspace.users:\n    print(user)\n
    "},{"location":"how_to_guides/workspace/#add-a-user-to-a-workspace","title":"Add a user to a workspace","text":"

    You can also add a user to a workspace by calling the add_user method on the Workspace class.

    For further information on how to manage users, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nadded_user = workspace.add_user(\"my_username\")\n
    "},{"location":"how_to_guides/workspace/#remove-a-user-from-workspace","title":"Remove a user from workspace","text":"

    You can also remove a user from a workspace by calling the remove_user method on the Workspace class.

    For further information on how to manage users, check this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace = client.workspaces(\"my_workspace\")\n\nremoved_user = workspace.remove_user(\"my_username\")\n
    "},{"location":"how_to_guides/workspace/#delete-a-workspace","title":"Delete a workspace","text":"

    To delete a workspace, no dataset can be associated with it. If the workspace contains any dataset, deletion will fail. You can delete a workspace by calling the delete method on the Workspace class.

    To clear a workspace and delete all their datasets, refer to this how-to guide.

    import argilla as rg\n\nclient = rg.Argilla(api_url=\"<api_url>\", api_key=\"<api_key>\")\n\nworkspace_to_delete = client.workspaces(\"my_workspace\")\n\ndeleted_workspace = workspace_to_delete.delete()\n
    "},{"location":"reference/argilla/SUMMARY/","title":"SUMMARY","text":"
    • rg.Argilla
    • rg.Workspace
    • rg.User
    • rg.Dataset
      • rg.Dataset.records
    • rg.Settings
      • Fields
      • Questions
      • Metadata
      • Vectors
      • Distribution
    • rg.Record
      • rg.Response
      • rg.Suggestion
      • rg.Vector
      • rg.Metadata
    • rg.Query
    • rg.markdown
    "},{"location":"reference/argilla/client/","title":"rg.Argilla","text":"

    To interact with the Argilla server from Python you can use the Argilla class. The Argilla client is used to create, get, update, and delete all Argilla resources, such as workspaces, users, datasets, and records.

    "},{"location":"reference/argilla/client/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/client/#connecting-to-an-argilla-server","title":"Connecting to an Argilla server","text":"

    To connect to an Argilla server, instantiate the Argilla class and pass the api_url of the server and the api_key to authenticate.

    import argilla as rg\n\nclient = rg.Argilla(\n    api_url=\"https://argilla.example.com\",\n    api_key=\"my_api_key\",\n)\n
    "},{"location":"reference/argilla/client/#accessing-dataset-workspace-and-user-objects","title":"Accessing Dataset, Workspace, and User objects","text":"

    The Argilla clients provides access to the Dataset, Workspace, and User objects of the Argilla server.

    my_dataset = client.datasets(\"my_dataset\")\n\nmy_workspace = client.workspaces(\"my_workspace\")\n\nmy_user = client.users(\"my_user\")\n

    These resources can then be interacted with to access their properties and methods. For example, to list all datasets in a workspace:

    for dataset in my_workspace.datasets:\n    print(dataset.name)\n
    "},{"location":"reference/argilla/client/#src.argilla.client.Argilla","title":"Argilla","text":"

    Bases: APIClient

    Argilla API client. This is the main entry point to interact with the API.

    Attributes:

    Name Type Description workspaces Workspaces

    A collection of workspaces.

    datasets Datasets

    A collection of datasets.

    users Users

    A collection of users.

    me User

    The current user.

    Source code in src/argilla/client.py
    class Argilla(_api.APIClient):\n    \"\"\"Argilla API client. This is the main entry point to interact with the API.\n\n    Attributes:\n        workspaces: A collection of workspaces.\n        datasets: A collection of datasets.\n        users: A collection of users.\n        me: The current user.\n\n    \"\"\"\n\n    workspaces: \"Workspaces\"\n    datasets: \"Datasets\"\n    users: \"Users\"\n    me: \"User\"\n\n    # Default instance of Argilla\n    _default_client: Optional[\"Argilla\"] = None\n\n    def __init__(\n        self,\n        api_url: Optional[str] = DEFAULT_HTTP_CONFIG.api_url,\n        api_key: Optional[str] = DEFAULT_HTTP_CONFIG.api_key,\n        timeout: int = DEFAULT_HTTP_CONFIG.timeout,\n        **http_client_args,\n    ) -> None:\n        super().__init__(api_url=api_url, api_key=api_key, timeout=timeout, **http_client_args)\n\n        self._set_default(self)\n\n    @property\n    def workspaces(self) -> \"Workspaces\":\n        \"\"\"A collection of workspaces on the server.\"\"\"\n        return Workspaces(client=self)\n\n    @property\n    def datasets(self) -> \"Datasets\":\n        \"\"\"A collection of datasets on the server.\"\"\"\n        return Datasets(client=self)\n\n    @property\n    def users(self) -> \"Users\":\n        \"\"\"A collection of users on the server.\"\"\"\n        return Users(client=self)\n\n    @cached_property\n    def me(self) -> \"User\":\n        from argilla.users import User\n\n        return User(client=self, _model=self.api.users.get_me())\n\n    ############################\n    # Private methods\n    ############################\n\n    @classmethod\n    def _set_default(cls, client: \"Argilla\") -> None:\n        \"\"\"Set the default instance of Argilla.\"\"\"\n        cls._default_client = client\n\n    @classmethod\n    def _get_default(cls) -> \"Argilla\":\n        \"\"\"Get the default instance of Argilla. If it doesn't exist, create a new one.\"\"\"\n        if cls._default_client is None:\n            cls._default_client = Argilla()\n        return cls._default_client\n
    "},{"location":"reference/argilla/client/#src.argilla.client.Argilla.workspaces","title":"workspaces: Workspaces property","text":"

    A collection of workspaces on the server.

    "},{"location":"reference/argilla/client/#src.argilla.client.Argilla.datasets","title":"datasets: Datasets property","text":"

    A collection of datasets on the server.

    "},{"location":"reference/argilla/client/#src.argilla.client.Argilla.users","title":"users: Users property","text":"

    A collection of users on the server.

    "},{"location":"reference/argilla/markdown/","title":"rg.markdown","text":"

    To support the usage of Markdown within Argilla, we've created some helper functions to easy the usage of DataURL conversions and chat message visualizations.

    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.media","title":"media","text":""},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.video_to_html","title":"video_to_html(file_source, file_type=None, width=None, height=None, autoplay=False, loop=False)","text":"

    Convert a video file to an HTML tag with embedded base64 data.

    Parameters:

    Name Type Description Default file_source Union[str, bytes]

    The path to the media file or a non-b64 encoded byte string.

    required file_type Optional[str]

    The type of the video file. If not provided, it will be inferred from the file extension.

    None width Optional[str]

    Display width in HTML. Defaults to None.

    None height Optional[str]

    Display height in HTML. Defaults to None.

    None autoplay bool

    True to autoplay media. Defaults to False.

    False loop bool

    True to loop media. Defaults to False.

    False

    Returns:

    Type Description str

    The HTML tag with embedded base64 data.

    Examples:

    from argilla.markdown import video_to_html\nhtml = video_to_html(\"my_video.mp4\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n
    Source code in src/argilla/markdown/media.py
    def video_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n    autoplay: bool = False,\n    loop: bool = False,\n) -> str:\n    \"\"\"\n    Convert a video file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the video file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n        autoplay: True to autoplay media. Defaults to False.\n        loop: True to loop media. Defaults to False.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import video_to_html\n        html = video_to_html(\"my_video.mp4\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n        ```\n    \"\"\"\n    return _media_to_html(\"video\", file_source, file_type, width, height, autoplay, loop)\n
    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.audio_to_html","title":"audio_to_html(file_source, file_type=None, width=None, height=None, autoplay=False, loop=False)","text":"

    Convert an audio file to an HTML tag with embedded base64 data.

    Parameters:

    Name Type Description Default file_source Union[str, bytes]

    The path to the media file or a non-b64 encoded byte string.

    required file_type Optional[str]

    The type of the audio file. If not provided, it will be inferred from the file extension.

    None width Optional[str]

    Display width in HTML. Defaults to None.

    None height Optional[str]

    Display height in HTML. Defaults to None.

    None autoplay bool

    True to autoplay media. Defaults to False.

    False loop bool

    True to loop media. Defaults to False.

    False

    Returns:

    Type Description str

    The HTML tag with embedded base64 data.

    Examples:

    from argilla.markdown import audio_to_html\nhtml = audio_to_html(\"my_audio.mp3\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n
    Source code in src/argilla/markdown/media.py
    def audio_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n    autoplay: bool = False,\n    loop: bool = False,\n) -> str:\n    \"\"\"\n    Convert an audio file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the audio file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n        autoplay: True to autoplay media. Defaults to False.\n        loop: True to loop media. Defaults to False.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import audio_to_html\n        html = audio_to_html(\"my_audio.mp3\", width=\"300px\", height=\"300px\", autoplay=True, loop=True)\n        ```\n    \"\"\"\n    return _media_to_html(\"audio\", file_source, file_type, width, height, autoplay, loop)\n
    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.image_to_html","title":"image_to_html(file_source, file_type=None, width=None, height=None)","text":"

    Convert an image file to an HTML tag with embedded base64 data.

    Parameters:

    Name Type Description Default file_source Union[str, bytes]

    The path to the media file or a non-b64 encoded byte string.

    required file_type Optional[str]

    The type of the image file. If not provided, it will be inferred from the file extension.

    None width Optional[str]

    Display width in HTML. Defaults to None.

    None height Optional[str]

    Display height in HTML. Defaults to None.

    None

    Returns:

    Type Description str

    The HTML tag with embedded base64 data.

    Examples:

    from argilla.markdown import image_to_html\nhtml = image_to_html(\"my_image.png\", width=\"300px\", height=\"300px\")\n
    Source code in src/argilla/markdown/media.py
    def image_to_html(\n    file_source: Union[str, bytes],\n    file_type: Optional[str] = None,\n    width: Optional[str] = None,\n    height: Optional[str] = None,\n) -> str:\n    \"\"\"\n    Convert an image file to an HTML tag with embedded base64 data.\n\n    Args:\n        file_source: The path to the media file or a non-b64 encoded byte string.\n        file_type: The type of the image file. If not provided, it will be inferred from the file extension.\n        width: Display width in HTML. Defaults to None.\n        height: Display height in HTML. Defaults to None.\n\n    Returns:\n        The HTML tag with embedded base64 data.\n\n    Examples:\n        ```python\n        from argilla.markdown import image_to_html\n        html = image_to_html(\"my_image.png\", width=\"300px\", height=\"300px\")\n        ```\n    \"\"\"\n    return _media_to_html(\"image\", file_source, file_type, width, height)\n
    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.media.pdf_to_html","title":"pdf_to_html(file_source, width='1000px', height='1000px')","text":"

    Convert a pdf file to an HTML tag with embedded data.

    Parameters:

    Name Type Description Default file_source Union[str, bytes]

    The path to the PDF file, a bytes object with PDF data, or a URL.

    required width Optional[str]

    Display width in HTML. Defaults to \"1000px\".

    '1000px' height Optional[str]

    Display height in HTML. Defaults to \"1000px\".

    '1000px'

    Returns:

    Type Description str

    HTML string embedding the PDF.

    Raises:

    Type Description ValueError

    If the width and height are not pixel or percentage.

    Examples:

    from argilla.markdown import pdf_to_html\nhtml = pdf_to_html(\"my_pdf.pdf\", width=\"300px\", height=\"300px\")\n
    Source code in src/argilla/markdown/media.py
    def pdf_to_html(\n    file_source: Union[str, bytes], width: Optional[str] = \"1000px\", height: Optional[str] = \"1000px\"\n) -> str:\n    \"\"\"\n    Convert a pdf file to an HTML tag with embedded data.\n\n    Args:\n        file_source: The path to the PDF file, a bytes object with PDF data, or a URL.\n        width: Display width in HTML. Defaults to \"1000px\".\n        height: Display height in HTML. Defaults to \"1000px\".\n\n    Returns:\n        HTML string embedding the PDF.\n\n    Raises:\n        ValueError: If the width and height are not pixel or percentage.\n\n    Examples:\n        ```python\n        from argilla.markdown import pdf_to_html\n        html = pdf_to_html(\"my_pdf.pdf\", width=\"300px\", height=\"300px\")\n        ```\n    \"\"\"\n    if not _is_valid_dimension(width) or not _is_valid_dimension(height):\n        raise ValueError(\"Width and height must be valid pixel (e.g., '300px') or percentage (e.g., '50%') values.\")\n\n    if isinstance(file_source, str) and urlparse(file_source).scheme in [\"http\", \"https\"]:\n        return f'<embed src=\"{file_source}\" type=\"application/pdf\" width=\"{width}\" height=\"{height}\"></embed>'\n\n    file_data, _ = _get_file_data(file_source, \"pdf\")\n    pdf_base64 = base64.b64encode(file_data).decode(\"utf-8\")\n    data_url = f\"data:application/pdf;base64,{pdf_base64}\"\n    return f'<object id=\"pdf\" data=\"{data_url}\" type=\"application/pdf\" width=\"{width}\" height=\"{height}\"></object>'\n
    "},{"location":"reference/argilla/markdown/#src.argilla.markdown.chat","title":"chat","text":""},{"location":"reference/argilla/markdown/#src.argilla.markdown.chat.chat_to_html","title":"chat_to_html(messages)","text":"

    Converts a list of chat messages in the OpenAI format to HTML.

    Parameters:

    Name Type Description Default messages List[Dict[str, str]]

    A list of dictionaries where each dictionary represents a chat message. Each dictionary should have the keys: - \"role\": A string indicating the role of the sender (e.g., \"user\", \"model\", \"assistant\", \"system\"). - \"content\": The content of the message.

    required

    Returns:

    Name Type Description str str

    An HTML string that represents the chat conversation.

    Raises:

    Type Description ValueError

    If the an invalid role is passed.

    Examples:

    from argilla.markdown import chat_to_html\nhtml = chat_to_html([\n    {\"role\": \"user\", \"content\": \"hello\"},\n    {\"role\": \"assistant\", \"content\": \"goodbye\"}\n])\n
    Source code in src/argilla/markdown/chat.py
    def chat_to_html(messages: List[Dict[str, str]]) -> str:\n    \"\"\"\n    Converts a list of chat messages in the OpenAI format to HTML.\n\n    Args:\n        messages (List[Dict[str, str]]): A list of dictionaries where each dictionary represents a chat message.\n            Each dictionary should have the keys:\n                - \"role\": A string indicating the role of the sender (e.g., \"user\", \"model\", \"assistant\", \"system\").\n                - \"content\": The content of the message.\n\n    Returns:\n        str: An HTML string that represents the chat conversation.\n\n    Raises:\n        ValueError: If the an invalid role is passed.\n\n    Examples:\n        ```python\n        from argilla.markdown import chat_to_html\n        html = chat_to_html([\n            {\"role\": \"user\", \"content\": \"hello\"},\n            {\"role\": \"assistant\", \"content\": \"goodbye\"}\n        ])\n        ```\n    \"\"\"\n    chat_html = \"\"\n    for message in messages:\n        role = message[\"role\"]\n        content = message[\"content\"]\n        content_html = markdown.markdown(content)\n\n        if role == \"user\":\n            html = '<div class=\"user-message\">' + '<div class=\"message-content\">'\n        elif role in [\"model\", \"assistant\", \"system\"]:\n            html = '<div class=\"system-message\">' + '<div class=\"message-content\">'\n        else:\n            raise ValueError(f\"Invalid role: {role}\")\n\n        html += f\"{content_html}\"\n        html += \"</div></div>\"\n        chat_html += html\n\n    return f\"<body>{CHAT_CSS_STYLE}{chat_html}</body>\"\n
    "},{"location":"reference/argilla/search/","title":"rg.Query","text":"

    To collect records based on searching criteria, you can use the Query and Filter classes. The Query class is used to define the search criteria, while the Filter class is used to filter the search results. Filter is passed to a Query object so you can combine multiple filters to create complex search queries. A Query object can also be passed to Dataset.records to fetch records based on the search criteria.

    "},{"location":"reference/argilla/search/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/search/#searching-for-records-with-terms","title":"Searching for records with terms","text":"

    To search for records with terms, you can use the Dataset.records attribute with a query string. The search terms are used to search for records that contain the terms in the text field.

    for record in dataset.records(query=\"paris\"):\n    print(record)\n
    "},{"location":"reference/argilla/search/#filtering-records-by-conditions","title":"Filtering records by conditions","text":"

    Argilla allows you to filter records based on conditions. You can use the Filter class to define the conditions and pass them to the Dataset.records attribute to fetch records based on the conditions. Conditions include \"==\", \">=\", \"<=\", or \"in\". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses.

    # create a range from 10 to 20\nrange_filter = rg.Filter(\n    [\n        (\"metadata.count\", \">=\", 10),\n        (\"metadata.count\", \"<=\", 20)\n    ]\n)\n\n# query records with metadata count greater than 10 and less than 20\nquery = rg.Query(filters=range_filter, query=\"paris\")\n\n# iterate over the results\nfor record in dataset.records(query=query):\n    print(record)\n
    "},{"location":"reference/argilla/search/#src.argilla.records._search.Query","title":"Query","text":"

    This class is used to map user queries to the internal query models

    Source code in src/argilla/records/_search.py
    class Query:\n    \"\"\"This class is used to map user queries to the internal query models\"\"\"\n\n    query: Optional[str] = None\n\n    def __init__(self, *, query: Union[str, None] = None, filter: Union[Filter, None] = None):\n        \"\"\"Create a query object for use in Argilla search requests.add()\n\n        Parameters:\n            query (Union[str, None], optional): The query string that will be used to search.\n            filter (Union[Filter, None], optional): The filter object that will be used to filter the search results.\n        \"\"\"\n\n        self.query = query\n        self.filter = filter\n\n    def api_model(self) -> SearchQueryModel:\n        model = SearchQueryModel()\n\n        if self.query is not None:\n            text_query = TextQueryModel(q=self.query)\n            model.query = QueryModel(text=text_query)\n\n        if self.filter is not None:\n            model.filters = self.filter.api_model()\n\n        return model\n
    "},{"location":"reference/argilla/search/#src.argilla.records._search.Query.__init__","title":"__init__(*, query=None, filter=None)","text":"

    Create a query object for use in Argilla search requests.add()

    Parameters:

    Name Type Description Default query Union[str, None]

    The query string that will be used to search.

    None filter Union[Filter, None]

    The filter object that will be used to filter the search results.

    None Source code in src/argilla/records/_search.py
    def __init__(self, *, query: Union[str, None] = None, filter: Union[Filter, None] = None):\n    \"\"\"Create a query object for use in Argilla search requests.add()\n\n    Parameters:\n        query (Union[str, None], optional): The query string that will be used to search.\n        filter (Union[Filter, None], optional): The filter object that will be used to filter the search results.\n    \"\"\"\n\n    self.query = query\n    self.filter = filter\n
    "},{"location":"reference/argilla/search/#src.argilla.records._search.Filter","title":"Filter","text":"

    This class is used to map user filters to the internal filter models

    Source code in src/argilla/records/_search.py
    class Filter:\n    \"\"\"This class is used to map user filters to the internal filter models\"\"\"\n\n    def __init__(self, conditions: Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None] = None):\n        \"\"\" Create a filter object for use in Argilla search requests.\n\n        Parameters:\n            conditions (Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None], optional): \\\n                The conditions that will be used to filter the search results. \\\n                The conditions should be a list of tuples where each tuple contains \\\n                the field, operator, and value. For example `(\"label\", \"in\", [\"positive\",\"happy\"])`.\\\n\n        \"\"\"\n\n        if isinstance(conditions, tuple):\n            conditions = [conditions]\n        self.conditions = [Condition(condition) for condition in conditions]\n\n    def api_model(self) -> AndFilterModel:\n        return AndFilterModel.model_validate({\"and\": [condition.api_model() for condition in self.conditions]})\n
    "},{"location":"reference/argilla/search/#src.argilla.records._search.Filter.__init__","title":"__init__(conditions=None)","text":"

    Create a filter object for use in Argilla search requests.

    Parameters:

    Name Type Description Default conditions Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None]

    The conditions that will be used to filter the search results. The conditions should be a list of tuples where each tuple contains the field, operator, and value. For example (\"label\", \"in\", [\"positive\",\"happy\"]).

    None Source code in src/argilla/records/_search.py
    def __init__(self, conditions: Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None] = None):\n    \"\"\" Create a filter object for use in Argilla search requests.\n\n    Parameters:\n        conditions (Union[List[Tuple[str, str, Any]], Tuple[str, str, Any], None], optional): \\\n            The conditions that will be used to filter the search results. \\\n            The conditions should be a list of tuples where each tuple contains \\\n            the field, operator, and value. For example `(\"label\", \"in\", [\"positive\",\"happy\"])`.\\\n\n    \"\"\"\n\n    if isinstance(conditions, tuple):\n        conditions = [conditions]\n    self.conditions = [Condition(condition) for condition in conditions]\n
    "},{"location":"reference/argilla/users/","title":"rg.User","text":"

    A user in Argilla is a profile that uses the SDK or UI. Their profile can be used to track their feedback activity and to manage their access to the Argilla server.

    "},{"location":"reference/argilla/users/#usage-examples","title":"Usage Examples","text":"

    To create a new user, instantiate the User object with the client and the username:

    user = rg.User(username=\"my_username\", password=\"my_password\")\nuser.create()\n

    Existing users can be retrieved by their username:

    user = client.users(\"my_username\")\n

    The current user of the rg.Argilla client can be accessed using the me attribute:

    client.me\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User","title":"User","text":"

    Bases: Resource

    Class for interacting with Argilla users in the Argilla server. User profiles are used to manage access to the Argilla server and track responses to records.

    Attributes:

    Name Type Description username str

    The username of the user.

    first_name str

    The first name of the user.

    last_name str

    The last name of the user.

    role str

    The role of the user, either 'annotator' or 'admin'.

    password str

    The password of the user.

    id UUID

    The ID of the user.

    Source code in src/argilla/users/_resource.py
    class User(Resource):\n    \"\"\"Class for interacting with Argilla users in the Argilla server. User profiles \\\n        are used to manage access to the Argilla server and track responses to records.\n\n    Attributes:\n        username (str): The username of the user.\n        first_name (str): The first name of the user.\n        last_name (str): The last name of the user.\n        role (str): The role of the user, either 'annotator' or 'admin'.\n        password (str): The password of the user.\n        id (UUID): The ID of the user.\n    \"\"\"\n\n    _model: UserModel\n    _api: UsersAPI\n\n    def __init__(\n        self,\n        username: Optional[str] = None,\n        first_name: Optional[str] = None,\n        last_name: Optional[str] = None,\n        role: Optional[str] = None,\n        password: Optional[str] = None,\n        client: Optional[\"Argilla\"] = None,\n        id: Optional[UUID] = None,\n        _model: Optional[UserModel] = None,\n    ) -> None:\n        \"\"\"Initializes a User object with a client and a username\n\n        Parameters:\n            username (str): The username of the user\n            first_name (str): The first name of the user\n            last_name (str): The last name of the user\n            role (str): The role of the user, either 'annotator', admin, or 'owner'\n            password (str): The password of the user\n            client (Argilla): The client used to interact with Argilla\n\n        Returns:\n            User: The initialized user object\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.users)\n\n        if _model is None:\n            _model = UserModel(\n                username=username,\n                password=password,\n                first_name=first_name or username,\n                last_name=last_name,\n                role=role or Role.annotator,\n                id=id,\n            )\n            self._log_message(f\"Initialized user with username {username}\")\n        self._model = _model\n\n    def create(self) -> \"User\":\n        \"\"\"Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.\n\n        Returns:\n            User: The user that was created in Argilla.\n        \"\"\"\n        model_create = self.api_model()\n        model = self._api.create(model_create)\n        # The password is not returned in the response\n        model.password = model_create.password\n        self._model = model\n        return self\n\n    def delete(self) -> None:\n        \"\"\"Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.\"\"\"\n        super().delete()\n        # exists relies on the id, so we need to set it to None\n        self._model = UserModel(username=self.username)\n\n    def add_to_workspace(self, workspace: \"Workspace\") -> \"User\":\n        \"\"\"Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets\n        in the workspace.\n\n        Args:\n            workspace (Workspace): The workspace to add the user to.\n\n        Returns:\n            User: The user that was added to the workspace.\n        \"\"\"\n        self._model = self._api.add_to_workspace(workspace.id, self.id)\n        return self\n\n    def remove_from_workspace(self, workspace: \"Workspace\") -> \"User\":\n        \"\"\"Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to\n        the datasets in the workspace.\n\n        Args:\n            workspace (Workspace): The workspace to remove the user from.\n\n        Returns:\n            User: The user that was removed from the workspace.\n\n        \"\"\"\n        self._model = self._api.delete_from_workspace(workspace.id, self.id)\n        return self\n\n    ############################\n    # Properties\n    ############################\n    @property\n    def username(self) -> str:\n        return self._model.username\n\n    @username.setter\n    def username(self, value: str) -> None:\n        self._model.username = value\n\n    @property\n    def password(self) -> str:\n        return self._model.password\n\n    @password.setter\n    def password(self, value: str) -> None:\n        self._model.password = value\n\n    @property\n    def first_name(self) -> str:\n        return self._model.first_name\n\n    @first_name.setter\n    def first_name(self, value: str) -> None:\n        self._model.first_name = value\n\n    @property\n    def last_name(self) -> str:\n        return self._model.last_name\n\n    @last_name.setter\n    def last_name(self, value: str) -> None:\n        self._model.last_name = value\n\n    @property\n    def role(self) -> Role:\n        return self._model.role\n\n    @role.setter\n    def role(self, value: Role) -> None:\n        self._model.role = value\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.__init__","title":"__init__(username=None, first_name=None, last_name=None, role=None, password=None, client=None, id=None, _model=None)","text":"

    Initializes a User object with a client and a username

    Parameters:

    Name Type Description Default username str

    The username of the user

    None first_name str

    The first name of the user

    None last_name str

    The last name of the user

    None role str

    The role of the user, either 'annotator', admin, or 'owner'

    None password str

    The password of the user

    None client Argilla

    The client used to interact with Argilla

    None

    Returns:

    Name Type Description User None

    The initialized user object

    Source code in src/argilla/users/_resource.py
    def __init__(\n    self,\n    username: Optional[str] = None,\n    first_name: Optional[str] = None,\n    last_name: Optional[str] = None,\n    role: Optional[str] = None,\n    password: Optional[str] = None,\n    client: Optional[\"Argilla\"] = None,\n    id: Optional[UUID] = None,\n    _model: Optional[UserModel] = None,\n) -> None:\n    \"\"\"Initializes a User object with a client and a username\n\n    Parameters:\n        username (str): The username of the user\n        first_name (str): The first name of the user\n        last_name (str): The last name of the user\n        role (str): The role of the user, either 'annotator', admin, or 'owner'\n        password (str): The password of the user\n        client (Argilla): The client used to interact with Argilla\n\n    Returns:\n        User: The initialized user object\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.users)\n\n    if _model is None:\n        _model = UserModel(\n            username=username,\n            password=password,\n            first_name=first_name or username,\n            last_name=last_name,\n            role=role or Role.annotator,\n            id=id,\n        )\n        self._log_message(f\"Initialized user with username {username}\")\n    self._model = _model\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.create","title":"create()","text":"

    Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.

    Returns:

    Name Type Description User User

    The user that was created in Argilla.

    Source code in src/argilla/users/_resource.py
    def create(self) -> \"User\":\n    \"\"\"Creates the user in Argilla. After creating a user, it will be able to log in to the Argilla server.\n\n    Returns:\n        User: The user that was created in Argilla.\n    \"\"\"\n    model_create = self.api_model()\n    model = self._api.create(model_create)\n    # The password is not returned in the response\n    model.password = model_create.password\n    self._model = model\n    return self\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.delete","title":"delete()","text":"

    Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.

    Source code in src/argilla/users/_resource.py
    def delete(self) -> None:\n    \"\"\"Deletes the user from Argilla. After deleting a user, it will no longer be able to log in to the Argilla server.\"\"\"\n    super().delete()\n    # exists relies on the id, so we need to set it to None\n    self._model = UserModel(username=self.username)\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.add_to_workspace","title":"add_to_workspace(workspace)","text":"

    Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets in the workspace.

    Parameters:

    Name Type Description Default workspace Workspace

    The workspace to add the user to.

    required

    Returns:

    Name Type Description User User

    The user that was added to the workspace.

    Source code in src/argilla/users/_resource.py
    def add_to_workspace(self, workspace: \"Workspace\") -> \"User\":\n    \"\"\"Adds the user to a workspace. After adding a user to a workspace, it will have access to the datasets\n    in the workspace.\n\n    Args:\n        workspace (Workspace): The workspace to add the user to.\n\n    Returns:\n        User: The user that was added to the workspace.\n    \"\"\"\n    self._model = self._api.add_to_workspace(workspace.id, self.id)\n    return self\n
    "},{"location":"reference/argilla/users/#src.argilla.users._resource.User.remove_from_workspace","title":"remove_from_workspace(workspace)","text":"

    Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to the datasets in the workspace.

    Parameters:

    Name Type Description Default workspace Workspace

    The workspace to remove the user from.

    required

    Returns:

    Name Type Description User User

    The user that was removed from the workspace.

    Source code in src/argilla/users/_resource.py
    def remove_from_workspace(self, workspace: \"Workspace\") -> \"User\":\n    \"\"\"Removes the user from a workspace. After removing a user from a workspace, it will no longer have access to\n    the datasets in the workspace.\n\n    Args:\n        workspace (Workspace): The workspace to remove the user from.\n\n    Returns:\n        User: The user that was removed from the workspace.\n\n    \"\"\"\n    self._model = self._api.delete_from_workspace(workspace.id, self.id)\n    return self\n
    "},{"location":"reference/argilla/workspaces/","title":"rg.Workspace","text":"

    In Argilla, workspaces are used to organize datasets in to groups. For example, you might have a workspace for each project or team.

    "},{"location":"reference/argilla/workspaces/#usage-examples","title":"Usage Examples","text":"

    To create a new workspace, instantiate the Workspace object with the client and the name:

    workspace = rg.Workspace(name=\"my_workspace\")\nworkspace.create()\n

    To retrieve an existing workspace, use the client.workspaces attribute:

    workspace = client.workspaces(\"my_workspace\")\n
    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace","title":"Workspace","text":"

    Bases: Resource

    Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.

    Attributes:

    Name Type Description name str

    The name of the workspace.

    id UUID

    The ID of the workspace. This is a unique identifier for the workspace in the server.

    datasets List[Dataset]

    A list of all datasets in the workspace.

    users WorkspaceUsers

    A list of all users in the workspace.

    Source code in src/argilla/workspaces/_resource.py
    class Workspace(Resource):\n    \"\"\"Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.\n\n    Attributes:\n        name (str): The name of the workspace.\n        id (UUID): The ID of the workspace. This is a unique identifier for the workspace in the server.\n        datasets (List[Dataset]): A list of all datasets in the workspace.\n        users (WorkspaceUsers): A list of all users in the workspace.\n    \"\"\"\n\n    name: Optional[str]\n\n    _api: \"WorkspacesAPI\"\n\n    def __init__(\n        self,\n        name: Optional[str] = None,\n        id: Optional[UUID] = None,\n        client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Initializes a Workspace object with a client and a name or id\n\n        Parameters:\n            client (Argilla): The client used to interact with Argilla\n            name (str): The name of the workspace\n            id (UUID): The id of the workspace\n\n        Returns:\n            Workspace: The initialized workspace object\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.workspaces)\n\n        self._model = WorkspaceModel(name=name, id=id)\n\n    def add_user(self, user: Union[\"User\", str]) -> \"User\":\n        \"\"\"Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets\n        in the workspace.\n\n        Args:\n            user (Union[User, str]): The user to add to the workspace. Can be a User object or a username.\n\n        Returns:\n            User: The user that was added to the workspace\n        \"\"\"\n        return self.users.add(user)\n\n    def remove_user(self, user: Union[\"User\", str]) -> \"User\":\n        \"\"\"Removes a user from the workspace. After removing a user from the workspace, it will no longer have access\n\n        Args:\n            user (Union[User, str]): The user to remove from the workspace. Can be a User object or a username.\n\n        Returns:\n            User: The user that was removed from the workspace.\n        \"\"\"\n        return self.users.delete(user)\n\n    # TODO: Make this method private\n    def list_datasets(self) -> List[\"Dataset\"]:\n        from argilla.datasets import Dataset\n\n        datasets = self._client.api.datasets.list(self.id)\n        self._log_message(f\"Got {len(datasets)} datasets for workspace {self.id}\")\n        return [Dataset.from_model(model=dataset, client=self._client) for dataset in datasets]\n\n    @classmethod\n    def from_model(cls, model: WorkspaceModel, client: Argilla) -> \"Workspace\":\n        instance = cls(name=model.name, id=model.id, client=client)\n        instance._model = model\n\n        return instance\n\n    ############################\n    # Properties\n    ############################\n\n    @property\n    def name(self) -> Optional[str]:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def datasets(self) -> List[\"Dataset\"]:\n        \"\"\"List all datasets in the workspace\n\n        Returns:\n            List[Dataset]: A list of all datasets in the workspace\n        \"\"\"\n        return self.list_datasets()\n\n    @property\n    def users(self) -> \"WorkspaceUsers\":\n        \"\"\"List all users in the workspace\n\n        Returns:\n            WorkspaceUsers: A list of all users in the workspace\n        \"\"\"\n        return WorkspaceUsers(workspace=self)\n
    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.datasets","title":"datasets: List[Dataset] property","text":"

    List all datasets in the workspace

    Returns:

    Type Description List[Dataset]

    List[Dataset]: A list of all datasets in the workspace

    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.users","title":"users: WorkspaceUsers property","text":"

    List all users in the workspace

    Returns:

    Name Type Description WorkspaceUsers WorkspaceUsers

    A list of all users in the workspace

    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.__init__","title":"__init__(name=None, id=None, client=None)","text":"

    Initializes a Workspace object with a client and a name or id

    Parameters:

    Name Type Description Default client Argilla

    The client used to interact with Argilla

    None name str

    The name of the workspace

    None id UUID

    The id of the workspace

    None

    Returns:

    Name Type Description Workspace None

    The initialized workspace object

    Source code in src/argilla/workspaces/_resource.py
    def __init__(\n    self,\n    name: Optional[str] = None,\n    id: Optional[UUID] = None,\n    client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Initializes a Workspace object with a client and a name or id\n\n    Parameters:\n        client (Argilla): The client used to interact with Argilla\n        name (str): The name of the workspace\n        id (UUID): The id of the workspace\n\n    Returns:\n        Workspace: The initialized workspace object\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.workspaces)\n\n    self._model = WorkspaceModel(name=name, id=id)\n
    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.add_user","title":"add_user(user)","text":"

    Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets in the workspace.

    Parameters:

    Name Type Description Default user Union[User, str]

    The user to add to the workspace. Can be a User object or a username.

    required

    Returns:

    Name Type Description User User

    The user that was added to the workspace

    Source code in src/argilla/workspaces/_resource.py
    def add_user(self, user: Union[\"User\", str]) -> \"User\":\n    \"\"\"Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets\n    in the workspace.\n\n    Args:\n        user (Union[User, str]): The user to add to the workspace. Can be a User object or a username.\n\n    Returns:\n        User: The user that was added to the workspace\n    \"\"\"\n    return self.users.add(user)\n
    "},{"location":"reference/argilla/workspaces/#src.argilla.workspaces._resource.Workspace.remove_user","title":"remove_user(user)","text":"

    Removes a user from the workspace. After removing a user from the workspace, it will no longer have access

    Parameters:

    Name Type Description Default user Union[User, str]

    The user to remove from the workspace. Can be a User object or a username.

    required

    Returns:

    Name Type Description User User

    The user that was removed from the workspace.

    Source code in src/argilla/workspaces/_resource.py
    def remove_user(self, user: Union[\"User\", str]) -> \"User\":\n    \"\"\"Removes a user from the workspace. After removing a user from the workspace, it will no longer have access\n\n    Args:\n        user (Union[User, str]): The user to remove from the workspace. Can be a User object or a username.\n\n    Returns:\n        User: The user that was removed from the workspace.\n    \"\"\"\n    return self.users.delete(user)\n
    "},{"location":"reference/argilla/datasets/dataset_records/","title":"rg.Dataset.records","text":""},{"location":"reference/argilla/datasets/dataset_records/#usage-examples","title":"Usage Examples","text":"

    In most cases, you will not need to create a DatasetRecords object directly. Instead, you can access it via the Dataset object:

    dataset.records\n

    For user familiar with legacy approaches

    1. Dataset.records object is used to interact with the records in a dataset. It interactively fetches records from the server in batches without using a local copy of the records.
    2. The log method of Dataset.records is used to both add and update records in a dataset. If the record includes a known id field, the record will be updated. If the record does not include a known id field, the record will be added.
    "},{"location":"reference/argilla/datasets/dataset_records/#adding-records-to-a-dataset","title":"Adding records to a dataset","text":"

    To add records to a dataset, use the log method. Records can be added as dictionaries or as Record objects. Single records can also be added as a dictionary or Record.

    As a Record objectFrom a data structureFrom a data structure with a mappingFrom a Hugging Face dataset

    You can also add records to a dataset by initializing a Record object directly.

    records = [\n    rg.Record(\n        fields={\n            \"question\": \"Do you need oxygen to breathe?\",\n            \"answer\": \"Yes\"\n        },\n    ),\n    rg.Record(\n        fields={\n            \"question\": \"What is the boiling point of water?\",\n            \"answer\": \"100 degrees Celsius\"\n        },\n    ),\n] # (1)\n\ndataset.records.log(records)\n
    1. This is an illustration of a definition. In a real world scenario, you would iterate over a data structure and create Record objects for each iteration.
    data = [\n    {\n        \"question\": \"Do you need oxygen to breathe?\",\n        \"answer\": \"Yes\",\n    },\n    {\n        \"question\": \"What is the boiling point of water?\",\n        \"answer\": \"100 degrees Celsius\",\n    },\n] # (1)\n\ndataset.records.log(data)\n
    1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
    data = [\n    {\n        \"query\": \"Do you need oxygen to breathe?\",\n        \"response\": \"Yes\",\n    },\n    {\n        \"query\": \"What is the boiling point of water?\",\n        \"response\": \"100 degrees Celsius\",\n    },\n] # (1)\ndataset.records.log(\n    records=data,\n    mapping={\"query\": \"question\", \"response\": \"answer\"} # (2)\n)\n
    1. The data structure's keys must match the fields or questions in the Argilla dataset. In this case, there are fields named question and answer.
    2. The data structure has keys query and response and the Argilla dataset has question and answer. You can use the mapping parameter to map the keys in the data structure to the fields in the Argilla dataset.

    You can also add records to a dataset using a Hugging Face dataset. This is useful when you want to use a dataset from the Hugging Face Hub and add it to your Argilla dataset.

    You can add the dataset where the column names correspond to the names of fields, questions, metadata or vectors in the Argilla dataset.

    If the dataset's schema does not correspond to your Argilla dataset names, you can use a mapping to indicate which columns in the dataset correspond to the Argilla dataset fields.

    from datasets import load_dataset\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (1)\n\ndataset.records.log(records=hf_dataset)\n
    1. In this example, the Hugging Face dataset matches the Argilla dataset schema. If that is not the case, you could use the .map of the datasets library to prepare the data before adding it to the Argilla dataset.

    Here we use the mapping parameter to specify the relationship between the Hugging Face dataset and the Argilla dataset.

    dataset.records.log(records=hf_dataset, mapping={\"txt\": \"text\", \"y\": \"label\"}) # (1)\n
    1. In this case, the txt key in the Hugging Face dataset corresponds to the text field in the Argilla dataset, and the y key in the Hugging Face dataset corresponds to the label field in the Argilla dataset.
    "},{"location":"reference/argilla/datasets/dataset_records/#updating-records-in-a-dataset","title":"Updating records in a dataset","text":"

    Records can also be updated using the log method with records that contain an id to identify the records to be updated. As above, records can be added as dictionaries or as Record objects.

    As a Record objectFrom a data structureFrom a data structure with a mappingFrom a Hugging Face dataset

    You can update records in a dataset by initializing a Record object directly and providing the id field.

    records = [\n    rg.Record(\n        metadata={\"department\": \"toys\"},\n        id=\"2\" # (1)\n    ),\n] # (1)\n\ndataset.records.log(records)\n
    1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.

    You can also update records in a dataset by providing the id field in the data structure.

    data = [\n    {\n        \"metadata\": {\"department\": \"toys\"},\n        \"id\": \"2\" # (1)\n    },\n] # (1)\n\ndataset.records.log(data)\n
    1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record.

    You can also update records in a dataset by providing the id field in the data structure and using a mapping to map the keys in the data structure to the fields in the dataset.

    data = [\n    {\n        \"metadata\": {\"department\": \"toys\"},\n        \"my_id\": \"2\" # (1)\n    },\n]\n\ndataset.records.log(\n    records=data,\n    mapping={\"my_id\": \"id\"} # (2)\n)\n
    1. The id field is required to identify the record to be updated. The id field must be unique for each record in the dataset. If the id field is not provided, the record will be added as a new record. 2. Let's say that your data structure has keys my_id instead of id. You can use the mapping parameter to map the keys in the data structure to the fields in the dataset.

    You can also update records to an Argilla dataset using a Hugging Face dataset. To update records, the Hugging Face dataset must contain an id field to identify the records to be updated, or you can use a mapping to map the keys in the Hugging Face dataset to the fields in the Argilla dataset.

    from datasets import load_dataset\n\nhf_dataset = load_dataset(\"imdb\", split=\"train[:100]\") # (1)\n\ndataset.records.log(records=hf_dataset, mapping={\"uuid\": \"id\"}) # (2)\n
    1. In this example, the Hugging Face dataset matches the Argilla dataset schema.
    2. The uuid key in the Hugging Face dataset corresponds to the id field in the Argilla dataset.
    "},{"location":"reference/argilla/datasets/dataset_records/#iterating-over-records-in-a-dataset","title":"Iterating over records in a dataset","text":"

    Dataset.records can be used to iterate over records in a dataset from the server. The records will be fetched in batches from the server::

    for record in dataset.records:\n    print(record)\n\n# Fetch records with suggestions and responses\nfor record in dataset.records(with_suggestions=True, with_responses=True):\n    print(record.suggestions)\n    print(record.responses)\n\n# Filter records by a query and fetch records with vectors\nfor record in dataset.records(query=\"capital\", with_vectors=True):\n    print(record.vectors)\n

    Check out the rg.Record class reference for more information on the properties and methods available on a record and the rg.Query class reference for more information on the query syntax.

    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords","title":"DatasetRecords","text":"

    Bases: Iterable[Record], LoggingMixin

    This class is used to work with records from a dataset and is accessed via Dataset.records. The responsibility of this class is to provide an interface to interact with records in a dataset, by adding, updating, fetching, querying, deleting, and exporting records.

    Attributes:

    Name Type Description client Argilla

    The Argilla client object.

    dataset Dataset

    The dataset object.

    Source code in src/argilla/records/_dataset_records.py
    class DatasetRecords(Iterable[Record], LoggingMixin):\n    \"\"\"This class is used to work with records from a dataset and is accessed via `Dataset.records`.\n    The responsibility of this class is to provide an interface to interact with records in a dataset,\n    by adding, updating, fetching, querying, deleting, and exporting records.\n\n    Attributes:\n        client (Argilla): The Argilla client object.\n        dataset (Dataset): The dataset object.\n    \"\"\"\n\n    _api: RecordsAPI\n\n    DEFAULT_BATCH_SIZE = 256\n    DEFAULT_DELETE_BATCH_SIZE = 64\n\n    def __init__(self, client: \"Argilla\", dataset: \"Dataset\"):\n        \"\"\"Initializes a DatasetRecords object with a client and a dataset.\n        Args:\n            client: An Argilla client object.\n            dataset: A Dataset object.\n        \"\"\"\n        self.__client = client\n        self.__dataset = dataset\n        self._api = self.__client.api.records\n\n    def __iter__(self):\n        return DatasetRecordsIterator(self.__dataset, self.__client, with_suggestions=True, with_responses=True)\n\n    def __call__(\n        self,\n        query: Optional[Union[str, Query]] = None,\n        batch_size: Optional[int] = DEFAULT_BATCH_SIZE,\n        start_offset: int = 0,\n        with_suggestions: bool = True,\n        with_responses: bool = True,\n        with_vectors: Optional[Union[List, bool, str]] = None,\n    ) -> DatasetRecordsIterator:\n        \"\"\"Returns an iterator over the records in the dataset on the server.\n\n        Parameters:\n            query: A string or a Query object to filter the records.\n            batch_size: The number of records to fetch in each batch. The default is 256.\n            start_offset: The offset from which to start fetching records. The default is 0.\n            with_suggestions: Whether to include suggestions in the records. The default is True.\n            with_responses: Whether to include responses in the records. The default is True.\n            with_vectors: A list of vector names to include in the records. The default is None.\n                If a list is provided, only the specified vectors will be included.\n                If True is provided, all vectors will be included.\n\n        Returns:\n            An iterator over the records in the dataset on the server.\n\n        \"\"\"\n        if query and isinstance(query, str):\n            query = Query(query=query)\n\n        if with_vectors:\n            self._validate_vector_names(vector_names=with_vectors)\n\n        return DatasetRecordsIterator(\n            dataset=self.__dataset,\n            client=self.__client,\n            query=query,\n            batch_size=batch_size,\n            start_offset=start_offset,\n            with_suggestions=with_suggestions,\n            with_responses=with_responses,\n            with_vectors=with_vectors,\n        )\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}({self.__dataset})\"\n\n    ############################\n    # Public methods\n    ############################\n\n    def log(\n        self,\n        records: Union[List[dict], List[Record], HFDataset],\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        user_id: Optional[UUID] = None,\n        batch_size: int = DEFAULT_BATCH_SIZE,\n    ) -> \"DatasetRecords\":\n        \"\"\"Add or update records in a dataset on the server using the provided records.\n        If the record includes a known `id` field, the record will be updated.\n        If the record does not include a known `id` field, the record will be added as a new record.\n        See `rg.Record` for more information on the record definition.\n\n        Parameters:\n            records: A list of `Record` objects, a Hugging Face Dataset, or a list of dictionaries representing the records.\n                     If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the\n                     fields in the Argilla dataset's fields and questions. `id` should be provided to identify the records when updating.\n            mapping: A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset.\n                     To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.\n            user_id: The user id to be associated with the records' response. If not provided, the current user id is used.\n            batch_size: The number of records to send in each batch. The default is 256.\n\n        Returns:\n            A list of Record objects representing the updated records.\n        \"\"\"\n        record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id or self.__client.me.id)\n        batch_size = self._normalize_batch_size(\n            batch_size=batch_size,\n            records_length=len(record_models),\n            max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,\n        )\n\n        created_or_updated = []\n        records_updated = 0\n\n        for batch in tqdm(\n            iterable=range(0, len(records), batch_size),\n            desc=\"Sending records...\",\n            total=len(records) // batch_size,\n            unit=\"batch\",\n        ):\n            self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n            batch_records = record_models[batch : batch + batch_size]\n            models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)\n            created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])\n            records_updated += updated\n\n        records_created = len(created_or_updated) - records_updated\n        self._log_message(\n            message=f\"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}\",\n            level=\"info\",\n        )\n\n        return self\n\n    def delete(\n        self,\n        records: List[Record],\n        batch_size: int = DEFAULT_DELETE_BATCH_SIZE,\n    ) -> List[Record]:\n        \"\"\"Delete records in a dataset on the server using the provided records\n            and matching based on the id.\n\n        Parameters:\n            records: A list of `Record` objects representing the records to be deleted.\n            batch_size: The number of records to send in each batch. The default is 64.\n\n        Returns:\n            A list of Record objects representing the deleted records.\n\n        \"\"\"\n        mapping = None\n        user_id = self.__client.me.id\n        record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id)\n        batch_size = self._normalize_batch_size(\n            batch_size=batch_size,\n            records_length=len(record_models),\n            max_value=self._api.MAX_RECORDS_PER_DELETE_BULK,\n        )\n\n        records_deleted = 0\n        for batch in tqdm(\n            iterable=range(0, len(records), batch_size),\n            desc=\"Sending records...\",\n            total=len(records) // batch_size,\n            unit=\"batch\",\n        ):\n            self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n            batch_records = record_models[batch : batch + batch_size]\n            self._api.delete_many(dataset_id=self.__dataset.id, records=batch_records)\n            records_deleted += len(batch_records)\n\n        self._log_message(\n            message=f\"Deleted {len(record_models)} records from dataset {self.__dataset.name}\",\n            level=\"info\",\n        )\n\n        return records\n\n    def to_dict(self, flatten: bool = False, orient: str = \"names\") -> Dict[str, Any]:\n        \"\"\"\n        Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().\n\n        Parameters:\n            flatten (bool): The structure of the exported dictionary.\n                - True: The record fields, metadata, suggestions and responses will be flattened.\n                - False: The record fields, metadata, suggestions and responses will be nested.\n            orient (str): The orientation of the exported dictionary.\n                - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.\n                - \"index\": The keys of the dictionary will be the id of the records.\n        Returns:\n            A dictionary of records.\n\n        \"\"\"\n        return self().to_dict(flatten=flatten, orient=orient)\n\n    def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:\n        \"\"\"\n        Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().\n\n        Parameters:\n            flatten (bool): The structure of the exported dictionaries in the list.\n                - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, `label.suggestion` and `label.response`. Records responses are spread across multiple columns for values and users.\n                - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.\n        Returns:\n            A list of dictionaries of records.\n        \"\"\"\n        data = self().to_list(flatten=flatten)\n        return data\n\n    def to_json(self, path: Union[Path, str]) -> Path:\n        \"\"\"\n        Export the records to a file on disk.\n\n        Parameters:\n            path (str): The path to the file to save the records.\n\n        Returns:\n            The path to the file where the records were saved.\n\n        \"\"\"\n        return self().to_json(path=path)\n\n    def from_json(self, path: Union[Path, str]) -> List[Record]:\n        \"\"\"Creates a DatasetRecords object from a disk path to a JSON file.\n            The JSON file should be defined by `DatasetRecords.to_json`.\n\n        Args:\n            path (str): The path to the file containing the records.\n\n        Returns:\n            DatasetRecords: The DatasetRecords object created from the disk path.\n\n        \"\"\"\n        records = JsonIO._records_from_json(path=path)\n        return self.log(records=records)\n\n    def to_datasets(self) -> HFDataset:\n        \"\"\"\n        Export the records to a HFDataset.\n\n        Returns:\n            The dataset containing the records.\n\n        \"\"\"\n\n        return self().to_datasets()\n\n    ############################\n    # Private methods\n    ############################\n\n    def _ingest_records(\n        self,\n        records: Union[List[Dict[str, Any]], List[Record], HFDataset],\n        mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n        user_id: Optional[UUID] = None,\n    ) -> List[RecordModel]:\n        \"\"\"Ingests records from a list of dictionaries, a Hugging Face Dataset, or a list of Record objects.\"\"\"\n\n        if len(records) == 0:\n            raise ValueError(\"No records provided to ingest.\")\n\n        if HFDatasetsIO._is_hf_dataset(dataset=records):\n            records = HFDatasetsIO._record_dicts_from_datasets(dataset=records)\n\n        ingested_records = []\n        record_mapper = IngestedRecordMapper(mapping=mapping, dataset=self.__dataset, user_id=user_id)\n        for record in records:\n            try:\n                if isinstance(record, dict):\n                    record = record_mapper(data=record)\n                elif isinstance(record, Record):\n                    record.dataset = self.__dataset\n                else:\n                    raise ValueError(\n                        \"Records should be a a list Record instances, \"\n                        \"a Hugging Face Dataset, or a list of dictionaries representing the records.\"\n                        f\"Found a record of type {type(record)}: {record}.\"\n                    )\n            except Exception as e:\n                raise RecordsIngestionError(f\"Failed to ingest record from dict {record}: {e}\")\n            ingested_records.append(record.api_model())\n        return ingested_records\n\n    def _normalize_batch_size(self, batch_size: int, records_length, max_value: int):\n        norm_batch_size = min(batch_size, records_length, max_value)\n\n        if batch_size != norm_batch_size:\n            self._log_message(\n                message=f\"The provided batch size {batch_size} was normalized. Using value {norm_batch_size}.\",\n                level=\"warning\",\n            )\n\n        return norm_batch_size\n\n    def _validate_vector_names(self, vector_names: Union[List[str], str]) -> None:\n        if not isinstance(vector_names, list):\n            vector_names = [vector_names]\n        for vector_name in vector_names:\n            if isinstance(vector_name, bool):\n                continue\n            if vector_name not in self.__dataset.schema:\n                raise ValueError(f\"Vector field {vector_name} not found in dataset schema.\")\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.__init__","title":"__init__(client, dataset)","text":"

    Initializes a DatasetRecords object with a client and a dataset. Args: client: An Argilla client object. dataset: A Dataset object.

    Source code in src/argilla/records/_dataset_records.py
    def __init__(self, client: \"Argilla\", dataset: \"Dataset\"):\n    \"\"\"Initializes a DatasetRecords object with a client and a dataset.\n    Args:\n        client: An Argilla client object.\n        dataset: A Dataset object.\n    \"\"\"\n    self.__client = client\n    self.__dataset = dataset\n    self._api = self.__client.api.records\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.__call__","title":"__call__(query=None, batch_size=DEFAULT_BATCH_SIZE, start_offset=0, with_suggestions=True, with_responses=True, with_vectors=None)","text":"

    Returns an iterator over the records in the dataset on the server.

    Parameters:

    Name Type Description Default query Optional[Union[str, Query]]

    A string or a Query object to filter the records.

    None batch_size Optional[int]

    The number of records to fetch in each batch. The default is 256.

    DEFAULT_BATCH_SIZE start_offset int

    The offset from which to start fetching records. The default is 0.

    0 with_suggestions bool

    Whether to include suggestions in the records. The default is True.

    True with_responses bool

    Whether to include responses in the records. The default is True.

    True with_vectors Optional[Union[List, bool, str]]

    A list of vector names to include in the records. The default is None. If a list is provided, only the specified vectors will be included. If True is provided, all vectors will be included.

    None

    Returns:

    Type Description DatasetRecordsIterator

    An iterator over the records in the dataset on the server.

    Source code in src/argilla/records/_dataset_records.py
    def __call__(\n    self,\n    query: Optional[Union[str, Query]] = None,\n    batch_size: Optional[int] = DEFAULT_BATCH_SIZE,\n    start_offset: int = 0,\n    with_suggestions: bool = True,\n    with_responses: bool = True,\n    with_vectors: Optional[Union[List, bool, str]] = None,\n) -> DatasetRecordsIterator:\n    \"\"\"Returns an iterator over the records in the dataset on the server.\n\n    Parameters:\n        query: A string or a Query object to filter the records.\n        batch_size: The number of records to fetch in each batch. The default is 256.\n        start_offset: The offset from which to start fetching records. The default is 0.\n        with_suggestions: Whether to include suggestions in the records. The default is True.\n        with_responses: Whether to include responses in the records. The default is True.\n        with_vectors: A list of vector names to include in the records. The default is None.\n            If a list is provided, only the specified vectors will be included.\n            If True is provided, all vectors will be included.\n\n    Returns:\n        An iterator over the records in the dataset on the server.\n\n    \"\"\"\n    if query and isinstance(query, str):\n        query = Query(query=query)\n\n    if with_vectors:\n        self._validate_vector_names(vector_names=with_vectors)\n\n    return DatasetRecordsIterator(\n        dataset=self.__dataset,\n        client=self.__client,\n        query=query,\n        batch_size=batch_size,\n        start_offset=start_offset,\n        with_suggestions=with_suggestions,\n        with_responses=with_responses,\n        with_vectors=with_vectors,\n    )\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.log","title":"log(records, mapping=None, user_id=None, batch_size=DEFAULT_BATCH_SIZE)","text":"

    Add or update records in a dataset on the server using the provided records. If the record includes a known id field, the record will be updated. If the record does not include a known id field, the record will be added as a new record. See rg.Record for more information on the record definition.

    Parameters:

    Name Type Description Default records Union[List[dict], List[Record], HFDataset]

    A list of Record objects, a Hugging Face Dataset, or a list of dictionaries representing the records. If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the fields in the Argilla dataset's fields and questions. id should be provided to identify the records when updating.

    required mapping Optional[Dict[str, Union[str, Sequence[str]]]]

    A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset. To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.

    None user_id Optional[UUID]

    The user id to be associated with the records' response. If not provided, the current user id is used.

    None batch_size int

    The number of records to send in each batch. The default is 256.

    DEFAULT_BATCH_SIZE

    Returns:

    Type Description DatasetRecords

    A list of Record objects representing the updated records.

    Source code in src/argilla/records/_dataset_records.py
    def log(\n    self,\n    records: Union[List[dict], List[Record], HFDataset],\n    mapping: Optional[Dict[str, Union[str, Sequence[str]]]] = None,\n    user_id: Optional[UUID] = None,\n    batch_size: int = DEFAULT_BATCH_SIZE,\n) -> \"DatasetRecords\":\n    \"\"\"Add or update records in a dataset on the server using the provided records.\n    If the record includes a known `id` field, the record will be updated.\n    If the record does not include a known `id` field, the record will be added as a new record.\n    See `rg.Record` for more information on the record definition.\n\n    Parameters:\n        records: A list of `Record` objects, a Hugging Face Dataset, or a list of dictionaries representing the records.\n                 If records are defined as a dictionaries or a dataset, the keys/ column names should correspond to the\n                 fields in the Argilla dataset's fields and questions. `id` should be provided to identify the records when updating.\n        mapping: A dictionary that maps the keys/ column names in the records to the fields or questions in the Argilla dataset.\n                 To assign an incoming key or column to multiple fields or questions, provide a list or tuple of field or question names.\n        user_id: The user id to be associated with the records' response. If not provided, the current user id is used.\n        batch_size: The number of records to send in each batch. The default is 256.\n\n    Returns:\n        A list of Record objects representing the updated records.\n    \"\"\"\n    record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id or self.__client.me.id)\n    batch_size = self._normalize_batch_size(\n        batch_size=batch_size,\n        records_length=len(record_models),\n        max_value=self._api.MAX_RECORDS_PER_UPSERT_BULK,\n    )\n\n    created_or_updated = []\n    records_updated = 0\n\n    for batch in tqdm(\n        iterable=range(0, len(records), batch_size),\n        desc=\"Sending records...\",\n        total=len(records) // batch_size,\n        unit=\"batch\",\n    ):\n        self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n        batch_records = record_models[batch : batch + batch_size]\n        models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)\n        created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])\n        records_updated += updated\n\n    records_created = len(created_or_updated) - records_updated\n    self._log_message(\n        message=f\"Updated {records_updated} records and added {records_created} records to dataset {self.__dataset.name}\",\n        level=\"info\",\n    )\n\n    return self\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.delete","title":"delete(records, batch_size=DEFAULT_DELETE_BATCH_SIZE)","text":"

    Delete records in a dataset on the server using the provided records and matching based on the id.

    Parameters:

    Name Type Description Default records List[Record]

    A list of Record objects representing the records to be deleted.

    required batch_size int

    The number of records to send in each batch. The default is 64.

    DEFAULT_DELETE_BATCH_SIZE

    Returns:

    Type Description List[Record]

    A list of Record objects representing the deleted records.

    Source code in src/argilla/records/_dataset_records.py
    def delete(\n    self,\n    records: List[Record],\n    batch_size: int = DEFAULT_DELETE_BATCH_SIZE,\n) -> List[Record]:\n    \"\"\"Delete records in a dataset on the server using the provided records\n        and matching based on the id.\n\n    Parameters:\n        records: A list of `Record` objects representing the records to be deleted.\n        batch_size: The number of records to send in each batch. The default is 64.\n\n    Returns:\n        A list of Record objects representing the deleted records.\n\n    \"\"\"\n    mapping = None\n    user_id = self.__client.me.id\n    record_models = self._ingest_records(records=records, mapping=mapping, user_id=user_id)\n    batch_size = self._normalize_batch_size(\n        batch_size=batch_size,\n        records_length=len(record_models),\n        max_value=self._api.MAX_RECORDS_PER_DELETE_BULK,\n    )\n\n    records_deleted = 0\n    for batch in tqdm(\n        iterable=range(0, len(records), batch_size),\n        desc=\"Sending records...\",\n        total=len(records) // batch_size,\n        unit=\"batch\",\n    ):\n        self._log_message(message=f\"Sending records from {batch} to {batch + batch_size}.\")\n        batch_records = record_models[batch : batch + batch_size]\n        self._api.delete_many(dataset_id=self.__dataset.id, records=batch_records)\n        records_deleted += len(batch_records)\n\n    self._log_message(\n        message=f\"Deleted {len(record_models)} records from dataset {self.__dataset.name}\",\n        level=\"info\",\n    )\n\n    return records\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_dict","title":"to_dict(flatten=False, orient='names')","text":"

    Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().

    Parameters:

    Name Type Description Default flatten bool

    The structure of the exported dictionary. - True: The record fields, metadata, suggestions and responses will be flattened. - False: The record fields, metadata, suggestions and responses will be nested.

    False orient str

    The orientation of the exported dictionary. - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses. - \"index\": The keys of the dictionary will be the id of the records.

    'names'

    Returns: A dictionary of records.

    Source code in src/argilla/records/_dataset_records.py
    def to_dict(self, flatten: bool = False, orient: str = \"names\") -> Dict[str, Any]:\n    \"\"\"\n    Return the records as a dictionary. This is a convenient shortcut for dataset.records(...).to_dict().\n\n    Parameters:\n        flatten (bool): The structure of the exported dictionary.\n            - True: The record fields, metadata, suggestions and responses will be flattened.\n            - False: The record fields, metadata, suggestions and responses will be nested.\n        orient (str): The orientation of the exported dictionary.\n            - \"names\": The keys of the dictionary will be the names of the fields, metadata, suggestions and responses.\n            - \"index\": The keys of the dictionary will be the id of the records.\n    Returns:\n        A dictionary of records.\n\n    \"\"\"\n    return self().to_dict(flatten=flatten, orient=orient)\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_list","title":"to_list(flatten=False)","text":"

    Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().

    Parameters:

    Name Type Description Default flatten bool

    The structure of the exported dictionaries in the list. - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, label.suggestion and label.response. Records responses are spread across multiple columns for values and users. - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.

    False

    Returns: A list of dictionaries of records.

    Source code in src/argilla/records/_dataset_records.py
    def to_list(self, flatten: bool = False) -> List[Dict[str, Any]]:\n    \"\"\"\n    Return the records as a list of dictionaries. This is a convenient shortcut for dataset.records(...).to_list().\n\n    Parameters:\n        flatten (bool): The structure of the exported dictionaries in the list.\n            - True: The record keys are flattened and a dot notation is used to record attributes and their attributes . For example, `label.suggestion` and `label.response`. Records responses are spread across multiple columns for values and users.\n            - False: The record fields, metadata, suggestions and responses will be nested dictionary with keys for record attributes.\n    Returns:\n        A list of dictionaries of records.\n    \"\"\"\n    data = self().to_list(flatten=flatten)\n    return data\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_json","title":"to_json(path)","text":"

    Export the records to a file on disk.

    Parameters:

    Name Type Description Default path str

    The path to the file to save the records.

    required

    Returns:

    Type Description Path

    The path to the file where the records were saved.

    Source code in src/argilla/records/_dataset_records.py
    def to_json(self, path: Union[Path, str]) -> Path:\n    \"\"\"\n    Export the records to a file on disk.\n\n    Parameters:\n        path (str): The path to the file to save the records.\n\n    Returns:\n        The path to the file where the records were saved.\n\n    \"\"\"\n    return self().to_json(path=path)\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.from_json","title":"from_json(path)","text":"

    Creates a DatasetRecords object from a disk path to a JSON file. The JSON file should be defined by DatasetRecords.to_json.

    Parameters:

    Name Type Description Default path str

    The path to the file containing the records.

    required

    Returns:

    Name Type Description DatasetRecords List[Record]

    The DatasetRecords object created from the disk path.

    Source code in src/argilla/records/_dataset_records.py
    def from_json(self, path: Union[Path, str]) -> List[Record]:\n    \"\"\"Creates a DatasetRecords object from a disk path to a JSON file.\n        The JSON file should be defined by `DatasetRecords.to_json`.\n\n    Args:\n        path (str): The path to the file containing the records.\n\n    Returns:\n        DatasetRecords: The DatasetRecords object created from the disk path.\n\n    \"\"\"\n    records = JsonIO._records_from_json(path=path)\n    return self.log(records=records)\n
    "},{"location":"reference/argilla/datasets/dataset_records/#src.argilla.records._dataset_records.DatasetRecords.to_datasets","title":"to_datasets()","text":"

    Export the records to a HFDataset.

    Returns:

    Type Description HFDataset

    The dataset containing the records.

    Source code in src/argilla/records/_dataset_records.py
    def to_datasets(self) -> HFDataset:\n    \"\"\"\n    Export the records to a HFDataset.\n\n    Returns:\n        The dataset containing the records.\n\n    \"\"\"\n\n    return self().to_datasets()\n
    "},{"location":"reference/argilla/datasets/datasets/","title":"rg.Dataset","text":"

    Dataset is a class that represents a collection of records. It is used to store and manage records in Argilla.

    "},{"location":"reference/argilla/datasets/datasets/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/datasets/datasets/#creating-a-dataset","title":"Creating a Dataset","text":"

    To create a new dataset you need to define its name and settings. Optional parameters are workspace and client, if you want to create the dataset in a specific workspace or on a specific Argilla instance.

    dataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=rg.Settings(\n        fields=[\n            rg.TextField(name=\"text\"),\n        ],\n        questions=[\n            rg.TextQuestion(name=\"response\"),\n        ],\n    ),\n)\ndataset.create()\n

    For a detail guide of the dataset creation and publication process, see the Dataset how to guide.

    "},{"location":"reference/argilla/datasets/datasets/#retrieving-an-existing-dataset","title":"Retrieving an existing Dataset","text":"

    To retrieve an existing dataset, use client.datasets(\"my_dataset\") instead.

    dataset = client.datasets(\"my_dataset\")\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset","title":"Dataset","text":"

    Bases: Resource, HubImportExportMixin, DiskImportExportMixin

    Class for interacting with Argilla Datasets

    Attributes:

    Name Type Description name str

    Name of the dataset.

    records DatasetRecords

    The records object for the dataset. Used to interact with the records of the dataset by iterating, searching, etc.

    settings Settings

    The settings object of the dataset. Used to configure the dataset with fields, questions, guidelines, etc.

    fields list

    The fields of the dataset, for example the rg.TextField of the dataset. Defined in the settings.

    questions list

    The questions of the dataset defined in the settings. For example, the rg.TextQuestion that you want labelers to answer.

    guidelines str

    The guidelines of the dataset defined in the settings. Used to provide instructions to labelers.

    allow_extra_metadata bool

    True if extra metadata is allowed, False otherwise.

    Source code in src/argilla/datasets/_resource.py
    class Dataset(Resource, HubImportExportMixin, DiskImportExportMixin):\n    \"\"\"Class for interacting with Argilla Datasets\n\n    Attributes:\n        name: Name of the dataset.\n        records (DatasetRecords): The records object for the dataset. Used to interact with the records of the dataset by iterating, searching, etc.\n        settings (Settings): The settings object of the dataset. Used to configure the dataset with fields, questions, guidelines, etc.\n        fields (list): The fields of the dataset, for example the `rg.TextField` of the dataset. Defined in the settings.\n        questions (list): The questions of the dataset defined in the settings. For example, the `rg.TextQuestion` that you want labelers to answer.\n        guidelines (str): The guidelines of the dataset defined in the settings. Used to provide instructions to labelers.\n        allow_extra_metadata (bool): True if extra metadata is allowed, False otherwise.\n    \"\"\"\n\n    name: str\n    id: Optional[UUID]\n\n    _api: \"DatasetsAPI\"\n    _model: \"DatasetModel\"\n\n    def __init__(\n        self,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str, UUID]] = None,\n        settings: Optional[Settings] = None,\n        client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Initializes a new Argilla Dataset object with the given parameters.\n\n        Parameters:\n            name (str): Name of the dataset. Replaced by random UUID if not assigned.\n            workspace (UUID): Workspace of the dataset. Default is the first workspace found in the server.\n            settings (Settings): Settings class to be used to configure the dataset.\n            client (Argilla): Instance of Argilla to connect with the server. Default is the default client.\n        \"\"\"\n        client = client or Argilla._get_default()\n        super().__init__(client=client, api=client.api.datasets)\n        if name is None:\n            name = f\"dataset_{uuid4()}\"\n            self._log_message(f\"Settings dataset name to unique UUID: {name}\")\n\n        self._workspace = workspace\n        self._model = DatasetModel(name=name)\n        self._settings = settings._copy() if settings else Settings(_dataset=self)\n        self._settings.dataset = self\n        self.__records = DatasetRecords(client=self._client, dataset=self)\n\n    #####################\n    #  Properties       #\n    #####################\n\n    @property\n    def name(self) -> str:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def records(self) -> \"DatasetRecords\":\n        return self.__records\n\n    @property\n    def settings(self) -> Settings:\n        return self._settings\n\n    @settings.setter\n    def settings(self, value: Settings) -> None:\n        settings_copy = value._copy()\n        settings_copy.dataset = self\n        self._settings = settings_copy\n\n    @property\n    def fields(self) -> list:\n        return self.settings.fields\n\n    @property\n    def questions(self) -> list:\n        return self.settings.questions\n\n    @property\n    def guidelines(self) -> str:\n        return self.settings.guidelines\n\n    @guidelines.setter\n    def guidelines(self, value: str) -> None:\n        self.settings.guidelines = value\n\n    @property\n    def allow_extra_metadata(self) -> bool:\n        return self.settings.allow_extra_metadata\n\n    @allow_extra_metadata.setter\n    def allow_extra_metadata(self, value: bool) -> None:\n        self.settings.allow_extra_metadata = value\n\n    @property\n    def schema(self) -> dict:\n        return self.settings.schema\n\n    @property\n    def workspace(self) -> Workspace:\n        self._workspace = self._resolve_workspace()\n        return self._workspace\n\n    @property\n    def distribution(self) -> TaskDistribution:\n        return self.settings.distribution\n\n    @distribution.setter\n    def distribution(self, value: TaskDistribution) -> None:\n        self.settings.distribution = value\n\n    #####################\n    #  Core methods     #\n    #####################\n\n    def get(self) -> \"Dataset\":\n        super().get()\n        self.settings.get()\n        return self\n\n    def create(self) -> \"Dataset\":\n        \"\"\"Creates the dataset on the server with the `Settings` configuration.\n\n        Returns:\n            Dataset: The created dataset object.\n        \"\"\"\n        super().create()\n        try:\n            return self._publish()\n        except Exception as e:\n            self._log_message(message=f\"Error creating dataset: {e}\", level=\"error\")\n            self._rollback_dataset_creation()\n            raise SettingsError from e\n\n    def update(self) -> \"Dataset\":\n        \"\"\"Updates the dataset on the server with the current settings.\n\n        Returns:\n            Dataset: The updated dataset object.\n        \"\"\"\n        self.settings.update()\n        return self\n\n    @classmethod\n    def from_model(cls, model: DatasetModel, client: \"Argilla\") -> \"Dataset\":\n        instance = cls(client=client, workspace=model.workspace_id, name=model.name)\n        instance._model = model\n\n        return instance\n\n    #####################\n    #  Utility methods  #\n    #####################\n\n    def api_model(self) -> DatasetModel:\n        self._model.workspace_id = self.workspace.id\n        return self._model\n\n    def _publish(self) -> \"Dataset\":\n        self._settings.create()\n        self._api.publish(dataset_id=self._model.id)\n\n        return self.get()\n\n    def _resolve_workspace(self) -> Workspace:\n        workspace = self._workspace\n\n        if workspace is None:\n            workspace = self._client.workspaces.default\n            warnings.warn(f\"Workspace not provided. Using default workspace: {workspace.name} id: {workspace.id}\")\n        elif isinstance(workspace, str):\n            workspace = self._client.workspaces(workspace)\n            if workspace is None:\n                available_workspace_names = [ws.name for ws in self._client.workspaces]\n                raise NotFoundError(\n                    f\"Workspace with name {workspace} not found. Available workspaces: {available_workspace_names}\"\n                )\n        elif isinstance(workspace, UUID):\n            ws_model = self._client.api.workspaces.get(workspace)\n            workspace = Workspace.from_model(ws_model, client=self._client)\n        elif not isinstance(workspace, Workspace):\n            raise ValueError(f\"Wrong workspace value found {workspace}\")\n\n        return workspace\n\n    def _rollback_dataset_creation(self):\n        if not self._is_published():\n            self.delete()\n\n    def _is_published(self) -> bool:\n        return self._model.status == \"ready\"\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.__init__","title":"__init__(name=None, workspace=None, settings=None, client=None)","text":"

    Initializes a new Argilla Dataset object with the given parameters.

    Parameters:

    Name Type Description Default name str

    Name of the dataset. Replaced by random UUID if not assigned.

    None workspace UUID

    Workspace of the dataset. Default is the first workspace found in the server.

    None settings Settings

    Settings class to be used to configure the dataset.

    None client Argilla

    Instance of Argilla to connect with the server. Default is the default client.

    None Source code in src/argilla/datasets/_resource.py
    def __init__(\n    self,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str, UUID]] = None,\n    settings: Optional[Settings] = None,\n    client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Initializes a new Argilla Dataset object with the given parameters.\n\n    Parameters:\n        name (str): Name of the dataset. Replaced by random UUID if not assigned.\n        workspace (UUID): Workspace of the dataset. Default is the first workspace found in the server.\n        settings (Settings): Settings class to be used to configure the dataset.\n        client (Argilla): Instance of Argilla to connect with the server. Default is the default client.\n    \"\"\"\n    client = client or Argilla._get_default()\n    super().__init__(client=client, api=client.api.datasets)\n    if name is None:\n        name = f\"dataset_{uuid4()}\"\n        self._log_message(f\"Settings dataset name to unique UUID: {name}\")\n\n    self._workspace = workspace\n    self._model = DatasetModel(name=name)\n    self._settings = settings._copy() if settings else Settings(_dataset=self)\n    self._settings.dataset = self\n    self.__records = DatasetRecords(client=self._client, dataset=self)\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.create","title":"create()","text":"

    Creates the dataset on the server with the Settings configuration.

    Returns:

    Name Type Description Dataset Dataset

    The created dataset object.

    Source code in src/argilla/datasets/_resource.py
    def create(self) -> \"Dataset\":\n    \"\"\"Creates the dataset on the server with the `Settings` configuration.\n\n    Returns:\n        Dataset: The created dataset object.\n    \"\"\"\n    super().create()\n    try:\n        return self._publish()\n    except Exception as e:\n        self._log_message(message=f\"Error creating dataset: {e}\", level=\"error\")\n        self._rollback_dataset_creation()\n        raise SettingsError from e\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._resource.Dataset.update","title":"update()","text":"

    Updates the dataset on the server with the current settings.

    Returns:

    Name Type Description Dataset Dataset

    The updated dataset object.

    Source code in src/argilla/datasets/_resource.py
    def update(self) -> \"Dataset\":\n    \"\"\"Updates the dataset on the server with the current settings.\n\n    Returns:\n        Dataset: The updated dataset object.\n    \"\"\"\n    self.settings.update()\n    return self\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._disk.DiskImportExportMixin","title":"DiskImportExportMixin","text":"

    Bases: ABC

    A mixin for exporting and importing datasets to and from disk.

    Source code in src/argilla/datasets/_export/_disk.py
    class DiskImportExportMixin(ABC):\n    \"\"\"A mixin for exporting and importing datasets to and from disk.\"\"\"\n\n    _model: DatasetModel\n    _DEFAULT_RECORDS_PATH = \"records.json\"\n    _DEFAULT_CONFIG_REPO_DIR = \".argilla\"\n    _DEFAULT_SETTINGS_PATH = f\"{_DEFAULT_CONFIG_REPO_DIR}/settings.json\"\n    _DEFAULT_DATASET_PATH = f\"{_DEFAULT_CONFIG_REPO_DIR}/dataset.json\"\n    _DEFAULT_CONFIGURATION_FILES = [_DEFAULT_SETTINGS_PATH, _DEFAULT_DATASET_PATH]\n\n    def to_disk(self: \"Dataset\", path: str, *, with_records: bool = True) -> str:\n        \"\"\"Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.\n\n        Parameters:\n            path (str): The path to export the dataset to. Must be an empty directory.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        \"\"\"\n        dataset_path, settings_path, records_path = self._define_child_paths(path=path)\n        logging.info(f\"Loading dataset from {dataset_path}\")\n        logging.info(f\"Loading settings from {settings_path}\")\n        logging.info(f\"Loading records from {records_path}\")\n        # Export the dataset model, settings and records\n        self._persist_dataset_model(path=dataset_path)\n        self.settings.to_json(path=settings_path)\n        if with_records:\n            self.records.to_json(path=records_path)\n\n        return path\n\n    @classmethod\n    def from_disk(\n        cls: Type[\"Dataset\"],\n        path: str,\n        *,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str]] = None,\n        client: Optional[\"Argilla\"] = None,\n        with_records: bool = True,\n    ) -> \"Dataset\":\n        \"\"\"Imports a dataset from disk as a directory containing the dataset model, settings and records.\n        The directory should be defined using the `to_disk` method.\n\n        Parameters:\n            path (str): The path to the directory containing the dataset model, settings and records.\n            name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n            workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n            client (Argilla, optional): The client to use for the import. Defaults to None and the default client is used.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        \"\"\"\n\n        client = client or Argilla._get_default()\n\n        dataset_path, settings_path, records_path = cls._define_child_paths(path=path)\n        logging.info(f\"Loading dataset from {dataset_path}\")\n        logging.info(f\"Loading settings from {settings_path}\")\n        logging.info(f\"Loading records from {records_path}\")\n        dataset_model = cls._load_dataset_model(path=dataset_path)\n\n        # Get the relevant workspace_id of the incoming dataset\n        if isinstance(workspace, str):\n            workspace = client.workspaces(workspace)\n            if not workspace:\n                raise ArgillaError(f\"Workspace {workspace} not found on the server.\")\n        else:\n            warnings.warn(\"Workspace not provided. Using default workspace.\")\n            workspace = client.workspaces.default\n        dataset_model.workspace_id = workspace.id\n\n        if name and (name != dataset_model.name):\n            logging.info(f\"Changing dataset name from {dataset_model.name} to {name}\")\n            dataset_model.name = name\n\n        if client.api.datasets.name_exists(name=dataset_model.name, workspace_id=workspace.id):\n            warnings.warn(\n                f\"Loaded dataset name {dataset_model.name} already exists in the workspace {workspace.name} so using it. To create a new dataset, provide a unique name to the `name` parameter.\"\n            )\n            dataset_model = client.api.datasets.get_by_name_and_workspace_id(\n                name=dataset_model.name, workspace_id=workspace.id\n            )\n            dataset = cls.from_model(model=dataset_model, client=client)\n        else:\n            # Create a new dataset and load the settings and records\n            dataset = cls.from_model(model=dataset_model, client=client)\n            dataset.settings = Settings.from_json(path=settings_path)\n            dataset.create()\n\n        if os.path.exists(records_path) and with_records:\n            try:\n                dataset.records.from_json(path=records_path)\n            except RecordsIngestionError as e:\n                raise RecordsIngestionError(\n                    message=\"Error importing dataset records from disk. Records and datasets settings are not compatible.\"\n                ) from e\n        return dataset\n\n    ############################\n    # Utility methods\n    ############################\n\n    def _persist_dataset_model(self, path: Path):\n        \"\"\"Persists the dataset model to disk.\"\"\"\n        if path.exists():\n            raise FileExistsError(f\"Dataset already exists at {path}\")\n        with open(file=path, mode=\"w\") as f:\n            json.dump(self.api_model().model_dump(), f)\n\n    @classmethod\n    def _load_dataset_model(cls, path: Path):\n        \"\"\"Loads the dataset model from disk.\"\"\"\n        if not os.path.exists(path):\n            raise FileNotFoundError(f\"Dataset model not found at {path}\")\n        with open(file=path, mode=\"r\") as f:\n            dataset_model = json.load(f)\n            dataset_model = DatasetModel(**dataset_model)\n        return dataset_model\n\n    @classmethod\n    def _define_child_paths(cls, path: Union[Path, str]) -> Tuple[Path, Path, Path]:\n        path = Path(path)\n        if not path.is_dir():\n            raise NotADirectoryError(f\"Path {path} is not a directory\")\n        main_path = path / cls._DEFAULT_CONFIG_REPO_DIR\n        main_path.mkdir(exist_ok=True)\n        dataset_path = path / cls._DEFAULT_DATASET_PATH\n        settings_path = path / cls._DEFAULT_SETTINGS_PATH\n        records_path = path / cls._DEFAULT_RECORDS_PATH\n        return dataset_path, settings_path, records_path\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._disk.DiskImportExportMixin.to_disk","title":"to_disk(path, *, with_records=True)","text":"

    Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.

    Parameters:

    Name Type Description Default path str

    The path to export the dataset to. Must be an empty directory.

    required with_records bool

    whether to load the records from the Hugging Face dataset. Defaults to True.

    True Source code in src/argilla/datasets/_export/_disk.py
    def to_disk(self: \"Dataset\", path: str, *, with_records: bool = True) -> str:\n    \"\"\"Exports the dataset to disk in the given path. The dataset is exported as a directory containing the dataset model, settings and records as json files.\n\n    Parameters:\n        path (str): The path to export the dataset to. Must be an empty directory.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n    \"\"\"\n    dataset_path, settings_path, records_path = self._define_child_paths(path=path)\n    logging.info(f\"Loading dataset from {dataset_path}\")\n    logging.info(f\"Loading settings from {settings_path}\")\n    logging.info(f\"Loading records from {records_path}\")\n    # Export the dataset model, settings and records\n    self._persist_dataset_model(path=dataset_path)\n    self.settings.to_json(path=settings_path)\n    if with_records:\n        self.records.to_json(path=records_path)\n\n    return path\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._disk.DiskImportExportMixin.from_disk","title":"from_disk(path, *, name=None, workspace=None, client=None, with_records=True) classmethod","text":"

    Imports a dataset from disk as a directory containing the dataset model, settings and records. The directory should be defined using the to_disk method.

    Parameters:

    Name Type Description Default path str

    The path to the directory containing the dataset model, settings and records.

    required name str

    The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.

    None workspace Union[Workspace, str]

    The workspace to import the dataset to. Defaults to None and default workspace is used.

    None client Argilla

    The client to use for the import. Defaults to None and the default client is used.

    None with_records bool

    whether to load the records from the Hugging Face dataset. Defaults to True.

    True Source code in src/argilla/datasets/_export/_disk.py
    @classmethod\ndef from_disk(\n    cls: Type[\"Dataset\"],\n    path: str,\n    *,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str]] = None,\n    client: Optional[\"Argilla\"] = None,\n    with_records: bool = True,\n) -> \"Dataset\":\n    \"\"\"Imports a dataset from disk as a directory containing the dataset model, settings and records.\n    The directory should be defined using the `to_disk` method.\n\n    Parameters:\n        path (str): The path to the directory containing the dataset model, settings and records.\n        name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n        workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n        client (Argilla, optional): The client to use for the import. Defaults to None and the default client is used.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n    \"\"\"\n\n    client = client or Argilla._get_default()\n\n    dataset_path, settings_path, records_path = cls._define_child_paths(path=path)\n    logging.info(f\"Loading dataset from {dataset_path}\")\n    logging.info(f\"Loading settings from {settings_path}\")\n    logging.info(f\"Loading records from {records_path}\")\n    dataset_model = cls._load_dataset_model(path=dataset_path)\n\n    # Get the relevant workspace_id of the incoming dataset\n    if isinstance(workspace, str):\n        workspace = client.workspaces(workspace)\n        if not workspace:\n            raise ArgillaError(f\"Workspace {workspace} not found on the server.\")\n    else:\n        warnings.warn(\"Workspace not provided. Using default workspace.\")\n        workspace = client.workspaces.default\n    dataset_model.workspace_id = workspace.id\n\n    if name and (name != dataset_model.name):\n        logging.info(f\"Changing dataset name from {dataset_model.name} to {name}\")\n        dataset_model.name = name\n\n    if client.api.datasets.name_exists(name=dataset_model.name, workspace_id=workspace.id):\n        warnings.warn(\n            f\"Loaded dataset name {dataset_model.name} already exists in the workspace {workspace.name} so using it. To create a new dataset, provide a unique name to the `name` parameter.\"\n        )\n        dataset_model = client.api.datasets.get_by_name_and_workspace_id(\n            name=dataset_model.name, workspace_id=workspace.id\n        )\n        dataset = cls.from_model(model=dataset_model, client=client)\n    else:\n        # Create a new dataset and load the settings and records\n        dataset = cls.from_model(model=dataset_model, client=client)\n        dataset.settings = Settings.from_json(path=settings_path)\n        dataset.create()\n\n    if os.path.exists(records_path) and with_records:\n        try:\n            dataset.records.from_json(path=records_path)\n        except RecordsIngestionError as e:\n            raise RecordsIngestionError(\n                message=\"Error importing dataset records from disk. Records and datasets settings are not compatible.\"\n            ) from e\n    return dataset\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._hub.HubImportExportMixin","title":"HubImportExportMixin","text":"

    Bases: DiskImportExportMixin

    Source code in src/argilla/datasets/_export/_hub.py
    class HubImportExportMixin(DiskImportExportMixin):\n    def to_hub(\n        self: \"Dataset\",\n        repo_id: str,\n        *,\n        with_records: bool = True,\n        generate_card: Optional[bool] = True,\n        **kwargs,\n    ) -> None:\n        \"\"\"Pushes the `Dataset` to the Hugging Face Hub. If the dataset has been previously pushed to the\n        Hugging Face Hub, it will be updated instead of creating a new dataset repo.\n\n        Parameters:\n            repo_id: the ID of the Hugging Face Hub repo to push the `Dataset` to.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n            generate_card: whether to generate a dataset card for the `Dataset` in the Hugging Face Hub. Defaults\n                to `True`.\n            **kwargs: the kwargs to pass to `datasets.Dataset.push_to_hub`.\n        \"\"\"\n\n        from huggingface_hub import DatasetCardData, HfApi\n\n        from argilla.datasets._export.card import (\n            ArgillaDatasetCard,\n            size_categories_parser,\n        )\n\n        hf_api = HfApi(token=kwargs.get(\"token\"))\n\n        hfds = False\n        if with_records:\n            hfds = self.records(with_vectors=True, with_responses=True, with_suggestions=True).to_datasets()\n            hfds.push_to_hub(repo_id, **kwargs)\n        else:\n            hf_api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=kwargs.get(\"exist_ok\") or True)\n\n        with TemporaryDirectory() as tmpdirname:\n            config_dir = os.path.join(tmpdirname)\n\n            self.to_disk(path=config_dir, with_records=False)\n\n            if generate_card:\n                sample_argilla_record = next(iter(self.records(with_suggestions=True, with_responses=True)))\n                if hfds:\n                    sample_huggingface_record = hfds[0]\n                    size_categories = len(hfds)\n                else:\n                    sample_huggingface_record = \"No sample records provided\"\n                    size_categories = 0\n                card = ArgillaDatasetCard.from_template(\n                    card_data=DatasetCardData(\n                        size_categories=size_categories_parser(size_categories),\n                        tags=[\"rlfh\", \"argilla\", \"human-feedback\"],\n                    ),\n                    repo_id=repo_id,\n                    argilla_fields=self.settings.fields,\n                    argilla_questions=self.settings.questions,\n                    argilla_guidelines=self.settings.guidelines or None,\n                    argilla_vectors_settings=self.settings.vectors or None,\n                    argilla_metadata_properties=self.settings.metadata,\n                    argilla_record=sample_argilla_record.to_dict(),\n                    huggingface_record=sample_huggingface_record,\n                )\n                card.save(filepath=os.path.join(tmpdirname, \"README.md\"))\n\n            hf_api.upload_folder(\n                folder_path=tmpdirname,\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n            )\n\n    @classmethod\n    def from_hub(\n        cls: Type[\"Dataset\"],\n        repo_id: str,\n        *,\n        name: Optional[str] = None,\n        workspace: Optional[Union[\"Workspace\", str]] = None,\n        client: Optional[\"Argilla\"] = None,\n        with_records: bool = True,\n        settings: Optional[\"Settings\"] = None,\n        **kwargs: Any,\n    ):\n        \"\"\"Loads a `Dataset` from the Hugging Face Hub.\n\n        Parameters:\n            repo_id: the ID of the Hugging Face Hub repo to load the `Dataset` from.\n            name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n            workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n            client: the client to use to load the `Dataset`. If not provided, the default client will be used.\n            with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n            **kwargs: the kwargs to pass to `datasets.Dataset.load_from_hub`.\n\n        Returns:\n            A `Dataset` loaded from the Hugging Face Hub.\n        \"\"\"\n        from datasets import load_dataset\n        from huggingface_hub import snapshot_download\n\n        if name is None:\n            name = repo_id.replace(\"/\", \"_\")\n\n        if settings is not None:\n            dataset = cls(name=name, settings=settings)\n            dataset.create()\n        else:\n            # download configuration files from the hub\n            folder_path = snapshot_download(\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n                allow_patterns=cls._DEFAULT_CONFIGURATION_FILES,\n                token=kwargs.get(\"token\"),\n            )\n\n            dataset = cls.from_disk(\n                path=folder_path, workspace=workspace, name=name, client=client, with_records=with_records\n            )\n\n        if with_records:\n            try:\n                hf_dataset = load_dataset(path=repo_id, **kwargs)  # type: ignore\n                hf_dataset = cls._get_dataset_split(hf_dataset=hf_dataset, **kwargs)\n                cls._log_dataset_records(hf_dataset=hf_dataset, dataset=dataset)\n            except EmptyDatasetError:\n                warnings.warn(\n                    message=\"Trying to load a dataset `with_records=True` but dataset does not contain any records.\",\n                    category=UserWarning,\n                )\n\n        return dataset\n\n    @staticmethod\n    def _log_dataset_records(hf_dataset: \"HFDataset\", dataset: \"Dataset\"):\n        \"\"\"This method extracts the responses from a Hugging Face dataset and returns a list of `Record` objects\"\"\"\n\n        # Identify columns that colunms that contain responses\n        responses_columns = [col for col in hf_dataset.column_names if \".responses\" in col]\n        response_questions = defaultdict(dict)\n        user_ids = {}\n        for col in responses_columns:\n            question_name = col.split(\".\")[0]\n            if col.endswith(\"users\"):\n                response_questions[question_name][\"users\"] = hf_dataset[col]\n                user_ids.update({UUID(user_id): UUID(user_id) for user_id in set(sum(hf_dataset[col], []))})\n            elif col.endswith(\"responses\"):\n                response_questions[question_name][\"responses\"] = hf_dataset[col]\n            elif col.endswith(\"status\"):\n                response_questions[question_name][\"status\"] = hf_dataset[col]\n\n        # Check if all user ids are known to this Argilla client\n        known_users_ids = [user.id for user in dataset._client.users]\n        unknown_user_ids = set(user_ids.keys()) - set(known_users_ids)\n        my_user = dataset._client.me\n        if len(unknown_user_ids) > 1:\n            warnings.warn(\n                message=f\"\"\"Found unknown user ids in dataset repo: {unknown_user_ids}.\n                    Assigning first response for each record to current user ({my_user.username}) and discarding the rest.\"\"\"\n            )\n        for unknown_user_id in unknown_user_ids:\n            user_ids[unknown_user_id] = my_user.id\n\n        # Create a mapper to map the Hugging Face dataset to a Record object\n        mapping = {col: col for col in hf_dataset.column_names if \".suggestion\" in col}\n        mapper = IngestedRecordMapper(dataset=dataset, mapping=mapping, user_id=my_user.id)\n\n        # Extract responses and create Record objects\n        records = []\n        for idx, row in enumerate(hf_dataset):\n            record = mapper(row)\n            for question_name, values in response_questions.items():\n                response_values = values[\"responses\"][idx]\n                response_users = values[\"users\"][idx]\n                response_status = values[\"status\"][idx]\n                for value, user_id, status in zip(response_values, response_users, response_status):\n                    user_id = user_ids[UUID(user_id)]\n                    if user_id in response_users:\n                        continue\n                    response_users[user_id] = True\n                    response = Response(\n                        user_id=user_id,\n                        question_name=question_name,\n                        value=value,\n                        status=status,\n                    )\n                    record.responses.add(response)\n            records.append(record)\n\n        try:\n            dataset.records.log(records=records)\n        except (RecordsIngestionError, UnprocessableEntityError) as e:\n            raise SettingsError(\n                message=f\"Failed to load records from Hugging Face dataset. Defined settings do not match dataset schema. Hugging face dataset features: {hf_dataset.features}. Argilla dataset settings : {dataset.settings}\"\n            ) from e\n\n    @staticmethod\n    def _get_dataset_split(hf_dataset: \"HFDataset\", split: Optional[str] = None, **kwargs: Dict) -> \"HFDataset\":\n        \"\"\"Get a single dataset from a Hugging Face dataset.\n\n        Parameters:\n            hf_dataset (HFDataset): The Hugging Face dataset to get a single dataset from.\n\n        Returns:\n            HFDataset: The single dataset.\n        \"\"\"\n\n        if isinstance(hf_dataset, DatasetDict) and split is None:\n            split = next(iter(hf_dataset.keys()))\n            if len(hf_dataset.keys()) > 1:\n                warnings.warn(\n                    message=f\"Multiple splits found in Hugging Face dataset. Using the first split: {split}. \"\n                    f\"Available splits are: {', '.join(hf_dataset.keys())}.\"\n                )\n            hf_dataset = hf_dataset[split]\n        return hf_dataset\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._hub.HubImportExportMixin.to_hub","title":"to_hub(repo_id, *, with_records=True, generate_card=True, **kwargs)","text":"

    Pushes the Dataset to the Hugging Face Hub. If the dataset has been previously pushed to the Hugging Face Hub, it will be updated instead of creating a new dataset repo.

    Parameters:

    Name Type Description Default repo_id str

    the ID of the Hugging Face Hub repo to push the Dataset to.

    required with_records bool

    whether to load the records from the Hugging Face dataset. Defaults to True.

    True generate_card Optional[bool]

    whether to generate a dataset card for the Dataset in the Hugging Face Hub. Defaults to True.

    True **kwargs

    the kwargs to pass to datasets.Dataset.push_to_hub.

    {} Source code in src/argilla/datasets/_export/_hub.py
    def to_hub(\n    self: \"Dataset\",\n    repo_id: str,\n    *,\n    with_records: bool = True,\n    generate_card: Optional[bool] = True,\n    **kwargs,\n) -> None:\n    \"\"\"Pushes the `Dataset` to the Hugging Face Hub. If the dataset has been previously pushed to the\n    Hugging Face Hub, it will be updated instead of creating a new dataset repo.\n\n    Parameters:\n        repo_id: the ID of the Hugging Face Hub repo to push the `Dataset` to.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        generate_card: whether to generate a dataset card for the `Dataset` in the Hugging Face Hub. Defaults\n            to `True`.\n        **kwargs: the kwargs to pass to `datasets.Dataset.push_to_hub`.\n    \"\"\"\n\n    from huggingface_hub import DatasetCardData, HfApi\n\n    from argilla.datasets._export.card import (\n        ArgillaDatasetCard,\n        size_categories_parser,\n    )\n\n    hf_api = HfApi(token=kwargs.get(\"token\"))\n\n    hfds = False\n    if with_records:\n        hfds = self.records(with_vectors=True, with_responses=True, with_suggestions=True).to_datasets()\n        hfds.push_to_hub(repo_id, **kwargs)\n    else:\n        hf_api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=kwargs.get(\"exist_ok\") or True)\n\n    with TemporaryDirectory() as tmpdirname:\n        config_dir = os.path.join(tmpdirname)\n\n        self.to_disk(path=config_dir, with_records=False)\n\n        if generate_card:\n            sample_argilla_record = next(iter(self.records(with_suggestions=True, with_responses=True)))\n            if hfds:\n                sample_huggingface_record = hfds[0]\n                size_categories = len(hfds)\n            else:\n                sample_huggingface_record = \"No sample records provided\"\n                size_categories = 0\n            card = ArgillaDatasetCard.from_template(\n                card_data=DatasetCardData(\n                    size_categories=size_categories_parser(size_categories),\n                    tags=[\"rlfh\", \"argilla\", \"human-feedback\"],\n                ),\n                repo_id=repo_id,\n                argilla_fields=self.settings.fields,\n                argilla_questions=self.settings.questions,\n                argilla_guidelines=self.settings.guidelines or None,\n                argilla_vectors_settings=self.settings.vectors or None,\n                argilla_metadata_properties=self.settings.metadata,\n                argilla_record=sample_argilla_record.to_dict(),\n                huggingface_record=sample_huggingface_record,\n            )\n            card.save(filepath=os.path.join(tmpdirname, \"README.md\"))\n\n        hf_api.upload_folder(\n            folder_path=tmpdirname,\n            repo_id=repo_id,\n            repo_type=\"dataset\",\n        )\n
    "},{"location":"reference/argilla/datasets/datasets/#src.argilla.datasets._export._hub.HubImportExportMixin.from_hub","title":"from_hub(repo_id, *, name=None, workspace=None, client=None, with_records=True, settings=None, **kwargs) classmethod","text":"

    Loads a Dataset from the Hugging Face Hub.

    Parameters:

    Name Type Description Default repo_id str

    the ID of the Hugging Face Hub repo to load the Dataset from.

    required name str

    The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.

    None workspace Union[Workspace, str]

    The workspace to import the dataset to. Defaults to None and default workspace is used.

    None client Optional[Argilla]

    the client to use to load the Dataset. If not provided, the default client will be used.

    None with_records bool

    whether to load the records from the Hugging Face dataset. Defaults to True.

    True **kwargs Any

    the kwargs to pass to datasets.Dataset.load_from_hub.

    {}

    Returns:

    Type Description

    A Dataset loaded from the Hugging Face Hub.

    Source code in src/argilla/datasets/_export/_hub.py
    @classmethod\ndef from_hub(\n    cls: Type[\"Dataset\"],\n    repo_id: str,\n    *,\n    name: Optional[str] = None,\n    workspace: Optional[Union[\"Workspace\", str]] = None,\n    client: Optional[\"Argilla\"] = None,\n    with_records: bool = True,\n    settings: Optional[\"Settings\"] = None,\n    **kwargs: Any,\n):\n    \"\"\"Loads a `Dataset` from the Hugging Face Hub.\n\n    Parameters:\n        repo_id: the ID of the Hugging Face Hub repo to load the `Dataset` from.\n        name (str, optional): The name to assign to the new dataset. Defaults to None and the dataset's source name is used, unless it already exists, in which case a unique UUID is appended.\n        workspace (Union[Workspace, str], optional): The workspace to import the dataset to. Defaults to None and default workspace is used.\n        client: the client to use to load the `Dataset`. If not provided, the default client will be used.\n        with_records: whether to load the records from the Hugging Face dataset. Defaults to `True`.\n        **kwargs: the kwargs to pass to `datasets.Dataset.load_from_hub`.\n\n    Returns:\n        A `Dataset` loaded from the Hugging Face Hub.\n    \"\"\"\n    from datasets import load_dataset\n    from huggingface_hub import snapshot_download\n\n    if name is None:\n        name = repo_id.replace(\"/\", \"_\")\n\n    if settings is not None:\n        dataset = cls(name=name, settings=settings)\n        dataset.create()\n    else:\n        # download configuration files from the hub\n        folder_path = snapshot_download(\n            repo_id=repo_id,\n            repo_type=\"dataset\",\n            allow_patterns=cls._DEFAULT_CONFIGURATION_FILES,\n            token=kwargs.get(\"token\"),\n        )\n\n        dataset = cls.from_disk(\n            path=folder_path, workspace=workspace, name=name, client=client, with_records=with_records\n        )\n\n    if with_records:\n        try:\n            hf_dataset = load_dataset(path=repo_id, **kwargs)  # type: ignore\n            hf_dataset = cls._get_dataset_split(hf_dataset=hf_dataset, **kwargs)\n            cls._log_dataset_records(hf_dataset=hf_dataset, dataset=dataset)\n        except EmptyDatasetError:\n            warnings.warn(\n                message=\"Trying to load a dataset `with_records=True` but dataset does not contain any records.\",\n                category=UserWarning,\n            )\n\n    return dataset\n
    "},{"location":"reference/argilla/records/metadata/","title":"metadata","text":"

    Metadata in argilla is a dictionary that can be attached to a record. It is used to store additional information about the record that is not part of the record's fields or responses. For example, the source of the record, the date it was created, or any other information that is relevant to the record. Metadata can be added to a record directly or as valules within a dictionary.

    "},{"location":"reference/argilla/records/metadata/#usage-examples","title":"Usage Examples","text":"

    To use metadata within a dataset, you must define a metadata property in the dataset settings. The metadata property is a list of metadata properties that can be attached to a record. The following example demonstrates how to add metadata to a dataset and how to access metadata from a record object:

    import argilla as rg\n\ndataset = Dataset(\n    name=\"dataset_with_metadata\",\n    settings=Settings(\n        fields=[TextField(name=\"text\")],\n        questions=[LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])],\n        metadata=[\n            rg.TermsMetadataProperty(name=\"category\", options=[\"A\", \"B\", \"C\"]),\n        ],\n    ),\n)\ndataset.create()\n

    Then, you can add records to the dataset with metadata that corresponds to the metadata property defined in the dataset settings:

    dataset_with_metadata.records.log(\n    [\n        {\"text\": \"text\", \"label\": \"positive\", \"category\": \"A\"},\n        {\"text\": \"text\", \"label\": \"negative\", \"category\": \"B\"},\n    ]\n)\n
    "},{"location":"reference/argilla/records/metadata/#format-per-metadataproperty-type","title":"Format per MetadataProperty type","text":"

    Depending on the MetadataProperty type, metadata might need to be formatted in a slightly different way.

    For TermsMetadataPropertyFor FloatMetadataPropertyFor IntegerMetadataProperty
    rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": \"A\"}\n)\n\n# with multiple terms\n\nrg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": [\"A\", \"B\"]}\n)\n
    rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": 2.1}\n)\n
    rg.Records(\n    fields={\"text\": \"example\"},\n    metadata={\"category\": 42}\n)\n
    "},{"location":"reference/argilla/records/records/","title":"rg.Record","text":"

    The Record object is used to represent a single record in Argilla. It contains fields, suggestions, responses, metadata, and vectors.

    "},{"location":"reference/argilla/records/records/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/records/records/#creating-a-record","title":"Creating a Record","text":"

    To create records, you can use the Record class and pass it to the Dataset.records.log method. The Record class requires a fields parameter, which is a dictionary of field names and values. The field names must match the field names in the dataset's Settings object to be accepted.

    dataset.records.log(\n    records=[\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n        ),\n    ]\n) # (1)\n
    1. The Argilla dataset contains a field named text matching the key here.
    "},{"location":"reference/argilla/records/records/#accessing-record-attributes","title":"Accessing Record Attributes","text":"

    The Record object has suggestions, responses, metadata, and vectors attributes that can be accessed directly whilst iterating over records in a dataset.

    for record in dataset.records(\n    with_suggestions=True,\n    with_responses=True,\n    with_metadata=True,\n    with_vectors=True\n    ):\n    print(record.suggestions)\n    print(record.responses)\n    print(record.metadata)\n    print(record.vectors)\n

    Record properties can also be updated whilst iterating over records in a dataset.

    for record in dataset.records(with_metadata=True):\n    record.metadata = {\"department\": \"toys\"}\n

    For changes to take effect, the user must call the update method on the Dataset object, or pass the updated records to Dataset.records.log. All core record atttributes can be updated in this way. Check their respective documentation for more information: Suggestions, Responses, Metadata, Vectors.

    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record","title":"Record","text":"

    Bases: Resource

    The class for interacting with Argilla Records. A Record is a single sample in a dataset. Records receives feedback in the form of responses and suggestions. Records contain fields, metadata, and vectors.

    Attributes:

    Name Type Description id Union[str, UUID]

    The id of the record.

    fields RecordFields

    The fields of the record.

    metadata RecordMetadata

    The metadata of the record.

    vectors RecordVectors

    The vectors of the record.

    responses RecordResponses

    The responses of the record.

    suggestions RecordSuggestions

    The suggestions of the record.

    dataset Dataset

    The dataset to which the record belongs.

    _server_id UUID

    An id for the record generated by the Argilla server.

    Source code in src/argilla/records/_resource.py
    class Record(Resource):\n    \"\"\"The class for interacting with Argilla Records. A `Record` is a single sample\n    in a dataset. Records receives feedback in the form of responses and suggestions.\n    Records contain fields, metadata, and vectors.\n\n    Attributes:\n        id (Union[str, UUID]): The id of the record.\n        fields (RecordFields): The fields of the record.\n        metadata (RecordMetadata): The metadata of the record.\n        vectors (RecordVectors): The vectors of the record.\n        responses (RecordResponses): The responses of the record.\n        suggestions (RecordSuggestions): The suggestions of the record.\n        dataset (Dataset): The dataset to which the record belongs.\n        _server_id (UUID): An id for the record generated by the Argilla server.\n    \"\"\"\n\n    _model: RecordModel\n\n    def __init__(\n        self,\n        id: Optional[Union[UUID, str]] = None,\n        fields: Optional[Dict[str, FieldValue]] = None,\n        metadata: Optional[Dict[str, MetadataValue]] = None,\n        vectors: Optional[Dict[str, VectorValue]] = None,\n        responses: Optional[List[Response]] = None,\n        suggestions: Optional[List[Suggestion]] = None,\n        _server_id: Optional[UUID] = None,\n        _dataset: Optional[\"Dataset\"] = None,\n    ):\n        \"\"\"Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id.\n        Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions\n        and passed to Dataset.DatasetRecords.add() as a list of dictionaries.\n\n        Args:\n            id: An id for the record. If not provided, a UUID will be generated.\n            fields: A dictionary of fields for the record.\n            metadata: A dictionary of metadata for the record.\n            vectors: A dictionary of vectors for the record.\n            responses: A list of Response objects for the record.\n            suggestions: A list of Suggestion objects for the record.\n            _server_id: An id for the record. (Read-only and set by the server)\n            _dataset: The dataset object to which the record belongs.\n        \"\"\"\n        if fields is None and metadata is None and vectors is None and responses is None and suggestions is None:\n            raise ValueError(\"At least one of fields, metadata, vectors, responses, or suggestions must be provided.\")\n        if fields is None and id is None:\n            raise ValueError(\"If fields are not provided, an id must be provided.\")\n        if fields == {} and id is None:\n            raise ValueError(\"If fields are an empty dictionary, an id must be provided.\")\n\n        self._dataset = _dataset\n        self._model = RecordModel(external_id=id, id=_server_id)\n        self.__fields = RecordFields(fields=fields)\n        self.__vectors = RecordVectors(vectors=vectors)\n        self.__metadata = RecordMetadata(metadata=metadata)\n        self.__responses = RecordResponses(responses=responses, record=self)\n        self.__suggestions = RecordSuggestions(suggestions=suggestions, record=self)\n\n    def __repr__(self) -> str:\n        return (\n            f\"Record(id={self.id},status={self.status},fields={self.fields},metadata={self.metadata},\"\n            f\"suggestions={self.suggestions},responses={self.responses})\"\n        )\n\n    ############################\n    # Properties\n    ############################\n\n    @property\n    def id(self) -> str:\n        return self._model.external_id\n\n    @id.setter\n    def id(self, value: str) -> None:\n        self._model.external_id = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n\n    @property\n    def fields(self) -> \"RecordFields\":\n        return self.__fields\n\n    @property\n    def responses(self) -> \"RecordResponses\":\n        return self.__responses\n\n    @property\n    def suggestions(self) -> \"RecordSuggestions\":\n        return self.__suggestions\n\n    @property\n    def metadata(self) -> \"RecordMetadata\":\n        return self.__metadata\n\n    @property\n    def vectors(self) -> \"RecordVectors\":\n        return self.__vectors\n\n    @property\n    def status(self) -> str:\n        return self._model.status\n\n    @property\n    def _server_id(self) -> Optional[UUID]:\n        return self._model.id\n\n    ############################\n    # Public methods\n    ############################\n\n    def api_model(self) -> RecordModel:\n        return RecordModel(\n            id=self._model.id,\n            external_id=self._model.external_id,\n            fields=self.fields.to_dict(),\n            metadata=self.metadata.api_models(),\n            vectors=self.vectors.api_models(),\n            responses=self.responses.api_models(),\n            suggestions=self.suggestions.api_models(),\n            status=self.status,\n        )\n\n    def serialize(self) -> Dict[str, Any]:\n        \"\"\"Serializes the Record to a dictionary for interaction with the API\"\"\"\n        serialized_model = self._model.model_dump()\n        serialized_suggestions = [suggestion.serialize() for suggestion in self.__suggestions]\n        serialized_responses = [response.serialize() for response in self.__responses]\n        serialized_model[\"responses\"] = serialized_responses\n        serialized_model[\"suggestions\"] = serialized_suggestions\n        return serialized_model\n\n    def to_dict(self) -> Dict[str, Dict]:\n        \"\"\"Converts a Record object to a dictionary for export.\n        Returns:\n            A dictionary representing the record where the keys are \"fields\",\n            \"metadata\", \"suggestions\", and \"responses\". Each field and question is\n            represented as a key-value pair in the dictionary of the respective key. i.e.\n            `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},\n        \"\"\"\n        id = str(self.id) if self.id else None\n        server_id = str(self._model.id) if self._model.id else None\n        status = self.status\n        fields = self.fields.to_dict()\n        metadata = self.metadata.to_dict()\n        suggestions = self.suggestions.to_dict()\n        responses = self.responses.to_dict()\n        vectors = self.vectors.to_dict()\n\n        return {\n            \"id\": id,\n            \"fields\": fields,\n            \"metadata\": metadata,\n            \"suggestions\": suggestions,\n            \"responses\": responses,\n            \"vectors\": vectors,\n            \"status\": status,\n            \"_server_id\": server_id,\n        }\n\n    @classmethod\n    def from_dict(cls, data: Dict[str, Dict], dataset: Optional[\"Dataset\"] = None) -> \"Record\":\n        \"\"\"Converts a dictionary to a Record object.\n        Args:\n            data: A dictionary representing the record.\n            dataset: The dataset object to which the record belongs.\n        Returns:\n            A Record object.\n        \"\"\"\n        fields = data.get(\"fields\", {})\n        metadata = data.get(\"metadata\", {})\n        suggestions = data.get(\"suggestions\", {})\n        responses = data.get(\"responses\", {})\n        vectors = data.get(\"vectors\", {})\n        record_id = data.get(\"id\", None)\n        _server_id = data.get(\"_server_id\", None)\n\n        suggestions = [Suggestion(question_name=question_name, **value) for question_name, value in suggestions.items()]\n        responses = [\n            Response(question_name=question_name, **value)\n            for question_name, _responses in responses.items()\n            for value in _responses\n        ]\n\n        return cls(\n            id=record_id,\n            fields=fields,\n            suggestions=suggestions,\n            responses=responses,\n            vectors=vectors,\n            metadata=metadata,\n            _dataset=dataset,\n            _server_id=_server_id,\n        )\n\n    @classmethod\n    def from_model(cls, model: RecordModel, dataset: \"Dataset\") -> \"Record\":\n        \"\"\"Converts a RecordModel object to a Record object.\n        Args:\n            model: A RecordModel object.\n            dataset: The dataset object to which the record belongs.\n        Returns:\n            A Record object.\n        \"\"\"\n        instance = cls(\n            id=model.external_id,\n            fields=model.fields,\n            metadata={meta.name: meta.value for meta in model.metadata},\n            vectors={vector.name: vector.vector_values for vector in model.vectors},\n            # Responses and their models are not aligned 1-1.\n            responses=[\n                response\n                for response_model in model.responses\n                for response in UserResponse.from_model(response_model, dataset=dataset)\n            ],\n            suggestions=[Suggestion.from_model(model=suggestion, dataset=dataset) for suggestion in model.suggestions],\n        )\n\n        # set private attributes\n        instance._dataset = dataset\n        instance._model.id = model.id\n        instance._model.status = model.status\n\n        return instance\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.__init__","title":"__init__(id=None, fields=None, metadata=None, vectors=None, responses=None, suggestions=None, _server_id=None, _dataset=None)","text":"

    Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id. Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions and passed to Dataset.DatasetRecords.add() as a list of dictionaries.

    Parameters:

    Name Type Description Default id Optional[Union[UUID, str]]

    An id for the record. If not provided, a UUID will be generated.

    None fields Optional[Dict[str, FieldValue]]

    A dictionary of fields for the record.

    None metadata Optional[Dict[str, MetadataValue]]

    A dictionary of metadata for the record.

    None vectors Optional[Dict[str, VectorValue]]

    A dictionary of vectors for the record.

    None responses Optional[List[Response]]

    A list of Response objects for the record.

    None suggestions Optional[List[Suggestion]]

    A list of Suggestion objects for the record.

    None _server_id Optional[UUID]

    An id for the record. (Read-only and set by the server)

    None _dataset Optional[Dataset]

    The dataset object to which the record belongs.

    None Source code in src/argilla/records/_resource.py
    def __init__(\n    self,\n    id: Optional[Union[UUID, str]] = None,\n    fields: Optional[Dict[str, FieldValue]] = None,\n    metadata: Optional[Dict[str, MetadataValue]] = None,\n    vectors: Optional[Dict[str, VectorValue]] = None,\n    responses: Optional[List[Response]] = None,\n    suggestions: Optional[List[Suggestion]] = None,\n    _server_id: Optional[UUID] = None,\n    _dataset: Optional[\"Dataset\"] = None,\n):\n    \"\"\"Initializes a Record with fields, metadata, vectors, responses, suggestions, external_id, and id.\n    Records are typically defined as flat dictionary objects with fields, metadata, vectors, responses, and suggestions\n    and passed to Dataset.DatasetRecords.add() as a list of dictionaries.\n\n    Args:\n        id: An id for the record. If not provided, a UUID will be generated.\n        fields: A dictionary of fields for the record.\n        metadata: A dictionary of metadata for the record.\n        vectors: A dictionary of vectors for the record.\n        responses: A list of Response objects for the record.\n        suggestions: A list of Suggestion objects for the record.\n        _server_id: An id for the record. (Read-only and set by the server)\n        _dataset: The dataset object to which the record belongs.\n    \"\"\"\n    if fields is None and metadata is None and vectors is None and responses is None and suggestions is None:\n        raise ValueError(\"At least one of fields, metadata, vectors, responses, or suggestions must be provided.\")\n    if fields is None and id is None:\n        raise ValueError(\"If fields are not provided, an id must be provided.\")\n    if fields == {} and id is None:\n        raise ValueError(\"If fields are an empty dictionary, an id must be provided.\")\n\n    self._dataset = _dataset\n    self._model = RecordModel(external_id=id, id=_server_id)\n    self.__fields = RecordFields(fields=fields)\n    self.__vectors = RecordVectors(vectors=vectors)\n    self.__metadata = RecordMetadata(metadata=metadata)\n    self.__responses = RecordResponses(responses=responses, record=self)\n    self.__suggestions = RecordSuggestions(suggestions=suggestions, record=self)\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.serialize","title":"serialize()","text":"

    Serializes the Record to a dictionary for interaction with the API

    Source code in src/argilla/records/_resource.py
    def serialize(self) -> Dict[str, Any]:\n    \"\"\"Serializes the Record to a dictionary for interaction with the API\"\"\"\n    serialized_model = self._model.model_dump()\n    serialized_suggestions = [suggestion.serialize() for suggestion in self.__suggestions]\n    serialized_responses = [response.serialize() for response in self.__responses]\n    serialized_model[\"responses\"] = serialized_responses\n    serialized_model[\"suggestions\"] = serialized_suggestions\n    return serialized_model\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.to_dict","title":"to_dict()","text":"

    Converts a Record object to a dictionary for export. Returns: A dictionary representing the record where the keys are \"fields\", \"metadata\", \"suggestions\", and \"responses\". Each field and question is represented as a key-value pair in the dictionary of the respective key. i.e. `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},

    Source code in src/argilla/records/_resource.py
    def to_dict(self) -> Dict[str, Dict]:\n    \"\"\"Converts a Record object to a dictionary for export.\n    Returns:\n        A dictionary representing the record where the keys are \"fields\",\n        \"metadata\", \"suggestions\", and \"responses\". Each field and question is\n        represented as a key-value pair in the dictionary of the respective key. i.e.\n        `{\"fields\": {\"prompt\": \"...\", \"response\": \"...\"}, \"responses\": {\"rating\": \"...\"},\n    \"\"\"\n    id = str(self.id) if self.id else None\n    server_id = str(self._model.id) if self._model.id else None\n    status = self.status\n    fields = self.fields.to_dict()\n    metadata = self.metadata.to_dict()\n    suggestions = self.suggestions.to_dict()\n    responses = self.responses.to_dict()\n    vectors = self.vectors.to_dict()\n\n    return {\n        \"id\": id,\n        \"fields\": fields,\n        \"metadata\": metadata,\n        \"suggestions\": suggestions,\n        \"responses\": responses,\n        \"vectors\": vectors,\n        \"status\": status,\n        \"_server_id\": server_id,\n    }\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.from_dict","title":"from_dict(data, dataset=None) classmethod","text":"

    Converts a dictionary to a Record object. Args: data: A dictionary representing the record. dataset: The dataset object to which the record belongs. Returns: A Record object.

    Source code in src/argilla/records/_resource.py
    @classmethod\ndef from_dict(cls, data: Dict[str, Dict], dataset: Optional[\"Dataset\"] = None) -> \"Record\":\n    \"\"\"Converts a dictionary to a Record object.\n    Args:\n        data: A dictionary representing the record.\n        dataset: The dataset object to which the record belongs.\n    Returns:\n        A Record object.\n    \"\"\"\n    fields = data.get(\"fields\", {})\n    metadata = data.get(\"metadata\", {})\n    suggestions = data.get(\"suggestions\", {})\n    responses = data.get(\"responses\", {})\n    vectors = data.get(\"vectors\", {})\n    record_id = data.get(\"id\", None)\n    _server_id = data.get(\"_server_id\", None)\n\n    suggestions = [Suggestion(question_name=question_name, **value) for question_name, value in suggestions.items()]\n    responses = [\n        Response(question_name=question_name, **value)\n        for question_name, _responses in responses.items()\n        for value in _responses\n    ]\n\n    return cls(\n        id=record_id,\n        fields=fields,\n        suggestions=suggestions,\n        responses=responses,\n        vectors=vectors,\n        metadata=metadata,\n        _dataset=dataset,\n        _server_id=_server_id,\n    )\n
    "},{"location":"reference/argilla/records/records/#src.argilla.records._resource.Record.from_model","title":"from_model(model, dataset) classmethod","text":"

    Converts a RecordModel object to a Record object. Args: model: A RecordModel object. dataset: The dataset object to which the record belongs. Returns: A Record object.

    Source code in src/argilla/records/_resource.py
    @classmethod\ndef from_model(cls, model: RecordModel, dataset: \"Dataset\") -> \"Record\":\n    \"\"\"Converts a RecordModel object to a Record object.\n    Args:\n        model: A RecordModel object.\n        dataset: The dataset object to which the record belongs.\n    Returns:\n        A Record object.\n    \"\"\"\n    instance = cls(\n        id=model.external_id,\n        fields=model.fields,\n        metadata={meta.name: meta.value for meta in model.metadata},\n        vectors={vector.name: vector.vector_values for vector in model.vectors},\n        # Responses and their models are not aligned 1-1.\n        responses=[\n            response\n            for response_model in model.responses\n            for response in UserResponse.from_model(response_model, dataset=dataset)\n        ],\n        suggestions=[Suggestion.from_model(model=suggestion, dataset=dataset) for suggestion in model.suggestions],\n    )\n\n    # set private attributes\n    instance._dataset = dataset\n    instance._model.id = model.id\n    instance._model.status = model.status\n\n    return instance\n
    "},{"location":"reference/argilla/records/responses/","title":"rg.Response","text":"

    Class for interacting with Argilla Responses of records. Responses are answers to questions by a user. Therefore, a recod question can have multiple responses, one for each user that has answered the question. A Response is typically created by a user in the UI or consumed from a data source as a label, unlike a Suggestion which is typically created by a model prediction.

    "},{"location":"reference/argilla/records/responses/#usage-examples","title":"Usage Examples","text":"

    Responses can be added to an instantiated Record directly or as a dictionary a dictionary. The following examples demonstrate how to add responses to a record object and how to access responses from a record object:

    Instantiate the Record and related Response objects:

    dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            responses=[rg.Response(\"label\", \"negative\", user_id=user.id)],\n            external_id=str(uuid.uuid4()),\n        )\n    ]\n)\n

    Or, add a response from a dictionary where key is the question name and value is the response:

    dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"label.response\": \"negative\",\n        },\n    ]\n)\n

    Responses can be accessed from a Record via their question name as an attribute of the record. So if a question is named label, the response can be accessed as record.label. The following example demonstrates how to access responses from a record object:

    # iterate over the records and responses\n\nfor record in dataset.records:\n    for response in record.responses[\"label\"]: # (1)\n        print(response.value)\n        print(response.user_id)\n\n# validate that the record has a response\n\nfor record in dataset.records:\n    if record.responses[\"label\"]:\n        for response in record.responses[\"label\"]:\n            print(response.value)\n            print(response.user_id)\n    else:\n        record.responses.add(\n            rg.Response(\"label\", \"positive\", user_id=user.id)\n        ) # (2)\n
    1. Access the responses for the question named label for each record like a dictionary containing a list of Response objects. 2. Add a response to the record if it does not already have one.

    "},{"location":"reference/argilla/records/responses/#format-per-question-type","title":"Format per Question type","text":"

    Depending on the Question type, responses might need to be formatted in a slightly different way.

    For LabelQuestionFor MultiLabelQuestionFor RankingQuestionFor RatingQuestionFor SpanQuestionFor TextQuestion
    rg.Response(\n    question_name=\"label\",\n    value=\"positive\",\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"multi-label\",\n    value=[\"positive\", \"negative\"],\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"rank\",\n    value=[\"1\", \"3\", \"2\"],\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"rating\",\n    value=4,\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"span\",\n    value=[{\"start\": 0, \"end\": 9, \"label\": \"MISC\"}],\n    user_id=user.id,\n    status=\"draft\"\n)\n
    rg.Response(\n    question_name=\"text\",\n    value=\"value\",\n    user_id=user.id,\n    status=\"draft\"\n)\n
    "},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response","title":"Response","text":"

    Class for interacting with Argilla Responses of records. Responses are answers to questions by a user. Therefore, a record question can have multiple responses, one for each user that has answered the question. A Response is typically created by a user in the UI or consumed from a data source as a label, unlike a Suggestion which is typically created by a model prediction.

    Source code in src/argilla/responses.py
    class Response:\n    \"\"\"Class for interacting with Argilla Responses of records. Responses are answers to questions by a user.\n    Therefore, a record question can have multiple responses, one for each user that has answered the question.\n    A `Response` is typically created by a user in the UI or consumed from a data source as a label,\n    unlike a `Suggestion` which is typically created by a model prediction.\n\n    \"\"\"\n\n    def __init__(\n        self,\n        question_name: str,\n        value: Any,\n        user_id: UUID,\n        status: Optional[Union[ResponseStatus, str]] = None,\n        _record: Optional[\"Record\"] = None,\n    ) -> None:\n        \"\"\"Initializes a `Response` for a `Record` with a user_id and value\n\n        Attributes:\n            question_name (str): The name of the question that the suggestion is for.\n            value (str): The value of the response\n            user_id (UUID): The id of the user that submits the response\n            status (Union[ResponseStatus, str]): The status of the response as \"draft\", \"submitted\", \"discarded\".\n        \"\"\"\n\n        if question_name is None:\n            raise ValueError(\"question_name is required\")\n        if value is None:\n            raise ValueError(\"value is required\")\n        if user_id is None:\n            raise ValueError(\"user_id is required\")\n\n        if isinstance(status, str):\n            status = ResponseStatus(status)\n\n        self.record = _record\n        self.question_name = question_name\n        self.value = value\n        self.user_id = user_id\n        self.status = status\n\n    def serialize(self) -> dict[str, Any]:\n        \"\"\"Serializes the Response to a dictionary. This is principally used for sending the response to the API, \\\n            but can be used for data wrangling or manual export.\n\n        Returns:\n            dict[str, Any]: The serialized response as a dictionary with keys `question_name`, `value`, and `user_id`.\n\n        Examples:\n\n        ```python\n        response = rg.Response(\"label\", \"negative\", user_id=user.id)\n        response.serialize()\n        ```\n        \"\"\"\n        return {\n            \"question_name\": self.question_name,\n            \"value\": self.value,\n            \"user_id\": self.user_id,\n            \"status\": self.status,\n        }\n
    "},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.__init__","title":"__init__(question_name, value, user_id, status=None, _record=None)","text":"

    Initializes a Response for a Record with a user_id and value

    Attributes:

    Name Type Description question_name str

    The name of the question that the suggestion is for.

    value str

    The value of the response

    user_id UUID

    The id of the user that submits the response

    status Union[ResponseStatus, str]

    The status of the response as \"draft\", \"submitted\", \"discarded\".

    Source code in src/argilla/responses.py
    def __init__(\n    self,\n    question_name: str,\n    value: Any,\n    user_id: UUID,\n    status: Optional[Union[ResponseStatus, str]] = None,\n    _record: Optional[\"Record\"] = None,\n) -> None:\n    \"\"\"Initializes a `Response` for a `Record` with a user_id and value\n\n    Attributes:\n        question_name (str): The name of the question that the suggestion is for.\n        value (str): The value of the response\n        user_id (UUID): The id of the user that submits the response\n        status (Union[ResponseStatus, str]): The status of the response as \"draft\", \"submitted\", \"discarded\".\n    \"\"\"\n\n    if question_name is None:\n        raise ValueError(\"question_name is required\")\n    if value is None:\n        raise ValueError(\"value is required\")\n    if user_id is None:\n        raise ValueError(\"user_id is required\")\n\n    if isinstance(status, str):\n        status = ResponseStatus(status)\n\n    self.record = _record\n    self.question_name = question_name\n    self.value = value\n    self.user_id = user_id\n    self.status = status\n
    "},{"location":"reference/argilla/records/responses/#src.argilla.responses.Response.serialize","title":"serialize()","text":"

    Serializes the Response to a dictionary. This is principally used for sending the response to the API, but can be used for data wrangling or manual export.

    Returns:

    Type Description dict[str, Any]

    dict[str, Any]: The serialized response as a dictionary with keys question_name, value, and user_id.

    Examples:

    response = rg.Response(\"label\", \"negative\", user_id=user.id)\nresponse.serialize()\n
    Source code in src/argilla/responses.py
    def serialize(self) -> dict[str, Any]:\n    \"\"\"Serializes the Response to a dictionary. This is principally used for sending the response to the API, \\\n        but can be used for data wrangling or manual export.\n\n    Returns:\n        dict[str, Any]: The serialized response as a dictionary with keys `question_name`, `value`, and `user_id`.\n\n    Examples:\n\n    ```python\n    response = rg.Response(\"label\", \"negative\", user_id=user.id)\n    response.serialize()\n    ```\n    \"\"\"\n    return {\n        \"question_name\": self.question_name,\n        \"value\": self.value,\n        \"user_id\": self.user_id,\n        \"status\": self.status,\n    }\n
    "},{"location":"reference/argilla/records/suggestions/","title":"rg.Suggestion","text":"

    Class for interacting with Argilla Suggestions of records. Suggestions are typically created by a model prediction, unlike a Response which is typically created by a user in the UI or consumed from a data source as a label.

    "},{"location":"reference/argilla/records/suggestions/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/records/suggestions/#adding-records-with-suggestions","title":"Adding records with suggestions","text":"

    Suggestions can be added to a record directly or via a dictionary structure. The following examples demonstrate how to add suggestions to a record object and how to access suggestions from a record object:

    Add a response from a dictionary where key is the question name and value is the response:

    dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"label\": \"negative\", # this will be used as a suggestion\n        },\n    ]\n)\n

    If your data contains scores for suggestions you can add them as well via the mapping parameter. The following example demonstrates how to add a suggestion with a score to a record object:

    dataset.records.log(\n    [\n        {\n            \"prompt\": \"Hello World, how are you?\",\n            \"label\": \"negative\",  # this will be used as a suggestion\n            \"score\": 0.9,  # this will be used as the suggestion score\n            \"model\": \"model_name\",  # this will be used as the suggestion agent\n        },\n    ],\n    mapping={\n        \"score\": \"label.suggestion.score\",\n        \"model\": \"label.suggestion.agent\",\n    },  # `label` is the question name in the dataset settings\n)\n

    Or, instantiate the Record and related Suggestions objects directly, like this:

    dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            suggestions=[rg.Suggestion(\"negative\", \"label\", score=0.9, agent=\"model_name\")],\n        )\n    ]\n)\n
    "},{"location":"reference/argilla/records/suggestions/#iterating-over-records-with-suggestions","title":"Iterating over records with suggestions","text":"

    Just like responses, suggestions can be accessed from a Record via their question name as an attribute of the record. So if a question is named label, the suggestion can be accessed as record.label. The following example demonstrates how to access suggestions from a record object:

    for record in dataset.records(with_suggestions=True):\n    print(record.suggestions[\"label\"].value)\n

    We can also add suggestions to records as we iterate over them using the add method:

    for record in dataset.records(with_suggestions=True):\n    if not record.suggestions[\"label\"]: # (1)\n        record.suggestions.add(\n            rg.Suggestion(\"positive\", \"label\", score=0.9, agent=\"model_name\")\n        ) # (2)\n
    1. Validate that the record has a suggestion
    2. Add a suggestion to the record if it does not already have one
    "},{"location":"reference/argilla/records/suggestions/#format-per-question-type","title":"Format per Question type","text":"

    Depending on the Question type, responses might need to be formatted in a slightly different way.

    For LabelQuestionFor MultiLabelQuestionFor RankingQuestionFor RatingQuestionFor SpanQuestionFor TextQuestion
    rg.Suggestion(\n    question_name=\"label\",\n    value=\"positive\",\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"multi-label\",\n    value=[\"positive\", \"negative\"],\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"rank\",\n    value=[\"1\", \"3\", \"2\"],\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"rating\",\n    value=4,\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"span\",\n    value=[{\"start\": 0, \"end\": 9, \"label\": \"MISC\"}],\n    score=0.9,\n    agent=\"model_name\"\n)\n
    rg.Suggestion(\n    question_name=\"text\",\n    value=\"value\",\n    score=0.9,\n    agent=\"model_name\"\n)\n
    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion","title":"Suggestion","text":"

    Bases: Resource

    Class for interacting with Argilla Suggestions. Suggestions are typically model predictions for records. Suggestions are rendered in the user interfaces as 'hints' or 'suggestions' for the user to review and accept or reject.

    Attributes:

    Name Type Description question_name str

    The name of the question that the suggestion is for.

    value str

    The value of the suggestion

    score float

    The score of the suggestion. For example, the probability of the model prediction.

    agent str

    The agent that created the suggestion. For example, the model name.

    type str

    The type of suggestion, either 'model' or 'human'.

    Source code in src/argilla/suggestions.py
    class Suggestion(Resource):\n    \"\"\"Class for interacting with Argilla Suggestions. Suggestions are typically model predictions for records.\n    Suggestions are rendered in the user interfaces as 'hints' or 'suggestions' for the user to review and accept or reject.\n\n    Attributes:\n        question_name (str): The name of the question that the suggestion is for.\n        value (str): The value of the suggestion\n        score (float): The score of the suggestion. For example, the probability of the model prediction.\n        agent (str): The agent that created the suggestion. For example, the model name.\n        type (str): The type of suggestion, either 'model' or 'human'.\n    \"\"\"\n\n    _model: SuggestionModel\n\n    def __init__(\n        self,\n        question_name: str,\n        value: Any,\n        score: Union[float, List[float], None] = None,\n        agent: Optional[str] = None,\n        type: Optional[Literal[\"model\", \"human\"]] = None,\n        _record: Optional[\"Record\"] = None,\n    ) -> None:\n        super().__init__()\n\n        if question_name is None:\n            raise ValueError(\"question_name is required\")\n        if value is None:\n            raise ValueError(\"value is required\")\n\n        self.record = _record\n        self._model = SuggestionModel(\n            question_name=question_name,\n            value=value,\n            type=type,\n            score=score,\n            agent=agent,\n        )\n\n    ##############################\n    # Properties\n    ##############################\n\n    @property\n    def value(self) -> Any:\n        \"\"\"The value of the suggestion.\"\"\"\n        return self._model.value\n\n    @property\n    def question_name(self) -> Optional[str]:\n        \"\"\"The name of the question that the suggestion is for.\"\"\"\n        return self._model.question_name\n\n    @question_name.setter\n    def question_name(self, value: str) -> None:\n        self._model.question_name = value\n\n    @property\n    def type(self) -> Optional[Literal[\"model\", \"human\"]]:\n        \"\"\"The type of suggestion, either 'model' or 'human'.\"\"\"\n        return self._model.type\n\n    @property\n    def score(self) -> Optional[Union[float, List[float]]]:\n        \"\"\"The score of the suggestion.\"\"\"\n        return self._model.score\n\n    @score.setter\n    def score(self, value: float) -> None:\n        self._model.score = value\n\n    @property\n    def agent(self) -> Optional[str]:\n        \"\"\"The agent that created the suggestion.\"\"\"\n        return self._model.agent\n\n    @agent.setter\n    def agent(self, value: str) -> None:\n        self._model.agent = value\n\n    @classmethod\n    def from_model(cls, model: SuggestionModel, dataset: \"Dataset\") -> \"Suggestion\":\n        question = dataset.settings.questions[model.question_id]\n        model.question_name = question.name\n        model.value = cls.__from_model_value(model.value, question)\n\n        instance = cls(question.name, model.value)\n        instance._model = model\n\n        return instance\n\n    def api_model(self) -> SuggestionModel:\n        if self.record is None or self.record.dataset is None:\n            return self._model\n\n        question = self.record.dataset.settings.questions[self.question_name]\n        if question:\n            return SuggestionModel(\n                value=self.__to_model_value(self.value, question),\n                question_name=None if not question else question.name,\n                question_id=None if not question else question.id,\n                type=self._model.type,\n                score=self._model.score,\n                agent=self._model.agent,\n                id=self._model.id,\n            )\n        else:\n            raise RecordSuggestionsError(\n                f\"Record suggestion is invalid because question with name={self.question_name} does not exist in the dataset ({self.record.dataset.name}). Available questions are: {list(self.record.dataset.settings.questions._properties_by_name.keys())}\"\n            )\n\n    @classmethod\n    def __to_model_value(cls, value: Any, question: \"QuestionType\") -> Any:\n        if isinstance(question, RankingQuestion):\n            return cls.__ranking_to_model_value(value)\n        return value\n\n    @classmethod\n    def __from_model_value(cls, value: Any, question: \"QuestionType\") -> Any:\n        if isinstance(question, RankingQuestion):\n            return cls.__ranking_from_model_value(value)\n        return value\n\n    @classmethod\n    def __ranking_from_model_value(cls, value: List[Dict[str, Any]]) -> List[str]:\n        return [v[\"value\"] for v in value]\n\n    @classmethod\n    def __ranking_to_model_value(cls, value: List[str]) -> List[Dict[str, str]]:\n        return [{\"value\": str(v)} for v in value]\n
    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.value","title":"value: Any property","text":"

    The value of the suggestion.

    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.question_name","title":"question_name: Optional[str] property writable","text":"

    The name of the question that the suggestion is for.

    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.type","title":"type: Optional[Literal['model', 'human']] property","text":"

    The type of suggestion, either 'model' or 'human'.

    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.score","title":"score: Optional[Union[float, List[float]]] property writable","text":"

    The score of the suggestion.

    "},{"location":"reference/argilla/records/suggestions/#src.argilla.suggestions.Suggestion.agent","title":"agent: Optional[str] property writable","text":"

    The agent that created the suggestion.

    "},{"location":"reference/argilla/records/vectors/","title":"rg.Vector","text":"

    A vector is a numerical representation of a Record field or attribute, usually the record's text. Vectors can be used to search for similar records via the UI or SDK. Vectors can be added to a record directly or as a dictionary with a key that the matches rg.VectorField name.

    "},{"location":"reference/argilla/records/vectors/#usage-examples","title":"Usage Examples","text":"

    To use vectors within a dataset, you must define a vector field in the dataset settings. The vector field is a list of vector fields that can be attached to a record. The following example demonstrates how to add vectors to a dataset and how to access vectors from a record object:

    import argilla as rg\n\ndataset = Dataset(\n    name=\"dataset_with_metadata\",\n    settings=Settings(\n        fields=[TextField(name=\"text\")],\n        questions=[LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])],\n        vectors=[\n            VectorField(name=\"vector_name\"),\n        ],\n    ),\n)\ndataset.create()\n

    Then, you can add records to the dataset with vectors that correspond to the vector field defined in the dataset settings:

    dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"vector_name\": [0.1, 0.2, 0.3]\n        }\n    ]\n)\n

    Vectors can be passed using a mapping, where the key is the key in the data source and the value is the name in the dataset's setting's rg.VectorField object. For example, the following code adds a record with a vector using a mapping:

    dataset.records.log(\n    [\n        {\n            \"text\": \"Hello World, how are you?\",\n            \"x\": [0.1, 0.2, 0.3]\n        }\n    ],\n    mapping={\"x\": \"vector_name\"}\n)\n

    Or, vectors can be instantiated and added to a record directly, like this:

    dataset.records.log(\n    [\n        rg.Record(\n            fields={\"text\": \"Hello World, how are you?\"},\n            vectors=[rg.Vector(\"embedding\", [0.1, 0.2, 0.3])],\n        )\n    ]\n)\n
    "},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector","title":"Vector","text":"

    Bases: Resource

    Class for interacting with Argilla Vectors. Vectors are typically used to represent embeddings or features of records. The Vector class is used to deliver vectors to the Argilla server.

    Attributes:

    Name Type Description name str

    The name of the vector.

    values list[float]

    The values of the vector.

    Source code in src/argilla/vectors.py
    class Vector(Resource):\n    \"\"\" Class for interacting with Argilla Vectors. Vectors are typically used to represent \\\n        embeddings or features of records. The `Vector` class is used to deliver vectors to the Argilla server.\n\n    Attributes:\n        name (str): The name of the vector.\n        values (list[float]): The values of the vector.\n    \"\"\"\n\n    _model: VectorModel\n\n    def __init__(\n        self,\n        name: str,\n        values: list[float],\n    ) -> None:\n        \"\"\"Initializes a Vector with a name and values that can be used to search in the Argilla ui.\n\n        Parameters:\n            name (str): Name of the vector\n            values (list[float]): List of float values\n\n        \"\"\"\n        self._model = VectorModel(\n            name=name,\n            vector_values=values,\n        )\n\n    def __repr__(self) -> str:\n        return repr(f\"{self.__class__.__name__}({self._model})\")\n\n    ##############################\n    # Properties\n    ##############################\n\n    @property\n    def name(self) -> str:\n        \"\"\"Name of the vector that corresponds to the name of the vector in the dataset's `Settings`\"\"\"\n        return self._model.name\n\n    @property\n    def values(self) -> list[float]:\n        \"\"\"List of float values that represent the vector.\"\"\"\n        return self._model.vector_values\n\n    ##############################\n    # Methods\n    ##############################\n\n    @classmethod\n    def from_model(cls, model: VectorModel) -> \"Vector\":\n        return cls(\n            name=model.name,\n            values=model.vector_values,\n        )\n\n    def serialize(self) -> dict[str, Any]:\n        dumped_model = self._model.model_dump()\n        name = dumped_model.pop(\"name\")\n        values = dumped_model.pop(\"vector_values\")\n        return {name: values}\n
    "},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.name","title":"name: str property","text":"

    Name of the vector that corresponds to the name of the vector in the dataset's Settings

    "},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.values","title":"values: list[float] property","text":"

    List of float values that represent the vector.

    "},{"location":"reference/argilla/records/vectors/#src.argilla.vectors.Vector.__init__","title":"__init__(name, values)","text":"

    Initializes a Vector with a name and values that can be used to search in the Argilla ui.

    Parameters:

    Name Type Description Default name str

    Name of the vector

    required values list[float]

    List of float values

    required Source code in src/argilla/vectors.py
    def __init__(\n    self,\n    name: str,\n    values: list[float],\n) -> None:\n    \"\"\"Initializes a Vector with a name and values that can be used to search in the Argilla ui.\n\n    Parameters:\n        name (str): Name of the vector\n        values (list[float]): List of float values\n\n    \"\"\"\n    self._model = VectorModel(\n        name=name,\n        vector_values=values,\n    )\n
    "},{"location":"reference/argilla/settings/fields/","title":"Fields","text":"

    Fields in Argilla define the content of a record that will be reviewed by a user.

    "},{"location":"reference/argilla/settings/fields/#usage-examples","title":"Usage Examples","text":"

    To define a field, instantiate the TextField class and pass it to the fields parameter of the Settings class.

    text_field = rg.TextField(name=\"text\")\nmarkdown_field = rg.TextField(name=\"text\", use_markdown=True)\n\nsettings = rg.Settings(\n    fields=[\n        text_field,\n        markdown_field,\n    ],\n    questions=[\n        rg.TextQuestion(name=\"response\"),\n    ],\n)\n\ndata = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

    To add records with values for fields, refer to the rg.Dataset.records documentation.

    "},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.TextField","title":"TextField","text":"

    Bases: SettingsPropertyBase

    Text field for use in Argilla Dataset Settings

    Source code in src/argilla/settings/_field.py
    class TextField(SettingsPropertyBase):\n    \"\"\"Text field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    _model: FieldModel\n    _api: FieldsAPI\n\n    _dataset: Optional[\"Dataset\"]\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        use_markdown: Optional[bool] = False,\n        required: bool = True,\n        description: Optional[str] = None,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Text field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the field\n            title (Optional[str]): The name of the field, as it will be displayed in the UI.\n            use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n                to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n            required (bool): Whether the field is required. At least one field must be required.\n            description (Optional[str]): The description of the field.\n        \"\"\"\n        client = client or Argilla._get_default()\n\n        super().__init__(api=client.api.fields, client=client)\n\n        self._model = FieldModel(\n            name=name,\n            title=title,\n            required=required,\n            description=description,\n            settings=TextFieldSettings(use_markdown=use_markdown),\n        )\n\n        self._dataset = None\n\n    @classmethod\n    def from_model(cls, model: FieldModel) -> \"TextField\":\n        instance = cls(name=model.name)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"TextField\":\n        model = FieldModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def use_markdown(self) -> Optional[bool]:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, value: bool) -> None:\n        self._model.settings.use_markdown = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n        self._model.dataset_id = self._dataset.id\n        self._with_client(self._dataset._client)\n\n    def _with_client(self, client: \"Argilla\") -> \"Self\":\n        # TODO: Review and simplify. Maybe only one of them is required\n        self._client = client\n        self._api = self._client.api.fields\n\n        return self\n
    "},{"location":"reference/argilla/settings/fields/#src.argilla.settings._field.TextField.__init__","title":"__init__(name, title=None, use_markdown=False, required=True, description=None, client=None)","text":"

    Text field for use in Argilla Dataset Settings

    Parameters:

    Name Type Description Default name str

    The name of the field

    required title Optional[str]

    The name of the field, as it will be displayed in the UI.

    None use_markdown Optional[bool]

    Whether to render the markdown in the UI. When True, you will be able to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.

    False required bool

    Whether the field is required. At least one field must be required.

    True description Optional[str]

    The description of the field.

    None Source code in src/argilla/settings/_field.py
    def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    use_markdown: Optional[bool] = False,\n    required: bool = True,\n    description: Optional[str] = None,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Text field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the field\n        title (Optional[str]): The name of the field, as it will be displayed in the UI.\n        use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n            to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n        required (bool): Whether the field is required. At least one field must be required.\n        description (Optional[str]): The description of the field.\n    \"\"\"\n    client = client or Argilla._get_default()\n\n    super().__init__(api=client.api.fields, client=client)\n\n    self._model = FieldModel(\n        name=name,\n        title=title,\n        required=required,\n        description=description,\n        settings=TextFieldSettings(use_markdown=use_markdown),\n    )\n\n    self._dataset = None\n
    "},{"location":"reference/argilla/settings/metadata_property/","title":"Metadata Properties","text":"

    Metadata properties are used to define metadata fields in a dataset. Metadata fields are used to store additional information about the records in the dataset. For example, the category of a record, the price of a product, or any other information that is relevant to the record.

    "},{"location":"reference/argilla/settings/metadata_property/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/settings/metadata_property/#defining-metadata-property-for-a-dataset","title":"Defining Metadata Property for a dataset","text":"

    We define metadata properties via type specific classes. The following example demonstrates how to define metadata properties as either a float, integer, or terms metadata property and pass them to the Settings.

    TermsMetadataProperty is used to define a metadata field with a list of options. For example, a color field with options red, blue, and green. FloatMetadataProperty and IntegerMetadataProperty is used to define a metadata field with a float value. For example, a price field with a minimum value of 0.0 and a maximum value of 100.0.

    metadata_field = rg.TermsMetadataProperty(\n    name=\"color\",\n    options=[\"red\", \"blue\", \"green\"],\n    title=\"Color\",\n)\n\nfloat_metadata_field = rg.FloatMetadataProperty(\n    name=\"price\",\n    min=0.0,\n    max=100.0,\n    title=\"Price\",\n)\n\nint_metadata_field = rg.IntegerMetadataProperty(\n    name=\"quantity\",\n    min=0,\n    max=100,\n    title=\"Quantity\",\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=rg.Settings(\n        fields=[\n            rg.TextField(name=\"text\"),\n        ],\n        questions=[\n            rg.TextQuestion(name=\"response\"),\n        ],\n        metadata=[\n            metadata_field,\n            float_metadata_field,\n            int_metadata_field,\n        ],\n    ),\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

    To add records with metadata, refer to the rg.Metadata class documentation.

    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.FloatMetadataProperty","title":"FloatMetadataProperty","text":"

    Bases: MetadataPropertyBase

    Source code in src/argilla/settings/_metadata.py
    class FloatMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        min: Optional[float] = None,\n        max: Optional[float] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with float settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            min (Optional[float]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n            max (Optional[float]): The maximum valid value. If none is provided, it will be computed from the values provided in the records.\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings.\n        \"\"\"\n\n        super().__init__(client=client)\n\n        try:\n            settings = FloatMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.float)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.float,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def min(self) -> Optional[int]:\n        return self._model.settings.min\n\n    @min.setter\n    def min(self, value: Optional[int]) -> None:\n        self._model.settings.min = value\n\n    @property\n    def max(self) -> Optional[int]:\n        return self._model.settings.max\n\n    @max.setter\n    def max(self, value: Optional[int]) -> None:\n        self._model.settings.max = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"FloatMetadataProperty\":\n        instance = FloatMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.FloatMetadataProperty.__init__","title":"__init__(name, min=None, max=None, title=None, visible_for_annotators=True, client=None)","text":"

    Create a metadata field with float settings.

    Parameters:

    Name Type Description Default name str

    The name of the metadata field

    required min Optional[float]

    The minimum valid value. If none is provided, it will be computed from the values provided in the records.

    None max Optional[float]

    The maximum valid value. If none is provided, it will be computed from the values provided in the records.

    None title Optional[str]

    The title of the metadata to be shown in the UI

    None visible_for_annotators Optional[bool]

    Whether the metadata field is visible for annotators.

    True

    Raises:

    Type Description MetadataError

    If an error occurs while defining metadata settings.

    Source code in src/argilla/settings/_metadata.py
    def __init__(\n    self,\n    name: str,\n    min: Optional[float] = None,\n    max: Optional[float] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with float settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        min (Optional[float]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n        max (Optional[float]): The maximum valid value. If none is provided, it will be computed from the values provided in the records.\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings.\n    \"\"\"\n\n    super().__init__(client=client)\n\n    try:\n        settings = FloatMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.float)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.float,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.IntegerMetadataProperty","title":"IntegerMetadataProperty","text":"

    Bases: MetadataPropertyBase

    Source code in src/argilla/settings/_metadata.py
    class IntegerMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        min: Optional[int] = None,\n        max: Optional[int] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with integer settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            min (Optional[int]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n            max (Optional[int]): The maximum  valid value. If none is provided, it will be computed from the values provided in the records.\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings.\n        \"\"\"\n        super().__init__(client=client)\n\n        try:\n            settings = IntegerMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.integer)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.integer,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def min(self) -> Optional[int]:\n        return self._model.settings.min\n\n    @min.setter\n    def min(self, value: Optional[int]) -> None:\n        self._model.settings.min = value\n\n    @property\n    def max(self) -> Optional[int]:\n        return self._model.settings.max\n\n    @max.setter\n    def max(self, value: Optional[int]) -> None:\n        self._model.settings.max = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"IntegerMetadataProperty\":\n        instance = IntegerMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.IntegerMetadataProperty.__init__","title":"__init__(name, min=None, max=None, title=None, visible_for_annotators=True, client=None)","text":"

    Create a metadata field with integer settings.

    Parameters:

    Name Type Description Default name str

    The name of the metadata field

    required min Optional[int]

    The minimum valid value. If none is provided, it will be computed from the values provided in the records.

    None max Optional[int]

    The maximum valid value. If none is provided, it will be computed from the values provided in the records.

    None title Optional[str]

    The title of the metadata to be shown in the UI

    None visible_for_annotators Optional[bool]

    Whether the metadata field is visible for annotators.

    True

    Raises:

    Type Description MetadataError

    If an error occurs while defining metadata settings.

    Source code in src/argilla/settings/_metadata.py
    def __init__(\n    self,\n    name: str,\n    min: Optional[int] = None,\n    max: Optional[int] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with integer settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        min (Optional[int]): The minimum valid value. If none is provided, it will be computed from the values provided in the records.\n        max (Optional[int]): The maximum  valid value. If none is provided, it will be computed from the values provided in the records.\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings.\n    \"\"\"\n    super().__init__(client=client)\n\n    try:\n        settings = IntegerMetadataPropertySettings(min=min, max=max, type=MetadataPropertyType.integer)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.integer,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.TermsMetadataProperty","title":"TermsMetadataProperty","text":"

    Bases: MetadataPropertyBase

    Source code in src/argilla/settings/_metadata.py
    class TermsMetadataProperty(MetadataPropertyBase):\n    def __init__(\n        self,\n        name: str,\n        options: Optional[List[str]] = None,\n        title: Optional[str] = None,\n        visible_for_annotators: Optional[bool] = True,\n        client: Optional[Argilla] = None,\n    ) -> None:\n        \"\"\"Create a metadata field with terms settings.\n\n        Parameters:\n            name (str): The name of the metadata field\n            options (Optional[List[str]]): The list of options\n            title (Optional[str]): The title of the metadata to be shown in the UI\n            visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n        Raises:\n            MetadataError: If an error occurs while defining metadata settings\n        \"\"\"\n        super().__init__(client=client)\n\n        try:\n            settings = TermsMetadataPropertySettings(values=options, type=MetadataPropertyType.terms)\n        except ValueError as e:\n            raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n        self._model = MetadataFieldModel(\n            name=name,\n            type=MetadataPropertyType.terms,\n            title=title,\n            settings=settings,\n            visible_for_annotators=visible_for_annotators,\n        )\n\n    @property\n    def options(self) -> Optional[List[str]]:\n        return self._model.settings.values\n\n    @options.setter\n    def options(self, value: list[str]) -> None:\n        self._model.settings.values = value\n\n    @classmethod\n    def from_model(cls, model: MetadataFieldModel) -> \"TermsMetadataProperty\":\n        instance = TermsMetadataProperty(name=model.name)\n        instance._model = model\n\n        return instance\n
    "},{"location":"reference/argilla/settings/metadata_property/#src.argilla.settings._metadata.TermsMetadataProperty.__init__","title":"__init__(name, options=None, title=None, visible_for_annotators=True, client=None)","text":"

    Create a metadata field with terms settings.

    Parameters:

    Name Type Description Default name str

    The name of the metadata field

    required options Optional[List[str]]

    The list of options

    None title Optional[str]

    The title of the metadata to be shown in the UI

    None visible_for_annotators Optional[bool]

    Whether the metadata field is visible for annotators.

    True

    Raises:

    Type Description MetadataError

    If an error occurs while defining metadata settings

    Source code in src/argilla/settings/_metadata.py
    def __init__(\n    self,\n    name: str,\n    options: Optional[List[str]] = None,\n    title: Optional[str] = None,\n    visible_for_annotators: Optional[bool] = True,\n    client: Optional[Argilla] = None,\n) -> None:\n    \"\"\"Create a metadata field with terms settings.\n\n    Parameters:\n        name (str): The name of the metadata field\n        options (Optional[List[str]]): The list of options\n        title (Optional[str]): The title of the metadata to be shown in the UI\n        visible_for_annotators (Optional[bool]): Whether the metadata field is visible for annotators.\n\n    Raises:\n        MetadataError: If an error occurs while defining metadata settings\n    \"\"\"\n    super().__init__(client=client)\n\n    try:\n        settings = TermsMetadataPropertySettings(values=options, type=MetadataPropertyType.terms)\n    except ValueError as e:\n        raise MetadataError(f\"Error defining metadata settings for {name}\") from e\n\n    self._model = MetadataFieldModel(\n        name=name,\n        type=MetadataPropertyType.terms,\n        title=title,\n        settings=settings,\n        visible_for_annotators=visible_for_annotators,\n    )\n
    "},{"location":"reference/argilla/settings/questions/","title":"Questions","text":"

    Argilla uses questions to gather the feedback. The questions will be answered by users or models.

    "},{"location":"reference/argilla/settings/questions/#usage-examples","title":"Usage Examples","text":"

    To define a label question, for example, instantiate the LabelQuestion class and pass it to the Settings class.

    label_question = rg.LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])\n\nsettings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        label_question,\n    ],\n)\n

    Questions can be combined in extensible ways based on the type of feedback you want to collect. For example, you can combine a label question with a text question to collect both a label and a text response.

    label_question = rg.LabelQuestion(name=\"label\", labels=[\"positive\", \"negative\"])\ntext_question = rg.TextQuestion(name=\"response\")\n\nsettings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    questions=[\n        label_question,\n        text_question,\n    ],\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings,\n)\n

    To add records with responses to questions, refer to the rg.Response class documentation.

    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.LabelQuestion","title":"LabelQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class LabelQuestion(QuestionPropertyBase):\n    _model: LabelQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        labels: Union[List[str], Dict[str, str]],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n        visible_labels: Optional[int] = None,\n    ) -> None:\n        \"\"\" Define a new label question for `Settings` of a `Dataset`. A label \\\n            question is a question where the user can select one label from \\\n            a list of available labels.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            title (Optional[str]): The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n        \"\"\"\n        self._model = LabelQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=LabelQuestionSettings(\n                options=self._render_values_as_options(labels), visible_options=visible_labels\n            ),\n        )\n\n    @classmethod\n    def from_model(cls, model: LabelQuestionModel) -> \"LabelQuestion\":\n        instance = cls(name=model.name, labels=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"LabelQuestion\":\n        model = LabelQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    ##############################\n    # Public properties\n    ##############################\n\n    @property\n    def labels(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @labels.setter\n    def labels(self, labels: List[str]) -> None:\n        self._model.settings.options = self._render_values_as_options(labels)\n\n    @property\n    def visible_labels(self) -> Optional[int]:\n        return self._model.settings.visible_options\n\n    @visible_labels.setter\n    def visible_labels(self, visible_labels: Optional[int]) -> None:\n        self._model.settings.visible_options = visible_labels\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.LabelQuestion.__init__","title":"__init__(name, labels, title=None, description=None, required=True, visible_labels=None)","text":"

    Define a new label question for Settings of a Dataset. A label question is a question where the user can select one label from a list of available labels.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required labels Union[List[str], Dict[str, str]]

    The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

    required title Optional[str]

    The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True visible_labels Optional[int]

    The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

    None Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    labels: Union[List[str], Dict[str, str]],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n    visible_labels: Optional[int] = None,\n) -> None:\n    \"\"\" Define a new label question for `Settings` of a `Dataset`. A label \\\n        question is a question where the user can select one label from \\\n        a list of available labels.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        title (Optional[str]): The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n        visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n            Setting it to None show all options.\n    \"\"\"\n    self._model = LabelQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=LabelQuestionSettings(\n            options=self._render_values_as_options(labels), visible_options=visible_labels\n        ),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.MultiLabelQuestion","title":"MultiLabelQuestion","text":"

    Bases: LabelQuestion

    Source code in src/argilla/settings/_question.py
    class MultiLabelQuestion(LabelQuestion):\n    _model: MultiLabelQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        labels: Union[List[str], Dict[str, str]],\n        visible_labels: Optional[int] = None,\n        labels_order: Literal[\"natural\", \"suggestion\"] = \"natural\",\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new multi-label question for `Settings` of a `Dataset`. A \\\n            multi-label question is a question where the user can select multiple \\\n            labels from a list of available labels.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n            labels_order (Literal[\"natural\", \"suggestion\"]): The order of the labels in the UI. \\\n                Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). \\\n                The score of the suggestion will be taken into account for ordering if available.\n            title (Optional[str]: The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = MultiLabelQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=MultiLabelQuestionSettings(\n                options=self._render_values_as_options(labels),\n                visible_options=visible_labels,\n                options_order=labels_order,\n            ),\n        )\n\n    @classmethod\n    def from_model(cls, model: MultiLabelQuestionModel) -> \"MultiLabelQuestion\":\n        instance = cls(\n            name=model.name,\n            labels=cls._render_options_as_values(model.settings.options),\n            labels_order=model.settings.options_order,\n        )\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"MultiLabelQuestion\":\n        model = MultiLabelQuestionModel(**data)\n        return cls.from_model(model=model)\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.MultiLabelQuestion.__init__","title":"__init__(name, labels, visible_labels=None, labels_order='natural', title=None, description=None, required=True)","text":"

    Create a new multi-label question for Settings of a Dataset. A multi-label question is a question where the user can select multiple labels from a list of available labels.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required labels Union[List[str], Dict[str, str]]

    The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

    required visible_labels Optional[int]

    The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

    None labels_order Literal['natural', 'suggestion']

    The order of the labels in the UI. Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). The score of the suggestion will be taken into account for ordering if available.

    'natural' title Optional[str]

    The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    labels: Union[List[str], Dict[str, str]],\n    visible_labels: Optional[int] = None,\n    labels_order: Literal[\"natural\", \"suggestion\"] = \"natural\",\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new multi-label question for `Settings` of a `Dataset`. A \\\n        multi-label question is a question where the user can select multiple \\\n        labels from a list of available labels.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n            Setting it to None show all options.\n        labels_order (Literal[\"natural\", \"suggestion\"]): The order of the labels in the UI. \\\n            Can be either \"natural\" (order in which they were specified) or \"suggestion\" (order prioritizing those associated with a suggestion). \\\n            The score of the suggestion will be taken into account for ordering if available.\n        title (Optional[str]: The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = MultiLabelQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=MultiLabelQuestionSettings(\n            options=self._render_values_as_options(labels),\n            visible_options=visible_labels,\n            options_order=labels_order,\n        ),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RankingQuestion","title":"RankingQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class RankingQuestion(QuestionPropertyBase):\n    _model: RankingQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        values: Union[List[str], Dict[str, str]],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new ranking question for `Settings` of a `Dataset`. A ranking question \\\n            is a question where the user can rank a list of options.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            values (Union[List[str], Dict[str, str]]): The list of options to be ranked, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = RankingQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=RankingQuestionSettings(options=self._render_values_as_options(values)),\n        )\n\n    @classmethod\n    def from_model(cls, model: RankingQuestionModel) -> \"RankingQuestion\":\n        instance = cls(name=model.name, values=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"RankingQuestion\":\n        model = RankingQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def values(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @values.setter\n    def values(self, values: List[int]) -> None:\n        self._model.settings.options = self._render_values_as_options(values)\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RankingQuestion.__init__","title":"__init__(name, values, title=None, description=None, required=True)","text":"

    Create a new ranking question for Settings of a Dataset. A ranking question is a question where the user can rank a list of options.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required values Union[List[str], Dict[str, str]]

    The list of options to be ranked, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

    required title Optional[str]

    ) The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    values: Union[List[str], Dict[str, str]],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new ranking question for `Settings` of a `Dataset`. A ranking question \\\n        is a question where the user can rank a list of options.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        values (Union[List[str], Dict[str, str]]): The list of options to be ranked, or a \\\n            dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n        title (Optional[str]:) The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = RankingQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=RankingQuestionSettings(options=self._render_values_as_options(values)),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.TextQuestion","title":"TextQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class TextQuestion(QuestionPropertyBase):\n    _model: TextQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n        use_markdown: bool = False,\n    ) -> None:\n        \"\"\"Create a new text question for `Settings` of a `Dataset`. A text question \\\n            is a question where the user can input text.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            title (Optional[str]): The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n            use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n                to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n        \"\"\"\n        self._model = TextQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=TextQuestionSettings(use_markdown=use_markdown),\n        )\n\n    @classmethod\n    def from_model(cls, model: TextQuestionModel) -> \"TextQuestion\":\n        instance = cls(name=model.name)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"TextQuestion\":\n        model = TextQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def use_markdown(self) -> bool:\n        return self._model.settings.use_markdown\n\n    @use_markdown.setter\n    def use_markdown(self, use_markdown: bool) -> None:\n        self._model.settings.use_markdown = use_markdown\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.TextQuestion.__init__","title":"__init__(name, title=None, description=None, required=True, use_markdown=False)","text":"

    Create a new text question for Settings of a Dataset. A text question is a question where the user can input text.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required title Optional[str]

    The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True use_markdown Optional[bool]

    Whether to render the markdown in the UI. When True, you will be able to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.

    False Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n    use_markdown: bool = False,\n) -> None:\n    \"\"\"Create a new text question for `Settings` of a `Dataset`. A text question \\\n        is a question where the user can input text.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        title (Optional[str]): The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n        use_markdown (Optional[bool]): Whether to render the markdown in the UI. When True, you will be able \\\n            to use all the Markdown features for text formatting, including LaTex formulas and embedding multimedia content and PDFs.\n    \"\"\"\n    self._model = TextQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=TextQuestionSettings(use_markdown=use_markdown),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RatingQuestion","title":"RatingQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class RatingQuestion(QuestionPropertyBase):\n    _model: RatingQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        values: List[int],\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ) -> None:\n        \"\"\"Create a new rating question for `Settings` of a `Dataset`. A rating question \\\n            is a question where the user can select a value from a sequential list of options.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            values (List[int]): The list of selectable values. It should be defined in the range [0, 10].\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n        self._model = RatingQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            values=values,\n            settings=RatingQuestionSettings(options=self._render_values_as_options(values)),\n        )\n\n    @classmethod\n    def from_model(cls, model: RatingQuestionModel) -> \"RatingQuestion\":\n        instance = cls(name=model.name, values=cls._render_options_as_values(model.settings.options))\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"RatingQuestion\":\n        model = RatingQuestionModel(**data)\n        return cls.from_model(model=model)\n\n    @property\n    def values(self) -> List[int]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @values.setter\n    def values(self, values: List[int]) -> None:\n        self._model.values = self._render_values_as_options(values)\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.RatingQuestion.__init__","title":"__init__(name, values, title=None, description=None, required=True)","text":"

    Create a new rating question for Settings of a Dataset. A rating question is a question where the user can select a value from a sequential list of options.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required values List[int]

    The list of selectable values. It should be defined in the range [0, 10].

    required title Optional[str]

    ) The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    values: List[int],\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n) -> None:\n    \"\"\"Create a new rating question for `Settings` of a `Dataset`. A rating question \\\n        is a question where the user can select a value from a sequential list of options.\n\n    Parameters:\n        name (str): The name of the question to be used as a reference.\n        values (List[int]): The list of selectable values. It should be defined in the range [0, 10].\n        title (Optional[str]:) The title of the question to be shown in the UI.\n        description (Optional[str]): The description of the question to be shown in the UI.\n        required (bool): If the question is required for a record to be valid. At least one question must be required.\n    \"\"\"\n    self._model = RatingQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        values=values,\n        settings=RatingQuestionSettings(options=self._render_values_as_options(values)),\n    )\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.SpanQuestion","title":"SpanQuestion","text":"

    Bases: QuestionPropertyBase

    Source code in src/argilla/settings/_question.py
    class SpanQuestion(QuestionPropertyBase):\n    _model: SpanQuestionModel\n\n    def __init__(\n        self,\n        name: str,\n        field: str,\n        labels: Union[List[str], Dict[str, str]],\n        allow_overlapping: bool = False,\n        visible_labels: Optional[int] = None,\n        title: Optional[str] = None,\n        description: Optional[str] = None,\n        required: bool = True,\n    ):\n        \"\"\" Create a new span question for `Settings` of a `Dataset`. A span question \\\n            is a question where the user can select a section of text within a text field \\\n            and assign it a label.\n\n            Parameters:\n                name (str): The name of the question to be used as a reference.\n                field (str): The name of the text field where the span question will be applied.\n                labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                    dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n                allow_overlapping (bool) This value specifies whether overlapped spans are allowed or not.\n                visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                    Setting it to None show all options.\n                title (Optional[str]:) The title of the question to be shown in the UI.\n                description (Optional[str]): The description of the question to be shown in the UI.\n                required (bool): If the question is required for a record to be valid. At least one question must be required.\n            \"\"\"\n        self._model = SpanQuestionModel(\n            name=name,\n            title=title,\n            description=description,\n            required=required,\n            settings=SpanQuestionSettings(\n                field=field,\n                allow_overlapping=allow_overlapping,\n                visible_options=visible_labels,\n                options=self._render_values_as_options(labels),\n            ),\n        )\n\n    @property\n    def name(self):\n        return self._model.name\n\n    @property\n    def field(self):\n        return self._model.settings.field\n\n    @field.setter\n    def field(self, field: str):\n        self._model.settings.field = field\n\n    @property\n    def allow_overlapping(self):\n        return self._model.settings.allow_overlapping\n\n    @allow_overlapping.setter\n    def allow_overlapping(self, allow_overlapping: bool):\n        self._model.settings.allow_overlapping = allow_overlapping\n\n    @property\n    def visible_labels(self) -> Optional[int]:\n        return self._model.settings.visible_options\n\n    @visible_labels.setter\n    def visible_labels(self, visible_labels: Optional[int]) -> None:\n        self._model.settings.visible_options = visible_labels\n\n    @property\n    def labels(self) -> List[str]:\n        return self._render_options_as_labels(self._model.settings.options)\n\n    @labels.setter\n    def labels(self, labels: List[str]) -> None:\n        self._model.settings.options = self._render_values_as_options(labels)\n\n    @classmethod\n    def from_model(cls, model: SpanQuestionModel) -> \"SpanQuestion\":\n        instance = cls(\n            name=model.name,\n            field=model.settings.field,\n            labels=cls._render_options_as_values(model.settings.options),\n        )\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"SpanQuestion\":\n        model = SpanQuestionModel(**data)\n        return cls.from_model(model=model)\n
    "},{"location":"reference/argilla/settings/questions/#src.argilla.settings._question.SpanQuestion.__init__","title":"__init__(name, field, labels, allow_overlapping=False, visible_labels=None, title=None, description=None, required=True)","text":"

    Create a new span question for Settings of a Dataset. A span question is a question where the user can select a section of text within a text field and assign it a label.

    Parameters:

    Name Type Description Default name str

    The name of the question to be used as a reference.

    required field str

    The name of the text field where the span question will be applied.

    required labels Union[List[str], Dict[str, str]]

    The list of available labels for the question, or a dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.

    required visible_labels Optional[int]

    The number of visible labels for the question to be shown in the UI. Setting it to None show all options.

    None title Optional[str]

    ) The title of the question to be shown in the UI.

    None description Optional[str]

    The description of the question to be shown in the UI.

    None required bool

    If the question is required for a record to be valid. At least one question must be required.

    True Source code in src/argilla/settings/_question.py
    def __init__(\n    self,\n    name: str,\n    field: str,\n    labels: Union[List[str], Dict[str, str]],\n    allow_overlapping: bool = False,\n    visible_labels: Optional[int] = None,\n    title: Optional[str] = None,\n    description: Optional[str] = None,\n    required: bool = True,\n):\n    \"\"\" Create a new span question for `Settings` of a `Dataset`. A span question \\\n        is a question where the user can select a section of text within a text field \\\n        and assign it a label.\n\n        Parameters:\n            name (str): The name of the question to be used as a reference.\n            field (str): The name of the text field where the span question will be applied.\n            labels (Union[List[str], Dict[str, str]]): The list of available labels for the question, or a \\\n                dictionary of key-value pairs where the key is the label and the value is the label name displayed in the UI.\n            allow_overlapping (bool) This value specifies whether overlapped spans are allowed or not.\n            visible_labels (Optional[int]): The number of visible labels for the question to be shown in the UI. \\\n                Setting it to None show all options.\n            title (Optional[str]:) The title of the question to be shown in the UI.\n            description (Optional[str]): The description of the question to be shown in the UI.\n            required (bool): If the question is required for a record to be valid. At least one question must be required.\n        \"\"\"\n    self._model = SpanQuestionModel(\n        name=name,\n        title=title,\n        description=description,\n        required=required,\n        settings=SpanQuestionSettings(\n            field=field,\n            allow_overlapping=allow_overlapping,\n            visible_options=visible_labels,\n            options=self._render_values_as_options(labels),\n        ),\n    )\n
    "},{"location":"reference/argilla/settings/settings/","title":"rg.Settings","text":"

    rg.Settings is used to define the setttings of an Argilla Dataset. The settings can be used to configure the behavior of the dataset, such as the fields, questions, guidelines, metadata, and vectors. The Settings class is passed to the Dataset class and used to create the dataset on the server. Once created, the settings of a dataset cannot be changed.

    "},{"location":"reference/argilla/settings/settings/#usage-examples","title":"Usage Examples","text":""},{"location":"reference/argilla/settings/settings/#creating-a-new-dataset-with-settings","title":"Creating a new dataset with settings","text":"

    To create a new dataset with settings, instantiate the Settings class and pass it to the Dataset class.

    import argilla as rg\n\nsettings = rg.Settings(\n    guidelines=\"Select the sentiment of the prompt.\",\n    fields=[rg.TextField(name=\"prompt\", use_markdown=True)],\n    questions=[rg.LabelQuestion(name=\"sentiment\", labels=[\"positive\", \"negative\"])],\n)\n\ndataset = rg.Dataset(name=\"sentiment_analysis\", settings=settings)\n\n# Create the dataset on the server\ndataset.create()\n

    To define the settings for fields, questions, metadata, vectors, or distribution, refer to the rg.TextField, rg.LabelQuestion, rg.TermsMetadataProperty, and rg.VectorField, rg.TaskDistribution class documentation.

    "},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings","title":"Settings","text":"

    Bases: Resource

    Settings class for Argilla Datasets.

    This class is used to define the representation of a Dataset within the UI.

    Source code in src/argilla/settings/_resource.py
    class Settings(Resource):\n    \"\"\"\n    Settings class for Argilla Datasets.\n\n    This class is used to define the representation of a Dataset within the UI.\n    \"\"\"\n\n    def __init__(\n        self,\n        fields: Optional[List[TextField]] = None,\n        questions: Optional[List[QuestionType]] = None,\n        vectors: Optional[List[VectorField]] = None,\n        metadata: Optional[List[MetadataType]] = None,\n        guidelines: Optional[str] = None,\n        allow_extra_metadata: bool = False,\n        distribution: Optional[TaskDistribution] = None,\n        _dataset: Optional[\"Dataset\"] = None,\n    ) -> None:\n        \"\"\"\n        Args:\n            fields (List[TextField]): A list of TextField objects that represent the fields in the Dataset.\n            questions (List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]):\n                A list of Question objects that represent the questions in the Dataset.\n            vectors (List[VectorField]): A list of VectorField objects that represent the vectors in the Dataset.\n            metadata (List[MetadataField]): A list of MetadataField objects that represent the metadata in the Dataset.\n            guidelines (str): A string containing the guidelines for the Dataset.\n            allow_extra_metadata (bool): A boolean that determines whether or not extra metadata is allowed in the\n                Dataset. Defaults to False.\n            distribution (TaskDistribution): The annotation task distribution configuration.\n                Default to DEFAULT_TASK_DISTRIBUTION\n        \"\"\"\n        super().__init__(client=_dataset._client if _dataset else None)\n\n        self._dataset = _dataset\n        self._distribution = distribution\n        self.__guidelines = self.__process_guidelines(guidelines)\n        self.__allow_extra_metadata = allow_extra_metadata\n\n        self.__questions = QuestionsProperties(self, questions)\n        self.__fields = SettingsProperties(self, fields)\n        self.__vectors = SettingsProperties(self, vectors)\n        self.__metadata = SettingsProperties(self, metadata)\n\n    #####################\n    # Properties        #\n    #####################\n\n    @property\n    def fields(self) -> \"SettingsProperties\":\n        return self.__fields\n\n    @fields.setter\n    def fields(self, fields: List[TextField]):\n        self.__fields = SettingsProperties(self, fields)\n\n    @property\n    def questions(self) -> \"SettingsProperties\":\n        return self.__questions\n\n    @questions.setter\n    def questions(self, questions: List[QuestionType]):\n        self.__questions = QuestionsProperties(self, questions)\n\n    @property\n    def vectors(self) -> \"SettingsProperties\":\n        return self.__vectors\n\n    @vectors.setter\n    def vectors(self, vectors: List[VectorField]):\n        self.__vectors = SettingsProperties(self, vectors)\n\n    @property\n    def metadata(self) -> \"SettingsProperties\":\n        return self.__metadata\n\n    @metadata.setter\n    def metadata(self, metadata: List[MetadataType]):\n        self.__metadata = SettingsProperties(self, metadata)\n\n    @property\n    def guidelines(self) -> str:\n        return self.__guidelines\n\n    @guidelines.setter\n    def guidelines(self, guidelines: str):\n        self.__guidelines = self.__process_guidelines(guidelines)\n\n    @property\n    def allow_extra_metadata(self) -> bool:\n        return self.__allow_extra_metadata\n\n    @allow_extra_metadata.setter\n    def allow_extra_metadata(self, value: bool):\n        self.__allow_extra_metadata = value\n\n    @property\n    def distribution(self) -> TaskDistribution:\n        return self._distribution or TaskDistribution.default()\n\n    @distribution.setter\n    def distribution(self, value: TaskDistribution) -> None:\n        self._distribution = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, dataset: \"Dataset\"):\n        self._dataset = dataset\n        self._client = dataset._client\n\n    @cached_property\n    def schema(self) -> dict:\n        schema_dict = {}\n\n        for field in self.fields:\n            schema_dict[field.name] = field\n\n        for question in self.questions:\n            schema_dict[question.name] = question\n\n        for vector in self.vectors:\n            schema_dict[vector.name] = vector\n\n        for metadata in self.metadata:\n            schema_dict[metadata.name] = metadata\n\n        return schema_dict\n\n    @cached_property\n    def schema_by_id(self) -> Dict[UUID, Union[TextField, QuestionType, MetadataType, VectorField]]:\n        return {v.id: v for v in self.schema.values()}\n\n    def validate(self) -> None:\n        self._validate_empty_settings()\n        self._validate_duplicate_names()\n\n    #####################\n    #  Public methods   #\n    #####################\n\n    def get(self) -> \"Settings\":\n        self.fields = self._fetch_fields()\n        self.questions = self._fetch_questions()\n        self.vectors = self._fetch_vectors()\n        self.metadata = self._fetch_metadata()\n        self.__fetch_dataset_related_attributes()\n\n        self._update_last_api_call()\n        return self\n\n    def create(self) -> \"Settings\":\n        self.validate()\n\n        self._update_dataset_related_attributes()\n        self.__fields.create()\n        self.__questions.create()\n        self.__vectors.create()\n        self.__metadata.create()\n\n        self._update_last_api_call()\n        return self\n\n    def update(self) -> \"Resource\":\n        self.validate()\n\n        self._update_dataset_related_attributes()\n        self.__fields.update()\n        self.__vectors.update()\n        self.__metadata.update()\n        # self.questions.update()\n\n        self._update_last_api_call()\n        return self\n\n    def serialize(self):\n        try:\n            return {\n                \"guidelines\": self.guidelines,\n                \"questions\": self.__questions.serialize(),\n                \"fields\": self.__fields.serialize(),\n                \"vectors\": self.vectors.serialize(),\n                \"metadata\": self.metadata.serialize(),\n                \"allow_extra_metadata\": self.allow_extra_metadata,\n                \"distribution\": self.distribution.to_dict(),\n            }\n        except Exception as e:\n            raise ArgillaSerializeError(f\"Failed to serialize the settings. {e.__class__.__name__}\") from e\n\n    def to_json(self, path: Union[Path, str]) -> None:\n        \"\"\"Save the settings to a file on disk\n\n        Parameters:\n            path (str): The path to save the settings to\n        \"\"\"\n        if not isinstance(path, Path):\n            path = Path(path)\n        if path.exists():\n            raise FileExistsError(f\"File {path} already exists\")\n        with open(path, \"w\") as file:\n            json.dump(self.serialize(), file)\n\n    @classmethod\n    def from_json(cls, path: Union[Path, str]) -> \"Settings\":\n        \"\"\"Load the settings from a file on disk\"\"\"\n\n        with open(path, \"r\") as file:\n            settings_dict = json.load(file)\n            return cls._from_dict(settings_dict)\n\n    def __eq__(self, other: \"Settings\") -> bool:\n        return self.serialize() == other.serialize()  # TODO: Create proper __eq__ methods for fields and questions\n\n    #####################\n    #  Repr Methods     #\n    #####################\n\n    def __repr__(self) -> str:\n        return (\n            f\"Settings(guidelines={self.guidelines}, allow_extra_metadata={self.allow_extra_metadata}, \"\n            f\"distribution={self.distribution}, \"\n            f\"fields={self.fields}, questions={self.questions}, vectors={self.vectors}, metadata={self.metadata})\"\n        )\n\n    #####################\n    #  Private methods  #\n    #####################\n\n    @classmethod\n    def _from_dict(cls, settings_dict: dict) -> \"Settings\":\n        fields = settings_dict.get(\"fields\", [])\n        vectors = settings_dict.get(\"vectors\", [])\n        metadata = settings_dict.get(\"metadata\", [])\n        guidelines = settings_dict.get(\"guidelines\")\n        distribution = settings_dict.get(\"distribution\")\n        allow_extra_metadata = settings_dict.get(\"allow_extra_metadata\")\n\n        questions = [question_from_dict(question) for question in settings_dict.get(\"questions\", [])]\n        fields = [TextField.from_dict(field) for field in fields]\n        vectors = [VectorField.from_dict(vector) for vector in vectors]\n        metadata = [MetadataField.from_dict(metadata) for metadata in metadata]\n\n        if distribution:\n            distribution = TaskDistribution.from_dict(distribution)\n\n        return cls(\n            questions=questions,\n            fields=fields,\n            vectors=vectors,\n            metadata=metadata,\n            guidelines=guidelines,\n            allow_extra_metadata=allow_extra_metadata,\n            distribution=distribution,\n        )\n\n    def _copy(self) -> \"Settings\":\n        instance = self.__class__._from_dict(self.serialize())\n        return instance\n\n    def _fetch_fields(self) -> List[TextField]:\n        models = self._client.api.fields.list(dataset_id=self._dataset.id)\n        return [TextField.from_model(model) for model in models]\n\n    def _fetch_questions(self) -> List[QuestionType]:\n        models = self._client.api.questions.list(dataset_id=self._dataset.id)\n        return [question_from_model(model) for model in models]\n\n    def _fetch_vectors(self) -> List[VectorField]:\n        models = self.dataset._client.api.vectors.list(self.dataset.id)\n        return [VectorField.from_model(model) for model in models]\n\n    def _fetch_metadata(self) -> List[MetadataType]:\n        models = self._client.api.metadata.list(dataset_id=self._dataset.id)\n        return [MetadataField.from_model(model) for model in models]\n\n    def __fetch_dataset_related_attributes(self):\n        # This flow may be a bit weird, but it's the only way to update the dataset related attributes\n        # Everything is point that we should have several settings-related endpoints in the API to handle this.\n        # POST /api/v1/datasets/{dataset_id}/settings\n        # {\n        #   \"guidelines\": ....,\n        #   \"allow_extra_metadata\": ....,\n        # }\n        # But this is not implemented yet, so we need to update the dataset model directly\n        dataset_model = self._client.api.datasets.get(self._dataset.id)\n\n        self.guidelines = dataset_model.guidelines\n        self.allow_extra_metadata = dataset_model.allow_extra_metadata\n\n        if dataset_model.distribution:\n            self.distribution = TaskDistribution.from_model(dataset_model.distribution)\n\n    def _update_dataset_related_attributes(self):\n        # This flow may be a bit weird, but it's the only way to update the dataset related attributes\n        # Everything is point that we should have several settings-related endpoints in the API to handle this.\n        # POST /api/v1/datasets/{dataset_id}/settings\n        # {\n        #   \"guidelines\": ....,\n        #   \"allow_extra_metadata\": ....,\n        # }\n        # But this is not implemented yet, so we need to update the dataset model directly\n        dataset_model = DatasetModel(\n            id=self._dataset.id,\n            name=self._dataset.name,\n            guidelines=self.guidelines,\n            allow_extra_metadata=self.allow_extra_metadata,\n            distribution=self.distribution._api_model(),\n        )\n        self._client.api.datasets.update(dataset_model)\n\n    def _validate_empty_settings(self):\n        if not all([self.fields, self.questions]):\n            message = \"Fields and questions are required\"\n            raise SettingsError(message=message)\n\n    def _validate_duplicate_names(self) -> None:\n        dataset_properties_by_name = {}\n\n        for properties in [self.fields, self.questions, self.vectors, self.metadata]:\n            for property in properties:\n                if property.name in dataset_properties_by_name:\n                    raise SettingsError(\n                        f\"names of dataset settings must be unique, \"\n                        f\"but the name {property.name!r} is used by {type(property).__name__!r} and {type(dataset_properties_by_name[property.name]).__name__!r} \"\n                    )\n                dataset_properties_by_name[property.name] = property\n\n    def __process_guidelines(self, guidelines):\n        if guidelines is None:\n            return guidelines\n\n        if not isinstance(guidelines, str):\n            raise SettingsError(\"Guidelines must be a string or a path to a file\")\n\n        if os.path.exists(guidelines):\n            with open(guidelines, \"r\") as file:\n                return file.read()\n\n        return guidelines\n
    "},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.__init__","title":"__init__(fields=None, questions=None, vectors=None, metadata=None, guidelines=None, allow_extra_metadata=False, distribution=None, _dataset=None)","text":"

    Parameters:

    Name Type Description Default fields List[TextField]

    A list of TextField objects that represent the fields in the Dataset.

    None questions List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]

    A list of Question objects that represent the questions in the Dataset.

    None vectors List[VectorField]

    A list of VectorField objects that represent the vectors in the Dataset.

    None metadata List[MetadataField]

    A list of MetadataField objects that represent the metadata in the Dataset.

    None guidelines str

    A string containing the guidelines for the Dataset.

    None allow_extra_metadata bool

    A boolean that determines whether or not extra metadata is allowed in the Dataset. Defaults to False.

    False distribution TaskDistribution

    The annotation task distribution configuration. Default to DEFAULT_TASK_DISTRIBUTION

    None Source code in src/argilla/settings/_resource.py
    def __init__(\n    self,\n    fields: Optional[List[TextField]] = None,\n    questions: Optional[List[QuestionType]] = None,\n    vectors: Optional[List[VectorField]] = None,\n    metadata: Optional[List[MetadataType]] = None,\n    guidelines: Optional[str] = None,\n    allow_extra_metadata: bool = False,\n    distribution: Optional[TaskDistribution] = None,\n    _dataset: Optional[\"Dataset\"] = None,\n) -> None:\n    \"\"\"\n    Args:\n        fields (List[TextField]): A list of TextField objects that represent the fields in the Dataset.\n        questions (List[Union[LabelQuestion, MultiLabelQuestion, RankingQuestion, TextQuestion, RatingQuestion]]):\n            A list of Question objects that represent the questions in the Dataset.\n        vectors (List[VectorField]): A list of VectorField objects that represent the vectors in the Dataset.\n        metadata (List[MetadataField]): A list of MetadataField objects that represent the metadata in the Dataset.\n        guidelines (str): A string containing the guidelines for the Dataset.\n        allow_extra_metadata (bool): A boolean that determines whether or not extra metadata is allowed in the\n            Dataset. Defaults to False.\n        distribution (TaskDistribution): The annotation task distribution configuration.\n            Default to DEFAULT_TASK_DISTRIBUTION\n    \"\"\"\n    super().__init__(client=_dataset._client if _dataset else None)\n\n    self._dataset = _dataset\n    self._distribution = distribution\n    self.__guidelines = self.__process_guidelines(guidelines)\n    self.__allow_extra_metadata = allow_extra_metadata\n\n    self.__questions = QuestionsProperties(self, questions)\n    self.__fields = SettingsProperties(self, fields)\n    self.__vectors = SettingsProperties(self, vectors)\n    self.__metadata = SettingsProperties(self, metadata)\n
    "},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.to_json","title":"to_json(path)","text":"

    Save the settings to a file on disk

    Parameters:

    Name Type Description Default path str

    The path to save the settings to

    required Source code in src/argilla/settings/_resource.py
    def to_json(self, path: Union[Path, str]) -> None:\n    \"\"\"Save the settings to a file on disk\n\n    Parameters:\n        path (str): The path to save the settings to\n    \"\"\"\n    if not isinstance(path, Path):\n        path = Path(path)\n    if path.exists():\n        raise FileExistsError(f\"File {path} already exists\")\n    with open(path, \"w\") as file:\n        json.dump(self.serialize(), file)\n
    "},{"location":"reference/argilla/settings/settings/#src.argilla.settings._resource.Settings.from_json","title":"from_json(path) classmethod","text":"

    Load the settings from a file on disk

    Source code in src/argilla/settings/_resource.py
    @classmethod\ndef from_json(cls, path: Union[Path, str]) -> \"Settings\":\n    \"\"\"Load the settings from a file on disk\"\"\"\n\n    with open(path, \"r\") as file:\n        settings_dict = json.load(file)\n        return cls._from_dict(settings_dict)\n
    "},{"location":"reference/argilla/settings/task_distribution/","title":"Distribution","text":"

    Distribution settings are used to define the criteria used by the tool to automatically manage records in the dataset depending on the expected number of submitted responses per record.

    "},{"location":"reference/argilla/settings/task_distribution/#usage-examples","title":"Usage Examples","text":"

    The default minimum submitted responses per record is 1. If you wish to increase this value, you can define it through the TaskDistribution class and pass it to the Settings class.

    settings = rg.Settings(\n    guidelines=\"These are some guidelines.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"label\",\n            labels=[\"label_1\", \"label_2\", \"label_3\"]\n        ),\n    ],\n    distribution=rg.TaskDistribution(min_submitted=3)\n)\n\ndataset = rg.Dataset(\n    name=\"my_dataset\",\n    settings=settings\n)\n
    "},{"location":"reference/argilla/settings/task_distribution/#src.argilla.settings._task_distribution.OverlapTaskDistribution","title":"OverlapTaskDistribution","text":"

    The task distribution settings class.

    This task distribution defines a number of submitted responses required to complete a record.

    Parameters:

    Name Type Description Default min_submitted int

    The number of min. submitted responses to complete the record

    required Source code in src/argilla/settings/_task_distribution.py
    class OverlapTaskDistribution:\n    \"\"\"The task distribution settings class.\n\n    This task distribution defines a number of submitted responses required to complete a record.\n\n    Parameters:\n        min_submitted (int): The number of min. submitted responses to complete the record\n    \"\"\"\n\n    strategy: Literal[\"overlap\"] = \"overlap\"\n\n    def __init__(self, min_submitted: int):\n        self._model = OverlapTaskDistributionModel(min_submitted=min_submitted, strategy=self.strategy)\n\n    def __repr__(self) -> str:\n        return f\"OverlapTaskDistribution(min_submitted={self.min_submitted})\"\n\n    def __eq__(self, other) -> bool:\n        if not isinstance(other, self.__class__):\n            return False\n\n        return self._model == other._model\n\n    @classmethod\n    def default(cls) -> \"OverlapTaskDistribution\":\n        return cls(min_submitted=1)\n\n    @property\n    def min_submitted(self):\n        return self._model.min_submitted\n\n    @min_submitted.setter\n    def min_submitted(self, value: int):\n        self._model.min_submitted = value\n\n    @classmethod\n    def from_model(cls, model: OverlapTaskDistributionModel) -> \"OverlapTaskDistribution\":\n        return cls(min_submitted=model.min_submitted)\n\n    @classmethod\n    def from_dict(cls, dict: Dict[str, Any]) -> \"OverlapTaskDistribution\":\n        return cls.from_model(OverlapTaskDistributionModel.model_validate(dict))\n\n    def to_dict(self):\n        return self._model.model_dump()\n\n    def _api_model(self) -> OverlapTaskDistributionModel:\n        return self._model\n
    "},{"location":"reference/argilla/settings/vectors/","title":"Vectors","text":"

    Vector fields in Argilla are used to define the vector form of a record that will be reviewed by a user.

    "},{"location":"reference/argilla/settings/vectors/#usage-examples","title":"Usage Examples","text":"

    To define a vector field, instantiate the VectorField class with a name and dimensions, then pass it to the vectors parameter of the Settings class.

    settings = rg.Settings(\n    fields=[\n        rg.TextField(name=\"text\"),\n    ],\n    vectors=[\n        rg.VectorField(\n            name=\"my_vector\",\n            dimension=768,\n            title=\"Document Embedding\",\n        ),\n    ],\n)\n

    To add records with vectors, refer to the rg.Vector class documentation.

    "},{"location":"reference/argilla/settings/vectors/#src.argilla.settings._vector.VectorField","title":"VectorField","text":"

    Bases: Resource

    Vector field for use in Argilla Dataset Settings

    Source code in src/argilla/settings/_vector.py
    class VectorField(Resource):\n    \"\"\"Vector field for use in Argilla `Dataset` `Settings`\"\"\"\n\n    _model: VectorFieldModel\n    _api: VectorsAPI\n    _dataset: Optional[\"Dataset\"]\n\n    def __init__(\n        self,\n        name: str,\n        dimensions: int,\n        title: Optional[str] = None,\n        _client: Optional[\"Argilla\"] = None,\n    ) -> None:\n        \"\"\"Vector field for use in Argilla `Dataset` `Settings`\n\n        Parameters:\n            name (str): The name of the vector field\n            dimensions (int): The number of dimensions in the vector\n            title (Optional[str]): The title of the vector to be shown in the UI.\n        \"\"\"\n        client = _client or Argilla._get_default()\n        super().__init__(api=client.api.vectors, client=client)\n        self._model = VectorFieldModel(name=name, title=title, dimensions=dimensions)\n        self._dataset = None\n\n    @property\n    def name(self) -> str:\n        return self._model.name\n\n    @name.setter\n    def name(self, value: str) -> None:\n        self._model.name = value\n\n    @property\n    def title(self) -> Optional[str]:\n        return self._model.title\n\n    @title.setter\n    def title(self, value: Optional[str]) -> None:\n        self._model.title = value\n\n    @property\n    def dimensions(self) -> int:\n        return self._model.dimensions\n\n    @dimensions.setter\n    def dimensions(self, value: int) -> None:\n        self._model.dimensions = value\n\n    @property\n    def dataset(self) -> \"Dataset\":\n        return self._dataset\n\n    @dataset.setter\n    def dataset(self, value: \"Dataset\") -> None:\n        self._dataset = value\n        self._model.dataset_id = self._dataset.id\n        self._with_client(self._dataset._client)\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(name={self.name}, title={self.title}, dimensions={self.dimensions})\"\n\n    @classmethod\n    def from_model(cls, model: VectorFieldModel) -> \"VectorField\":\n        instance = cls(name=model.name, dimensions=model.dimensions)\n        instance._model = model\n\n        return instance\n\n    @classmethod\n    def from_dict(cls, data: dict) -> \"VectorField\":\n        model = VectorFieldModel(**data)\n        return cls.from_model(model=model)\n\n    def _with_client(self, client: \"Argilla\") -> \"VectorField\":\n        # TODO: Review and simplify. Maybe only one of them is required\n        self._client = client\n        self._api = self._client.api.vectors\n\n        return self\n
    "},{"location":"reference/argilla/settings/vectors/#src.argilla.settings._vector.VectorField.__init__","title":"__init__(name, dimensions, title=None, _client=None)","text":"

    Vector field for use in Argilla Dataset Settings

    Parameters:

    Name Type Description Default name str

    The name of the vector field

    required dimensions int

    The number of dimensions in the vector

    required title Optional[str]

    The title of the vector to be shown in the UI.

    None Source code in src/argilla/settings/_vector.py
    def __init__(\n    self,\n    name: str,\n    dimensions: int,\n    title: Optional[str] = None,\n    _client: Optional[\"Argilla\"] = None,\n) -> None:\n    \"\"\"Vector field for use in Argilla `Dataset` `Settings`\n\n    Parameters:\n        name (str): The name of the vector field\n        dimensions (int): The number of dimensions in the vector\n        title (Optional[str]): The title of the vector to be shown in the UI.\n    \"\"\"\n    client = _client or Argilla._get_default()\n    super().__init__(api=client.api.vectors, client=client)\n    self._model = VectorFieldModel(name=name, title=title, dimensions=dimensions)\n    self._dataset = None\n
    "},{"location":"reference/argilla-server/configuration/","title":"Server configuration","text":"

    This section explains advanced operations and settings for running the Argilla Server and Argilla Python Client.

    By default, the Argilla Server will look for your Elasticsearch (ES) endpoint at http://localhost:9200. You can customize this by setting the ARGILLA_ELASTICSEARCH environment variable. Have a look at the list of available environment variables to further configure the Argilla server.

    From the Argilla version 1.19.0, you must set up the search engine manually to work with datasets. You should set the environment variable ARGILLA_SEARCH_ENGINE=opensearch or ARGILLA_SEARCH_ENGINE=elasticsearch depending on the backend you're using The default value for this variable is set to elasticsearch. The minimal version for Elasticsearch is 8.5.0, and for Opensearch is 2.4.0. Please, review your backend and upgrade it if necessary.

    Warning

    For vector search in OpenSearch, the filtering applied is using a post_filter step, since there is a bug that makes queries fail using filtering + knn from Argilla. See https://github.com/opensearch-project/k-NN/issues/1286

    This may result in unexpected results when combining filtering with vector search with this engine.

    "},{"location":"reference/argilla-server/configuration/#launching","title":"Launching","text":""},{"location":"reference/argilla-server/configuration/#using-a-proxy","title":"Using a proxy","text":"

    If you run Argilla behind a proxy by adding some extra prefix to expose the service, you should set the ARGILLA_BASE_URL environment variable to properly route requests to the server application.

    For example, if your proxy exposes Argilla in the URL https://my-proxy/custom-path-for-argilla, you should launch the Argilla server with ARGILLA_BASE_URL=/custom-path-for-argilla.

    NGINX and Traefik have been tested and are known to work with Argilla:

    • NGINX example
    • Traefik example
    "},{"location":"reference/argilla-server/configuration/#environment-variables","title":"Environment variables","text":"

    You can set the following environment variables to further configure your server and client.

    "},{"location":"reference/argilla-server/configuration/#server","title":"Server","text":""},{"location":"reference/argilla-server/configuration/#fastapi","title":"FastAPI","text":"
    • ARGILLA_HOME_PATH: The directory where Argilla will store all the files needed to run. If the path doesn't exist it will be automatically created (Default: ~/.argilla).

    • ARGILLA_BASE_URL: If you want to launch the Argilla server in a specific base path other than /, you should set up this environment variable. This can be useful when running Argilla behind a proxy that adds a prefix path to route the service (Default: \"/\").

    • ARGILLA_CORS_ORIGINS: List of host patterns for CORS origin access.

    • ARGILLA_DOCS_ENABLED: If False, disables openapi docs endpoint at /api/docs.

    • ARGILLA_ENABLE_TELEMETRY: If False, disables telemetry for usage metrics.

    "},{"location":"reference/argilla-server/configuration/#authentication","title":"Authentication","text":"
    • ARGILLA_AUTH_SECRET_KEY: The secret key used to sign the API token data. You can use openssl rand -hex 32 to generate a 32 character string to use with this environment variable. By default a random value is generated, so if you are using more than one server worker (or more than one Argilla server) you will need to set the same value for all of them.
    "},{"location":"reference/argilla-server/configuration/#database","title":"Database","text":"
    • ARGILLA_DATABASE_URL: A URL string that contains the necessary information to connect to a database. Argilla uses SQLite by default, PostgreSQL is also officially supported (Default: sqlite:///$ARGILLA_HOME_PATH/argilla.db?check_same_thread=False).
    "},{"location":"reference/argilla-server/configuration/#sqlite","title":"SQLite","text":"

    The following environment variables are useful only when SQLite is used:

    • ARGILLA_DATABASE_SQLITE_TIMEOUT: How many seconds the connection should wait before raising an OperationalError when a table is locked. If another connection opens a transaction to modify a table, that table will be locked until the transaction is committed. (Defaut: 15 seconds).
    "},{"location":"reference/argilla-server/configuration/#postgresql","title":"PostgreSQL","text":"

    The following environment variables are useful only when PostgreSQL is used:

    • ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE: The number of connections to keep open inside the database connection pool (Default: 15).

    • ARGILLA_DATABASE_POSTGRESQL_MAX_OVERFLOW: The number of connections that can be opened above and beyond ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE setting (Default: 10).

    "},{"location":"reference/argilla-server/configuration/#search-engine","title":"Search engine","text":"
    • ARGILLA_ELASTICSEARCH: URL of the connection endpoint of the Elasticsearch instance (Default: http://localhost:9200).

    • ARGILLA_SEARCH_ENGINE: Search engine to use. Valid values are \"elasticsearch\" and \"opensearch\" (Default: \"elasticsearch\").

    • ARGILLA_ELASTICSEARCH_SSL_VERIFY: If \"False\", disables SSL certificate verification when connecting to the Elasticsearch backend.

    • ARGILLA_ELASTICSEARCH_CA_PATH: Path to CA cert for ES host. For example: /full/path/to/root-ca.pem (Optional)

    "},{"location":"reference/argilla-server/configuration/#datasets","title":"Datasets","text":"
    • ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS: Set the number of maximum items to be allowed by label and multi label questions (Default: 500).

    • ARGILLA_SPAN_OPTIONS_MAX_ITEMS: Set the number of maximum items to be allowed by span questions (Default: 500).

    "},{"location":"reference/argilla-server/configuration/#hugging-face","title":"Hugging Face","text":"
    • ARGILLA_SHOW_HUGGINGFACE_SPACE_PERSISTENT_STORAGE_WARNING: When Argilla is running on Hugging Face Spaces you can use this environment variable to disable the warning message showed when persistent storage is disabled for the space (Default: true).
    "},{"location":"reference/argilla-server/configuration/#docker-images-only","title":"Docker images only","text":"
    • REINDEX_DATASET: If true or 1, the datasets will be reindexed in the search engine. This is needed when some search configuration changed or data must be refreshed (Default: 0).

    • USERNAME: If provided, the owner username. This can be combined with HF OAuth to define the argilla server owner (Default: \"\").

    • PASSWORD: If provided, the owner password. If USERNAME and PASSWORD are provided, the owner user will be created with these credentials on the server startup (Default: \"\").

    • API_KEY: The default user api key to user. If API_KEY is not provided, a new random api key will be generated (Default: \"\").

    "},{"location":"reference/argilla-server/configuration/#rest-api-docs","title":"REST API docs","text":"

    FastAPI also provides beautiful REST API docs that you can check at http://localhost:6900/api/v1/docs.

    "},{"location":"reference/argilla-server/telemetry/","title":"Server Telemetry","text":"

    Argilla uses telemetry to report anonymous usage and error information. As an open-source software, this type of information is important to improve and understand how the product is used.

    "},{"location":"reference/argilla-server/telemetry/#how-to-opt-out","title":"How to opt out","text":"

    You can opt out of telemetry reporting using the ENV variable ARGILLA_ENABLE_TELEMETRY before launching the server. Setting this variable to 0 will completely disable telemetry reporting.

    If you are a Linux/MacOs user, you should run:

    export ARGILLA_ENABLE_TELEMETRY=0\n

    If you are a Windows user, you should run:

    set ARGILLA_ENABLE_TELEMETRY=0\n

    To opt in again, you can set the variable to 1.

    "},{"location":"reference/argilla-server/telemetry/#why-reporting-telemetry","title":"Why reporting telemetry","text":"

    Anonymous telemetry information enables us to continuously improve the product and detect recurring problems to better serve all users. We collect aggregated information about general usage and errors. We do NOT collect any information on users' data records, datasets, or metadata information.

    "},{"location":"reference/argilla-server/telemetry/#sensitive-data","title":"Sensitive data","text":"

    We do not collect any piece of information related to the source data you store in Argilla. We don't identify individual users. Your data does not leave your server at any time:

    • No dataset record is collected.
    • No dataset names or metadata are collected.
    "},{"location":"reference/argilla-server/telemetry/#information-reported","title":"Information reported","text":"

    The following usage and error information is reported:

    • The code of the raised error and the entity type related to the error, if any (Dataset, Workspace,...)
    • The user-agent and accept-language http headers
    • Task name and number of records for bulk operations
    • An anonymous generated user uuid
    • The Argilla version running the server
    • The Python version, e.g. 3.8.13
    • The system/OS name, such as Linux, Darwin, Windows
    • The system\u2019s release version, e.g. Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020
    • The machine type, e.g. AMD64
    • The underlying platform spec with as much useful information as possible. (eg. macOS-10.16-x86_64-i386-64bit)
    • The type of deployment: huggingface_space or server
    • The dockerized deployment flag: True or False

    For transparency, you can inspect the source code where this is performed here.

    If you have any doubts, don't hesitate to join our Discord channel or open a GitHub issue. We'd be very happy to discuss how we can improve this.

    "},{"location":"tutorials/","title":"Tutorials","text":"

    These are the tutorials for the Argilla SDK. They provide step-by-step instructions for common tasks.

    • Text classification

      Learn about a standard workflow to improve data quality for a text classification task. Tutorial

    • Token classification

      Learn about a standard workflow to improve data quality for a token classification task. Tutorial

    "},{"location":"tutorials/text_classification/","title":"Text classification","text":"

    In this tutorial, we will show a standard workflow for a text classification task, in this case, using SetFit and Argilla.

    We will follow these steps:

    • Configure the Argilla dataset
    • Add initial model suggestions
    • Evaluate with Argilla
    • Train your model
    • Update the suggestions with the new model

    If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

    To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

    !pip install argilla\n
    !pip install setfit==1.0.3 transformers==4.40.2\n

    Let's make the required imports:

    import argilla as rg\n\nfrom datasets import load_dataset, Dataset\nfrom setfit import SetFitModel, Trainer, get_templated_dataset, sample_dataset\n

    You also need to connect to the Argilla server using the api_url and api_key.

    # Replace api_url with your url if using Docker\n# Replace api_key if you configured a custom API key\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"owner.apikey\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

    Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a text field and a label question.

    Note

    Check this how-to guide to know more about configuring and creating a dataset.

    labels = [\"positive\", \"negative\"]\n\nsettings = rg.Settings(\n    guidelines=\"Classify the reviews as positive or negative.\",\n    fields=[\n        rg.TextField(\n            name=\"review\",\n            title=\"Text from the review\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.LabelQuestion(\n            name=\"sentiment_label\",\n            title=\"In which category does this article fit?\",\n            labels=labels,\n        )\n    ],\n)\n

    Let's create the dataset with the name and the defined settings:

    dataset = rg.Dataset(\n    name=\"text_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

    Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the imdb dataset from the Hugging Face Hub. Specifically, we will use 100 samples from the train split.

    hf_dataset = load_dataset(\"imdb\", split=\"train[:100]\")\n

    We will easily add them to the dataset using log and the mapping, where we indicate that the column text is the data that should be added to the field review.

    dataset.records.log(records=hf_dataset, mapping={\"text\": \"review\"})\n

    The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a zero-shot SetFit model. However, you can use a framework or technique of your choice.

    We will start by defining an example training set with the required labels: positive and negative. Using get_templated_dataset will create sentences from the default template: \"This sentence is {label}.\"

    zero_ds = get_templated_dataset(\n    candidate_labels=labels,\n    sample_size=8,\n)\n

    Now, we will prepare a function to train the SetFit model.

    Note

    For further customization, you can check the SetFit documentation.

    def train_model(model_name, dataset):\n    model = SetFitModel.from_pretrained(model_name)\n\n    trainer = Trainer(\n        model=model,\n        train_dataset=dataset,\n    )\n\n    trainer.train()\n\n    return model\n

    Let's train the model. We will use TaylorAI/bge-micro-v2, available in the Hugging Face Hub.

    model = train_model(model_name=\"TaylorAI/bge-micro-v2\", dataset=zero_ds)\n

    You can save it locally or push it to the Hub. And then, load it from there.

    # Save and load locally\n# model.save_pretrained(\"text_classification_model\")\n# model = SetFitModel.from_pretrained(\"text_classification_model\")\n\n# Push and load in HF\n# model.push_to_hub(\"[username]/text_classification_model\")\n# model = SetFitModel.from_pretrained(\"[username]/text_classification_model\")\n

    It's time to make the predictions! We will set a function that uses the predict method to get the suggested label. The model will infer the label based on the text.

    def predict(model, input, labels):\n    model.labels = labels\n\n    prediction = model.predict([input])\n\n    return prediction[0]\n

    To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

    data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"sentiment_label\": predict(model, sample[\"review\"], labels),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

    Voil\u00e0! We have added the suggestions to the dataset, and they will appear in the UI marked with a \u2728.

    Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

    Note

    Check this how-to guide to know more about annotating in the UI.

    After the annotation, we will have a robust dataset to train the main model. In our case, we will fine-tune using SetFit. However, you can select the one that best fits your requirements. So, let's start by retrieving the annotated records.

    Note

    Check this how-to guide to know more about filtering and querying in Argilla.

    dataset = client.datasets(\"text_classification_dataset\")\n
    status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_list(flatten=True)\n

    As we have a single response per record, we can retrieve the selected label straightforwardly and create the training set with 8 samples per label. We selected 8 samples per label to have a balanced dataset for few-shot learning.

    train_records = [\n    {\n        \"text\": r[\"review\"],\n        \"label\": r[\"sentiment_label.responses\"][0],\n    }\n    for r in submitted\n]\ntrain_dataset = Dataset.from_list(train_records)\ntrain_dataset = sample_dataset(train_dataset, label_column=\"label\", num_samples=8)\n

    We can train the model using our previous function, but this time with a high-quality human-annotated training set.

    model = train_model(model_name=\"TaylorAI/bge-micro-v2\", dataset=train_dataset)\n

    As the training data had a better-quality, we can expect a better model. So, we can update the remaining non-annotated records with the new model's suggestions.

    data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"sentiment_label\": predict(model, sample[\"review\"], labels),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

    In this tutorial, we present an end-to-end example of a text classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

    We started by configuring the dataset, adding records, and training a zero-shot SetFit model, as an example, to add suggestions. After the annotation process, we trained a new model with the annotated data and updated the remaining records with the new suggestions.

    "},{"location":"tutorials/text_classification/#text-classification","title":"Text classification","text":""},{"location":"tutorials/text_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/text_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/text_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/text_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/text_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/text_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/text_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/text_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/text_classification/#conclusions","title":"Conclusions","text":""},{"location":"tutorials/token_classification/","title":"Token classification","text":"

    In this tutorial, we will show a standard workflow for a token classification task, in this case, using GLiNER, SpanMarker and Argilla.

    We will follow these steps:

    • Configure the Argilla dataset
    • Add initial model suggestions
    • Evaluate with Argilla
    • Train your model
    • Update the suggestions with the new model

    If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following this guide.

    To complete this tutorial, you need to install the Argilla SDK and a few third-party libraries via pip.

    !pip install argilla\n
    !pip install gliner==0.2.6 transformers==4.40.2 span_marker==1.5.0\n

    Let's make the needed imports:

    import re\n\nimport argilla as rg\n\nfrom datasets import load_dataset, Dataset, DatasetDict\nfrom gliner import GLiNER\nfrom span_marker import SpanMarkerModel, Trainer\nfrom transformers import TrainingArguments\n

    You also need to connect to the Argilla server with the api_url and api_key.

    # Replace api_url with your url if using Docker\n# Replace api_key if you configured a custom API key\n# Uncomment the last line and set your HF_TOKEN if your space is private\nclient = rg.Argilla(\n    api_url=\"https://[your-owner-name]-[your_space_name].hf.space\",\n    api_key=\"owner.apikey\",\n    # headers={\"Authorization\": f\"Bearer {HF_TOKEN}\"}\n)\n

    Now, we will need to configure the dataset. In the settings, we can specify the guidelines, fields, and questions. If needed, you can also add metadata and vectors. However, for our use case, we just need a text field and a span question. We will focus on Name Entity Recognition, but this workflow can also be applied to Span Classification, which differs in that the spans are less clearly defined and often overlap.

    labels = [\n    \"CARDINAL\",\n    \"DATE\",\n    \"PERSON\",\n    \"NORP\",\n    \"GPE\",\n    \"LAW\",\n    \"PERCENT\",\n    \"ORDINAL\",\n    \"MONEY\",\n    \"WORK_OF_ART\",\n    \"FAC\",\n    \"TIME\",\n    \"QUANTITY\",\n    \"PRODUCT\",\n    \"LANGUAGE\",\n    \"ORG\",\n    \"LOC\",\n    \"EVENT\",\n]\n\nsettings = rg.Settings(\n    guidelines=\"Classify individual tokens according to the specified categories, ensuring that any overlapping or nested entities are accurately captured.\",\n    fields=[\n        rg.TextField(\n            name=\"text\",\n            title=\"Text\",\n            use_markdown=False,\n        ),\n    ],\n    questions=[\n        rg.SpanQuestion(\n            name=\"span_label\",\n            field=\"text\",\n            labels=labels,\n            title=\"Classify the tokens according to the specified categories.\",\n            allow_overlapping=False,\n        )\n    ],\n)\n

    Let's create the dataset with the name and the defined settings:

    dataset = rg.Dataset(\n    name=\"token_classification_dataset\",\n    settings=settings,\n)\ndataset.create()\n

    We have created the dataset (you can check it in the UI), but we still need to add the data for annotation. In this case, we will use the ontonote5 dataset from the Hugging Face Hub. Specifically, we will use 2100 samples from the test split.

    hf_dataset = load_dataset(\"tner/ontonotes5\", split=\"test[:2100]\")\n

    We will iterate over the Hugging Face dataset, adding data to the corresponding field in the Record object for the Argilla dataset. Then, we will easily add them to the dataset using log.

    records = [rg.Record(fields={\"text\": \" \".join(row[\"tokens\"])}) for row in hf_dataset]\n\ndataset.records.log(records)\n

    The next step is to add suggestions to the dataset. This will make things easier and faster for the annotation team. Suggestions will appear as preselected options, so annotators will only need to correct them. In our case, we will generate them using a GLiNER model. However, you can use a framework or technique of your choice.

    Note

    For further information, you can check the GLiNER repository and the original paper.

    We will start by loading the pre-trained GLiNER model. Specifically, we will use gliner_mediumv2, available in Hugging Face Hub.

    gliner_model = GLiNER.from_pretrained(\"urchade/gliner_mediumv2.1\")\n

    Next, we will create a function to generate predictions using this general model, which can identify the specified labels without being pre-trained on them. The function will return a dictionary formatted with the necessary schema to add entities to our Argilla dataset. This schema includes the keys 'start\u2019 and \u2018end\u2019 to indicate the indices where the span begins and ends, as well as \u2018label\u2019 for the entity label.

    def predict_gliner(model, text, labels, threshold):\n    entities = model.predict_entities(text, labels, threshold)\n    return [\n        {k: v for k, v in ent.items() if k not in {\"score\", \"text\"}} for ent in entities\n    ]\n

    To update the records, we will need to retrieve them from the server and update them with the new suggestions. The id will always need to be provided as it is the records' identifier to update a record and avoid creating a new one.

    data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"span_label\": predict_gliner(\n            model=gliner_model, text=sample[\"text\"], labels=labels, threshold=0.70\n        ),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

    Voil\u00e0! We have added the suggestions to the dataset and they will appear in the UI marked with \u2728.

    Now, we can start the annotation process. Just open the dataset in the Argilla UI and start annotating the records. If the suggestions are correct, you can just click on Submit. Otherwise, you can select the correct label.

    Note

    Check this how-to guide to know more about annotating in the UI.

    After the annotation, we will have a robust dataset to train our model for entity recognition. For our case, we will train a SpanMarker model, but you can select any model of your choice. So, let's start by retrieving the annotated records.

    Note

    Check this how-to guide to learn more about filtering and querying in Argilla.

    dataset = client.datasets(\"token_classification_dataset\")\n

    In our case, we submitted 2000 annotations using the bulk view.

    status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n\nsubmitted = dataset.records(status_filter).to_list(flatten=True)\n

    SpanMarker accepts any dataset as long as it has the tokens and ner_tags columns. The ner_tags can be annotated using the IOB, IOB2, BIOES or BILOU labeling scheme, as well as regular unschemed labels. In our case, we have chosen to use the IOB format. Thus, we will define a function to extract the annotated NER tags according to this schema.

    Note

    For further information, you can check the SpanMarker documentation.

    def get_iob_tag_for_token(token_start, token_end, ner_spans):\n    for span in ner_spans:\n        if token_start &gt;= span[\"start\"] and token_end &lt;= span[\"end\"]:\n            if token_start == span[\"start\"]:\n                return f\"B-{span['label']}\"\n            else:\n                return f\"I-{span['label']}\"\n    return \"O\"\n\n\ndef extract_ner_tags(text, responses):\n    tokens = re.split(r\"(\\s+)\", text)\n    ner_tags = []\n\n    current_position = 0\n    for token in tokens:\n        if token.strip():\n            token_start = current_position\n            token_end = current_position + len(token)\n            tag = get_iob_tag_for_token(token_start, token_end, responses)\n            ner_tags.append(tag)\n        current_position += len(token)\n\n    return ner_tags\n

    Let's now extract them and save two lists with the tokens and NER tags, which will help us build our dataset to train the SpanMarker model.

    tokens = []\nner_tags = []\nfor r in submitted:\n    tags = extract_ner_tags(r[\"text\"], r[\"span_label.responses\"][0])\n    tks = r[\"text\"].split()\n    tokens.append(tks)\n    ner_tags.append(tags)\n

    In addition, we will have to indicate the labels and they should be formatted as integers. So, we will retrieve them and map them.

    labels = list(set([item for sublist in ner_tags for item in sublist]))\n\nid2label = {i: label for i, label in enumerate(labels)}\nlabel2id = {label: id_ for id_, label in id2label.items()}\n\nmapped_ner_tags = [[label2id[label] for label in ner_tag] for ner_tag in ner_tags]\n

    Finally, we will create a dataset with the train and validation sets.

    records = [\n    {\n        \"tokens\": token,\n        \"ner_tags\": ner_tag,\n    }\n    for token, ner_tag in zip(tokens, mapped_ner_tags)\n]\nspan_dataset = DatasetDict(\n    {\n        \"train\": Dataset.from_list(records[:1500]),\n        \"validation\": Dataset.from_list(records[1501:2000]),\n    }\n)\n

    Now, let's prepare to train our model. For this, it is recommended to use GPU. You can check if it is available as shown below.

    import torch\n\nif torch.cuda.is_available():\n    device = torch.device(\"cuda\")\n    print(f\"Using {torch.cuda.get_device_name(0)}\")\nelif torch.backends.mps.is_available():\n    device = torch.device(\"mps\")\n    print(\"Using MPS device\")\nelse:\n    device = torch.device(\"cpu\")\n    print(\"No GPU available, using CPU instead.\")\n

    We will define our model and arguments. In this case, we will use the bert-base-cased, available in the Hugging Face Hub, but others can be applied.

    Note

    The training arguments are inherited from the Transformers library. You can check more information here.

    encoder_id = \"bert-base-cased\"\nmodel = SpanMarkerModel.from_pretrained(\n    encoder_id,\n    labels=labels,\n    model_max_length=256,\n    entity_max_length=8,\n)\n\nargs = TrainingArguments(\n    output_dir=\"models/span-marker\",\n    learning_rate=5e-5,\n    per_device_train_batch_size=8,\n    per_device_eval_batch_size=8,\n    num_train_epochs=1,\n    weight_decay=0.01,\n    warmup_ratio=0.1,\n    fp16=False,  # Set to True if available\n    logging_first_step=True,\n    logging_steps=50,\n    evaluation_strategy=\"steps\",\n    save_strategy=\"steps\",\n    eval_steps=500,\n    save_total_limit=2,\n    dataloader_num_workers=2,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=args,\n    train_dataset=span_dataset[\"train\"],\n    eval_dataset=span_dataset[\"validation\"],\n)\n

    Let's train it! This time, we use a high-quality human-annotated training set, so the results are expected to have improved.

    trainer.train()\n
    trainer.evaluate()\n

    You can save it locally or push it to the Hub. And then load it from there.

    # Save and load locally\n# model.save_pretrained(\"token_classification_model\")\n# model = SpanMarkerModel.from_pretrained(\"token_classification_model\")\n\n# Push and load in HF\n# model.push_to_hub(\"[username]/token_classification_model\")\n# model = SpanMarkerModel.from_pretrained(\"[username]/token_classification_model\")\n

    It's time to make the predictions! We will set a function that uses the predict method to get the suggested label. The model will infer the label based on the text. The function will return the spans in the corresponding structure for the Argilla dataset.

    def predict_spanmarker(model, text):\n    entities = model.predict(text)\n    return [\n        {\n            \"start\": ent[\"char_start_index\"],\n            \"end\": ent[\"char_end_index\"],\n            \"label\": ent[\"label\"],\n        }\n        for ent in entities\n    ]\n

    As the training data was of better quality, we can expect a better model. So we can update the remaining non-annotated records with the new model's suggestions.

    data = dataset.records.to_list(flatten=True)\nupdated_data = [\n    {\n        \"span_label\": predict_spanmarker(model=model, text=sample[\"text\"]),\n        \"id\": sample[\"id\"],\n    }\n    for sample in data\n]\ndataset.records.log(records=updated_data)\n

    In this tutorial, we present an end-to-end example of a token classification task. This serves as the base, but it can be performed iteratively and seamlessly integrated into your workflow to ensure high-quality curation of your data and improved results.

    We started by configuring the dataset, adding records, and adding suggestions based on the GLiNer predictions. After the annotation process, we trained a SpanMarker model with the annotated data and updated the remaining records with the new suggestions.

    "},{"location":"tutorials/token_classification/#token-classification","title":"Token classification","text":""},{"location":"tutorials/token_classification/#getting-started","title":"Getting started","text":""},{"location":"tutorials/token_classification/#deploy-the-argilla-server","title":"Deploy the Argilla server","text":""},{"location":"tutorials/token_classification/#set-up-the-environment","title":"Set up the environment","text":""},{"location":"tutorials/token_classification/#configure-and-create-the-argilla-dataset","title":"Configure and create the Argilla dataset","text":""},{"location":"tutorials/token_classification/#add-records","title":"Add records","text":""},{"location":"tutorials/token_classification/#add-initial-model-suggestions","title":"Add initial model suggestions","text":""},{"location":"tutorials/token_classification/#evaluate-with-argilla","title":"Evaluate with Argilla","text":""},{"location":"tutorials/token_classification/#train-your-model","title":"Train your model","text":""},{"location":"tutorials/token_classification/#conclusions","title":"Conclusions","text":""}]} \ No newline at end of file diff --git a/dev/sitemap.xml b/dev/sitemap.xml index 60e3901a74..8b77099d1e 100644 --- a/dev/sitemap.xml +++ b/dev/sitemap.xml @@ -2,222 +2,222 @@ https://docs.argilla.io/dev/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/community/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/community/changelog/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/community/contributor/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/community/popular_issues/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/getting_started/faq/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/getting_started/how-to-configure-argilla-on-huggingface/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/getting_started/how-to-deploy-argilla-with-docker/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/getting_started/quickstart/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/annotate/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/dataset/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/distribution/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/import_export/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/migrate_from_legacy_datasets/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/query/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/record/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/use_markdown_to_format_rich_content/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/user/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/how_to_guides/workspace/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/SUMMARY/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/client/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/markdown/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/search/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/users/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/workspaces/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/datasets/dataset_records/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/datasets/datasets/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/records/metadata/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/records/records/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/records/responses/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/records/suggestions/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/records/vectors/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/settings/fields/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/settings/metadata_property/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/settings/questions/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/settings/settings/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/settings/task_distribution/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla/settings/vectors/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla-server/configuration/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/reference/argilla-server/telemetry/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/tutorials/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/tutorials/text_classification/ - 2024-08-29 + 2024-08-30 daily https://docs.argilla.io/dev/tutorials/token_classification/ - 2024-08-29 + 2024-08-30 daily \ No newline at end of file diff --git a/dev/sitemap.xml.gz b/dev/sitemap.xml.gz index 4e5ca49b23da08b66c52eab17c6cb924aee15a5a..1fdeede022a1753e1cf66bfa934728da66a8a73a 100644 GIT binary patch delta 279 zcmV+y0qFkm1n~q1ABzYGfD6%)2OWR(+TmIxIdtiEOWPYd393xXLS)e)>E!bEl`lU{ zQQ&|sh3UnzOvxVwlA`F<<4c!jdmxUO*YkIa<$MO&V;yr?&!4}4sUGH!o4eIOi9u#X zIP-emiu9k1IUEiPjsl(Qd4UuR`?fr=Id_`n;^TaCH!IgT!Kw9ntrGgpI14nFX=?Pm za1>%nZV}Pcus16+O56@`EFIRH`{n(IYWYyTUlx&b-;J1WT@KI?>^8NF>9{hH*JS~( zlRW_|0gRJr0Zapadis;N0bmwyXi!dwx-9)ler;$2FPrDqPQP^XhLbJ=BOn*CzI>l! d6a9oO;RYGEkpDaL>#aEa<`>cpMM)%O%g~*~q(#hrRD_?$^ zqQC)N3e$^anUX&WBt_Bdr`Imc_COplKh8famh%~8k9Ex9asKlCOLaeg+T5)NN(?e9 z!kHiEtw{gLn8V?);3&|!o)<{Luy4x)n{%gGE*|EayIHx$2~MrIYn{+<##uDDOjD!h zg`*Hta*K$jhP_#vQQ~%hW9hKotd^^f)$+btJrt31-;J1WT@KI?>^8NF>9{tL*JS~3 zlRW_|0gjVt0ZapVe*TlV0bmyIXi!dwx-9)ler;$2FPrDqPQP^Xj*~6|BOq6?zI>l! d6a9oO;RYGEkpDaL+pReK<`-k!1|=sS0084