Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: advanced queries dsl #5435

Merged
merged 9 commits into from
Aug 30, 2024
Merged
6 changes: 6 additions & 0 deletions argilla/docs/how_to_guides/annotate.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,12 @@ The UI offers various features designed for data exploration and understanding.

From the **control panel** at the top of the left pane, you can search by keyword across the entire dataset. If you have more than one field in your records, you may specify if the search is to be performed “All” fields or on a specific one. Matched results are highlighted in color.

!!! note
If you introduce more than one keyword, the search will return results where **all** keywords have a match.

!!! tip
For more advanced searches, take a look at the [advanced queries DSL](query.md#advanced-queries).

### Order by record semantic similarity

You can retrieve records based on their similarity to another record if vectors have been added to the dataset.
Expand Down
23 changes: 20 additions & 3 deletions argilla/docs/how_to_guides/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ You can search for records in your dataset by **querying** or **filtering**. The

To search for records with terms, you can use the `Dataset.records` attribute with a query string. The search terms are used to search for records that contain the terms in the text field. You can search a single term or various terms, in the latter, all of them should appear in the record to be retrieved.

=== "Single search term"
=== "Single term search"

```python
import argilla as rg
Expand All @@ -49,7 +49,7 @@ To search for records with terms, you can use the `Dataset.records` attribute wi
queried_records = dataset.records(query=query).to_list(flatten=True)
```

=== "Multiple search term"
=== "Multiple terms search"

```python
import argilla as rg
Expand All @@ -63,6 +63,23 @@ To search for records with terms, you can use the `Dataset.records` attribute wi
queried_records = dataset.records(query=query).to_list(flatten=True)
```

### Advanced queries

If you need more complex searches, you can use [Elasticsearch's simple query string syntax](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#simple-query-string-syntax). Here is a summary of the different available operators:

| operator | description | example |
| ------------ | --------------------------- | --------------------------------------------------------------------- |
|`+` or `space`| **AND**: search both terms | `argilla + distilabel` or `argilla distilabel`</br> return records that include the terms "argilla" and "distilabel"|
|`|` | **OR**: search either term | `argilla | distilabel` </br> returns records that include the term "argilla" or "distilabel"|
|`-` | **Negation**: exclude a term| `argilla -distilabel` </br> returns records that contain the term "argilla" and don't have the term "distilabel"|
|`*` | **Prefix**: search a prefix | `arg*`</br> returns records with any words starting with "arg-"|
|`"` | **Phrase**: search a phrase | `"argilla and distilabel"` </br> returns records that contain the phrase "argilla and distilabel"|
|`(` and `)` | **Precedence**: group terms | `(argilla | distilabel) rules` </br> returns records that contain either "argilla" or "distilabel" and "rules"|
|`~N` | **Edit distance**: search a term or phrase with an edit distance| `argilla~1` </br> returns records that contain the term "argilla" with an edit distance of 1, e.g. "argila"|

nataliaElv marked this conversation as resolved.
Show resolved Hide resolved
!!! tip
To use one of these characters literally, escape it with a preceding backslash `\`, e.g. `"1 \+ 2"` would match records where the phrase "1 + 2" is found.

## Filter by conditions

You can use the `Filter` class to define the conditions and pass them to the `Dataset.records` attribute to fetch records based on the conditions. Conditions include "==", ">=", "<=", or "in". Conditions can be combined with dot notation to filter records based on metadata, suggestions, or responses. You can use a single condition or multiple conditions to filter records.
Expand All @@ -72,7 +89,7 @@ You can use the `Filter` class to define the conditions and pass them to the `Da
| `==` | The `field` value is equal to the `value` |
| `>=` | The `field` value is greater than or equal to the `value` |
| `<=` | The `field` value is less than or equal to the `value` |
| `in` | TThe `field` value is included in a list of values |
| `in` | The `field` value is included in a list of values |

=== "Single condition"

Expand Down
4 changes: 3 additions & 1 deletion argilla/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ plugins:
- docs/scripts/gen_changelog.py
- docs/scripts/gen_popular_issues.py
# - docs/scripts/gen_ref_pages.py
enabled: !ENV [CI, false] # enables the plugin only during continuous integration (CI), disabled on local build
- literate-nav:
nav_file: SUMMARY.md
- section-index
Expand Down Expand Up @@ -148,7 +149,8 @@ plugins:
# Signature
separate_signature: false
show_signature_annotations: false
- social
- social:
enabled: !ENV [CI, false] # enables the plugin only during continuous integration (CI), disabled on local build
- mknotebooks
- material-plausible

Expand Down
Loading