Skip to content

Commit

Permalink
Add documentation for markdown helper functions
Browse files Browse the repository at this point in the history
  • Loading branch information
davidberenstein1957 committed Jul 9, 2024
1 parent cd713c6 commit 0779bce
Show file tree
Hide file tree
Showing 5 changed files with 207 additions and 8 deletions.
18 changes: 17 additions & 1 deletion argilla/docs/how_to_guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ hide: toc

# How-to guides

These are the how-to guides for *the Argilla SDK*. They provide step-by-step instructions for common scenarios, including detailed explanations and code samples.
These are the how-to guides for *the Argilla SDK*. They provide step-by-step instructions for common scenarios, including detailed explanations and code samples. We have divided the guides into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Argilla, while the advanced guides will help you explore more advanced features.

## Basic

<div class="grid cards" markdown>

Expand Down Expand Up @@ -49,6 +51,20 @@ These are the how-to guides for *the Argilla SDK*. They provide step-by-step ins

[:octicons-arrow-right-24: How-to guide](query_export.md)

</div>

## Advanced

<div class="grid cards" markdown>

- __Making most of Markdown__

---

Learn how to use Markdown and HTML in TextFields to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

[:octicons-arrow-right-24: How-to guide](making_most_of_markdown.md)

- __Migrate to Argilla V2__

---
Expand Down
168 changes: 168 additions & 0 deletions argilla/docs/how_to_guides/making_most_of_markdown.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Making the most of Markdown

This guide provides an overview of how to use Markdown and HTML in `TextFields` to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.

The `TextField` and `TextQuestion` provide the option to enable Markdown and therefore HTML. Given the flexibility of HTML, we can get great control over the presentation of data to our annotators. We provide some out-of-the-box methods for multi-modality and chat templates in the examples below, but encourage everyone to gain inspiration to make the most of Markdown.

!!! note
We do assume that a `TextField` or `TextQuestion` has been configured with `use_markdown=True`. Take a loot at [`Datasets`](dataset.md) to learn more about this configuration.

## Multi-modal data: images, audio, video or PDFs

### Local content through DataURLs

A DataURL is a scheme that allows data to be encoded into a base64-encoded string, and then embedded directly into HTML. To facilitate this, we offer some functions: `image_to_html`, `audio_to_html`, `video_to_thml` and `pdf_to_html`. These functions accept either the file path or the file's byte data and return the corresponding HTMurl to render the media file within the Argilla user interface. Additionally, you can also set the `width` and `height` in pixel or percentage for video and image (defaults to the original dimensions) and the autoplay and loop attributes to True for audio and video (defaults to False).

!!! warning
DataURLs increase the memory usage of the original filesize. Additionally, different browsers enforce different size limitations for rendering DataURLs which might block the visualization experience per user.

=== "Image"

```python
from argilla.markdown import image_to_html

html = image_to_html(
"local_image_file.png",
width="300px",
height="300px"
)

rg.Record(
fields={"markdown_enabled_field": html}
)
```

=== "Audio"

```python
from argilla.markdown import audio_to_html

html = audio_to_html(
"local_audio_file.mp3",
width="300px",
height="300px",
autoplay=True,
loop=True
)

rg.Record(
fields={"markdown_enabled_field": html}
)
```

=== "Video"

```python
from argilla.markdown import video_to_thml

html = video_to_html(
"my_video.mp4",
width="300px",
height="300px",
autoplay=True,
loop=True
)

rg.Record(
fields={"markdown_enabled_field": html}
)
```

=== "PDF"

```python
from argilla.markdown import pdf_to_html

html = pdf_to_html(
"local_pdf.pdf",
width="300px",
height="300px"
)

rg.Record(
fields={"markdown_enabled_field": html}
)
```

### Hosted content

Instead of uploading local files through DataURLs we can also visualize URLs that link directly to media files such as images, audio, video, and PDFs hosted on a public or private server. In this case, you can use basic HTML to visualize content that is available on something like Google Photos or decide to configure a private media server.

!!! warning
When trying to access content from a private media server you have to ensure that the Argilla server has network access to the private media server, which might be done through something like IP whitelisting.

=== "Image"

```python
html = "<img src='https://example.com/public-image-file.jpg'>"

rg.Record(
fields={"markdown_enabled_field": html}
)
```

=== "Audio"

```python
html = """
<audio controls>
<source src="https://example.com/public-audio-file.mp3" type="audio/mpeg">
</audio>
""""

rg.Record(
fields={"markdown_enabled_field": html}
)
```

=== "Video"

```python
html = """
<video width="320" height="240" controls>
<source src="https://example.com/public-video-file.mp4" type="video/mp4">
</video>
""""

rg.Record(
fields={"markdown_enabled_field": html}
)
```

=== "PDF"

```python
html = """
<iframe
src="https://example.com/public-pdf-file.pdf"
width="600"
height="500">
</iframe>
""""

rg.Record(
fields={"markdown_enabled_field": html}
)
```

## Visualize chat conversations

When working with chat data from multi-turn interaction with a Large Language Model, it might be nice to be able to visualize the conversation in a similar way as a common chat interface. To facilitate this, we offer the `chat_to_html` function, which converts messages from OpenAI chat format to a HTML-formatted chat interface.

??? Question "OpenAI chat format"
The OpenAI chat format is a way to structure a list of messages as input from users and returns a model-generated message as output. These messages can only contain the `roles` "user" for human messages and "assistant", "system" or "model" for model-generated messages.

```python
from argilla.markdown import chat_to_html

messages = [
{"role": "user", "content": "Hello! How are you?"},
{"role": "assistant", "content": "I'm good, thank you! How can I assist you today?"}
]

html = chat_to_html(messages)

rg.Record(
fields={"markdown_enabled_field": html}
)
```
1 change: 1 addition & 0 deletions argilla/docs/reference/argilla/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@
* [rg.Vector](records/vectors.md)
* [rg.Metadata](records/metadata.md)
* [rg.Query](search.md)
* [rg.markdown](markdown.md)
11 changes: 11 additions & 0 deletions argilla/docs/reference/argilla/markdown.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
hide: footer
---

# `rg.markdown`

To support the usage of Markdown within Argilla, we've created some helper functions to easy the usage of DataURL conversions and chat message visualizations.

::: src.argilla.markdown.media

::: src.argilla.markdown.chat
17 changes: 10 additions & 7 deletions argilla/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,13 +134,16 @@ nav:
- FAQ: getting_started/faq.md
- How-to guides:
- how_to_guides/index.md
- Manage users and credentials: how_to_guides/user.md
- Manage workspaces: how_to_guides/workspace.md
- Manage and create datasets: how_to_guides/dataset.md
- Add, update, and delete records: how_to_guides/record.md
- Query, filter, and export records: how_to_guides/query_export.md
- Annotate a dataset: how_to_guides/annotate.md
- Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
- Basic:
- Manage users and credentials: how_to_guides/user.md
- Manage workspaces: how_to_guides/workspace.md
- Manage and create datasets: how_to_guides/dataset.md
- Add, update, and delete records: how_to_guides/record.md
- Query, filter, and export records: how_to_guides/query_export.md
- Annotate a dataset: how_to_guides/annotate.md
- Advanced:
- Making the most of Markdown: how_to_guides/making_most_of_markdown.md
- Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
- Tutorials:
- tutorials/index.md
- Text classification task: tutorials/text_classification.ipynb
Expand Down

0 comments on commit 0779bce

Please sign in to comment.