Add documentation for markdown helper functions

argilla-io · Jul 9, 2024 · 0779bce · 0779bce
1 parent cd713c6
commit 0779bce
Show file tree

Hide file tree

Showing 5 changed files with 207 additions and 8 deletions.
diff --git a/argilla/docs/how_to_guides/index.md b/argilla/docs/how_to_guides/index.md
@@ -5,7 +5,9 @@ hide: toc
 
 # How-to guides
 
-These are the how-to guides for *the Argilla SDK*. They provide step-by-step instructions for common scenarios, including detailed explanations and code samples.
+These are the how-to guides for *the Argilla SDK*. They provide step-by-step instructions for common scenarios, including detailed explanations and code samples. We have divided the guides into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Argilla, while the advanced guides will help you explore more advanced features.
+
+## Basic
 
 <div class="grid cards" markdown>
 
@@ -49,6 +51,20 @@ These are the how-to guides for *the Argilla SDK*. They provide step-by-step ins
 
     [:octicons-arrow-right-24: How-to guide](query_export.md)
 
+</div>
+
+## Advanced
+
+<div class="grid cards" markdown>
+
+-   __Making most of Markdown__
+
+    ---
+
+    Learn how to use Markdown and HTML in TextFields to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.
+
+    [:octicons-arrow-right-24: How-to guide](making_most_of_markdown.md)
+
 -   __Migrate to Argilla V2__
 
     ---

diff --git a/argilla/docs/how_to_guides/making_most_of_markdown.md b/argilla/docs/how_to_guides/making_most_of_markdown.md
@@ -0,0 +1,168 @@
+# Making the most of Markdown
+
+This guide provides an overview of how to use Markdown and HTML in `TextFields` to format chat conversations and allow for basic multi-modal support for images, audio, video and PDFs.
+
+The `TextField` and `TextQuestion` provide the option to enable Markdown and therefore HTML. Given the flexibility of HTML, we can get great control over the presentation of data to our annotators. We provide some out-of-the-box methods for multi-modality and chat templates in the examples below, but encourage everyone to gain inspiration to make the most of Markdown.
+
+!!! note
+    We do assume that a `TextField` or `TextQuestion` has been configured with `use_markdown=True`. Take a loot at [`Datasets`](dataset.md) to learn more about this configuration.
+
+## Multi-modal data: images, audio, video or PDFs
+
+### Local content through DataURLs
+
+A DataURL is a scheme that allows data to be encoded into a base64-encoded string, and then embedded directly into HTML. To facilitate this, we offer some functions: `image_to_html`, `audio_to_html`, `video_to_thml` and `pdf_to_html`. These functions accept either the file path or the file's byte data and return the corresponding HTMurl to render the media file within the Argilla user interface. Additionally, you can also set the `width` and `height` in pixel or percentage for video and image (defaults to the original dimensions) and the autoplay and loop attributes to True for audio and video (defaults to False).
+
+!!! warning
+    DataURLs increase the memory usage of the original filesize. Additionally, different browsers enforce different size limitations for rendering DataURLs which might block the visualization experience per user.
+
+=== "Image"
+
+    ```python
+    from argilla.markdown import image_to_html
+
+    html = image_to_html(
+        "local_image_file.png",
+        width="300px",
+        height="300px"
+    )
+
+    rg.Record(
+        fields={"markdown_enabled_field": html}
+    )
+    ```
+
+=== "Audio"
+
+    ```python
+    from argilla.markdown import audio_to_html
+
+    html = audio_to_html(
+        "local_audio_file.mp3",
+        width="300px",
+        height="300px",
+        autoplay=True,
+        loop=True
+    )
+
+    rg.Record(
+        fields={"markdown_enabled_field": html}
+    )
+    ```
+
+=== "Video"
+
+    ```python
+    from argilla.markdown import video_to_thml
+
+    html = video_to_html(
+        "my_video.mp4",
+        width="300px",
+        height="300px",
+        autoplay=True,
+        loop=True
+    )
+
+    rg.Record(
+        fields={"markdown_enabled_field": html}
+    )
+    ```
+
+=== "PDF"
+
+    ```python
+    from argilla.markdown import pdf_to_html
+
+    html = pdf_to_html(
+        "local_pdf.pdf",
+        width="300px",
+        height="300px"
+    )
+
+    rg.Record(
+        fields={"markdown_enabled_field": html}
+    )
+    ```
+
+### Hosted content
+
+Instead of uploading local files through DataURLs we can also visualize URLs that link directly to media files such as images, audio, video, and PDFs hosted on a public or private server. In this case, you can use basic HTML to visualize content that is available on something like Google Photos or decide to configure a private media server.
+
+!!! warning
+    When trying to access content from a private media server you have to ensure that the Argilla server has network access to the private media server, which might be done through something like IP whitelisting.
+
+=== "Image"
+
+    ```python
+    html = "<img src='https://example.com/public-image-file.jpg'>"
+
+    rg.Record(
+        fields={"markdown_enabled_field": html}
+    )
+    ```
+
+=== "Audio"
+
+    ```python
+    html = """
+    <audio controls>
+        <source src="https://example.com/public-audio-file.mp3" type="audio/mpeg">
+    </audio>
+    """"
+
+    rg.Record(
+        fields={"markdown_enabled_field": html}
+    )
+    ```
+
+=== "Video"
+
+    ```python
+    html = """
+    <video width="320" height="240" controls>
+        <source src="https://example.com/public-video-file.mp4" type="video/mp4">
+    </video>
+    """"
+
+    rg.Record(
+        fields={"markdown_enabled_field": html}
+    )
+    ```
+
+=== "PDF"
+
+    ```python
+    html = """
+    <iframe
+        src="https://example.com/public-pdf-file.pdf"
+        width="600"
+        height="500">
+    </iframe>
+    """"
+
+    rg.Record(
+        fields={"markdown_enabled_field": html}
+    )
+    ```
+
+## Visualize chat conversations
+
+When working with chat data from multi-turn interaction with a Large Language Model, it might be nice to be able to visualize the conversation in a similar way as a common chat interface. To facilitate this, we offer the `chat_to_html` function, which converts messages from OpenAI chat format to a HTML-formatted chat interface.
+
+??? Question "OpenAI chat format"
+    The OpenAI chat format is a way to structure a list of messages as input from users and returns a model-generated message as output. These messages can only contain the `roles` "user" for human messages and "assistant", "system" or "model" for model-generated messages.
+
+```python
+from argilla.markdown import chat_to_html
+
+messages = [
+    {"role": "user", "content": "Hello! How are you?"},
+    {"role": "assistant", "content": "I'm good, thank you! How can I assist you today?"}
+]
+
+html = chat_to_html(messages)
+
+rg.Record(
+    fields={"markdown_enabled_field": html}
+)
+```
diff --git a/argilla/docs/reference/argilla/SUMMARY.md b/argilla/docs/reference/argilla/SUMMARY.md
@@ -14,3 +14,4 @@
     * [rg.Vector](records/vectors.md)
     * [rg.Metadata](records/metadata.md)
 * [rg.Query](search.md)
+* [rg.markdown](markdown.md)
diff --git a/argilla/docs/reference/argilla/markdown.md b/argilla/docs/reference/argilla/markdown.md
@@ -0,0 +1,11 @@
+---
+hide: footer
+---
+
+# `rg.markdown`
+
+To support the usage of Markdown within Argilla, we've created some helper functions to easy the usage of DataURL conversions and chat message visualizations.
+
+::: src.argilla.markdown.media
+
+::: src.argilla.markdown.chat
diff --git a/argilla/mkdocs.yml b/argilla/mkdocs.yml
@@ -134,13 +134,16 @@ nav:
       - FAQ: getting_started/faq.md
   - How-to guides:
       - how_to_guides/index.md
-      - Manage users and credentials: how_to_guides/user.md
-      - Manage workspaces: how_to_guides/workspace.md
-      - Manage and create datasets: how_to_guides/dataset.md
-      - Add, update, and delete records: how_to_guides/record.md
-      - Query, filter, and export records: how_to_guides/query_export.md
-      - Annotate a dataset: how_to_guides/annotate.md
-      - Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
+      - Basic:
+        - Manage users and credentials: how_to_guides/user.md
+        - Manage workspaces: how_to_guides/workspace.md
+        - Manage and create datasets: how_to_guides/dataset.md
+        - Add, update, and delete records: how_to_guides/record.md
+        - Query, filter, and export records: how_to_guides/query_export.md
+        - Annotate a dataset: how_to_guides/annotate.md
+      - Advanced:
+        - Making the most of Markdown: how_to_guides/making_most_of_markdown.md
+        - Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
   - Tutorials:
       - tutorials/index.md
       - Text classification task: tutorials/text_classification.ipynb