diff --git a/search.json b/search.json
index 46fea08..b9a3f2d 100644
--- a/search.json
+++ b/search.json
@@ -1,189 +1,116 @@
[
{
- "objectID": "concepts/user-interfaces.html",
- "href": "concepts/user-interfaces.html",
- "title": "User interfaces",
- "section": "",
- "text": "A user interface (UI) in our context is the mechanism by which a human interacts with software.\n\n\nA command-line interface (CLI) is a UI for interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines). CLIs are great for automating tasks and often user-friendly, with a short learning curve (if designed well).\nCLIs are great for containing common data analysis tasks expressed in a single, configurable command. However, remembering the exact syntax of a command can be difficult, especially if the command is not used frequently.\n\n\n\nPython can be an interactive UI in a terminal, notebook, IDE, or other setup to serve as a user-interface for data analytics.\n\n\n\nIbis Birdbrain aims to implement a language user interface (LUI) for data analytics, providing additional AI assistance in both a CLI and Python interface. This enables users to interact with a bot through natural language that can perform tasks on their behalf.\n\n\n\n\n\n\nWarning\n\n\n\nThis LUI paradigm is highly experimental and should be put in production (automated) with caution.",
- "crumbs": [
- "User interfaces"
- ]
- },
- {
- "objectID": "concepts/user-interfaces.html#command-line-interface",
- "href": "concepts/user-interfaces.html#command-line-interface",
- "title": "User interfaces",
- "section": "",
- "text": "A command-line interface (CLI) is a UI for interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines). CLIs are great for automating tasks and often user-friendly, with a short learning curve (if designed well).\nCLIs are great for containing common data analysis tasks expressed in a single, configurable command. However, remembering the exact syntax of a command can be difficult, especially if the command is not used frequently.",
- "crumbs": [
- "User interfaces"
- ]
- },
- {
- "objectID": "concepts/user-interfaces.html#python",
- "href": "concepts/user-interfaces.html#python",
- "title": "User interfaces",
- "section": "",
- "text": "Python can be an interactive UI in a terminal, notebook, IDE, or other setup to serve as a user-interface for data analytics.",
- "crumbs": [
- "User interfaces"
- ]
- },
- {
- "objectID": "concepts/user-interfaces.html#language-user-interface",
- "href": "concepts/user-interfaces.html#language-user-interface",
- "title": "User interfaces",
- "section": "",
- "text": "Ibis Birdbrain aims to implement a language user interface (LUI) for data analytics, providing additional AI assistance in both a CLI and Python interface. This enables users to interact with a bot through natural language that can perform tasks on their behalf.\n\n\n\n\n\n\nWarning\n\n\n\nThis LUI paradigm is highly experimental and should be put in production (automated) with caution.",
- "crumbs": [
- "User interfaces"
- ]
- },
- {
- "objectID": "concepts/llms.html",
- "href": "concepts/llms.html",
- "title": "Large language models",
- "section": "",
- "text": "Large language models (LLMs) represent decades of research and development of neural networks. While relatively impressive LLMs have been around for years, recent innovations have made it possible to create instruction-following, conversational bots that can perform tasks on behalf of the user.\nWe are primarily concerned with applying LLMs to data, but we’ll take a brief look at how they work and why we should use them.\n\n\nAn artificial neural network (ANN or often just NN) is a computational model that is loosely inspired by the biological neural networks in the brain. It is a collection of connected nodes, called neurons, that are organized into layers. Each neuron is connected to other neurons in the network, and each connection has a weight associated with it. The weights are adjusted during training to improve the model’s performance.\nAn instance of a neural network (and many other ML architectures) is called a model. A model has usually been trained on data to learn to represent a system. While they are amny machine learning model architectures and training algorithms, the fundamental innovation of (large/deep) neural networks is the ability to represent an arbitrary system.\n\n\n\nA large language model is a neural network trained on vast amounts of text data.\n\n\n\nA LLM takes text as input and produces text as output.\n\n\n\nText in, text out. Neural networks and LLMs by design are non-determinstic. Though there are many tricks and workarounds, relying on LLMs for determinstic behavior is a bad idea. Instead, LLMs are great for:\n\ntext-based ML tasks (like classification, clustering)\ntext-based entity extraction (named entity regognition)\ntext-based generation (like summarization, translation, and question answering)\nother text-based tasks\n\nLLMs today are decent, but flawed, at generating programming code (as text). We can again use clever tricks and program around the non-determinstic behavior (such as running code, checking for any errors, and making one or more attempts via LLM to the errors). Fundamentally, keep in mind that an input to LLM is always text and an output is always text.\n\n\n\nSome limitations include:\n\ncost\nlatency\naccuracy\n…",
- "crumbs": [
- "Large language models"
- ]
- },
- {
- "objectID": "concepts/llms.html#what-is-an-artificial-neural-network",
- "href": "concepts/llms.html#what-is-an-artificial-neural-network",
- "title": "Large language models",
- "section": "",
- "text": "An artificial neural network (ANN or often just NN) is a computational model that is loosely inspired by the biological neural networks in the brain. It is a collection of connected nodes, called neurons, that are organized into layers. Each neuron is connected to other neurons in the network, and each connection has a weight associated with it. The weights are adjusted during training to improve the model’s performance.\nAn instance of a neural network (and many other ML architectures) is called a model. A model has usually been trained on data to learn to represent a system. While they are amny machine learning model architectures and training algorithms, the fundamental innovation of (large/deep) neural networks is the ability to represent an arbitrary system.",
- "crumbs": [
- "Large language models"
- ]
- },
- {
- "objectID": "concepts/llms.html#what-is-a-large-language-model",
- "href": "concepts/llms.html#what-is-a-large-language-model",
- "title": "Large language models",
- "section": "",
- "text": "A large language model is a neural network trained on vast amounts of text data.",
- "crumbs": [
- "Large language models"
- ]
- },
- {
- "objectID": "concepts/llms.html#what-are-the-inputs-and-outputs",
- "href": "concepts/llms.html#what-are-the-inputs-and-outputs",
- "title": "Large language models",
+ "objectID": "install.html",
+ "href": "install.html",
+ "title": "Installation and setup",
"section": "",
- "text": "A LLM takes text as input and produces text as output.",
+ "text": "This page describes how to install and setup Ibis Birdbrain.\n\n\npip install ibis-birdbrain\n\n\n\n\n\n\n\n\n\nWarning\n\n\n\nDocumentation coming soon!\n\n\n\n\n\n\n\n\nWarning\n\n\n\nOnly DuckDB supported for now. All 18+ Ibis backends coming soon.\n\n\n\n\n\n\n\n\n\n\n\nWarning\n\n\n\nDocumentation coming soon!\n\n\n\n\n\n\n\n\nWarning\n\n\n\nOnly Azure OpenAI is supported for now. All Marvin providers coming soon.\n\n\nCreate the ~/.birdbrain/.env file:\nMARVIN_AZURE_OPENAI_API_KEY=\"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\"\nMARVIN_AZURE_OPENAI_API_BASE=\"https://birdbrain.openai.azure.com\"\nMARVIN_AZURE_OPENAI_DEPLOYMENT_NAME=\"birdbrain-4-32k\"\nMARVIN_AZURE_OPENAI_API_TYPE=\"azure\"\n\n\n\nLearn how get started with Ibis Birdbrain’s CLI.",
"crumbs": [
- "Large language models"
+ "Installation and setup"
]
},
{
- "objectID": "concepts/llms.html#what-do-llms-work-well-for",
- "href": "concepts/llms.html#what-do-llms-work-well-for",
- "title": "Large language models",
+ "objectID": "install.html#install-from-pypi",
+ "href": "install.html#install-from-pypi",
+ "title": "Installation and setup",
"section": "",
- "text": "Text in, text out. Neural networks and LLMs by design are non-determinstic. Though there are many tricks and workarounds, relying on LLMs for determinstic behavior is a bad idea. Instead, LLMs are great for:\n\ntext-based ML tasks (like classification, clustering)\ntext-based entity extraction (named entity regognition)\ntext-based generation (like summarization, translation, and question answering)\nother text-based tasks\n\nLLMs today are decent, but flawed, at generating programming code (as text). We can again use clever tricks and program around the non-determinstic behavior (such as running code, checking for any errors, and making one or more attempts via LLM to the errors). Fundamentally, keep in mind that an input to LLM is always text and an output is always text.",
+ "text": "pip install ibis-birdbrain",
"crumbs": [
- "Large language models"
+ "Installation and setup"
]
},
{
- "objectID": "concepts/llms.html#what-are-the-limitations-and-considerations",
- "href": "concepts/llms.html#what-are-the-limitations-and-considerations",
- "title": "Large language models",
+ "objectID": "install.html#data-platform-setup",
+ "href": "install.html#data-platform-setup",
+ "title": "Installation and setup",
"section": "",
- "text": "Some limitations include:\n\ncost\nlatency\naccuracy\n…",
+ "text": "Warning\n\n\n\nDocumentation coming soon!\n\n\n\n\n\n\n\n\nWarning\n\n\n\nOnly DuckDB supported for now. All 18+ Ibis backends coming soon.",
"crumbs": [
- "Large language models"
+ "Installation and setup"
]
},
{
- "objectID": "why.html",
- "href": "why.html",
- "title": "Why Ibis Birdbrain?",
+ "objectID": "install.html#ai-platform-setup",
+ "href": "install.html#ai-platform-setup",
+ "title": "Installation and setup",
"section": "",
- "text": "Ibis is the portable Python dataframe library.\nIbis Birdbrain is the portable Python AI-powered data bot, built on Ibis with support for all data platforms Ibis’s 18+ backends support. Ibis Birdbrain is also built on Marvin with support for all AI platforms Marvin supports.\n\n\n\n\nWhen things don’t work as they should, it often means that standards are absent.\n- The International Organization for Standardization (ISO)\n\n- The Composable Codex\n\n\n\n\nComposable data systems are MICE: modular, interoperable, customizable, and extensible.\nIbis Birdbrain aims to bring these values to a data + AI framework that makes working with LLMs delightful and easy, for individual hobbiests or the more advanced technology organizations.",
+ "text": "Warning\n\n\n\nDocumentation coming soon!\n\n\n\n\n\n\n\n\nWarning\n\n\n\nOnly Azure OpenAI is supported for now. All Marvin providers coming soon.\n\n\nCreate the ~/.birdbrain/.env file:\nMARVIN_AZURE_OPENAI_API_KEY=\"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\"\nMARVIN_AZURE_OPENAI_API_BASE=\"https://birdbrain.openai.azure.com\"\nMARVIN_AZURE_OPENAI_DEPLOYMENT_NAME=\"birdbrain-4-32k\"\nMARVIN_AZURE_OPENAI_API_TYPE=\"azure\"",
"crumbs": [
- "Why Ibis Birdbrain?"
+ "Installation and setup"
]
},
{
- "objectID": "why.html#standards",
- "href": "why.html#standards",
- "title": "Why Ibis Birdbrain?",
+ "objectID": "install.html#next-steps",
+ "href": "install.html#next-steps",
+ "title": "Installation and setup",
"section": "",
- "text": "When things don’t work as they should, it often means that standards are absent.\n- The International Organization for Standardization (ISO)\n\n- The Composable Codex",
+ "text": "Learn how get started with Ibis Birdbrain’s CLI.",
"crumbs": [
- "Why Ibis Birdbrain?"
+ "Installation and setup"
]
},
{
- "objectID": "why.html#composable-data-systems-are-mice",
- "href": "why.html#composable-data-systems-are-mice",
- "title": "Why Ibis Birdbrain?",
+ "objectID": "posts/llms-and-data-pt4/index.html",
+ "href": "posts/llms-and-data-pt4/index.html",
+ "title": "Auxiliary tools",
"section": "",
- "text": "Composable data systems are MICE: modular, interoperable, customizable, and extensible.\nIbis Birdbrain aims to bring these values to a data + AI framework that makes working with LLMs delightful and easy, for individual hobbiests or the more advanced technology organizations.",
- "crumbs": [
- "Why Ibis Birdbrain?"
- ]
+ "text": "As a product manager, I don’t spend most of my time managing products. I suspect most data developers (analysts, engineers, scientists, etc.) don’t spend most of their time writing data code. There are many auxiliary tasks that are required in doing many technical jobs. These include:\n\nsearching the Internet for information\nreading, summarizing, and synthesizing information\nperforming boring computer tasks\ntranslating between different languages (e.g. SQL and Python; English and Spanish)\ncopying and modifying existing code\nquerying some basic informatin from data platforms\n\nWhat if we could, through natural language, have a bot perform many of these tasks (in addition to basic data analysis) on our behalf?"
},
{
- "objectID": "release_notes.html",
- "href": "release_notes.html",
- "title": "Release notes",
+ "objectID": "posts/llms-and-data-pt4/index.html#introduction",
+ "href": "posts/llms-and-data-pt4/index.html#introduction",
+ "title": "Auxiliary tools",
"section": "",
- "text": "Release notes\nTBD\n\n\n\n\n Back to top"
+ "text": "As a product manager, I don’t spend most of my time managing products. I suspect most data developers (analysts, engineers, scientists, etc.) don’t spend most of their time writing data code. There are many auxiliary tasks that are required in doing many technical jobs. These include:\n\nsearching the Internet for information\nreading, summarizing, and synthesizing information\nperforming boring computer tasks\ntranslating between different languages (e.g. SQL and Python; English and Spanish)\ncopying and modifying existing code\nquerying some basic informatin from data platforms\n\nWhat if we could, through natural language, have a bot perform many of these tasks (in addition to basic data analysis) on our behalf?"
},
{
- "objectID": "posts.html",
- "href": "posts.html",
- "title": "Posts",
- "section": "",
- "text": "Order By\n Default\n \n Title\n \n \n Date - Oldest\n \n \n Date - Newest\n \n \n Author\n \n \n \n \n \n \n \n\n\n\n\n\n\n\n\nAuxiliary tools\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 16, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComposable data + AI systems\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 15, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComputations and control flow: it’s just programming\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 14, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nThree approaches\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 13, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nAn introduction to Marvin and Ibis\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 12, 2023\n\n\n\n\n\n\n\n\nNo matching items\n\n Back to top"
+ "objectID": "posts/llms-and-data-pt4/index.html#were-using-python-lets-use-python",
+ "href": "posts/llms-and-data-pt4/index.html#were-using-python-lets-use-python",
+ "title": "Auxiliary tools",
+ "section": "We’re using Python, let’s use Python",
+ "text": "We’re using Python, let’s use Python\nWe’re already using Python for Ibis and Marvin. Let’s use it for auxillary tools. We’ll setup our data and AI platform connections and some simple example data to work with.\n\n\nCode\n1import ibis\nimport marvin\n\nfrom dotenv import load_dotenv\n\n2load_dotenv()\n\n3con = ibis.connect(\"duckdb://penguins.ddb\")\nt = ibis.examples.penguins.fetch()\nt = con.create_table(\"penguins\", t.to_pyarrow(), overwrite=True)\n\n\n\n1\n\nImport the libraries we need.\n\n2\n\nLoad the environment variable to setup Marvin to call our OpenAI account.\n\n3\n\nSetup the demo datain an Ibis backend.\n\n\n\n\n\n1import ibis\nimport marvin\n\nfrom ibis.expr.schema import Schema\nfrom ibis.expr.types.relations import Table\n\n2ibis.options.interactive = True\nmarvin.settings.llm_model = \"openai/gpt-4\"\n\n3con = ibis.connect(\"duckdb://penguins.ddb\")\nt = con.table(\"penguins\")\n\n\n1\n\nImport Ibis and Marvin.\n\n2\n\nConfigure Ibis (interactive) and Marvin (GPT-4).\n\n3\n\nConnect to the data and load a table into a variable."
},
{
- "objectID": "posts/llms-and-data-pt0/index.html",
- "href": "posts/llms-and-data-pt0/index.html",
- "title": "An introduction to Marvin and Ibis",
- "section": "",
- "text": "In this “LLMs and data” series, we’ll explore how to apply large-language models (LLMs) to data analytics. We’ll walk through the steps to build Ibis Birdbrain.\nThroughout the series, we’ll be using Marvin and Ibis. A brief introduction to each is provided below."
+ "objectID": "posts/llms-and-data-pt4/index.html#filesystem-tools",
+ "href": "posts/llms-and-data-pt4/index.html#filesystem-tools",
+ "title": "Auxiliary tools",
+ "section": "Filesystem tools",
+ "text": "Filesystem tools"
},
{
- "objectID": "posts/llms-and-data-pt0/index.html#introduction",
- "href": "posts/llms-and-data-pt0/index.html#introduction",
- "title": "An introduction to Marvin and Ibis",
- "section": "",
- "text": "In this “LLMs and data” series, we’ll explore how to apply large-language models (LLMs) to data analytics. We’ll walk through the steps to build Ibis Birdbrain.\nThroughout the series, we’ll be using Marvin and Ibis. A brief introduction to each is provided below."
+ "objectID": "posts/llms-and-data-pt4/index.html#internet-tools",
+ "href": "posts/llms-and-data-pt4/index.html#internet-tools",
+ "title": "Auxiliary tools",
+ "section": "Internet tools",
+ "text": "Internet tools"
},
{
- "objectID": "posts/llms-and-data-pt0/index.html#marvin",
- "href": "posts/llms-and-data-pt0/index.html#marvin",
- "title": "An introduction to Marvin and Ibis",
- "section": "Marvin",
- "text": "Marvin\nMarvin is an AI engineering framework that makes it easy to build up to an interactive conversational application.\nMarvin makes calls to an AI platform. You typically use an API key set as an environment variable – in this case, we’ll load a .env file that contians secrets for the AI platform that Marvin will use. We also set the large language model model.\n\n1import marvin\n\nfrom rich import print\nfrom time import sleep\nfrom dotenv import load_dotenv\n\n2load_dotenv()\n\n# increase accuracy\n3marvin.settings.llm_model = \"openai/gpt-4\"\n# decrease cost\n# marvin.settings.llm_model = \"openai/gpt-3.5-turbo\"\n\n4test_str = \"working with data and LLMs on 18+ data platforms is easy!\"\ntest_str\n\n\n1\n\nImport the libraries we need.\n\n2\n\nLoad the environment variable to setup Marvin to call our OpenAI account.\n\n3\n\nConfigure the LLM model to use.\n\n4\n\nSome text to test on\n\n\n\n\n'working with data and LLMs on 18+ data platforms is easy!'\n\n\n\nFunctions\nAI functions are one of the building blocks in Marvin and allow yout to specify a typed python function with no code – only a docstring – to achieve a wide variety of tasks.\nWe’ll demonstrate this with an AI function that trnaslates text:\n\n@marvin.ai_fn\ndef translate(text: str, from_: str = \"English\", to: str = \"Spanish\") -> str:\n \"\"\"translates the text\"\"\"\n\ntranslate(test_str)\n\n'trabajar con datos y LLMs en más de 18 plataformas de datos es fácil!'\n\n\n\n\nCode\n1sleep(1)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\ntranslate(translate(test_str), from_=\"Spanish\", to=\"English\")\n\n'Working with data and LLMs on more than 18 data platforms is easy!'\n\n\n\n\nCode\n1sleep(3)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\n\nModels\nAI models are another building block for generating python classes from input text. It’s a great way to build structured data from unstructured data that can be customized for your needs.\nWe’ll demosntrate this with an AI model that translates text:\n\nfrom pydantic import BaseModel, Field\n\n# decrease cost\nmarvin.settings.llm_model = \"openai/gpt-3.5-turbo\"\n\n@marvin.ai_model\nclass ExtractParts(BaseModel):\n \"\"\"Extracts parts of a sentence\"\"\"\n subject: str = Field(..., description=\"The subject of the sentence.\")\n objects: list[str] = Field(..., description=\"The objects of the sentence.\")\n predicate: str = Field(..., description=\"The predicate of the sentence.\")\n modifiers: list[str] = Field(..., description=\"The modifiers of the sentence.\")\n\nExtractParts(test_str)\n\nExtractParts(subject='working', objects=['data', 'LLMs'], predicate='is', modifiers=['on 18+ data platforms', 'easy'])\n\n\n\n\nCode\n1sleep(1)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\n\nClassifiers\nAI classifiers are another building block for generating python classes from input text. It’s the most efficient (time and cost) method for applying LLMs as it only results in a single output token, selecting an output in a specified Enum.\nWe’ll demonstrate this by classifying the language of some text:\n\nfrom enum import Enum\n\n# increase accuracy\nmarvin.settings.llm_model = \"openai/gpt-4\"\n\n@marvin.ai_classifier\nclass IdentifyLanguage(Enum):\n \"\"\"Identifies the language of the text\"\"\"\n\n english = \"English\"\n spanish = \"Spanish\"\n\n\nIdentifyLanguage(test_str).value\n\n'English'\n\n\n\n\nCode\n1sleep(1)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\nIdentifyLanguage(translate(test_str)).value\n\n'Spanish'\n\n\n\n\nCode\n1sleep(3)\n\n\n\n1\n\nAvoid rate-limiting by waiting."
+ "objectID": "posts/llms-and-data-pt4/index.html#ai-powered-tools",
+ "href": "posts/llms-and-data-pt4/index.html#ai-powered-tools",
+ "title": "Auxiliary tools",
+ "section": "AI-powered tools",
+ "text": "AI-powered tools"
},
{
- "objectID": "posts/llms-and-data-pt0/index.html#ibis",
- "href": "posts/llms-and-data-pt0/index.html#ibis",
- "title": "An introduction to Marvin and Ibis",
- "section": "Ibis",
- "text": "Ibis\nIbis is the portable Python dataframe library that enables Ibis Birdbrain to work on many data platforms at native scale.\nIbis makes calls to a data platform, providing an API but pushing the compute to (local or remote) query engines and storage. DuckDB is the default and we’ll typically use it for demo puroses. You can work with an in-memory instance, but we’ll often create a database file from example data:\n\n1import ibis\n\n2con = ibis.connect(\"duckdb://penguins.ddb\")\nt = ibis.examples.penguins.fetch()\nt = con.create_table(\"penguins\", t.to_pyarrow(), overwrite=True)\n\n\n1\n\nImport the libraries we need.\n\n2\n\nSetup the demo datain an Ibis backend.\n\n\n\n\nYou will typically connect to an existing data platform via your corresponding Ibis backend and have access to a number of tables:\n\n1import ibis\n\n2ibis.options.interactive = True\n\n3con = ibis.connect(\"duckdb://penguins.ddb\")\nt = con.table(\"penguins\")\n\n\n1\n\nImport Ibis.\n\n2\n\nConfigure Ibis (interactive).\n\n3\n\nConnect to the data and load a table into a variable.\n\n\n\n\n\nBackend\nA backend provides the connection and basic management of the data platform. Above, we created the con variable that is an instance of a DuckDB backend:\n\ncon\n\n<ibis.backends.duckdb.Backend at 0x16a17af10>\n\n\nIt usually contains some tables:\n\ncon.list_tables()\n\n['penguins']\n\n\nWe can access some internals of Ibis to see what backends are available:\n\n\n\n\n\n\nTip\n\n\n\nDon’t rely on accessing internals of Ibis in production.\n\n\n\nbackends = [entrypoint.name for entrypoint in ibis.util.backend_entry_points()]\nbackends\n\n['bigquery',\n 'clickhouse',\n 'dask',\n 'datafusion',\n 'druid',\n 'duckdb',\n 'flink',\n 'impala',\n 'mssql',\n 'mysql',\n 'oracle',\n 'pandas',\n 'polars',\n 'postgres',\n 'pyspark',\n 'snowflake',\n 'sqlite',\n 'trino']\n\n\n\n\nTable\nYou typically work with a table, conventionally named t for demo or exploratory purposes:\n\nt\n\n┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │\n│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │\n│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │\n│ Adelie │ Torgersen │ nan │ nan │ NULL │ NULL │ NULL │ 2007 │\n│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │\n│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │\n│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │\n│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │\n│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │\n│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │\n│ … │ … │ … │ … │ … │ … │ … │ … │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n\n\n\nWhen working with many tables, you should name them descriptively.\n\n\nSchema\nA table has a schema that Ibis maps to the data platform’s data types:\n\nt.schema()\n\nibis.Schema {\n species string\n island string\n bill_length_mm float64\n bill_depth_mm float64\n flipper_length_mm int64\n body_mass_g int64\n sex string\n year int64\n}"
+ "objectID": "posts/llms-and-data-pt4/index.html#introducing-ibis-birdbrain",
+ "href": "posts/llms-and-data-pt4/index.html#introducing-ibis-birdbrain",
+ "title": "Auxiliary tools",
+ "section": "Introducing Ibis Birdbrain",
+ "text": "Introducing Ibis Birdbrain\nIntroduce the bot. Need to overview the tools here I think, but should probably skip most details."
},
{
- "objectID": "posts/llms-and-data-pt0/index.html#llms-and-data-marvin-and-ibis",
- "href": "posts/llms-and-data-pt0/index.html#llms-and-data-marvin-and-ibis",
- "title": "An introduction to Marvin and Ibis",
- "section": "LLMs and data: Marvin and Ibis",
- "text": "LLMs and data: Marvin and Ibis\nYou can use Marvin and Ibis together to easily apply LLMs to data.\n\nfrom ibis.expr.schema import Schema\nfrom ibis.expr.types.relations import Table\n\n@marvin.ai_fn\ndef sql_select(\n text: str, table_name: str = t.get_name(), schema: Schema = t.schema()\n) -> str:\n \"\"\"writes the SQL SELECT statement to query the table according to the text\"\"\"\n\n\nquery = \"the unique combination of species and islands\"\nsql = sql_select(query).strip(\";\")\nsql\n\n'SELECT DISTINCT species, island FROM penguins'\n\n\n\nt.sql(sql)\n\n┏━━━━━━━━━━━┳━━━━━━━━━━━┓\n┃ species ┃ island ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━┩\n│ string │ string │\n├───────────┼───────────┤\n│ Adelie │ Torgersen │\n│ Adelie │ Biscoe │\n│ Adelie │ Dream │\n│ Gentoo │ Biscoe │\n│ Chinstrap │ Dream │\n└───────────┴───────────┘\n\n\n\n\n\nCode\n1sleep(3)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\nt.sql(sql_select(query + \" and include their counts in from highest to lowest\").strip(\";\"))\n\n┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓\n┃ species ┃ island ┃ count ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩\n│ string │ string │ int64 │\n├───────────┼───────────┼───────┤\n│ Gentoo │ Biscoe │ 124 │\n│ Chinstrap │ Dream │ 68 │\n│ Adelie │ Dream │ 56 │\n│ Adelie │ Torgersen │ 52 │\n│ Adelie │ Biscoe │ 44 │\n└───────────┴───────────┴───────┘"
+ "objectID": "posts/llms-and-data-pt4/index.html#a-comparison-with-mlops",
+ "href": "posts/llms-and-data-pt4/index.html#a-comparison-with-mlops",
+ "title": "Auxiliary tools",
+ "section": "A comparison with MLOps",
+ "text": "A comparison with MLOps\nTODO: point on how most of the work is not ML\nBefore “MLOps” was a standard term, the Sculley et al paper from 2015 described the key issues with building real-world ML systems.\n\n\n\nFigure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surround infrastructure is vast and complex.\n\n\nYou can substitute “LLM” for “ML” in the above figure.\n\nThe “toy problem” problem\nML and LLMs are cool! They’re fun to play with and it’s easy to get distracted with fun applications. Often, ML is learned through solving toy problems, and …\n\n\nThe application landscape is vast\n…and thus requires modular, interoperable, customizable, and extensible tools. TODO: more comparison to MLOps."
},
{
- "objectID": "posts/llms-and-data-pt0/index.html#next-steps",
- "href": "posts/llms-and-data-pt0/index.html#next-steps",
- "title": "An introduction to Marvin and Ibis",
+ "objectID": "posts/llms-and-data-pt4/index.html#next-steps",
+ "href": "posts/llms-and-data-pt4/index.html#next-steps",
+ "title": "Auxiliary tools",
"section": "Next steps",
- "text": "Next steps\nYou can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data.\nRead the next post in this series."
+ "text": "Next steps\nYou can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data."
},
{
"objectID": "posts/llms-and-data-pt1/index.html",
@@ -221,54 +148,60 @@
"text": "Next steps\nYou can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data.\nRead the next post in this series."
},
{
- "objectID": "install.html",
- "href": "install.html",
- "title": "Installation and setup",
+ "objectID": "posts/llms-and-data-pt3/index.html",
+ "href": "posts/llms-and-data-pt3/index.html",
+ "title": "Composable data + AI systems",
"section": "",
- "text": "This page describes how to install and setup Ibis Birdbrain.\n\n\npip install ibis-birdbrain\n\n\n\n\n\n\n\n\n\nWarning\n\n\n\nDocumentation coming soon!\n\n\n\n\n\n\n\n\nWarning\n\n\n\nOnly DuckDB supported for now. All 18+ Ibis backends coming soon.\n\n\n\n\n\n\n\n\n\n\n\nWarning\n\n\n\nDocumentation coming soon!\n\n\n\n\n\n\n\n\nWarning\n\n\n\nOnly Azure OpenAI is supported for now. All Marvin providers coming soon.\n\n\nCreate the ~/.birdbrain/.env file:\nMARVIN_AZURE_OPENAI_API_KEY=\"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\"\nMARVIN_AZURE_OPENAI_API_BASE=\"https://birdbrain.openai.azure.com\"\nMARVIN_AZURE_OPENAI_DEPLOYMENT_NAME=\"birdbrain-4-32k\"\nMARVIN_AZURE_OPENAI_API_TYPE=\"azure\"\n\n\n\nLearn how get started with Ibis Birdbrain’s CLI.",
- "crumbs": [
- "Installation and setup"
- ]
+ "text": "When writing (data and AI) code, it’s important to use the right abstraction for your job to be done. A modular and composable data and AI system is easier than ever with the new generation of LLM-powered tools, but lacks a standard, modular, interoperable, customizable, and extensible framework. Can Ibis Birdbrain be that standard?"
},
{
- "objectID": "install.html#install-from-pypi",
- "href": "install.html#install-from-pypi",
- "title": "Installation and setup",
+ "objectID": "posts/llms-and-data-pt3/index.html#introduction",
+ "href": "posts/llms-and-data-pt3/index.html#introduction",
+ "title": "Composable data + AI systems",
"section": "",
- "text": "pip install ibis-birdbrain",
- "crumbs": [
- "Installation and setup"
- ]
+ "text": "When writing (data and AI) code, it’s important to use the right abstraction for your job to be done. A modular and composable data and AI system is easier than ever with the new generation of LLM-powered tools, but lacks a standard, modular, interoperable, customizable, and extensible framework. Can Ibis Birdbrain be that standard?"
},
{
- "objectID": "install.html#data-platform-setup",
- "href": "install.html#data-platform-setup",
- "title": "Installation and setup",
- "section": "",
- "text": "Warning\n\n\n\nDocumentation coming soon!\n\n\n\n\n\n\n\n\nWarning\n\n\n\nOnly DuckDB supported for now. All 18+ Ibis backends coming soon.",
- "crumbs": [
- "Installation and setup"
- ]
+ "objectID": "posts/llms-and-data-pt3/index.html#standards",
+ "href": "posts/llms-and-data-pt3/index.html#standards",
+ "title": "Composable data + AI systems",
+ "section": "Standards",
+ "text": "Standards\n\n\nWhen things don’t work as they should, it often means that standards are absent.\n- The International Organization for Standardization (ISO)\n\n- The Composable Codex\n\nThe standard that has emerged from the recent LLM applications is the power of a conversation in interacting with a computer. If we can give that interaction tht ability to interact with data platforms, we can achieve a lot through natural language."
},
{
- "objectID": "install.html#ai-platform-setup",
- "href": "install.html#ai-platform-setup",
- "title": "Installation and setup",
- "section": "",
- "text": "Warning\n\n\n\nDocumentation coming soon!\n\n\n\n\n\n\n\n\nWarning\n\n\n\nOnly Azure OpenAI is supported for now. All Marvin providers coming soon.\n\n\nCreate the ~/.birdbrain/.env file:\nMARVIN_AZURE_OPENAI_API_KEY=\"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\"\nMARVIN_AZURE_OPENAI_API_BASE=\"https://birdbrain.openai.azure.com\"\nMARVIN_AZURE_OPENAI_DEPLOYMENT_NAME=\"birdbrain-4-32k\"\nMARVIN_AZURE_OPENAI_API_TYPE=\"azure\"",
- "crumbs": [
- "Installation and setup"
- ]
+ "objectID": "posts/llms-and-data-pt3/index.html#composable-data-and-ai-systems-are-mice",
+ "href": "posts/llms-and-data-pt3/index.html#composable-data-and-ai-systems-are-mice",
+ "title": "Composable data + AI systems",
+ "section": "Composable data (and AI) systems are MICE",
+ "text": "Composable data (and AI) systems are MICE\nComposable data systems are MICE: modular, interoperable, customizable, and extensible.\nWith any new tech, there’s a lot of duplication of effort and reinvention of wheels. Ibis Birdbrain aims to be an open-source standard for building data and AI systems in Python that adheres to MICE best practices and works with any data and AI platform. It achieves this by building on top of Ibis and Marvin, handling calls to the data and AI platforms respectively. Ibis and Marvin are themselves built on many other open-source projects, but provide the right level of abstraction for building composable data and AI systems, independently. Ibis Birdbrain glues them together enables a natural language interface that’s interoperable between 18+ data platforms and any AI platform Marvin supports.\nglue that enables working via natural language with any data platform Ibis supports and any AI platform Marvin supports.\n\n\n\n\n\n\nImportant\n\n\n\nIbis Birdbrain is very early in development and looking for community input and contributors. We believe it’s important for there to be a standard, and I can see Ibis Birdbrain going in several directions:\n\nA useful template for building a language user interface with data tools\nA standard for building data and AI systems with MICE components\n\nIf you’re interested in either of these directions, please get involved with the project!\nIbis Birdbrain is under Ibis project governance."
},
{
- "objectID": "install.html#next-steps",
- "href": "install.html#next-steps",
- "title": "Installation and setup",
+ "objectID": "posts/llms-and-data-pt3/index.html#looking-ahead",
+ "href": "posts/llms-and-data-pt3/index.html#looking-ahead",
+ "title": "Composable data + AI systems",
+ "section": "Looking ahead",
+ "text": "Looking ahead\nI am not a software engineer, I just saw a cool application of Ibis and Marvin. The objective of releasing Ibis Birdbrain as an open-source project under Ibis governance is to enable the community to build a standard for composable data and AI systems. My hope is for the Ibis, Marvin, and broader data/AI Python communities to build the standard in the open, and properietary applications on top of those standards. This is important for a composable data ecosystem that can be modular, interoperable, customizable, and extensible."
+ },
+ {
+ "objectID": "posts/llms-and-data-pt3/index.html#how-is-ibis-birdbrain-mice-now-whats-next",
+ "href": "posts/llms-and-data-pt3/index.html#how-is-ibis-birdbrain-mice-now-whats-next",
+ "title": "Composable data + AI systems",
+ "section": "How is Ibis Birdbrain MICE now? What’s next?",
+ "text": "How is Ibis Birdbrain MICE now? What’s next?"
+ },
+ {
+ "objectID": "posts/llms-and-data-pt3/index.html#next-steps",
+ "href": "posts/llms-and-data-pt3/index.html#next-steps",
+ "title": "Composable data + AI systems",
+ "section": "Next steps",
+ "text": "Next steps\nYou can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data.\nRead the next post in this series, building Ibis Birdbrain."
+ },
+ {
+ "objectID": "demo.html",
+ "href": "demo.html",
+ "title": "demo",
"section": "",
- "text": "Learn how get started with Ibis Birdbrain’s CLI.",
- "crumbs": [
- "Installation and setup"
- ]
+ "text": "Back to top"
},
{
"objectID": "contribute/contributing.html",
@@ -293,133 +226,233 @@
]
},
{
- "objectID": "tutorials/cli.html",
- "href": "tutorials/cli.html",
- "title": "Tutorial: CLI",
+ "objectID": "tutorials/python.html",
+ "href": "tutorials/python.html",
+ "title": "Tutorial: Python",
"section": "",
- "text": "Install Ibis Birdbrain\n\n\n\n\nWith Ibis Birdbrain installed, you can use the birdbrain command-line interface (CLI):\n$ birdbrain\n Usage: birdbrain [OPTIONS] COMMAND [ARGS]...\n\n╭─ Options ──────────────────────────────────────────────────────────────────────╮\n│ --version Show version. │\n│ --install-completion Install completion for the current shell. │\n│ --show-completion Show completion for the current shell, to copy │\n│ it or customize the installation. │\n│ --help Show this message and exit. │\n╰────────────────────────────────────────────────────────────────────────────────╯\n╭─ Commands ─────────────────────────────────────────────────────────────────────╮\n│ ipy ipy │\n│ test test │\n╰────────────────────────────────────────────────────────────────────────────────╯\n\n\n\nYou can use the ipy subcommand to start an interactive Python session with Ibis Birdbrain ready to use:\n$ birdbrain ipy\naccess to: birdbrain\nmodel: azure_openai/gpt-4-32k\nPython 3.11.5 (main, Sep 14 2023, 13:17:51) [Clang 14.0.3 (clang-1403.0.22.14.1)]\nType 'copyright', 'credits' or 'license' for more information\nIPython 8.16.0 -- An enhanced Interactive Python. Type '?' for help.\n\n[ins] In [1]: birdbrain\nOut[1]: <Bot: birdbrain>\n\n[ins] In [2]:\n\n\n\nLearn how to work with Ibis Birdbrain in Python.",
+ "text": "Install Ibis Birdbrain\n\n\n\n\nYou can use Ibis Birdbrain as a Python library, but it is often used through an interactive Python session. You can see how to start one with the birdbrain CLI with Ibis Birdbrain ready to use.\nIf you use the CLI, you’ll have access to the ai/bot/birdbrain object, which is an instance of the ibis_birdbrain.ai.Bot class. This bot has access to numerous tools and can be used to interact with data via Ibis.\n\n\n\nYou can ask the bot for help with its usage:\nbot(\"how do I use you?\")\n\nbirdbrain:\nI’m birdbrain, your assistant for data analysis projects using Ibis, AI, and many other data tools. Here’s how you can interact with me:\n\nAsk Questions: You can ask me anything related to Ibis, AI, and data analysis. For example, you can ask about how to use certain features in Ibis, how to code in Python, or ask for help in debugging your code.\nRun Analysis: You can also ask me to run data analysis for you. You can give me instructions to query data, analyze it, and even generate visualizations.\nLearning Resources: If you’re looking for resources to learn more about Ibis, AI, and data analysis, I can provide you with relevant links to documentation, tutorials, and more.\nFile Operations: I can help with file operations such as reading a file, writing to a file, listing files in a directory, etc.\nTranslation & Summarization: If you have a large piece of text or a document that you want to summarize, or if you want to translate text from one language to another, I can help with that too.\n\nRemember, I can only provide accurate information and perform tasks based on the data I have access to. I’m here to assist you. Let’s explore the world of data together!\n\n\n\n\nInteract via natural language to have the bot use tools on your behalf.\n\n\nIbis Birdbrain can read data files (limited to Delta Lake tables and Excel files, currently), registering them as tables in your Ibis backend. If you connect to a data platform with existing tables via Ibis, you will have access to them through the bot.\nIn this example, we’ll read in a local penguins.xlsx file and ask the bot about it:\nbot(\"read in the penguins excel\")\n\nbirdbrain:\nI’ve successfully read in the penguins Excel file. Here’s a preview of the data:\n\n\n\n\n\n\n\n\n\n\n\n\n\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\nyear\n\n\n\n\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n2007\n\n\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n2007\n\n\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n2007\n\n\nAdelie\nTorgersen\nNaN\nNaN\n-\n-\n-\n2007\n\n\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n2007\n\n\n\n(and so on…) ) Note: Null values are represented as NaN (Not a Number) or ‘-’ for numerical columns, and as NULL for string columns.\n\nbot(\"what tables do you have access to? give an overview of the data\")\n\nbirdbrain:\nCurrently, I have access to one table named ‘penguins’.\nHere’s an overview of the data in it:\n\nspecies: This is a string column that gives the species of the penguin.\nisland: This is a string column that mentions the island the penguin is from.\nbill_length_mm: This is a float64 column that specifies the length of the penguin’s bill in millimeters. 4. bill_depth_mm: This is a float64 column that specifies the depth of the penguin’s bill in millimeters.\nflipper_length_mm: This is an int64 column that provides the length of the penguin’s flipper in millimeters.\nbody_mass_g: This is an int64 column that gives the body mass of the penguin in grams.\nsex: This is a string column that mentions the sex of the penguin.\nyear: This is an int64 column that gives the year of observation.\n\nPlease note that a string field can contain text, a float64 field can contain any real number (including decimal values), and an int64 field can contain any integer.\n\n\n\n\nbot(\"what is this dataset? search the internet and find out more about its origin for me\")\n\nbirdbrain:\nThe penguins dataset, a possible alternative to the classic Iris dataset, contains data on three species of penguins from three islands in the Palmer Archipelago, Antarctica [1][2][3].\nThe data comprises measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex [3]. It provides a good mix of numeric and categorical data for various data exploration and machine learning tasks.\nThe dataset was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER) [2].\nThe purpose of the Palmer LTER is to study the marine ecosystem in the western Antarctic Peninsula region as a microcosm of global ocean ecosystems, focusing on the processes that link the extent of annual sea ice to the biological dynamics of different trophic levels [2].\nSources: [1] Penguin dataset : The new Iris | Kaggle [2] (https://allisonhorst.github.io/palmerpenguins/) [3] Penguins Dataset Overview — iris alternative | by AbdulMajedRaja RS | Towards Data Science\n\n\n\n\nbot(\"read in the readme; I don't think it contians much, draft a better version for me based on this session\")\n\nbirdbrain:\nThe ‘penguins’ project now has a new and improved README:\n\nThis repository contains an exploration and analysis of the penguins dataset. The data was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER).\nThe penguins dataset provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica. It includes measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex. The dataset presents a rich mix of numeric and categorical data, making it an excellent resource for various data exploration and machine learning tasks.\n\nThe data for this project is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python. The data is then available for analysis and visualization.\n\nDetailed analysis on this data is yet to be performed. The objective is to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\n\nContributions are welcome. Please read the CONTRIBUTING.md for details.\n\nThis project is licensed under the terms of the MIT License.\nThis summary gives a good overview of the data involved in the project, the purpose of the project, and how others can contribute.\n\nbot(\"only capitalize the first word in headings; remove the contributing/license stuff, this is buried deep in a repo. simplify and just include basic info, then write the file out for me\")\n\nbirdbrain:\nHere’s a simplified version of the README:\n\nThis project involves an exploration and analysis of the penguins dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\n\nThe penguins dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\n\nThe project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\nI’ll now write this content to the README file.\n\nbot(\"write away!\")\n\nbirdbrain:\nI’ve successfully written the updated README to the file ./README.md.\n\ncat README.md\n\n\nThis project involves an exploration and analysis of the penguins dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\n\nThe penguins dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\n\nThe project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\n\n\n\n\n\nFrom a new session:\n[ins] In [1]: bot(\"read in the readme\")\n\nbirdbrain:\nThe README file contains information about a data analysis project involving the penguins dataset. This dataset, collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), includes details on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\nThe dataset contains measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\nThe project aims to use this data to derive meaningful insights about the penguin species, their physical characteristics, and their distribution across the islands.\n\n[ins] In [2]: bot.save_history()\n\n[ins] In [3]: bb_con = ibis.connect(\"duckdb://birdbrain.ddb\")\n\n[ins] In [4]: bb_con.list_tables()\nOut[4]: ['history']\n\n[ins] In [5]: t = bb_con.table(\"history\")\n\n[ins] In [6]: t\nOut[6]:\n┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓\n┃ convo_id ┃ convo_timestamp ┃ bot_name ┃ message_content ┃ message_timestamp ┃ function_name ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩\n│ string │ timestamp │ string │ string │ timestamp('UTC') │ string │\n├────────────────────────────┼────────────────────────────┼───────────┼───────────────────────────┼──────────────────────────────────┼─────────────────────┤\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ read in the readme │ 2023-10-01 16:40:31.961856+00:00 │ ~ │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ ['./requirements.txt', '… │ 2023-10-01 16:40:34.391470+00:00 │ list_files_and_dirs │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ # Penguins data analysis… │ 2023-10-01 16:40:36.201490+00:00 │ read_file │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ The README file contains… │ 2023-10-01 16:40:43.166277+00:00 │ ~ │\n└────────────────────────────┴────────────────────────────┴───────────┴───────────────────────────┴──────────────────────────────────┴─────────────────────┘\n\n\n\nYou can ask the bot what tools it has access to and you should get an accurate answer, but you can also directly check. The Bot object is a very thin wrapper around a Marvin AI Application. You can access the Bot.ai attribute to get the underlying AIApplication object.\n[ins] In [1]: print(bot.ai.tools)\n[\n Tool(\n name='open_browser',\n description='Opens the URL in a web browser.',\n fn=<function open_browser at 0x11f22fc40>\n ),\n Tool(\n name='search_internet',\n description='Searches the internet for the given query.',\n fn=<function search_internet at 0x104890400>\n ),\n Tool(\n name='webpage_to_str',\n description='Reads a webpage link into a string. Useful for summarizing\nwebpages.',\n fn=<function webpage_to_str at 0x11f22fba0>\n ),\n Tool(\n name='summarize_text',\n description='Summarizes text',\n fn=<function summarize_text at 0x11f22fe20>\n ),\n Tool(\n name='translate_text',\n description='Translates text',\n fn=<function translate_text at 0x11f22fec0>\n ),\n Tool(\n name='read_file',\n description='Reads a file and returns its content.',\n fn=<function read_file at 0x11f25d260>\n ),\n Tool(\n name='list_files_and_dirs',\n description='Lists all files and directories in a directory.',\n fn=<function list_files_and_dirs at 0x11f25d440>\n ),\n Tool(\n name='write_file',\n description='Writes a file.',\n fn=<function write_file at 0x11f25d4e0>\n ),\n Tool(\n name='text_to_python',\n description='Returns a Python code given text.',\n fn=<function text_to_python at 0x11f22e3e0>\n ),\n Tool(\n name='fix_python_error',\n description='Fixes a Python error in the code.',\n fn=<function fix_python_error at 0x11f25cb80>\n ),\n Tool(\n name='run_python_code',\n description='Execute Python code as a string and return the output',\n fn=<function run_python_code at 0x11f25ce00>\n ),\n Tool(\n name='list_tables',\n description='Returns a list of available tables to query',\n fn=<function list_tables at 0x12a16e840>\n ),\n Tool(\n name='query_table',\n description='Queries the table in the database to answer the question',\n fn=<function query_table at 0x12a16e980>\n ),\n Tool(\n name='get_table_schema',\n description='Returns the schema of a table',\n fn=<function get_table_schema at 0x12a16e5c0>\n ),\n Tool(\n name='read_delta_table',\n description='Reads a Delta Lake table directory from the full filepath\\n\\n\nfilepath should be of the form: <path>/<table_name>\\n ',\n fn=<function read_delta_table at 0x12a16e7a0>\n ),\n Tool(\n name='read_excel_file',\n description='Reads an Excel file from the full filepath\\n\\n filepath\nshould be of the form: <path>/<table_name>.<extension>\\n ',\n fn=<function read_excel_file at 0x12a16e8e0>\n ),\n Tool(\n name='read_files_and_summarize',\n description='Read a list of files and summarize them.',\n fn=<function read_files_and_summarize at 0x12a16eb60>\n ),\n Tool(\n name='read_webpage_and_summarize',\n description='Read a webpage and summarize it.',\n fn=<function read_webpage_and_summarize at 0x12a16ea20>\n )\n]\n\n\n\nExplore some data with Ibis Birdbrain and let us know how it goes!",
"crumbs": [
- "Tutorial: CLI"
+ "Tutorial: Python"
]
},
{
- "objectID": "tutorials/cli.html#prerequisites",
- "href": "tutorials/cli.html#prerequisites",
- "title": "Tutorial: CLI",
+ "objectID": "tutorials/python.html#prerequisites",
+ "href": "tutorials/python.html#prerequisites",
+ "title": "Tutorial: Python",
"section": "",
"text": "Install Ibis Birdbrain",
"crumbs": [
- "Tutorial: CLI"
+ "Tutorial: Python"
]
},
{
- "objectID": "tutorials/cli.html#overview",
- "href": "tutorials/cli.html#overview",
- "title": "Tutorial: CLI",
+ "objectID": "tutorials/python.html#overview",
+ "href": "tutorials/python.html#overview",
+ "title": "Tutorial: Python",
"section": "",
- "text": "With Ibis Birdbrain installed, you can use the birdbrain command-line interface (CLI):\n$ birdbrain\n Usage: birdbrain [OPTIONS] COMMAND [ARGS]...\n\n╭─ Options ──────────────────────────────────────────────────────────────────────╮\n│ --version Show version. │\n│ --install-completion Install completion for the current shell. │\n│ --show-completion Show completion for the current shell, to copy │\n│ it or customize the installation. │\n│ --help Show this message and exit. │\n╰────────────────────────────────────────────────────────────────────────────────╯\n╭─ Commands ─────────────────────────────────────────────────────────────────────╮\n│ ipy ipy │\n│ test test │\n╰────────────────────────────────────────────────────────────────────────────────╯",
+ "text": "You can use Ibis Birdbrain as a Python library, but it is often used through an interactive Python session. You can see how to start one with the birdbrain CLI with Ibis Birdbrain ready to use.\nIf you use the CLI, you’ll have access to the ai/bot/birdbrain object, which is an instance of the ibis_birdbrain.ai.Bot class. This bot has access to numerous tools and can be used to interact with data via Ibis.",
"crumbs": [
- "Tutorial: CLI"
+ "Tutorial: Python"
]
},
{
- "objectID": "tutorials/cli.html#starting-an-interactive-python-session",
- "href": "tutorials/cli.html#starting-an-interactive-python-session",
- "title": "Tutorial: CLI",
+ "objectID": "tutorials/python.html#getting-help",
+ "href": "tutorials/python.html#getting-help",
+ "title": "Tutorial: Python",
"section": "",
- "text": "You can use the ipy subcommand to start an interactive Python session with Ibis Birdbrain ready to use:\n$ birdbrain ipy\naccess to: birdbrain\nmodel: azure_openai/gpt-4-32k\nPython 3.11.5 (main, Sep 14 2023, 13:17:51) [Clang 14.0.3 (clang-1403.0.22.14.1)]\nType 'copyright', 'credits' or 'license' for more information\nIPython 8.16.0 -- An enhanced Interactive Python. Type '?' for help.\n\n[ins] In [1]: birdbrain\nOut[1]: <Bot: birdbrain>\n\n[ins] In [2]:",
+ "text": "You can ask the bot for help with its usage:\nbot(\"how do I use you?\")\n\nbirdbrain:\nI’m birdbrain, your assistant for data analysis projects using Ibis, AI, and many other data tools. Here’s how you can interact with me:\n\nAsk Questions: You can ask me anything related to Ibis, AI, and data analysis. For example, you can ask about how to use certain features in Ibis, how to code in Python, or ask for help in debugging your code.\nRun Analysis: You can also ask me to run data analysis for you. You can give me instructions to query data, analyze it, and even generate visualizations.\nLearning Resources: If you’re looking for resources to learn more about Ibis, AI, and data analysis, I can provide you with relevant links to documentation, tutorials, and more.\nFile Operations: I can help with file operations such as reading a file, writing to a file, listing files in a directory, etc.\nTranslation & Summarization: If you have a large piece of text or a document that you want to summarize, or if you want to translate text from one language to another, I can help with that too.\n\nRemember, I can only provide accurate information and perform tasks based on the data I have access to. I’m here to assist you. Let’s explore the world of data together!",
"crumbs": [
- "Tutorial: CLI"
+ "Tutorial: Python"
]
},
{
- "objectID": "tutorials/cli.html#next-steps",
- "href": "tutorials/cli.html#next-steps",
- "title": "Tutorial: CLI",
+ "objectID": "tutorials/python.html#using-a-tool",
+ "href": "tutorials/python.html#using-a-tool",
+ "title": "Tutorial: Python",
"section": "",
- "text": "Learn how to work with Ibis Birdbrain in Python.",
+ "text": "Interact via natural language to have the bot use tools on your behalf.\n\n\nIbis Birdbrain can read data files (limited to Delta Lake tables and Excel files, currently), registering them as tables in your Ibis backend. If you connect to a data platform with existing tables via Ibis, you will have access to them through the bot.\nIn this example, we’ll read in a local penguins.xlsx file and ask the bot about it:\nbot(\"read in the penguins excel\")\n\nbirdbrain:\nI’ve successfully read in the penguins Excel file. Here’s a preview of the data:\n\n\n\n\n\n\n\n\n\n\n\n\n\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\nyear\n\n\n\n\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n2007\n\n\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n2007\n\n\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n2007\n\n\nAdelie\nTorgersen\nNaN\nNaN\n-\n-\n-\n2007\n\n\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n2007\n\n\n\n(and so on…) ) Note: Null values are represented as NaN (Not a Number) or ‘-’ for numerical columns, and as NULL for string columns.\n\nbot(\"what tables do you have access to? give an overview of the data\")\n\nbirdbrain:\nCurrently, I have access to one table named ‘penguins’.\nHere’s an overview of the data in it:\n\nspecies: This is a string column that gives the species of the penguin.\nisland: This is a string column that mentions the island the penguin is from.\nbill_length_mm: This is a float64 column that specifies the length of the penguin’s bill in millimeters. 4. bill_depth_mm: This is a float64 column that specifies the depth of the penguin’s bill in millimeters.\nflipper_length_mm: This is an int64 column that provides the length of the penguin’s flipper in millimeters.\nbody_mass_g: This is an int64 column that gives the body mass of the penguin in grams.\nsex: This is a string column that mentions the sex of the penguin.\nyear: This is an int64 column that gives the year of observation.\n\nPlease note that a string field can contain text, a float64 field can contain any real number (including decimal values), and an int64 field can contain any integer.\n\n\n\n\nbot(\"what is this dataset? search the internet and find out more about its origin for me\")\n\nbirdbrain:\nThe penguins dataset, a possible alternative to the classic Iris dataset, contains data on three species of penguins from three islands in the Palmer Archipelago, Antarctica [1][2][3].\nThe data comprises measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex [3]. It provides a good mix of numeric and categorical data for various data exploration and machine learning tasks.\nThe dataset was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER) [2].\nThe purpose of the Palmer LTER is to study the marine ecosystem in the western Antarctic Peninsula region as a microcosm of global ocean ecosystems, focusing on the processes that link the extent of annual sea ice to the biological dynamics of different trophic levels [2].\nSources: [1] Penguin dataset : The new Iris | Kaggle [2] (https://allisonhorst.github.io/palmerpenguins/) [3] Penguins Dataset Overview — iris alternative | by AbdulMajedRaja RS | Towards Data Science\n\n\n\n\nbot(\"read in the readme; I don't think it contians much, draft a better version for me based on this session\")\n\nbirdbrain:\nThe ‘penguins’ project now has a new and improved README:\n\nThis repository contains an exploration and analysis of the penguins dataset. The data was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER).\nThe penguins dataset provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica. It includes measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex. The dataset presents a rich mix of numeric and categorical data, making it an excellent resource for various data exploration and machine learning tasks.\n\nThe data for this project is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python. The data is then available for analysis and visualization.\n\nDetailed analysis on this data is yet to be performed. The objective is to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\n\nContributions are welcome. Please read the CONTRIBUTING.md for details.\n\nThis project is licensed under the terms of the MIT License.\nThis summary gives a good overview of the data involved in the project, the purpose of the project, and how others can contribute.\n\nbot(\"only capitalize the first word in headings; remove the contributing/license stuff, this is buried deep in a repo. simplify and just include basic info, then write the file out for me\")\n\nbirdbrain:\nHere’s a simplified version of the README:\n\nThis project involves an exploration and analysis of the penguins dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\n\nThe penguins dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\n\nThe project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\nI’ll now write this content to the README file.\n\nbot(\"write away!\")\n\nbirdbrain:\nI’ve successfully written the updated README to the file ./README.md.\n\ncat README.md\n\n\nThis project involves an exploration and analysis of the penguins dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\n\nThe penguins dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\n\nThe project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.",
"crumbs": [
- "Tutorial: CLI"
+ "Tutorial: Python"
]
},
{
- "objectID": "tutorials/python.html",
- "href": "tutorials/python.html",
+ "objectID": "tutorials/python.html#saving-and-loading-conversation-history",
+ "href": "tutorials/python.html#saving-and-loading-conversation-history",
"title": "Tutorial: Python",
"section": "",
- "text": "Install Ibis Birdbrain\n\n\n\n\nYou can use Ibis Birdbrain as a Python library, but it is often used through an interactive Python session. You can see how to start one with the birdbrain CLI with Ibis Birdbrain ready to use.\nIf you use the CLI, you’ll have access to the ai/bot/birdbrain object, which is an instance of the ibis_birdbrain.ai.Bot class. This bot has access to numerous tools and can be used to interact with data via Ibis.\n\n\n\nYou can ask the bot for help with its usage:\nbot(\"how do I use you?\")\n\nbirdbrain:\nI’m birdbrain, your assistant for data analysis projects using Ibis, AI, and many other data tools. Here’s how you can interact with me:\n\nAsk Questions: You can ask me anything related to Ibis, AI, and data analysis. For example, you can ask about how to use certain features in Ibis, how to code in Python, or ask for help in debugging your code.\nRun Analysis: You can also ask me to run data analysis for you. You can give me instructions to query data, analyze it, and even generate visualizations.\nLearning Resources: If you’re looking for resources to learn more about Ibis, AI, and data analysis, I can provide you with relevant links to documentation, tutorials, and more.\nFile Operations: I can help with file operations such as reading a file, writing to a file, listing files in a directory, etc.\nTranslation & Summarization: If you have a large piece of text or a document that you want to summarize, or if you want to translate text from one language to another, I can help with that too.\n\nRemember, I can only provide accurate information and perform tasks based on the data I have access to. I’m here to assist you. Let’s explore the world of data together!\n\n\n\n\nInteract via natural language to have the bot use tools on your behalf.\n\n\nIbis Birdbrain can read data files (limited to Delta Lake tables and Excel files, currently), registering them as tables in your Ibis backend. If you connect to a data platform with existing tables via Ibis, you will have access to them through the bot.\nIn this example, we’ll read in a local penguins.xlsx file and ask the bot about it:\nbot(\"read in the penguins excel\")\n\nbirdbrain:\nI’ve successfully read in the penguins Excel file. Here’s a preview of the data:\n\n\n\n\n\n\n\n\n\n\n\n\n\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\nyear\n\n\n\n\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n2007\n\n\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n2007\n\n\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n2007\n\n\nAdelie\nTorgersen\nNaN\nNaN\n-\n-\n-\n2007\n\n\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n2007\n\n\n\n(and so on…) ) Note: Null values are represented as NaN (Not a Number) or ‘-’ for numerical columns, and as NULL for string columns.\n\nbot(\"what tables do you have access to? give an overview of the data\")\n\nbirdbrain:\nCurrently, I have access to one table named ‘penguins’.\nHere’s an overview of the data in it:\n\nspecies: This is a string column that gives the species of the penguin.\nisland: This is a string column that mentions the island the penguin is from.\nbill_length_mm: This is a float64 column that specifies the length of the penguin’s bill in millimeters. 4. bill_depth_mm: This is a float64 column that specifies the depth of the penguin’s bill in millimeters.\nflipper_length_mm: This is an int64 column that provides the length of the penguin’s flipper in millimeters.\nbody_mass_g: This is an int64 column that gives the body mass of the penguin in grams.\nsex: This is a string column that mentions the sex of the penguin.\nyear: This is an int64 column that gives the year of observation.\n\nPlease note that a string field can contain text, a float64 field can contain any real number (including decimal values), and an int64 field can contain any integer.\n\n\n\n\nbot(\"what is this dataset? search the internet and find out more about its origin for me\")\n\nbirdbrain:\nThe penguins dataset, a possible alternative to the classic Iris dataset, contains data on three species of penguins from three islands in the Palmer Archipelago, Antarctica [1][2][3].\nThe data comprises measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex [3]. It provides a good mix of numeric and categorical data for various data exploration and machine learning tasks.\nThe dataset was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER) [2].\nThe purpose of the Palmer LTER is to study the marine ecosystem in the western Antarctic Peninsula region as a microcosm of global ocean ecosystems, focusing on the processes that link the extent of annual sea ice to the biological dynamics of different trophic levels [2].\nSources: [1] Penguin dataset : The new Iris | Kaggle [2] (https://allisonhorst.github.io/palmerpenguins/) [3] Penguins Dataset Overview — iris alternative | by AbdulMajedRaja RS | Towards Data Science\n\n\n\n\nbot(\"read in the readme; I don't think it contians much, draft a better version for me based on this session\")\n\nbirdbrain:\nThe ‘penguins’ project now has a new and improved README:\n\nThis repository contains an exploration and analysis of the penguins dataset. The data was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER).\nThe penguins dataset provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica. It includes measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex. The dataset presents a rich mix of numeric and categorical data, making it an excellent resource for various data exploration and machine learning tasks.\n\nThe data for this project is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python. The data is then available for analysis and visualization.\n\nDetailed analysis on this data is yet to be performed. The objective is to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\n\nContributions are welcome. Please read the CONTRIBUTING.md for details.\n\nThis project is licensed under the terms of the MIT License.\nThis summary gives a good overview of the data involved in the project, the purpose of the project, and how others can contribute.\n\nbot(\"only capitalize the first word in headings; remove the contributing/license stuff, this is buried deep in a repo. simplify and just include basic info, then write the file out for me\")\n\nbirdbrain:\nHere’s a simplified version of the README:\n\nThis project involves an exploration and analysis of the penguins dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\n\nThe penguins dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\n\nThe project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\nI’ll now write this content to the README file.\n\nbot(\"write away!\")\n\nbirdbrain:\nI’ve successfully written the updated README to the file ./README.md.\n\ncat README.md\n\n\nThis project involves an exploration and analysis of the penguins dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\n\nThe penguins dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\n\nThe project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\n\n\n\n\n\nFrom a new session:\n[ins] In [1]: bot(\"read in the readme\")\n\nbirdbrain:\nThe README file contains information about a data analysis project involving the penguins dataset. This dataset, collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), includes details on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\nThe dataset contains measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\nThe project aims to use this data to derive meaningful insights about the penguin species, their physical characteristics, and their distribution across the islands.\n\n[ins] In [2]: bot.save_history()\n\n[ins] In [3]: bb_con = ibis.connect(\"duckdb://birdbrain.ddb\")\n\n[ins] In [4]: bb_con.list_tables()\nOut[4]: ['history']\n\n[ins] In [5]: t = bb_con.table(\"history\")\n\n[ins] In [6]: t\nOut[6]:\n┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓\n┃ convo_id ┃ convo_timestamp ┃ bot_name ┃ message_content ┃ message_timestamp ┃ function_name ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩\n│ string │ timestamp │ string │ string │ timestamp('UTC') │ string │\n├────────────────────────────┼────────────────────────────┼───────────┼───────────────────────────┼──────────────────────────────────┼─────────────────────┤\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ read in the readme │ 2023-10-01 16:40:31.961856+00:00 │ ~ │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ ['./requirements.txt', '… │ 2023-10-01 16:40:34.391470+00:00 │ list_files_and_dirs │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ # Penguins data analysis… │ 2023-10-01 16:40:36.201490+00:00 │ read_file │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ The README file contains… │ 2023-10-01 16:40:43.166277+00:00 │ ~ │\n└────────────────────────────┴────────────────────────────┴───────────┴───────────────────────────┴──────────────────────────────────┴─────────────────────┘\n\n\n\nYou can ask the bot what tools it has access to and you should get an accurate answer, but you can also directly check. The Bot object is a very thin wrapper around a Marvin AI Application. You can access the Bot.ai attribute to get the underlying AIApplication object.\n[ins] In [1]: print(bot.ai.tools)\n[\n Tool(\n name='open_browser',\n description='Opens the URL in a web browser.',\n fn=<function open_browser at 0x11f22fc40>\n ),\n Tool(\n name='search_internet',\n description='Searches the internet for the given query.',\n fn=<function search_internet at 0x104890400>\n ),\n Tool(\n name='webpage_to_str',\n description='Reads a webpage link into a string. Useful for summarizing\nwebpages.',\n fn=<function webpage_to_str at 0x11f22fba0>\n ),\n Tool(\n name='summarize_text',\n description='Summarizes text',\n fn=<function summarize_text at 0x11f22fe20>\n ),\n Tool(\n name='translate_text',\n description='Translates text',\n fn=<function translate_text at 0x11f22fec0>\n ),\n Tool(\n name='read_file',\n description='Reads a file and returns its content.',\n fn=<function read_file at 0x11f25d260>\n ),\n Tool(\n name='list_files_and_dirs',\n description='Lists all files and directories in a directory.',\n fn=<function list_files_and_dirs at 0x11f25d440>\n ),\n Tool(\n name='write_file',\n description='Writes a file.',\n fn=<function write_file at 0x11f25d4e0>\n ),\n Tool(\n name='text_to_python',\n description='Returns a Python code given text.',\n fn=<function text_to_python at 0x11f22e3e0>\n ),\n Tool(\n name='fix_python_error',\n description='Fixes a Python error in the code.',\n fn=<function fix_python_error at 0x11f25cb80>\n ),\n Tool(\n name='run_python_code',\n description='Execute Python code as a string and return the output',\n fn=<function run_python_code at 0x11f25ce00>\n ),\n Tool(\n name='list_tables',\n description='Returns a list of available tables to query',\n fn=<function list_tables at 0x12a16e840>\n ),\n Tool(\n name='query_table',\n description='Queries the table in the database to answer the question',\n fn=<function query_table at 0x12a16e980>\n ),\n Tool(\n name='get_table_schema',\n description='Returns the schema of a table',\n fn=<function get_table_schema at 0x12a16e5c0>\n ),\n Tool(\n name='read_delta_table',\n description='Reads a Delta Lake table directory from the full filepath\\n\\n\nfilepath should be of the form: <path>/<table_name>\\n ',\n fn=<function read_delta_table at 0x12a16e7a0>\n ),\n Tool(\n name='read_excel_file',\n description='Reads an Excel file from the full filepath\\n\\n filepath\nshould be of the form: <path>/<table_name>.<extension>\\n ',\n fn=<function read_excel_file at 0x12a16e8e0>\n ),\n Tool(\n name='read_files_and_summarize',\n description='Read a list of files and summarize them.',\n fn=<function read_files_and_summarize at 0x12a16eb60>\n ),\n Tool(\n name='read_webpage_and_summarize',\n description='Read a webpage and summarize it.',\n fn=<function read_webpage_and_summarize at 0x12a16ea20>\n )\n]\n\n\n\nExplore some data with Ibis Birdbrain and let us know how it goes!",
+ "text": "From a new session:\n[ins] In [1]: bot(\"read in the readme\")\n\nbirdbrain:\nThe README file contains information about a data analysis project involving the penguins dataset. This dataset, collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), includes details on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\nThe dataset contains measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\nThe project aims to use this data to derive meaningful insights about the penguin species, their physical characteristics, and their distribution across the islands.\n\n[ins] In [2]: bot.save_history()\n\n[ins] In [3]: bb_con = ibis.connect(\"duckdb://birdbrain.ddb\")\n\n[ins] In [4]: bb_con.list_tables()\nOut[4]: ['history']\n\n[ins] In [5]: t = bb_con.table(\"history\")\n\n[ins] In [6]: t\nOut[6]:\n┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓\n┃ convo_id ┃ convo_timestamp ┃ bot_name ┃ message_content ┃ message_timestamp ┃ function_name ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩\n│ string │ timestamp │ string │ string │ timestamp('UTC') │ string │\n├────────────────────────────┼────────────────────────────┼───────────┼───────────────────────────┼──────────────────────────────────┼─────────────────────┤\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ read in the readme │ 2023-10-01 16:40:31.961856+00:00 │ ~ │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ ['./requirements.txt', '… │ 2023-10-01 16:40:34.391470+00:00 │ list_files_and_dirs │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ # Penguins data analysis… │ 2023-10-01 16:40:36.201490+00:00 │ read_file │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ The README file contains… │ 2023-10-01 16:40:43.166277+00:00 │ ~ │\n└────────────────────────────┴────────────────────────────┴───────────┴───────────────────────────┴──────────────────────────────────┴─────────────────────┘",
"crumbs": [
"Tutorial: Python"
]
},
{
- "objectID": "tutorials/python.html#prerequisites",
- "href": "tutorials/python.html#prerequisites",
+ "objectID": "tutorials/python.html#understanding-and-accessing-bot-objects",
+ "href": "tutorials/python.html#understanding-and-accessing-bot-objects",
"title": "Tutorial: Python",
"section": "",
- "text": "Install Ibis Birdbrain",
+ "text": "You can ask the bot what tools it has access to and you should get an accurate answer, but you can also directly check. The Bot object is a very thin wrapper around a Marvin AI Application. You can access the Bot.ai attribute to get the underlying AIApplication object.\n[ins] In [1]: print(bot.ai.tools)\n[\n Tool(\n name='open_browser',\n description='Opens the URL in a web browser.',\n fn=<function open_browser at 0x11f22fc40>\n ),\n Tool(\n name='search_internet',\n description='Searches the internet for the given query.',\n fn=<function search_internet at 0x104890400>\n ),\n Tool(\n name='webpage_to_str',\n description='Reads a webpage link into a string. Useful for summarizing\nwebpages.',\n fn=<function webpage_to_str at 0x11f22fba0>\n ),\n Tool(\n name='summarize_text',\n description='Summarizes text',\n fn=<function summarize_text at 0x11f22fe20>\n ),\n Tool(\n name='translate_text',\n description='Translates text',\n fn=<function translate_text at 0x11f22fec0>\n ),\n Tool(\n name='read_file',\n description='Reads a file and returns its content.',\n fn=<function read_file at 0x11f25d260>\n ),\n Tool(\n name='list_files_and_dirs',\n description='Lists all files and directories in a directory.',\n fn=<function list_files_and_dirs at 0x11f25d440>\n ),\n Tool(\n name='write_file',\n description='Writes a file.',\n fn=<function write_file at 0x11f25d4e0>\n ),\n Tool(\n name='text_to_python',\n description='Returns a Python code given text.',\n fn=<function text_to_python at 0x11f22e3e0>\n ),\n Tool(\n name='fix_python_error',\n description='Fixes a Python error in the code.',\n fn=<function fix_python_error at 0x11f25cb80>\n ),\n Tool(\n name='run_python_code',\n description='Execute Python code as a string and return the output',\n fn=<function run_python_code at 0x11f25ce00>\n ),\n Tool(\n name='list_tables',\n description='Returns a list of available tables to query',\n fn=<function list_tables at 0x12a16e840>\n ),\n Tool(\n name='query_table',\n description='Queries the table in the database to answer the question',\n fn=<function query_table at 0x12a16e980>\n ),\n Tool(\n name='get_table_schema',\n description='Returns the schema of a table',\n fn=<function get_table_schema at 0x12a16e5c0>\n ),\n Tool(\n name='read_delta_table',\n description='Reads a Delta Lake table directory from the full filepath\\n\\n\nfilepath should be of the form: <path>/<table_name>\\n ',\n fn=<function read_delta_table at 0x12a16e7a0>\n ),\n Tool(\n name='read_excel_file',\n description='Reads an Excel file from the full filepath\\n\\n filepath\nshould be of the form: <path>/<table_name>.<extension>\\n ',\n fn=<function read_excel_file at 0x12a16e8e0>\n ),\n Tool(\n name='read_files_and_summarize',\n description='Read a list of files and summarize them.',\n fn=<function read_files_and_summarize at 0x12a16eb60>\n ),\n Tool(\n name='read_webpage_and_summarize',\n description='Read a webpage and summarize it.',\n fn=<function read_webpage_and_summarize at 0x12a16ea20>\n )\n]",
"crumbs": [
"Tutorial: Python"
]
},
{
- "objectID": "tutorials/python.html#overview",
- "href": "tutorials/python.html#overview",
+ "objectID": "tutorials/python.html#next-steps",
+ "href": "tutorials/python.html#next-steps",
"title": "Tutorial: Python",
"section": "",
- "text": "You can use Ibis Birdbrain as a Python library, but it is often used through an interactive Python session. You can see how to start one with the birdbrain CLI with Ibis Birdbrain ready to use.\nIf you use the CLI, you’ll have access to the ai/bot/birdbrain object, which is an instance of the ibis_birdbrain.ai.Bot class. This bot has access to numerous tools and can be used to interact with data via Ibis.",
+ "text": "Explore some data with Ibis Birdbrain and let us know how it goes!",
"crumbs": [
"Tutorial: Python"
]
},
{
- "objectID": "tutorials/python.html#getting-help",
- "href": "tutorials/python.html#getting-help",
- "title": "Tutorial: Python",
+ "objectID": "concepts/user-interfaces.html",
+ "href": "concepts/user-interfaces.html",
+ "title": "User interfaces",
+ "section": "",
+ "text": "A user interface (UI) in our context is the mechanism by which a human interacts with software.\n\n\nA command-line interface (CLI) is a UI for interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines). CLIs are great for automating tasks and often user-friendly, with a short learning curve (if designed well).\nCLIs are great for containing common data analysis tasks expressed in a single, configurable command. However, remembering the exact syntax of a command can be difficult, especially if the command is not used frequently.\n\n\n\nPython can be an interactive UI in a terminal, notebook, IDE, or other setup to serve as a user-interface for data analytics.\n\n\n\nIbis Birdbrain aims to implement a language user interface (LUI) for data analytics, providing additional AI assistance in both a CLI and Python interface. This enables users to interact with a bot through natural language that can perform tasks on their behalf.\n\n\n\n\n\n\nWarning\n\n\n\nThis LUI paradigm is highly experimental and should be put in production (automated) with caution.",
+ "crumbs": [
+ "User interfaces"
+ ]
+ },
+ {
+ "objectID": "concepts/user-interfaces.html#command-line-interface",
+ "href": "concepts/user-interfaces.html#command-line-interface",
+ "title": "User interfaces",
+ "section": "",
+ "text": "A command-line interface (CLI) is a UI for interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines). CLIs are great for automating tasks and often user-friendly, with a short learning curve (if designed well).\nCLIs are great for containing common data analysis tasks expressed in a single, configurable command. However, remembering the exact syntax of a command can be difficult, especially if the command is not used frequently.",
+ "crumbs": [
+ "User interfaces"
+ ]
+ },
+ {
+ "objectID": "concepts/user-interfaces.html#python",
+ "href": "concepts/user-interfaces.html#python",
+ "title": "User interfaces",
+ "section": "",
+ "text": "Python can be an interactive UI in a terminal, notebook, IDE, or other setup to serve as a user-interface for data analytics.",
+ "crumbs": [
+ "User interfaces"
+ ]
+ },
+ {
+ "objectID": "concepts/user-interfaces.html#language-user-interface",
+ "href": "concepts/user-interfaces.html#language-user-interface",
+ "title": "User interfaces",
+ "section": "",
+ "text": "Ibis Birdbrain aims to implement a language user interface (LUI) for data analytics, providing additional AI assistance in both a CLI and Python interface. This enables users to interact with a bot through natural language that can perform tasks on their behalf.\n\n\n\n\n\n\nWarning\n\n\n\nThis LUI paradigm is highly experimental and should be put in production (automated) with caution.",
+ "crumbs": [
+ "User interfaces"
+ ]
+ },
+ {
+ "objectID": "concepts/platforms.html",
+ "href": "concepts/platforms.html",
+ "title": "Data and AI platforms",
+ "section": "",
+ "text": "Data and AI platforms are…",
+ "crumbs": [
+ "Data and AI platforms"
+ ]
+ },
+ {
+ "objectID": "why.html",
+ "href": "why.html",
+ "title": "Why Ibis Birdbrain?",
+ "section": "",
+ "text": "Ibis is the portable Python dataframe library.\nIbis Birdbrain is the portable Python AI-powered data bot, built on Ibis with support for all data platforms Ibis’s 18+ backends support. Ibis Birdbrain is also built on Marvin with support for all AI platforms Marvin supports.\n\n\n\n\nWhen things don’t work as they should, it often means that standards are absent.\n- The International Organization for Standardization (ISO)\n\n- The Composable Codex\n\n\n\n\nComposable data systems are MICE: modular, interoperable, customizable, and extensible.\nIbis Birdbrain aims to bring these values to a data + AI framework that makes working with LLMs delightful and easy, for individual hobbiests or the more advanced technology organizations.",
+ "crumbs": [
+ "Why Ibis Birdbrain?"
+ ]
+ },
+ {
+ "objectID": "why.html#standards",
+ "href": "why.html#standards",
+ "title": "Why Ibis Birdbrain?",
+ "section": "",
+ "text": "When things don’t work as they should, it often means that standards are absent.\n- The International Organization for Standardization (ISO)\n\n- The Composable Codex",
+ "crumbs": [
+ "Why Ibis Birdbrain?"
+ ]
+ },
+ {
+ "objectID": "why.html#composable-data-systems-are-mice",
+ "href": "why.html#composable-data-systems-are-mice",
+ "title": "Why Ibis Birdbrain?",
+ "section": "",
+ "text": "Composable data systems are MICE: modular, interoperable, customizable, and extensible.\nIbis Birdbrain aims to bring these values to a data + AI framework that makes working with LLMs delightful and easy, for individual hobbiests or the more advanced technology organizations.",
+ "crumbs": [
+ "Why Ibis Birdbrain?"
+ ]
+ },
+ {
+ "objectID": "concepts/ops.html",
+ "href": "concepts/ops.html",
+ "title": "LLMOps, MLOps, DevOps",
+ "section": "",
+ "text": "The ML landscape is plagued by “the toy problem” – building something cool is easy, putting something in production is hard.",
+ "crumbs": [
+ "LLMOps, MLOps, DevOps"
+ ]
+ },
+ {
+ "objectID": "concepts/llms.html",
+ "href": "concepts/llms.html",
+ "title": "Large language models",
+ "section": "",
+ "text": "Large language models (LLMs) represent decades of research and development of neural networks. While relatively impressive LLMs have been around for years, recent innovations have made it possible to create instruction-following, conversational bots that can perform tasks on behalf of the user.\nWe are primarily concerned with applying LLMs to data, but we’ll take a brief look at how they work and why we should use them.\n\n\nAn artificial neural network (ANN or often just NN) is a computational model that is loosely inspired by the biological neural networks in the brain. It is a collection of connected nodes, called neurons, that are organized into layers. Each neuron is connected to other neurons in the network, and each connection has a weight associated with it. The weights are adjusted during training to improve the model’s performance.\nAn instance of a neural network (and many other ML architectures) is called a model. A model has usually been trained on data to learn to represent a system. While they are amny machine learning model architectures and training algorithms, the fundamental innovation of (large/deep) neural networks is the ability to represent an arbitrary system.\n\n\n\nA large language model is a neural network trained on vast amounts of text data.\n\n\n\nA LLM takes text as input and produces text as output.\n\n\n\nText in, text out. Neural networks and LLMs by design are non-determinstic. Though there are many tricks and workarounds, relying on LLMs for determinstic behavior is a bad idea. Instead, LLMs are great for:\n\ntext-based ML tasks (like classification, clustering)\ntext-based entity extraction (named entity regognition)\ntext-based generation (like summarization, translation, and question answering)\nother text-based tasks\n\nLLMs today are decent, but flawed, at generating programming code (as text). We can again use clever tricks and program around the non-determinstic behavior (such as running code, checking for any errors, and making one or more attempts via LLM to the errors). Fundamentally, keep in mind that an input to LLM is always text and an output is always text.\n\n\n\nSome limitations include:\n\ncost\nlatency\naccuracy\n…",
+ "crumbs": [
+ "Large language models"
+ ]
+ },
+ {
+ "objectID": "concepts/llms.html#what-is-an-artificial-neural-network",
+ "href": "concepts/llms.html#what-is-an-artificial-neural-network",
+ "title": "Large language models",
"section": "",
- "text": "You can ask the bot for help with its usage:\nbot(\"how do I use you?\")\n\nbirdbrain:\nI’m birdbrain, your assistant for data analysis projects using Ibis, AI, and many other data tools. Here’s how you can interact with me:\n\nAsk Questions: You can ask me anything related to Ibis, AI, and data analysis. For example, you can ask about how to use certain features in Ibis, how to code in Python, or ask for help in debugging your code.\nRun Analysis: You can also ask me to run data analysis for you. You can give me instructions to query data, analyze it, and even generate visualizations.\nLearning Resources: If you’re looking for resources to learn more about Ibis, AI, and data analysis, I can provide you with relevant links to documentation, tutorials, and more.\nFile Operations: I can help with file operations such as reading a file, writing to a file, listing files in a directory, etc.\nTranslation & Summarization: If you have a large piece of text or a document that you want to summarize, or if you want to translate text from one language to another, I can help with that too.\n\nRemember, I can only provide accurate information and perform tasks based on the data I have access to. I’m here to assist you. Let’s explore the world of data together!",
+ "text": "An artificial neural network (ANN or often just NN) is a computational model that is loosely inspired by the biological neural networks in the brain. It is a collection of connected nodes, called neurons, that are organized into layers. Each neuron is connected to other neurons in the network, and each connection has a weight associated with it. The weights are adjusted during training to improve the model’s performance.\nAn instance of a neural network (and many other ML architectures) is called a model. A model has usually been trained on data to learn to represent a system. While they are amny machine learning model architectures and training algorithms, the fundamental innovation of (large/deep) neural networks is the ability to represent an arbitrary system.",
"crumbs": [
- "Tutorial: Python"
+ "Large language models"
]
},
{
- "objectID": "tutorials/python.html#using-a-tool",
- "href": "tutorials/python.html#using-a-tool",
- "title": "Tutorial: Python",
+ "objectID": "concepts/llms.html#what-is-a-large-language-model",
+ "href": "concepts/llms.html#what-is-a-large-language-model",
+ "title": "Large language models",
"section": "",
- "text": "Interact via natural language to have the bot use tools on your behalf.\n\n\nIbis Birdbrain can read data files (limited to Delta Lake tables and Excel files, currently), registering them as tables in your Ibis backend. If you connect to a data platform with existing tables via Ibis, you will have access to them through the bot.\nIn this example, we’ll read in a local penguins.xlsx file and ask the bot about it:\nbot(\"read in the penguins excel\")\n\nbirdbrain:\nI’ve successfully read in the penguins Excel file. Here’s a preview of the data:\n\n\n\n\n\n\n\n\n\n\n\n\n\nspecies\nisland\nbill_length_mm\nbill_depth_mm\nflipper_length_mm\nbody_mass_g\nsex\nyear\n\n\n\n\nAdelie\nTorgersen\n39.1\n18.7\n181\n3750\nmale\n2007\n\n\nAdelie\nTorgersen\n39.5\n17.4\n186\n3800\nfemale\n2007\n\n\nAdelie\nTorgersen\n40.3\n18.0\n195\n3250\nfemale\n2007\n\n\nAdelie\nTorgersen\nNaN\nNaN\n-\n-\n-\n2007\n\n\nAdelie\nTorgersen\n36.7\n19.3\n193\n3450\nfemale\n2007\n\n\n\n(and so on…) ) Note: Null values are represented as NaN (Not a Number) or ‘-’ for numerical columns, and as NULL for string columns.\n\nbot(\"what tables do you have access to? give an overview of the data\")\n\nbirdbrain:\nCurrently, I have access to one table named ‘penguins’.\nHere’s an overview of the data in it:\n\nspecies: This is a string column that gives the species of the penguin.\nisland: This is a string column that mentions the island the penguin is from.\nbill_length_mm: This is a float64 column that specifies the length of the penguin’s bill in millimeters. 4. bill_depth_mm: This is a float64 column that specifies the depth of the penguin’s bill in millimeters.\nflipper_length_mm: This is an int64 column that provides the length of the penguin’s flipper in millimeters.\nbody_mass_g: This is an int64 column that gives the body mass of the penguin in grams.\nsex: This is a string column that mentions the sex of the penguin.\nyear: This is an int64 column that gives the year of observation.\n\nPlease note that a string field can contain text, a float64 field can contain any real number (including decimal values), and an int64 field can contain any integer.\n\n\n\n\nbot(\"what is this dataset? search the internet and find out more about its origin for me\")\n\nbirdbrain:\nThe penguins dataset, a possible alternative to the classic Iris dataset, contains data on three species of penguins from three islands in the Palmer Archipelago, Antarctica [1][2][3].\nThe data comprises measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex [3]. It provides a good mix of numeric and categorical data for various data exploration and machine learning tasks.\nThe dataset was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER) [2].\nThe purpose of the Palmer LTER is to study the marine ecosystem in the western Antarctic Peninsula region as a microcosm of global ocean ecosystems, focusing on the processes that link the extent of annual sea ice to the biological dynamics of different trophic levels [2].\nSources: [1] Penguin dataset : The new Iris | Kaggle [2] (https://allisonhorst.github.io/palmerpenguins/) [3] Penguins Dataset Overview — iris alternative | by AbdulMajedRaja RS | Towards Data Science\n\n\n\n\nbot(\"read in the readme; I don't think it contians much, draft a better version for me based on this session\")\n\nbirdbrain:\nThe ‘penguins’ project now has a new and improved README:\n\nThis repository contains an exploration and analysis of the penguins dataset. The data was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER).\nThe penguins dataset provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica. It includes measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex. The dataset presents a rich mix of numeric and categorical data, making it an excellent resource for various data exploration and machine learning tasks.\n\nThe data for this project is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python. The data is then available for analysis and visualization.\n\nDetailed analysis on this data is yet to be performed. The objective is to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\n\nContributions are welcome. Please read the CONTRIBUTING.md for details.\n\nThis project is licensed under the terms of the MIT License.\nThis summary gives a good overview of the data involved in the project, the purpose of the project, and how others can contribute.\n\nbot(\"only capitalize the first word in headings; remove the contributing/license stuff, this is buried deep in a repo. simplify and just include basic info, then write the file out for me\")\n\nbirdbrain:\nHere’s a simplified version of the README:\n\nThis project involves an exploration and analysis of the penguins dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\n\nThe penguins dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\n\nThe project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.\nI’ll now write this content to the README file.\n\nbot(\"write away!\")\n\nbirdbrain:\nI’ve successfully written the updated README to the file ./README.md.\n\ncat README.md\n\n\nThis project involves an exploration and analysis of the penguins dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\n\nThe penguins dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\n\nThe project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.",
+ "text": "A large language model is a neural network trained on vast amounts of text data.",
"crumbs": [
- "Tutorial: Python"
+ "Large language models"
]
},
{
- "objectID": "tutorials/python.html#saving-and-loading-conversation-history",
- "href": "tutorials/python.html#saving-and-loading-conversation-history",
- "title": "Tutorial: Python",
+ "objectID": "concepts/llms.html#what-are-the-inputs-and-outputs",
+ "href": "concepts/llms.html#what-are-the-inputs-and-outputs",
+ "title": "Large language models",
"section": "",
- "text": "From a new session:\n[ins] In [1]: bot(\"read in the readme\")\n\nbirdbrain:\nThe README file contains information about a data analysis project involving the penguins dataset. This dataset, collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), includes details on three species of penguins from three islands in the Palmer Archipelago, Antarctica.\nThe dataset contains measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.\nThe project aims to use this data to derive meaningful insights about the penguin species, their physical characteristics, and their distribution across the islands.\n\n[ins] In [2]: bot.save_history()\n\n[ins] In [3]: bb_con = ibis.connect(\"duckdb://birdbrain.ddb\")\n\n[ins] In [4]: bb_con.list_tables()\nOut[4]: ['history']\n\n[ins] In [5]: t = bb_con.table(\"history\")\n\n[ins] In [6]: t\nOut[6]:\n┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓\n┃ convo_id ┃ convo_timestamp ┃ bot_name ┃ message_content ┃ message_timestamp ┃ function_name ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩\n│ string │ timestamp │ string │ string │ timestamp('UTC') │ string │\n├────────────────────────────┼────────────────────────────┼───────────┼───────────────────────────┼──────────────────────────────────┼─────────────────────┤\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ read in the readme │ 2023-10-01 16:40:31.961856+00:00 │ ~ │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ ['./requirements.txt', '… │ 2023-10-01 16:40:34.391470+00:00 │ list_files_and_dirs │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ # Penguins data analysis… │ 2023-10-01 16:40:36.201490+00:00 │ read_file │\n│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ The README file contains… │ 2023-10-01 16:40:43.166277+00:00 │ ~ │\n└────────────────────────────┴────────────────────────────┴───────────┴───────────────────────────┴──────────────────────────────────┴─────────────────────┘",
+ "text": "A LLM takes text as input and produces text as output.",
"crumbs": [
- "Tutorial: Python"
+ "Large language models"
]
},
{
- "objectID": "tutorials/python.html#understanding-and-accessing-bot-objects",
- "href": "tutorials/python.html#understanding-and-accessing-bot-objects",
- "title": "Tutorial: Python",
+ "objectID": "concepts/llms.html#what-do-llms-work-well-for",
+ "href": "concepts/llms.html#what-do-llms-work-well-for",
+ "title": "Large language models",
"section": "",
- "text": "You can ask the bot what tools it has access to and you should get an accurate answer, but you can also directly check. The Bot object is a very thin wrapper around a Marvin AI Application. You can access the Bot.ai attribute to get the underlying AIApplication object.\n[ins] In [1]: print(bot.ai.tools)\n[\n Tool(\n name='open_browser',\n description='Opens the URL in a web browser.',\n fn=<function open_browser at 0x11f22fc40>\n ),\n Tool(\n name='search_internet',\n description='Searches the internet for the given query.',\n fn=<function search_internet at 0x104890400>\n ),\n Tool(\n name='webpage_to_str',\n description='Reads a webpage link into a string. Useful for summarizing\nwebpages.',\n fn=<function webpage_to_str at 0x11f22fba0>\n ),\n Tool(\n name='summarize_text',\n description='Summarizes text',\n fn=<function summarize_text at 0x11f22fe20>\n ),\n Tool(\n name='translate_text',\n description='Translates text',\n fn=<function translate_text at 0x11f22fec0>\n ),\n Tool(\n name='read_file',\n description='Reads a file and returns its content.',\n fn=<function read_file at 0x11f25d260>\n ),\n Tool(\n name='list_files_and_dirs',\n description='Lists all files and directories in a directory.',\n fn=<function list_files_and_dirs at 0x11f25d440>\n ),\n Tool(\n name='write_file',\n description='Writes a file.',\n fn=<function write_file at 0x11f25d4e0>\n ),\n Tool(\n name='text_to_python',\n description='Returns a Python code given text.',\n fn=<function text_to_python at 0x11f22e3e0>\n ),\n Tool(\n name='fix_python_error',\n description='Fixes a Python error in the code.',\n fn=<function fix_python_error at 0x11f25cb80>\n ),\n Tool(\n name='run_python_code',\n description='Execute Python code as a string and return the output',\n fn=<function run_python_code at 0x11f25ce00>\n ),\n Tool(\n name='list_tables',\n description='Returns a list of available tables to query',\n fn=<function list_tables at 0x12a16e840>\n ),\n Tool(\n name='query_table',\n description='Queries the table in the database to answer the question',\n fn=<function query_table at 0x12a16e980>\n ),\n Tool(\n name='get_table_schema',\n description='Returns the schema of a table',\n fn=<function get_table_schema at 0x12a16e5c0>\n ),\n Tool(\n name='read_delta_table',\n description='Reads a Delta Lake table directory from the full filepath\\n\\n\nfilepath should be of the form: <path>/<table_name>\\n ',\n fn=<function read_delta_table at 0x12a16e7a0>\n ),\n Tool(\n name='read_excel_file',\n description='Reads an Excel file from the full filepath\\n\\n filepath\nshould be of the form: <path>/<table_name>.<extension>\\n ',\n fn=<function read_excel_file at 0x12a16e8e0>\n ),\n Tool(\n name='read_files_and_summarize',\n description='Read a list of files and summarize them.',\n fn=<function read_files_and_summarize at 0x12a16eb60>\n ),\n Tool(\n name='read_webpage_and_summarize',\n description='Read a webpage and summarize it.',\n fn=<function read_webpage_and_summarize at 0x12a16ea20>\n )\n]",
+ "text": "Text in, text out. Neural networks and LLMs by design are non-determinstic. Though there are many tricks and workarounds, relying on LLMs for determinstic behavior is a bad idea. Instead, LLMs are great for:\n\ntext-based ML tasks (like classification, clustering)\ntext-based entity extraction (named entity regognition)\ntext-based generation (like summarization, translation, and question answering)\nother text-based tasks\n\nLLMs today are decent, but flawed, at generating programming code (as text). We can again use clever tricks and program around the non-determinstic behavior (such as running code, checking for any errors, and making one or more attempts via LLM to the errors). Fundamentally, keep in mind that an input to LLM is always text and an output is always text.",
"crumbs": [
- "Tutorial: Python"
+ "Large language models"
]
},
{
- "objectID": "tutorials/python.html#next-steps",
- "href": "tutorials/python.html#next-steps",
- "title": "Tutorial: Python",
+ "objectID": "concepts/llms.html#what-are-the-limitations-and-considerations",
+ "href": "concepts/llms.html#what-are-the-limitations-and-considerations",
+ "title": "Large language models",
"section": "",
- "text": "Explore some data with Ibis Birdbrain and let us know how it goes!",
+ "text": "Some limitations include:\n\ncost\nlatency\naccuracy\n…",
"crumbs": [
- "Tutorial: Python"
+ "Large language models"
]
},
{
@@ -430,116 +463,72 @@
"text": "Why Ibis Birdbrain?\n \n \n \n Installation\n \n \n \n Tutorial: getting started\n \n \n \n GitHub\n \n \n \n Chat\n \n \n \n RSS\n \n\n\n\n\n\n\n\n\n\nWarning\n\n\n\nIbis Birdbrain is highly experimental and currently under construction by slow humans. Please check back soon!\n\n\n\n\n\n\n Back to top"
},
{
- "objectID": "posts/llms-and-data-pt3/index.html",
- "href": "posts/llms-and-data-pt3/index.html",
- "title": "Composable data + AI systems",
+ "objectID": "tutorials/cli.html",
+ "href": "tutorials/cli.html",
+ "title": "Tutorial: CLI",
"section": "",
- "text": "When writing (data and AI) code, it’s important to use the right abstraction for your job to be done. A modular and composable data and AI system is easier than ever with the new generation of LLM-powered tools, but lacks a standard, modular, interoperable, customizable, and extensible framework. Can Ibis Birdbrain be that standard?"
+ "text": "Install Ibis Birdbrain\n\n\n\n\nWith Ibis Birdbrain installed, you can use the birdbrain command-line interface (CLI):\n$ birdbrain\n Usage: birdbrain [OPTIONS] COMMAND [ARGS]...\n\n╭─ Options ──────────────────────────────────────────────────────────────────────╮\n│ --version Show version. │\n│ --install-completion Install completion for the current shell. │\n│ --show-completion Show completion for the current shell, to copy │\n│ it or customize the installation. │\n│ --help Show this message and exit. │\n╰────────────────────────────────────────────────────────────────────────────────╯\n╭─ Commands ─────────────────────────────────────────────────────────────────────╮\n│ ipy ipy │\n│ test test │\n╰────────────────────────────────────────────────────────────────────────────────╯\n\n\n\nYou can use the ipy subcommand to start an interactive Python session with Ibis Birdbrain ready to use:\n$ birdbrain ipy\naccess to: birdbrain\nmodel: azure_openai/gpt-4-32k\nPython 3.11.5 (main, Sep 14 2023, 13:17:51) [Clang 14.0.3 (clang-1403.0.22.14.1)]\nType 'copyright', 'credits' or 'license' for more information\nIPython 8.16.0 -- An enhanced Interactive Python. Type '?' for help.\n\n[ins] In [1]: birdbrain\nOut[1]: <Bot: birdbrain>\n\n[ins] In [2]:\n\n\n\nLearn how to work with Ibis Birdbrain in Python.",
+ "crumbs": [
+ "Tutorial: CLI"
+ ]
},
{
- "objectID": "posts/llms-and-data-pt3/index.html#introduction",
- "href": "posts/llms-and-data-pt3/index.html#introduction",
- "title": "Composable data + AI systems",
+ "objectID": "tutorials/cli.html#prerequisites",
+ "href": "tutorials/cli.html#prerequisites",
+ "title": "Tutorial: CLI",
"section": "",
- "text": "When writing (data and AI) code, it’s important to use the right abstraction for your job to be done. A modular and composable data and AI system is easier than ever with the new generation of LLM-powered tools, but lacks a standard, modular, interoperable, customizable, and extensible framework. Can Ibis Birdbrain be that standard?"
- },
- {
- "objectID": "posts/llms-and-data-pt3/index.html#standards",
- "href": "posts/llms-and-data-pt3/index.html#standards",
- "title": "Composable data + AI systems",
- "section": "Standards",
- "text": "Standards\n\n\nWhen things don’t work as they should, it often means that standards are absent.\n- The International Organization for Standardization (ISO)\n\n- The Composable Codex\n\nThe standard that has emerged from the recent LLM applications is the power of a conversation in interacting with a computer. If we can give that interaction tht ability to interact with data platforms, we can achieve a lot through natural language."
- },
- {
- "objectID": "posts/llms-and-data-pt3/index.html#composable-data-and-ai-systems-are-mice",
- "href": "posts/llms-and-data-pt3/index.html#composable-data-and-ai-systems-are-mice",
- "title": "Composable data + AI systems",
- "section": "Composable data (and AI) systems are MICE",
- "text": "Composable data (and AI) systems are MICE\nComposable data systems are MICE: modular, interoperable, customizable, and extensible.\nWith any new tech, there’s a lot of duplication of effort and reinvention of wheels. Ibis Birdbrain aims to be an open-source standard for building data and AI systems in Python that adheres to MICE best practices and works with any data and AI platform. It achieves this by building on top of Ibis and Marvin, handling calls to the data and AI platforms respectively. Ibis and Marvin are themselves built on many other open-source projects, but provide the right level of abstraction for building composable data and AI systems, independently. Ibis Birdbrain glues them together enables a natural language interface that’s interoperable between 18+ data platforms and any AI platform Marvin supports.\nglue that enables working via natural language with any data platform Ibis supports and any AI platform Marvin supports.\n\n\n\n\n\n\nImportant\n\n\n\nIbis Birdbrain is very early in development and looking for community input and contributors. We believe it’s important for there to be a standard, and I can see Ibis Birdbrain going in several directions:\n\nA useful template for building a language user interface with data tools\nA standard for building data and AI systems with MICE components\n\nIf you’re interested in either of these directions, please get involved with the project!\nIbis Birdbrain is under Ibis project governance."
- },
- {
- "objectID": "posts/llms-and-data-pt3/index.html#looking-ahead",
- "href": "posts/llms-and-data-pt3/index.html#looking-ahead",
- "title": "Composable data + AI systems",
- "section": "Looking ahead",
- "text": "Looking ahead\nI am not a software engineer, I just saw a cool application of Ibis and Marvin. The objective of releasing Ibis Birdbrain as an open-source project under Ibis governance is to enable the community to build a standard for composable data and AI systems. My hope is for the Ibis, Marvin, and broader data/AI Python communities to build the standard in the open, and properietary applications on top of those standards. This is important for a composable data ecosystem that can be modular, interoperable, customizable, and extensible."
- },
- {
- "objectID": "posts/llms-and-data-pt3/index.html#how-is-ibis-birdbrain-mice-now-whats-next",
- "href": "posts/llms-and-data-pt3/index.html#how-is-ibis-birdbrain-mice-now-whats-next",
- "title": "Composable data + AI systems",
- "section": "How is Ibis Birdbrain MICE now? What’s next?",
- "text": "How is Ibis Birdbrain MICE now? What’s next?"
- },
- {
- "objectID": "posts/llms-and-data-pt3/index.html#next-steps",
- "href": "posts/llms-and-data-pt3/index.html#next-steps",
- "title": "Composable data + AI systems",
- "section": "Next steps",
- "text": "Next steps\nYou can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data.\nRead the next post in this series, building Ibis Birdbrain."
+ "text": "Install Ibis Birdbrain",
+ "crumbs": [
+ "Tutorial: CLI"
+ ]
},
{
- "objectID": "posts/llms-and-data-pt4/index.html",
- "href": "posts/llms-and-data-pt4/index.html",
- "title": "Auxiliary tools",
+ "objectID": "tutorials/cli.html#overview",
+ "href": "tutorials/cli.html#overview",
+ "title": "Tutorial: CLI",
"section": "",
- "text": "As a product manager, I don’t spend most of my time managing products. I suspect most data developers (analysts, engineers, scientists, etc.) don’t spend most of their time writing data code. There are many auxiliary tasks that are required in doing many technical jobs. These include:\n\nsearching the Internet for information\nreading, summarizing, and synthesizing information\nperforming boring computer tasks\ntranslating between different languages (e.g. SQL and Python; English and Spanish)\ncopying and modifying existing code\nquerying some basic informatin from data platforms\n\nWhat if we could, through natural language, have a bot perform many of these tasks (in addition to basic data analysis) on our behalf?"
+ "text": "With Ibis Birdbrain installed, you can use the birdbrain command-line interface (CLI):\n$ birdbrain\n Usage: birdbrain [OPTIONS] COMMAND [ARGS]...\n\n╭─ Options ──────────────────────────────────────────────────────────────────────╮\n│ --version Show version. │\n│ --install-completion Install completion for the current shell. │\n│ --show-completion Show completion for the current shell, to copy │\n│ it or customize the installation. │\n│ --help Show this message and exit. │\n╰────────────────────────────────────────────────────────────────────────────────╯\n╭─ Commands ─────────────────────────────────────────────────────────────────────╮\n│ ipy ipy │\n│ test test │\n╰────────────────────────────────────────────────────────────────────────────────╯",
+ "crumbs": [
+ "Tutorial: CLI"
+ ]
},
{
- "objectID": "posts/llms-and-data-pt4/index.html#introduction",
- "href": "posts/llms-and-data-pt4/index.html#introduction",
- "title": "Auxiliary tools",
+ "objectID": "tutorials/cli.html#starting-an-interactive-python-session",
+ "href": "tutorials/cli.html#starting-an-interactive-python-session",
+ "title": "Tutorial: CLI",
"section": "",
- "text": "As a product manager, I don’t spend most of my time managing products. I suspect most data developers (analysts, engineers, scientists, etc.) don’t spend most of their time writing data code. There are many auxiliary tasks that are required in doing many technical jobs. These include:\n\nsearching the Internet for information\nreading, summarizing, and synthesizing information\nperforming boring computer tasks\ntranslating between different languages (e.g. SQL and Python; English and Spanish)\ncopying and modifying existing code\nquerying some basic informatin from data platforms\n\nWhat if we could, through natural language, have a bot perform many of these tasks (in addition to basic data analysis) on our behalf?"
- },
- {
- "objectID": "posts/llms-and-data-pt4/index.html#were-using-python-lets-use-python",
- "href": "posts/llms-and-data-pt4/index.html#were-using-python-lets-use-python",
- "title": "Auxiliary tools",
- "section": "We’re using Python, let’s use Python",
- "text": "We’re using Python, let’s use Python\nWe’re already using Python for Ibis and Marvin. Let’s use it for auxillary tools. We’ll setup our data and AI platform connections and some simple example data to work with.\n\n\nCode\n1import ibis\nimport marvin\n\nfrom dotenv import load_dotenv\n\n2load_dotenv()\n\n3con = ibis.connect(\"duckdb://penguins.ddb\")\nt = ibis.examples.penguins.fetch()\nt = con.create_table(\"penguins\", t.to_pyarrow(), overwrite=True)\n\n\n\n1\n\nImport the libraries we need.\n\n2\n\nLoad the environment variable to setup Marvin to call our OpenAI account.\n\n3\n\nSetup the demo datain an Ibis backend.\n\n\n\n\n\n1import ibis\nimport marvin\n\nfrom ibis.expr.schema import Schema\nfrom ibis.expr.types.relations import Table\n\n2ibis.options.interactive = True\nmarvin.settings.llm_model = \"openai/gpt-4\"\n\n3con = ibis.connect(\"duckdb://penguins.ddb\")\nt = con.table(\"penguins\")\n\n\n1\n\nImport Ibis and Marvin.\n\n2\n\nConfigure Ibis (interactive) and Marvin (GPT-4).\n\n3\n\nConnect to the data and load a table into a variable."
- },
- {
- "objectID": "posts/llms-and-data-pt4/index.html#filesystem-tools",
- "href": "posts/llms-and-data-pt4/index.html#filesystem-tools",
- "title": "Auxiliary tools",
- "section": "Filesystem tools",
- "text": "Filesystem tools"
- },
- {
- "objectID": "posts/llms-and-data-pt4/index.html#internet-tools",
- "href": "posts/llms-and-data-pt4/index.html#internet-tools",
- "title": "Auxiliary tools",
- "section": "Internet tools",
- "text": "Internet tools"
- },
- {
- "objectID": "posts/llms-and-data-pt4/index.html#ai-powered-tools",
- "href": "posts/llms-and-data-pt4/index.html#ai-powered-tools",
- "title": "Auxiliary tools",
- "section": "AI-powered tools",
- "text": "AI-powered tools"
+ "text": "You can use the ipy subcommand to start an interactive Python session with Ibis Birdbrain ready to use:\n$ birdbrain ipy\naccess to: birdbrain\nmodel: azure_openai/gpt-4-32k\nPython 3.11.5 (main, Sep 14 2023, 13:17:51) [Clang 14.0.3 (clang-1403.0.22.14.1)]\nType 'copyright', 'credits' or 'license' for more information\nIPython 8.16.0 -- An enhanced Interactive Python. Type '?' for help.\n\n[ins] In [1]: birdbrain\nOut[1]: <Bot: birdbrain>\n\n[ins] In [2]:",
+ "crumbs": [
+ "Tutorial: CLI"
+ ]
},
{
- "objectID": "posts/llms-and-data-pt4/index.html#introducing-ibis-birdbrain",
- "href": "posts/llms-and-data-pt4/index.html#introducing-ibis-birdbrain",
- "title": "Auxiliary tools",
- "section": "Introducing Ibis Birdbrain",
- "text": "Introducing Ibis Birdbrain\nIntroduce the bot. Need to overview the tools here I think, but should probably skip most details."
+ "objectID": "tutorials/cli.html#next-steps",
+ "href": "tutorials/cli.html#next-steps",
+ "title": "Tutorial: CLI",
+ "section": "",
+ "text": "Learn how to work with Ibis Birdbrain in Python.",
+ "crumbs": [
+ "Tutorial: CLI"
+ ]
},
{
- "objectID": "posts/llms-and-data-pt4/index.html#a-comparison-with-mlops",
- "href": "posts/llms-and-data-pt4/index.html#a-comparison-with-mlops",
- "title": "Auxiliary tools",
- "section": "A comparison with MLOps",
- "text": "A comparison with MLOps\nTODO: point on how most of the work is not ML\nBefore “MLOps” was a standard term, the Sculley et al paper from 2015 described the key issues with building real-world ML systems.\n\n\n\nFigure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surround infrastructure is vast and complex.\n\n\nYou can substitute “LLM” for “ML” in the above figure.\n\nThe “toy problem” problem\nML and LLMs are cool! They’re fun to play with and it’s easy to get distracted with fun applications. Often, ML is learned through solving toy problems, and …\n\n\nThe application landscape is vast\n…and thus requires modular, interoperable, customizable, and extensible tools. TODO: more comparison to MLOps."
+ "objectID": "how-to/input-output/basics.html",
+ "href": "how-to/input-output/basics.html",
+ "title": "Basic input/output",
+ "section": "",
+ "text": "You can…",
+ "crumbs": [
+ "Input Output",
+ "Basic input/output"
+ ]
},
{
- "objectID": "posts/llms-and-data-pt4/index.html#next-steps",
- "href": "posts/llms-and-data-pt4/index.html#next-steps",
- "title": "Auxiliary tools",
- "section": "Next steps",
- "text": "Next steps\nYou can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data."
+ "objectID": "posts.html",
+ "href": "posts.html",
+ "title": "Posts",
+ "section": "",
+ "text": "Order By\n Default\n \n Title\n \n \n Date - Oldest\n \n \n Date - Newest\n \n \n Author\n \n \n \n \n \n \n \n\n\n\n\n\n\n\n\nAuxiliary tools\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 16, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComposable data + AI systems\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 15, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComputations and control flow: it’s just programming\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 14, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nThree approaches\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 13, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nAn introduction to Marvin and Ibis\n\n\n\nLLMs and data\n\n\n\n\n\n\n\nCody Peterson\n\n\nOct 12, 2023\n\n\n\n\n\n\n\n\nNo matching items\n\n Back to top"
},
{
"objectID": "posts/llms-and-data-pt2/index.html",
@@ -598,41 +587,52 @@
"text": "Next steps\nYou can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data.\nRead the next post in this series."
},
{
- "objectID": "demo.html",
- "href": "demo.html",
- "title": "demo",
+ "objectID": "posts/llms-and-data-pt0/index.html",
+ "href": "posts/llms-and-data-pt0/index.html",
+ "title": "An introduction to Marvin and Ibis",
"section": "",
- "text": "Back to top"
+ "text": "In this “LLMs and data” series, we’ll explore how to apply large-language models (LLMs) to data analytics. We’ll walk through the steps to build Ibis Birdbrain.\nThroughout the series, we’ll be using Marvin and Ibis. A brief introduction to each is provided below."
},
{
- "objectID": "how-to/input-output/basics.html",
- "href": "how-to/input-output/basics.html",
- "title": "Basic input/output",
+ "objectID": "posts/llms-and-data-pt0/index.html#introduction",
+ "href": "posts/llms-and-data-pt0/index.html#introduction",
+ "title": "An introduction to Marvin and Ibis",
"section": "",
- "text": "You can…",
- "crumbs": [
- "Input Output",
- "Basic input/output"
- ]
+ "text": "In this “LLMs and data” series, we’ll explore how to apply large-language models (LLMs) to data analytics. We’ll walk through the steps to build Ibis Birdbrain.\nThroughout the series, we’ll be using Marvin and Ibis. A brief introduction to each is provided below."
},
{
- "objectID": "concepts/ops.html",
- "href": "concepts/ops.html",
- "title": "LLMOps, MLOps, DevOps",
- "section": "",
- "text": "The ML landscape is plagued by “the toy problem” – building something cool is easy, putting something in production is hard.",
- "crumbs": [
- "LLMOps, MLOps, DevOps"
- ]
+ "objectID": "posts/llms-and-data-pt0/index.html#marvin",
+ "href": "posts/llms-and-data-pt0/index.html#marvin",
+ "title": "An introduction to Marvin and Ibis",
+ "section": "Marvin",
+ "text": "Marvin\nMarvin is an AI engineering framework that makes it easy to build up to an interactive conversational application.\nMarvin makes calls to an AI platform. You typically use an API key set as an environment variable – in this case, we’ll load a .env file that contians secrets for the AI platform that Marvin will use. We also set the large language model model.\n\n1import marvin\n\nfrom rich import print\nfrom time import sleep\nfrom dotenv import load_dotenv\n\n2load_dotenv()\n\n# increase accuracy\n3marvin.settings.llm_model = \"openai/gpt-4\"\n# decrease cost\n# marvin.settings.llm_model = \"openai/gpt-3.5-turbo\"\n\n4test_str = \"working with data and LLMs on 18+ data platforms is easy!\"\ntest_str\n\n\n1\n\nImport the libraries we need.\n\n2\n\nLoad the environment variable to setup Marvin to call our OpenAI account.\n\n3\n\nConfigure the LLM model to use.\n\n4\n\nSome text to test on\n\n\n\n\n'working with data and LLMs on 18+ data platforms is easy!'\n\n\n\nFunctions\nAI functions are one of the building blocks in Marvin and allow yout to specify a typed python function with no code – only a docstring – to achieve a wide variety of tasks.\nWe’ll demonstrate this with an AI function that trnaslates text:\n\n@marvin.ai_fn\ndef translate(text: str, from_: str = \"English\", to: str = \"Spanish\") -> str:\n \"\"\"translates the text\"\"\"\n\ntranslate(test_str)\n\n'trabajar con datos y LLMs en más de 18 plataformas de datos es fácil!'\n\n\n\n\nCode\n1sleep(1)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\ntranslate(translate(test_str), from_=\"Spanish\", to=\"English\")\n\n'Working with data and LLMs on more than 18 data platforms is easy!'\n\n\n\n\nCode\n1sleep(3)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\n\nModels\nAI models are another building block for generating python classes from input text. It’s a great way to build structured data from unstructured data that can be customized for your needs.\nWe’ll demosntrate this with an AI model that translates text:\n\nfrom pydantic import BaseModel, Field\n\n# decrease cost\nmarvin.settings.llm_model = \"openai/gpt-3.5-turbo\"\n\n@marvin.ai_model\nclass ExtractParts(BaseModel):\n \"\"\"Extracts parts of a sentence\"\"\"\n subject: str = Field(..., description=\"The subject of the sentence.\")\n objects: list[str] = Field(..., description=\"The objects of the sentence.\")\n predicate: str = Field(..., description=\"The predicate of the sentence.\")\n modifiers: list[str] = Field(..., description=\"The modifiers of the sentence.\")\n\nExtractParts(test_str)\n\nExtractParts(subject='working', objects=['data', 'LLMs'], predicate='is', modifiers=['on 18+ data platforms', 'easy'])\n\n\n\n\nCode\n1sleep(1)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\n\nClassifiers\nAI classifiers are another building block for generating python classes from input text. It’s the most efficient (time and cost) method for applying LLMs as it only results in a single output token, selecting an output in a specified Enum.\nWe’ll demonstrate this by classifying the language of some text:\n\nfrom enum import Enum\n\n# increase accuracy\nmarvin.settings.llm_model = \"openai/gpt-4\"\n\n@marvin.ai_classifier\nclass IdentifyLanguage(Enum):\n \"\"\"Identifies the language of the text\"\"\"\n\n english = \"English\"\n spanish = \"Spanish\"\n\n\nIdentifyLanguage(test_str).value\n\n'English'\n\n\n\n\nCode\n1sleep(1)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\nIdentifyLanguage(translate(test_str)).value\n\n'Spanish'\n\n\n\n\nCode\n1sleep(3)\n\n\n\n1\n\nAvoid rate-limiting by waiting."
},
{
- "objectID": "concepts/platforms.html",
- "href": "concepts/platforms.html",
- "title": "Data and AI platforms",
+ "objectID": "posts/llms-and-data-pt0/index.html#ibis",
+ "href": "posts/llms-and-data-pt0/index.html#ibis",
+ "title": "An introduction to Marvin and Ibis",
+ "section": "Ibis",
+ "text": "Ibis\nIbis is the portable Python dataframe library that enables Ibis Birdbrain to work on many data platforms at native scale.\nIbis makes calls to a data platform, providing an API but pushing the compute to (local or remote) query engines and storage. DuckDB is the default and we’ll typically use it for demo puroses. You can work with an in-memory instance, but we’ll often create a database file from example data:\n\n1import ibis\n\n2con = ibis.connect(\"duckdb://penguins.ddb\")\nt = ibis.examples.penguins.fetch()\nt = con.create_table(\"penguins\", t.to_pyarrow(), overwrite=True)\n\n\n1\n\nImport the libraries we need.\n\n2\n\nSetup the demo datain an Ibis backend.\n\n\n\n\nYou will typically connect to an existing data platform via your corresponding Ibis backend and have access to a number of tables:\n\n1import ibis\n\n2ibis.options.interactive = True\n\n3con = ibis.connect(\"duckdb://penguins.ddb\")\nt = con.table(\"penguins\")\n\n\n1\n\nImport Ibis.\n\n2\n\nConfigure Ibis (interactive).\n\n3\n\nConnect to the data and load a table into a variable.\n\n\n\n\n\nBackend\nA backend provides the connection and basic management of the data platform. Above, we created the con variable that is an instance of a DuckDB backend:\n\ncon\n\n<ibis.backends.duckdb.Backend at 0x16a17af10>\n\n\nIt usually contains some tables:\n\ncon.list_tables()\n\n['penguins']\n\n\nWe can access some internals of Ibis to see what backends are available:\n\n\n\n\n\n\nTip\n\n\n\nDon’t rely on accessing internals of Ibis in production.\n\n\n\nbackends = [entrypoint.name for entrypoint in ibis.util.backend_entry_points()]\nbackends\n\n['bigquery',\n 'clickhouse',\n 'dask',\n 'datafusion',\n 'druid',\n 'duckdb',\n 'flink',\n 'impala',\n 'mssql',\n 'mysql',\n 'oracle',\n 'pandas',\n 'polars',\n 'postgres',\n 'pyspark',\n 'snowflake',\n 'sqlite',\n 'trino']\n\n\n\n\nTable\nYou typically work with a table, conventionally named t for demo or exploratory purposes:\n\nt\n\n┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │\n│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │\n│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │\n│ Adelie │ Torgersen │ nan │ nan │ NULL │ NULL │ NULL │ 2007 │\n│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │\n│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │\n│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │\n│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │\n│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │\n│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │\n│ … │ … │ … │ … │ … │ … │ … │ … │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n\n\n\nWhen working with many tables, you should name them descriptively.\n\n\nSchema\nA table has a schema that Ibis maps to the data platform’s data types:\n\nt.schema()\n\nibis.Schema {\n species string\n island string\n bill_length_mm float64\n bill_depth_mm float64\n flipper_length_mm int64\n body_mass_g int64\n sex string\n year int64\n}"
+ },
+ {
+ "objectID": "posts/llms-and-data-pt0/index.html#llms-and-data-marvin-and-ibis",
+ "href": "posts/llms-and-data-pt0/index.html#llms-and-data-marvin-and-ibis",
+ "title": "An introduction to Marvin and Ibis",
+ "section": "LLMs and data: Marvin and Ibis",
+ "text": "LLMs and data: Marvin and Ibis\nYou can use Marvin and Ibis together to easily apply LLMs to data.\n\nfrom ibis.expr.schema import Schema\nfrom ibis.expr.types.relations import Table\n\n@marvin.ai_fn\ndef sql_select(\n text: str, table_name: str = t.get_name(), schema: Schema = t.schema()\n) -> str:\n \"\"\"writes the SQL SELECT statement to query the table according to the text\"\"\"\n\n\nquery = \"the unique combination of species and islands\"\nsql = sql_select(query).strip(\";\")\nsql\n\n'SELECT DISTINCT species, island FROM penguins'\n\n\n\nt.sql(sql)\n\n┏━━━━━━━━━━━┳━━━━━━━━━━━┓\n┃ species ┃ island ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━┩\n│ string │ string │\n├───────────┼───────────┤\n│ Adelie │ Torgersen │\n│ Adelie │ Biscoe │\n│ Adelie │ Dream │\n│ Gentoo │ Biscoe │\n│ Chinstrap │ Dream │\n└───────────┴───────────┘\n\n\n\n\n\nCode\n1sleep(3)\n\n\n\n1\n\nAvoid rate-limiting by waiting.\n\n\n\n\n\nt.sql(sql_select(query + \" and include their counts in from highest to lowest\").strip(\";\"))\n\n┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓\n┃ species ┃ island ┃ count ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩\n│ string │ string │ int64 │\n├───────────┼───────────┼───────┤\n│ Gentoo │ Biscoe │ 124 │\n│ Chinstrap │ Dream │ 68 │\n│ Adelie │ Dream │ 56 │\n│ Adelie │ Torgersen │ 52 │\n│ Adelie │ Biscoe │ 44 │\n└───────────┴───────────┴───────┘"
+ },
+ {
+ "objectID": "posts/llms-and-data-pt0/index.html#next-steps",
+ "href": "posts/llms-and-data-pt0/index.html#next-steps",
+ "title": "An introduction to Marvin and Ibis",
+ "section": "Next steps",
+ "text": "Next steps\nYou can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data.\nRead the next post in this series."
+ },
+ {
+ "objectID": "release_notes.html",
+ "href": "release_notes.html",
+ "title": "Release notes",
"section": "",
- "text": "Data and AI platforms are…",
- "crumbs": [
- "Data and AI platforms"
- ]
+ "text": "Release notes\nTBD\n\n\n\n\n Back to top"
}
]
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index 6b3e1a6..55403b6 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -1,79 +1,79 @@
- https://ibis-project.github.io/concepts/user-interfaces.html
- 2024-02-13T02:03:21.812Z
+ https://ibis-project.github.io/install.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/concepts/llms.html
- 2024-02-13T02:03:21.812Z
+ https://ibis-project.github.io/posts/llms-and-data-pt4/index.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/why.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/posts/llms-and-data-pt1/index.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/release_notes.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/posts/llms-and-data-pt3/index.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/posts.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/demo.html
+ 2024-02-13T02:09:26.039Z
- https://ibis-project.github.io/posts/llms-and-data-pt0/index.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/contribute/contributing.html
+ 2024-02-13T02:09:26.039Z
- https://ibis-project.github.io/posts/llms-and-data-pt1/index.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/tutorials/python.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/install.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/concepts/user-interfaces.html
+ 2024-02-13T02:09:26.039Z
- https://ibis-project.github.io/contribute/contributing.html
- 2024-02-13T02:03:21.812Z
+ https://ibis-project.github.io/concepts/platforms.html
+ 2024-02-13T02:09:26.039Z
- https://ibis-project.github.io/tutorials/cli.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/why.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/tutorials/python.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/concepts/ops.html
+ 2024-02-13T02:09:26.039Z
- https://ibis-project.github.io/index.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/concepts/llms.html
+ 2024-02-13T02:09:26.039Z
- https://ibis-project.github.io/posts/llms-and-data-pt3/index.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/index.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/posts/llms-and-data-pt4/index.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/tutorials/cli.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/posts/llms-and-data-pt2/index.html
- 2024-02-13T02:03:21.828Z
+ https://ibis-project.github.io/how-to/input-output/basics.html
+ 2024-02-13T02:09:26.039Z
- https://ibis-project.github.io/demo.html
- 2024-02-13T02:03:21.812Z
+ https://ibis-project.github.io/posts.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/how-to/input-output/basics.html
- 2024-02-13T02:03:21.812Z
+ https://ibis-project.github.io/posts/llms-and-data-pt2/index.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/concepts/ops.html
- 2024-02-13T02:03:21.812Z
+ https://ibis-project.github.io/posts/llms-and-data-pt0/index.html
+ 2024-02-13T02:09:26.055Z
- https://ibis-project.github.io/concepts/platforms.html
- 2024-02-13T02:03:21.812Z
+ https://ibis-project.github.io/release_notes.html
+ 2024-02-13T02:09:26.055Z