Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update nomic AI embeddings integration docs #25308

Merged
merged 4 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 184 additions & 51 deletions docs/docs/integrations/text_embedding/nomic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,121 +12,254 @@
},
{
"cell_type": "markdown",
"id": "e49f1e0d",
"id": "9a3d6f34",
"metadata": {},
"source": [
"# NomicEmbeddings\n",
"\n",
"This notebook covers how to get started with Nomic embedding models.\n",
"This will help you get started with Nomic embedding models using LangChain. For detailed documentation on `NomicEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html).\n",
"\n",
"## Installation"
"## Overview\n",
"### Integration details\n",
"\n",
"import { ItemTable } from \"@theme/FeatureTables\";\n",
"\n",
"<ItemTable category=\"text_embedding\" item=\"Nomic\" />\n",
"\n",
"## Setup\n",
"\n",
"To access Nomic embedding models you'll need to create a/an Nomic account, get an API key, and install the `langchain-nomic` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"Head to [https://atlas.nomic.ai/](https://atlas.nomic.ai/) to sign up to Nomic and generate an API key. Once you've done this set the `NOMIC_API_KEY` environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c3bef91",
"execution_count": 2,
"id": "36521c2a",
"metadata": {},
"outputs": [],
"source": [
"# install package\n",
"!pip install -U langchain-nomic"
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"NOMIC_API_KEY\"):\n",
" os.environ[\"NOMIC_API_KEY\"] = getpass.getpass(\"Enter your Nomic API key: \")"
]
},
{
"cell_type": "markdown",
"id": "2b4f3e15",
"id": "c84fb993",
"metadata": {},
"source": [
"## Environment Setup\n",
"\n",
"Make sure to set the following environment variables:\n",
"\n",
"- `NOMIC_API_KEY`\n",
"\n",
"## Usage"
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62e0dbc3",
"metadata": {
"tags": []
},
"execution_count": 3,
"id": "39a4953b",
"metadata": {},
"outputs": [],
"source": [
"from langchain_nomic.embeddings import NomicEmbeddings\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
]
},
{
"cell_type": "markdown",
"id": "d9664366",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"embeddings = NomicEmbeddings(model=\"nomic-embed-text-v1.5\")"
"The LangChain Nomic integration lives in the `langchain-nomic` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12fcfb4b",
"execution_count": 2,
"id": "64853226",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"embeddings.embed_query(\"My query to look up\")"
"%pip install -qU langchain-nomic"
]
},
{
"cell_type": "markdown",
"id": "45dd1724",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f2e6104",
"execution_count": 10,
"id": "9ea7a09b",
"metadata": {},
"outputs": [],
"source": [
"embeddings.embed_documents(\n",
" [\"This is a content of the document\", \"This is another document\"]\n",
"from langchain_nomic import NomicEmbeddings\n",
"\n",
"embeddings = NomicEmbeddings(\n",
" model=\"nomic-embed-text-v1.5\",\n",
" # dimensionality=256,\n",
" # Nomic's `nomic-embed-text-v1.5` model was [trained with Matryoshka learning](https://blog.nomic.ai/posts/nomic-embed-matryoshka)\n",
" # to enable variable-length embeddings with a single model.\n",
" # This means that you can specify the dimensionality of the embeddings at inference time.\n",
" # The model supports dimensionality from 64 to 768.\n",
" # inference_mode=\"remote\",\n",
" # One of `remote`, `local` (Embed4All), or `dynamic` (automatic). Defaults to `remote`.\n",
" # api_key=... , # if using remote inference,\n",
" # device=\"cpu\",\n",
" # The device to use for local embeddings. Choices include\n",
" # `cpu`, `gpu`, `nvidia`, `amd`, or a specific device name. See\n",
" # the docstring for `GPT4All.__init__` for more info. Typically\n",
" # defaults to CPU. Do not use on macOS.\n",
")"
]
},
{
"cell_type": "markdown",
"id": "77d271b6",
"metadata": {},
"source": [
"## Indexing and Retrieval\n",
"\n",
"Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n",
"\n",
"Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46739f68",
"execution_count": 5,
"id": "d817716b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'LangChain is the framework for building context-aware reasoning applications'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create a vector store with a sample text\n",
"from langchain_core.vectorstores import InMemoryVectorStore\n",
"\n",
"text = \"LangChain is the framework for building context-aware reasoning applications\"\n",
"\n",
"vectorstore = InMemoryVectorStore.from_texts(\n",
" [text],\n",
" embedding=embeddings,\n",
")\n",
"\n",
"# Use the vectorstore as a retriever\n",
"retriever = vectorstore.as_retriever()\n",
"\n",
"# Retrieve the most similar text\n",
"retrieved_documents = retriever.invoke(\"What is LangChain?\")\n",
"\n",
"# show the retrieved document's content\n",
"retrieved_documents[0].page_content"
]
},
{
"cell_type": "markdown",
"id": "e02b9855",
"metadata": {},
"outputs": [],
"source": [
"# async embed query\n",
"await embeddings.aembed_query(\"My query to look up\")"
"## Direct Usage\n",
"\n",
"Under the hood, the vectorstore and retriever implementations are calling `embeddings.embed_documents(...)` and `embeddings.embed_query(...)` to create embeddings for the text(s) used in `from_texts` and retrieval `invoke` operations, respectively.\n",
"\n",
"You can directly call these methods to get embeddings for your own use cases.\n",
"\n",
"### Embed single texts\n",
"\n",
"You can embed single texts or documents with `embed_query`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e48632ea",
"execution_count": 6,
"id": "0d2befcd",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.024642944, 0.029083252, -0.14013672, -0.09082031, 0.058898926, -0.07489014, -0.0138168335, 0.0037\n"
]
}
],
"source": [
"# async embed documents\n",
"await embeddings.aembed_documents(\n",
" [\"This is a content of the document\", \"This is another document\"]\n",
")"
"single_vector = embeddings.embed_query(text)\n",
"print(str(single_vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "markdown",
"id": "7a331dc3",
"id": "1b5a7d03",
"metadata": {},
"source": [
"### Custom Dimensionality\n",
"### Embed multiple texts\n",
"\n",
"Nomic's `nomic-embed-text-v1.5` model was [trained with Matryoshka learning](https://blog.nomic.ai/posts/nomic-embed-matryoshka) to enable variable-length embeddings with a single model. This means that you can specify the dimensionality of the embeddings at inference time. The model supports dimensionality from 64 to 768."
"You can embed multiple texts with `embed_documents`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "993f65c8",
"execution_count": 7,
"id": "2f4d6e97",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.012771606, 0.023727417, -0.12365723, -0.083740234, 0.06530762, -0.07110596, -0.021896362, -0.0068\n",
"[-0.019058228, 0.04058838, -0.15222168, -0.06842041, -0.012130737, -0.07128906, -0.04534912, 0.00522\n"
]
}
],
"source": [
"text2 = (\n",
" \"LangGraph is a library for building stateful, multi-actor applications with LLMs\"\n",
")\n",
"two_vectors = embeddings.embed_documents([text, text2])\n",
"for vector in two_vectors:\n",
" print(str(vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "markdown",
"id": "98785c12",
"metadata": {},
"outputs": [],
"source": [
"embeddings = NomicEmbeddings(model=\"nomic-embed-text-v1.5\", dimensionality=256)\n",
"## API Reference\n",
"\n",
"embeddings.embed_query(\"My query to look up\")"
"For detailed documentation on `NomicEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html).\n"
]
}
],
Expand All @@ -146,7 +279,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.9.6"
}
},
"nbformat": 4,
Expand Down
6 changes: 6 additions & 0 deletions docs/src/theme/FeatureTables.js
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,12 @@ const FEATURE_TABLES = {
package: "langchain-cohere",
apiLink: "https://api.python.langchain.com/en/latest/embeddings/langchain_cohere.embeddings.CohereEmbeddings.html#langchain_cohere.embeddings.CohereEmbeddings"
},
{
name: "Nomic",
link: "cohere",
package: "langchain-nomic",
apiLink: "https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html#langchain_nomic.embeddings.NomicEmbeddings"
},
]
},
document_retrievers: {
Expand Down
Loading