Learn how to integrate Azure OpenAI API with Azure SQL DB to create, store, and query embeddings for advanced similarity searches and LLM generation augmentation.
This Python notebook will teach you to:
- Create Embeddings: Generate embeddings from content using the Azure OpenAI API.
- Vector Database Utilization: Use Azure SQL DB to store embeddings and perform similarity searches.
- LLM Generation Augmentation: Enhance language model generation with embeddings from a vector database. In this case we use the embeddings to inform a GPT-4 chat model, enabling it to provide rich, context-aware answers about products based on past customer reviews.
We use the Fine Foods Review Dataset from Kaggle, which contains Amazon reviews of fine foods.
- For simplicity, this tutorial uses a smaller sample Fine Foods Review Dataset to demonstrate embedding generation.
- Alternatively, if you to wish bypass embedding generation and jump straight to similarity search in SQLDB. you can download the pre-generated FineFoodEmbeddings.csv
- Azure Subscription: Create one for free
- Azure SQL Database: Set up your database for free
- Azure Data Studio: Download here to manage your Azure SQL database and execute the notebook
- Azure OpenAI Access: Apply for access in the desired Azure subscription at https://aka.ms/oai/access
- Azure OpenAI Resource: Deploy an embeddings model (e.g.,
text-embedding-small
ortext-embedding-ada-002
) and aGPT-4
model for chat completion. Refer to the resource deployment guide - Python: Version 3.7.1 or later from Python.org.
- Python Libraries: Install the required libraries openai, num2words, matplotlib, plotly, scipy, scikit-learn, pandas, tiktoken, and pyodbc.
- Jupyter Notebooks: Use within Azure Data Studio or Visual Studio Code .
Code snippets are adapted from the Azure OpenAI Service embeddings Tutorial
- Database Setup: Execute SQL commands from the
createtable.sql
script to create the necessary table in your database. - Model Deployment: Deploy an embeddings model (
text-embedding-small
ortext-embedding-ada-002
) and aGPT-4
model for chat completion. Note the 2 model deployment names for later use. - Connection String: Find your Azure SQL DB connection string in the Azure portal under your database settings.
- Configuration: Populate the
.env
file with your SQL server connection details , Azure OpenAI key, and endpoint values.
You can retrieve the Azure OpenAI endpoint and key:
To execute the notebook, connect to your Azure SQL database using Azure Data Studio, which can be downloaded here.
Then open the notebook RetrievalAugmentedGeneration.ipynb