-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Vector Based Text2SQL Code and Approach (#14)
* Add deployment for text2sql index * Add entities for new version * Work in progress * Add deployment for text2sql index * Add entities for new version * Work in progress * refactor the location * Update envs * Update the envs * Update envs * Finish indexer building * Update the location of the dictionary * Update data dictionary * Update the plugin to load jsonl * Update content * Move to use a skillset for indexing * Update the scripts * Improve the readme * Update the naming * Fix bad replacement * Update the readmes * Update readme * Update the readme * Run the vector example * Update the env * Update the code * Update readme * Update main readme
- Loading branch information
1 parent
45f2023
commit f973c3e
Showing
31 changed files
with
2,395 additions
and
987 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
FunctionApp__ClientId=<clientId of the function app if using user assigned managed identity> | ||
IdentityType=<identityType> # system_assigned or user_assigned or key | ||
OpenAI__ApiKey=<openAIKey if using non managed identity> | ||
OpenAI__Endpoint=<openAIEndpoint> | ||
OpenAI__MultiModalDeployment=<openAIEmbeddingDeploymentId> | ||
OpenAI__ApiVersion=<openAIApiVersion> | ||
AIService__DocumentIntelligence__Endpoint=<documentIntelligenceEndpoint> | ||
AIService__DocumentIntelligence__Key=<documentIntelligenceKey if not using identity> | ||
AIService__Language__Endpoint=<languageEndpoint> | ||
AIService__Language__Key=<languageKey if not using identity> | ||
StorageAccount__Endpoint=<Endpoint if using identity based connections> | ||
StorageAccount__ConnectionString=<connectionString if using non managed identity> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
FunctionApp__Endpoint=<functionAppEndpoint> | ||
FunctionApp__Key=<functionAppKey> | ||
FunctionApp__PreEmbeddingCleaner__FunctionName=pre_embedding_cleaner | ||
FunctionApp__ADI__FunctionName=adi_2_ai_search | ||
FunctionApp__KeyPhraseExtractor__FunctionName=key_phrase_extractor | ||
FunctionApp__AppRegistrationResourceId=<App registration in form api://appRegistrationclientId if using identity based connections> | ||
IdentityType=<identityType> # system_assigned or user_assigned or key | ||
AIService__AzureSearchOptions__Endpoint=<searchServiceEndpoint> | ||
AIService__AzureSearchOptions__Identity__ClientId=<clientId if using user assigned identity> | ||
AIService__AzureSearchOptions__Key=<searchServiceKey if not using identity> | ||
AIService__AzureSearchOptions__UsePrivateEndpoint=<true/false> | ||
AIService__AzureSearchOptions__Identity__FQName=<fully qualified name of the identity if using user assigned identity> | ||
StorageAccount__FQEndpoint=<Fully qualified endpoint in form ResourceId=resourceId if using identity based connections> | ||
StorageAccount__ConnectionString=<connectionString if using non managed identity> | ||
StorageAccount__RagDocuments__Container=<containerName> | ||
StorageAccount__Text2Sql__Container=<containerName> | ||
OpenAI__ApiKey=<openAIKey if using non managed identity> | ||
OpenAI__Endpoint=<openAIEndpoint> | ||
OpenAI__EmbeddingModel=<openAIEmbeddingModelName> | ||
OpenAI__EmbeddingDeployment=<openAIEmbeddingDeploymentId> | ||
OpenAI__EmbeddingDimensions=1536 | ||
Text2Sql__DatabaseName=<databaseName> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,28 @@ | ||
# AI Search Indexing with Azure Document Intelligence - Pre-built Index Setup | ||
# AI Search Indexing Pre-built Index Setup | ||
|
||
The associated scripts in this portion of the repository contains pre-built scripts to deploy the skillset with Azure Document Intelligence. | ||
|
||
## Steps | ||
## Steps for Rag Documents Index Deployment | ||
|
||
1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication. | ||
2. Adjust `rag_documents.py` with any changes to the index / indexer. The `get_skills()` method implements the skills pipeline. Make any adjustments here in the skills needed to enrich the data source. | ||
3. Run `deploy.py` with the following args: | ||
|
||
- `indexer_type rag`. This selects the `rag_documents` sub class. | ||
- `indexer_type rag`. This selects the `RagDocumentsAISearch` sub class. | ||
- `enable_page_chunking True`. This determines whether page wise chunking is applied in ADI, or whether the inbuilt skill is used for TextSplit. **Page wise analysis in ADI is recommended to avoid splitting tables / figures across multiple chunks, when the chunking is performed.** | ||
- `rebuild`. Whether to delete and rebuild the index. | ||
- `suffix`. Optional parameter that will apply a suffix onto the deployed index and indexer. This is useful if you want deploy a test version, before overwriting the main version. | ||
|
||
## Steps for Text2SQL Index Deployment | ||
|
||
1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication. | ||
2. Adjust `text_2_sql.py` with any changes to the index / indexer. The `get_skills()` method implements the skills pipeline. Make any adjustments here in the skills needed to enrich the data source. | ||
3. Run `deploy.py` with the following args: | ||
|
||
- `indexer_type text_2_sql`. This selects the `Text2SQLAISearch` sub class. | ||
- `rebuild`. Whether to delete and rebuild the index. | ||
- `suffix`. Optional parameter that will apply a suffix onto the deployed index and indexer. This is useful if you want deploy a test version, before overwriting the main version. | ||
|
||
## ai_search.py & environment.py | ||
|
||
This includes a variety of helper files and scripts to deploy the index setup. This is useful for CI/CD to avoid having to write JSON files manually or use the UI to deploy the pipeline. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.