pip install -r requirements.txt
export ES_CONNECTION_STRING=http://localhost:9200
export INDEX_NAME=${your_index_name}
Please refer to this readme.
Start document preparation microservice for Elasticsearch with below command.
python prepare_doc_elastic.py
Please refer to this readme.
export ES_CONNECTION_STRING=http://localhost:9200
export INDEX_NAME=${your_index_name}
cd GenAIComps
docker build -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
docker run --name="dataprep-elasticsearch" -p 6011:6011 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e ES_CONNECTION_STRING=$ES_CONNECTION_STRING -e INDEX_NAME=$INDEX_NAME -e TEI_ENDPOINT=$TEI_ENDPOINT -e DATAPREP_COMPONENT_NAME="OPEA_DATAPREP_ELASTICSEARCH" opea/dataprep:latest
cd comps/dataprep/deployment/docker_compose/
docker compose -f compose_elasticsearch.yaml up -d
Once document preparation microservice for Elasticsearch is started, user can use below command to invoke the microservice to convert the document to embedding and save to the database.
curl -X POST \
-H "Content-Type: application/json" \
-d '{"path":"/path/to/document"}' \
http://localhost:6011/v1/dataprep/ingest
To get uploaded file structures, use the following command:
curl -X POST \
-H "Content-Type: application/json" \
http://localhost:6011/v1/dataprep/get
Then you will get the response JSON like this:
[
{
"name": "uploaded_file_1.txt",
"id": "uploaded_file_1.txt",
"type": "File",
"parent": ""
},
{
"name": "uploaded_file_2.txt",
"id": "uploaded_file_2.txt",
"type": "File",
"parent": ""
}
]
To delete uploaded file/link, use the following command.
The file_path
here should be the id
get from /v1/dataprep/get
API.
# delete link
curl -X POST \
-H "Content-Type: application/json" \
-d '{"file_path": "https://www.ces.tech/.txt"}' \
http://localhost:6011/v1/dataprep/delete
# delete file
curl -X POST \
-H "Content-Type: application/json" \
-d '{"file_path": "uploaded_file_1.txt"}' \
http://localhost:6011/v1/dataprep/delete
# delete all files and links
curl -X POST \
-H "Content-Type: application/json" \
-d '{"file_path": "all"}' \
http://localhost:6011/v1/dataprep/delete