NP-Backend-Dockers
powers the backend infrastructure of an application designed for efficient indexing, analysis, and retrieval of textual data leveraging a Solr engine. It also seamlessly integrates additional services like topic-based inference and classification in a multi-container setup, enhancing user functionality.
This multi-container application is orchestrated using a docker-compose script, connecting all services through the np-net
network.
-
Clone the repository:
git clone https://github.com/nextprocurement/NP-Backend-Dockers.git
-
Initialize submodules:
git submodule init
-
Update content of submodules:
git submodule update
-
Create folder
data
and copy model information into it. It should looks as follows:An example of what should be in the
data
folder is available here. -
Create a network that you can use and replace the
ml4ds2_net
in thedocker-compose.yml
with the name of your new network:networks: np-net: name: ml4ds2_net external: true
-
Start the services:
docker-compose up -d
-
Check all the services are working:
docker ps
-
Check that the
NP-solr-dist-plugin
plugin have been mounted properly in Solr. For this, go to Solr (it should be available at http://your_server_name:8984/solr/#/ and create atest
collection from the following view using thenp_config
config set. If everything worked fine, delete the test collection.
If you encounter any problems, write an email to [email protected].
This RESTful API serves as an entry point for indexing and performing a series of queries to retrieve information from the Solr search engine. It essentially acts as a Python wrapper that encapsulates Solr's fundamental functionalities within a Flask-based framework.
It has dependencies on the np_solr
and np-tools
services and requires access to the following mounted volumes:
./data/source
./np_config
This service deploys an instance of the Solr search engine using the official Solr image from Docker Hub and relying on the zoo service. It mounts several volumes, including:
- The Solr data directory (
./db/data/solr:/var/solr
) for data persistence. - The custom Solr plugin
NP-solr-dist-plugin
, which provides a plugin for performing distance calculations within Solr efficiently. - The Solr configuration directory (
./solr_config:/opt/solr/server/solr
) to access specific Solr schemas for the NextProcurement project data.
This service is temporary and serves the sole purpose of initializing the mounted volume /db/data
with the necessary permissions required by Solr.
This service runs Zookeeper, which is essential for Solr to coordinate cluster nodes. It employs the official zookeeper image and mounts two volumes for data and logs.
This service handles Solr configuration. It is constructed using the Dockerfile located in the solr-config
directory. This service has dependencies on the Solr and zoo services and mounts the Docker socket and the bash_scripts
directory, which contains a script for initializing the Solr configuration for the NextProcuremetn proyect.
This service deploys a RESTful API with a series of auxiliary endpoints. Right now it contains enpoints to:
- Retrieve embeddings for a given document or word based on a Word2Vec (a precalculated Word2Vec model is assumed) or SBERT.
- Retrieve document-topic representation of a given document based on a trained topic model.
- Retrieve the lemmas of a given document.
It has the same mounted volumes as the np-solr-api
service.
Visit the Wiki page for instructions on indexing information into Solr, available queries, and their return formats.
Python requirements files available within each "service" folder.
Requirements are directly installed in their respective services at the building-up time.