LLMs Training Datasets Manager is a web application for creating and managing training datasets with different formats for training Large Language Models (LLMs) or for using them in Retrieval Augmented Generation (RAG) systems.
- 🗂️ Creating datasets and adding instructions into it.
- ⚙️ Managing the instructions of the datasets (updating or deleting the instructions).
- 📑 Browse the instructions of datasets esaly in pagination model.
- 📥 Export datasets into your machine (Actually download datasets).
- 🤗 Huggingface integration via Huggingface OAuth flow. And the gained features will be 👇
- 📤 Linking datasets with Huggingface dataset repository and pushing local datasets to.
- 🆕 Creating new Huggingface dataset repository if needed
- 🖲️ Syncing local datasets with their linked repositories after updates.
- 🔌 Unlinking a local dataset with its linked repository (with options to delete the entire repository or just the dataset file)
- Bun ( The all in one JavaScript + TypeScript runtime ) 🔥
- Typescript
- Docker
- MongoDB (With Replication & Sharding for availability and scalability)
- Clerk (For authentication and managing users)
Before installing the application, make sure you have the following requirements:
- Bun (1.1.31 or later)
- Docker Engine or Docker Desktop
-
Clone the repository:
git clone https://github.com/AbdulrhmanGoni/LLMs-TDM-server.git cd LLMs-TDM-server
-
Install dependencies:
bun install
-
Set up the environment file:
Copy .env.example
to .env.development
, .env.test
and .env.production
files and modify the variables in these files according to your settings
cp .env.example .env.development
cp .env.example .env.test
cp .env.example .env.production
.env.development
file for development environment.env.test
file for tests correctly.env.production
file for production environment
Note: You can see more details about the environment variables inside
.env.example
file
- Start the server:
-
Starting the server in development environment:
bun dev
-
The server should now be available at http://localhost:9000
(or the port you specified).
I use Bun's built-in and Jest-compatible test runner for writing and running tests.
Note: Don't forget to set the environment variables of .env.test` file before running the tests
You can copy the following template for quick start 👇
NODE_ENV=test
PORT=9100
DB_NAME=test
DB_HOST=127.0.0.1 # Default host of testing database
DB_PORT=270111 # Default port of testing database
DB_URL="mongodb://$DB_HOST:$DB_PORT/$DB_NAME?directConnection=true"
TESTING_USER_ID="user_Xm3A5q9gd3ghR73oh975bA" # Random user id for test
MUTE_LOGS=true
To run all tests use the following command:
bun test
To run a specific type of tests (unit, integration or e2e) use the following command:
bun test tests/<unit|integration|e2e>
To run a specific tests file just use the name of the file:
bun test <test-file-name>