Implementation of a Large Language Model based on the Llama 3 architecture, focusing on efficiency and scalability.
- Llama 3 Architecture: Implements key components of the Llama 3 architecture including RoPE (Rotary Position Embedding), RMSNorm, and Multi-Query Attention
- Mixed Precision Training: Utilizes PyTorch's automatic mixed precision for efficient training
- Flexible Data Processing: Supports large-scale dataset preparation with efficient memory mapping
- TensorBoard Integration: Built-in training monitoring and visualization
- Configurable Architecture: Easily adjustable model parameters including dimensions, layers, and attention heads
- Inference Pipeline: Ready-to-use inference system with temperature and top-k/p sampling controls
git clone https://github.com/[your-username]/ZeusLLM.git
cd ZeusLLM
pip install -r requirements.txt
torch>=2.0.0
transformers
numpy
tqdm
tensorboard
datasets
ZeusLLM/
├── llm.py # Core model architecture implementation
├── preparedata.py # Data preprocessing and tokenization
├── train.py # Training loop and utilities
├── inference.py # Inference and text generation
├── data/ # Directory for processed datasets
└── outputs/ # Model checkpoints and TensorBoard logs
Process and tokenize your training data:
python preparedata.py --model-name "meta-llama/Llama-3.1-8B" \
--max-seq-length 2048 \
--num-proc 8 \
--output-dir "data" \
--val-size 0.0005
Train the model:
python train.py
Key training features:
- Automatic mixed precision training
- TensorBoard logging
- Checkpoint saving
- Validation monitoring
Generate text using a trained model:
from inference import LLMInference
generator = LLMInference("outputs/best_model.pt")
response = generator.generate(
prompt="En un futuro lejano,",
max_new_tokens=20,
temperature=0.8,
top_p=0.95,
top_k=50,
repetition_penalty=1.1
)
print(response)
The model implements the following key components:
-
Attention Mechanism:
- Multi-query attention for efficient inference
- Configurable number of heads and KV heads
- Rotary Position Embeddings (RoPE)
-
Feed Forward Network:
- SwiGLU activation
- Configurable hidden dimensions
- Dropout for regularization
-
Normalization:
- RMSNorm for stable training
- Configurable epsilon parameter
ModelArgs(
dim=768, # Model dimension
n_layers=12, # Number of transformer layers
n_heads=12, # Number of attention heads
vocab_size=32000, # Vocabulary size
max_seq_len=2048, # Maximum sequence length
dropout=0.1 # Dropout rate
)
The training pipeline includes:
- Gradient scaling for mixed precision
- Configurable batch sizes and learning rates
- TensorBoard metrics tracking:
- Training/validation loss
- Learning rate schedules
- Model gradients
- Automatic model checkpointing
This project is licensed under the RiveraAICloseLicense - see the LICENSE file for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- The Llama team at Meta AI for the original architecture
- The PyTorch team for their excellent framework
- The Hugging Face team for their transformers library