diff --git a/book.toml b/book.toml index d5c380c..74572ee 100644 --- a/book.toml +++ b/book.toml @@ -3,7 +3,7 @@ authors = ["Mateusz Charytoniuk"] language = "en" multilingual = false src = "src" -title = "LLMOps Handbook" +title = "LLMOps Handbook (work in progress)" [output.html] additional-js = [ diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 25fb9c4..0eade89 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -5,7 +5,7 @@ - [Contributing](./introduction/contributing.md) - [General Concepts]() - [Continuous Batching](./general-concepts/continuous-batching/README.md) - - [Embeddings]() + - [Embedding](./general-concepts/embedding/README.md) - [Input/Output](./general-concepts/input-output/README.md) - [Large Language Model](./general-concepts/large-language-model/README.md) - [Load Balancing](./general-concepts/load-balancing/README.md) @@ -13,6 +13,7 @@ - [Reverse Proxy]() - [Model Parameters]() - [Supervisor]() + - [Vector Database]() - [Deployments]() - [llama.cpp](./deployments/llama.cpp/README.md) - [Production Deployment]() diff --git a/src/fine-tuning/README.md b/src/fine-tuning/README.md index f0af33f..16bbbdf 100644 --- a/src/fine-tuning/README.md +++ b/src/fine-tuning/README.md @@ -1 +1,7 @@ # Fine-tuning + +Fine-tuning is taking a pre-trained model and further training it on a new task. This is typically useful when you want to repurpose a model trained on a large-scale dataset for a new task with less data available. + +In practice, that means fine-tuning allows the model to adapt to the new data without forgetting what it has learned before. + +A good example might be the [sqlcoder](https://github.com/defog-ai/sqlcoder) model, which is a fine-tuned [starcoder](https://github.com/bigcode-project/starcoder) model (which is a general coding model) to be exceptionally good at producing SQL. diff --git a/src/general-concepts/embedding/README.md b/src/general-concepts/embedding/README.md new file mode 100644 index 0000000..5d80093 --- /dev/null +++ b/src/general-concepts/embedding/README.md @@ -0,0 +1,19 @@ +# Embedding + +Formally, embedding represents a word (or a phrase) in a vector space. In this space, words with similar meanings are close to each other. + +For example, the words "dog" and "cat" might be close to each other in the vector space because they are both animals. + +## RGB Analogy + +Because embeddings can be vectors with 4096 or more dimensions, it might be hard to imagine them and get a good intuition on how they work in practice. + +A good analogy for getting an intuition about embeddings is to imagine them as points in 3D space first. + +Let's assume a color represented by RGB is our embedding. It is a 3D vector with 3 values: red, green, and blue representing 3 dimensions. Similar colors in that space are placed near each other. Red is close to orange, blue and green are close to teal, etc. + +Embeddings work similarly. Words and phrases are represented by vectors, and similar words are placed close to each other in the vector space. + +Searching through similar embeddings to a given one means we are looking for vectors that are placed close to the given embedding. + +![RGB Space](https://upload.wikimedia.org/wikipedia/commons/8/83/RGB_Cube_Show_lowgamma_cutout_b.png) diff --git a/src/retrieval-augmented-generation/README.md b/src/retrieval-augmented-generation/README.md index ea0ea96..1a8cf1b 100644 --- a/src/retrieval-augmented-generation/README.md +++ b/src/retrieval-augmented-generation/README.md @@ -1 +1,13 @@ # Retrieval Augmented Generation + +Retrieval Augmented Generation is a technique to improve the quality of the generated text. + +In practice, and in a significant simplification, RAG is about injecting data into [Large Language Model](/general-concepts/large-language-model) prompt. + +For example, let's say the user asks the LLM: +- `What are the latest articles on our website?` + +To augment the response, you need to intercept the user's question and tell LLM to respond in a way more or less like: +- `You are a . Tell the user that the latest articles on our site are ` + +That is greatly simplified, but generally, that is how it works. Along the way, [embeddings](/general-concepts/embeddings) and [vector databases](/general-concepts/vector-database) are involved.