Skip to content

Commit

Permalink
embeddings, fine-tuning
Browse files Browse the repository at this point in the history
  • Loading branch information
mcharytoniuk committed Jul 4, 2024
1 parent c784944 commit 90c5451
Show file tree
Hide file tree
Showing 5 changed files with 40 additions and 2 deletions.
2 changes: 1 addition & 1 deletion book.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ authors = ["Mateusz Charytoniuk"]
language = "en"
multilingual = false
src = "src"
title = "LLMOps Handbook"
title = "LLMOps Handbook (work in progress)"

[output.html]
additional-js = [
Expand Down
3 changes: 2 additions & 1 deletion src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,15 @@
- [Contributing](./introduction/contributing.md)
- [General Concepts]()
- [Continuous Batching](./general-concepts/continuous-batching/README.md)
- [Embeddings]()
- [Embedding](./general-concepts/embedding/README.md)
- [Input/Output](./general-concepts/input-output/README.md)
- [Large Language Model](./general-concepts/large-language-model/README.md)
- [Load Balancing](./general-concepts/load-balancing/README.md)
- [Forward Proxy]()
- [Reverse Proxy]()
- [Model Parameters]()
- [Supervisor]()
- [Vector Database]()
- [Deployments]()
- [llama.cpp](./deployments/llama.cpp/README.md)
- [Production Deployment]()
Expand Down
6 changes: 6 additions & 0 deletions src/fine-tuning/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
# Fine-tuning

Fine-tuning is taking a pre-trained model and further training it on a new task. This is typically useful when you want to repurpose a model trained on a large-scale dataset for a new task with less data available.

In practice, that means fine-tuning allows the model to adapt to the new data without forgetting what it has learned before.

A good example might be the [sqlcoder](https://github.com/defog-ai/sqlcoder) model, which is a fine-tuned [starcoder](https://github.com/bigcode-project/starcoder) model (which is a general coding model) to be exceptionally good at producing SQL.
19 changes: 19 additions & 0 deletions src/general-concepts/embedding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Embedding

Formally, embedding represents a word (or a phrase) in a vector space. In this space, words with similar meanings are close to each other.

For example, the words "dog" and "cat" might be close to each other in the vector space because they are both animals.

## RGB Analogy

Because embeddings can be vectors with 4096 or more dimensions, it might be hard to imagine them and get a good intuition on how they work in practice.

A good analogy for getting an intuition about embeddings is to imagine them as points in 3D space first.

Let's assume a color represented by RGB is our embedding. It is a 3D vector with 3 values: red, green, and blue representing 3 dimensions. Similar colors in that space are placed near each other. Red is close to orange, blue and green are close to teal, etc.

Embeddings work similarly. Words and phrases are represented by vectors, and similar words are placed close to each other in the vector space.

Searching through similar embeddings to a given one means we are looking for vectors that are placed close to the given embedding.

![RGB Space](https://upload.wikimedia.org/wikipedia/commons/8/83/RGB_Cube_Show_lowgamma_cutout_b.png)
12 changes: 12 additions & 0 deletions src/retrieval-augmented-generation/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,13 @@
# Retrieval Augmented Generation

Retrieval Augmented Generation is a technique to improve the quality of the generated text.

In practice, and in a significant simplification, RAG is about injecting data into [Large Language Model](/general-concepts/large-language-model) prompt.

For example, let's say the user asks the LLM:
- `What are the latest articles on our website?`

To augment the response, you need to intercept the user's question and tell LLM to respond in a way more or less like:
- `You are a <inser persona here>. Tell the user that the latest articles on our site are <insert latest articles metadata here>`

That is greatly simplified, but generally, that is how it works. Along the way, [embeddings](/general-concepts/embeddings) and [vector databases](/general-concepts/vector-database) are involved.

0 comments on commit 90c5451

Please sign in to comment.