Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastest way to create and store embeddings for data in a SQLite table #189

Open
punkish opened this issue Dec 26, 2024 · 2 comments
Open
Assignees
Labels
question Further information is requested stale

Comments

@punkish
Copy link

punkish commented Dec 26, 2024

I have a SQLite table like so (simplified schema below) with ~950K rows

CREATE TABLE t (id INTEGER PRIMARY KEY, fulltext TEXT);

What is the quickest and easiest way to generate embeddings and store them in a vectors table? I am using Ollama with llama 3.2 running locally using the nomic-embed-text embeddings model. I would like to do this once and then create a web interface to query the data. Additionally, as new rows get added to the table t, I would like to TRIGGER embeddings for them as well. Is that possible? If yes, any hint would be very welcome.

@punkish
Copy link
Author

punkish commented Dec 28, 2024

Is there a way to use transactions when inserting a lot of embeddings in a libSQL db? Right now I am using a pattern like so, and it is really slow (I have about a million documents for which I need to insert embeddings). I'd like to insert them in transactions of 5000 at a time

const rows = db.prepare(`SELECT fulltext FROM t LIMIT 5000`).all();

for (const row of result.rows) {
    await app.addLoader(new TextLoader({ text: row.fulltext }));
}

@adhityan adhityan added the question Further information is requested label Dec 30, 2024
Copy link

This issue is stale because it has been open for 14 days with no activity.

@github-actions github-actions bot added the stale label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

2 participants