You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to rewrite a DuckDB query to Ibis, and ran into this error:
BinderException: Binder Error: No function matches the given name and argument types 'array_cosine_distance(DOUBLE[], DOUBLE[])'. You might need to add explicit type casts.
Here is the function and the attempt at rewriting it:
importduckdbimportibisfromibisimport_fromsentence_transformersimportSentenceTransformerfromsentence_transformers.modelsimportStaticEmbeddingibis.options.interactive=Truestatic_embedding=StaticEmbedding.from_model2vec("minishlab/potion-base-8M")
model=SentenceTransformer(modules=[static_embedding])
defsimilarity_search_duckdb(
query: str,
k: int=5,
dataset_name: str="ai-blueprint/fineweb-bbc-news-embeddings",
embedding_column: str="embeddings",
):
query_vector=model.encode(query)
embedding_dim=model.get_sentence_embedding_dimension()
sql=f""" SELECT *, array_cosine_distance({embedding_column}::float[{embedding_dim}], {query_vector.tolist()}::float[{embedding_dim}] ) as distance FROM 'hf://datasets/{dataset_name}/**/*.parquet' ORDER BY distance LIMIT {k} """returnibis.memtable(duckdb.sql(sql).to_arrow_table())
t1=similarity_search_duckdb("What is the future of AI?")
print(t1)
@ibis.udf.scalar.builtindefarray_cosine_distance(x, y) ->float:
"""Compute cosine similarity between two vectors."""defsimilarity_search_ibis(
query: str="What is the future of AI?",
k: int=5,
dataset_name: str="ai-blueprint/fineweb-bbc-news-embeddings",
embedding_column: str="embeddings",
):
# Use same model as used for indexingquery_vector=model.encode(query)
embedding_dim=model.get_sentence_embedding_dimension()
return (
ibis.read_parquet(f"hf://datasets/{dataset_name}/**/*.parquet")
.mutate(
distance=array_cosine_distance(
_[embedding_column].cast("array<float64>"),
ibis.array(query_vector).cast("array<float64>"),
)
)
.order_by(_.distance.desc())
.limit(k)
.drop(embedding_column)
)
t2=similarity_search_ibis("What is the future of AI?")
print(t2)
This ensures DuckDB tables with fixed-length arrays can be used in Ibis, but from what I can tell Ibis treats this as a variable-length array. I haven't found a way to create a fixed-length array in a DuckDB table using Ibis. The array_cosine_distance function only supports fixed-length arrays.
What is the motivation behind your request?
I'm trying to use Ibis wherever I would previously have used SQL or Pandas, both as a learning exercise and because I'm hoping Ibis can become my default data manipulation library.
Describe the solution you'd like
I'd like some way to create fixed-length array columns in DuckDB using Ibis.
What version of ibis are you running?
9.5.0
What backend(s) are you using, if any?
duckdb 1.1.3
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem?
I was trying to rewrite a DuckDB query to Ibis, and ran into this error:
Here is the function and the attempt at rewriting it:
I found a related issue: #7963
This ensures DuckDB tables with fixed-length arrays can be used in Ibis, but from what I can tell Ibis treats this as a variable-length array. I haven't found a way to create a fixed-length array in a DuckDB table using Ibis. The array_cosine_distance function only supports fixed-length arrays.
What is the motivation behind your request?
I'm trying to use Ibis wherever I would previously have used SQL or Pandas, both as a learning exercise and because I'm hoping Ibis can become my default data manipulation library.
Describe the solution you'd like
I'd like some way to create fixed-length array columns in DuckDB using Ibis.
What version of ibis are you running?
9.5.0
What backend(s) are you using, if any?
duckdb 1.1.3
Code of Conduct
The text was updated successfully, but these errors were encountered: