Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LSH implementation for vector embedding indexing #568

Merged
merged 4 commits into from
Jan 9, 2025

Conversation

beidu555
Copy link
Contributor

@beidu555 beidu555 commented Jan 8, 2025

No description provided.

Signed-off-by: beidu555 <[email protected]>
<groupId>io.jhdf</groupId>
<artifactId>jhdf</artifactId>
<version>0.6.10</version>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test scope is required

@@ -89,6 +89,10 @@ public void initializeWriter() throws IOException {
assert tokioRuntimeBuilder != null;
assert ioConfigBuilder != null;

setOption("is_lsh","true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not set option by hard-coding, pass options by configuration

@@ -348,6 +367,21 @@ impl LakeSoulIOConfigBuilder {
self
}

// pub fn with_nbits(mut self,mut nbits:u64) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless code?

@@ -0,0 +1,7 @@
use arrow::array::BinaryArray;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless file?

@xuchen-plus xuchen-plus changed the title add lsh poc Add LSH implementation for vector embedding indexing Jan 8, 2025
@xuchen-plus xuchen-plus added the enhancement New feature or request label Jan 8, 2025
use arrow::compute::sort_to_indices;
use arrow::compute::SortOptions;

// #[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollback original test

@@ -123,7 +144,32 @@ impl SyncSendableMutableLakeSoulWriter {
// for ffi callers
pub fn write_batch(&mut self, record_batch: RecordBatch) -> Result<()> {
let runtime = self.runtime.clone();
runtime.block_on(async move { self.write_batch_async(record_batch, false).await })
if record_batch.num_rows() == 0{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimize the if condition statement

@Ceng23333 Ceng23333 merged commit 1d3ef1d into lakesoul-io:main Jan 9, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants