Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SQL backend to support SQLAlchemy 2.x #45

Merged
merged 3 commits into from
Apr 11, 2024
Merged

Update SQL backend to support SQLAlchemy 2.x #45

merged 3 commits into from
Apr 11, 2024

Conversation

davidmezzetti
Copy link
Contributor

Hi @j6k4m8

Once again thank you for this compelling series of graph-related libraries!

I am working to use grand as a NetworkX compatible backend with Postgres. The main issue I ran into is compatibility with SQLAlchemy 2.x. This PR adds support for that.

This PR also adds a few performance improvements.

  • New batch insert method for nodes and edges. A default implementation is provided for other backends.
  • Index on edges source and target. This helps with large graphs stored in the database.

A couple other performance improvements were possible using what's already provided in the library (i.e. caching). So overall, with these changes, I'm add to get solid performance with large graphs stored in Postgres.

Copy link
Member

@j6k4m8 j6k4m8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow @davidmezzetti this is really great stuff! Thank you for the great improvements here!

Running some basic workloads on this today and will merge asap!

grand/backends/_sqlbackend.py Show resolved Hide resolved
self._edge_table.create(self._engine, checkfirst=True)

# Create source and target index
sindex = Index("edge_source", source_column)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@j6k4m8
Copy link
Member

j6k4m8 commented Apr 11, 2024

https://github.com/aplbrain/grand/actions/runs/8649244221/job/23717215080?pr=45#step:6:22

@davidmezzetti Some failing tests — looks like it's a matter of deduplicating edges?

@davidmezzetti
Copy link
Contributor Author

Let me take a look. I didn't see that tests were available. I'll reproduce locally and push a new commit with the fix.

@j6k4m8
Copy link
Member

j6k4m8 commented Apr 11, 2024

Awesome, let me know if you run into any trouble! SHOULD be as simple as running pytest (and it should know to run sql queries in-memory during testing, etc)

@davidmezzetti
Copy link
Contributor Author

Ok, I just checked in a change that fixes the unit tests. One of the errors didn't properly convert the logic to SQLAlchemy 2.x.

The other one is related to the new edge source and target indexes. Previously, when that index wasn't there, the unit tests used the default sort order of the table when pulling edges, which typically uses the primary key. An order_by clause was added to sort by the primary key.

Alternatively, the bfs_successors calls could have a sort_neighbors argument applied in the tests to guarantee a consistent ordering. This might be a better way to ensure the test always works regardless of the edge ordering.

@davidmezzetti
Copy link
Contributor Author

Looks like this failure is due to not being able to build networkit for 3.7. Perhaps 3.7 should be removed given it's long been EOL?

@j6k4m8
Copy link
Member

j6k4m8 commented Apr 11, 2024 via email

@davidmezzetti
Copy link
Contributor Author

Just made that change to the build script.

@j6k4m8
Copy link
Member

j6k4m8 commented Apr 11, 2024

Whew — thank you!! Looking good, good to merge once these pass?

@davidmezzetti
Copy link
Contributor Author

Good from my end.

I'm putting together an example project for txtai that shows how to store all components in Postgres. This will help with the Graph piece and it's really cool how this integrates with the previous work with openCypher!

@j6k4m8 j6k4m8 merged commit 046b95a into aplbrain:master Apr 11, 2024
5 checks passed
@davidmezzetti
Copy link
Contributor Author

Hi @j6k4m8 - I wanted to check and see if there were plans to push a new release anytime soon. I'm leaning towards integrating the database graph component as a core feature but would need this change to add the dependency.

@j6k4m8
Copy link
Member

j6k4m8 commented Apr 17, 2024

Sorry, have been traveling — will roll it today!

[edit] @davidmezzetti I just pushed it, thanks for the bump!

@davidmezzetti
Copy link
Contributor Author

Great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants