-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: moved link storage to a pure metadata-based method #650
base: main
Are you sure you want to change the base?
Conversation
@@ -482,39 +476,33 @@ def fetch_initial_candidates() -> None: | |||
# If the next nodes would not exceed the depth limit, find the | |||
# adjacent nodes. | |||
# | |||
# TODO: For a big performance win, we should track which tags we've | |||
# TODO: For a big performance win, we should track which links we've |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this may be a stale comment. Specifically, the difference_update
on line 487 seems to be doing this.
else: | ||
# don't add link search to original metadata dict | ||
metadata = metadata.copy() | ||
metadata[_metadata_s_link_key(link=outgoing_link)] = _metadata_s_link_value() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? It looks like it is doing equality. But, there may be multiple outgoing links, and we need to find the nodes with one of those as an incoming link. It seems like this is maybe going the wrong direction, and also likely missing the set-equality. Are we sure this works the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we add_nodes()
, we are storing all the incoming_links
for each chunk as dictionary keys in the metadata_s
column. There is an arbitrary, static value set with each key, so that it can be stored in the MAP<text,text>
type.
Here, when we search, we are finding all the chunks that have a matching outgoing-link key.
We are essentially doing a hybrid query on metadata keys... the values do not matter.
The code does a separate query for each outgoing link. This the same way the code in main
operates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. That's the part I missed. Could you add a comment to that effect where we store / query? It also seems like there is then a risk that the generated key collides with the user defined key? Is there a prefix or something we should tell people not to use in their metadata?
This makes CassandraGraphStore compatible with existing cassio.
Note:
traversal_search()
has been updated to use the new link storage.