-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/snowflake): support lineage via rename and swap using que… #11600
feat(ingest/snowflake): support lineage via rename and swap using que… #11600
Conversation
FROM deduplicated_queries q | ||
JOIN filtered_access_history a USING (query_id) | ||
-- This qualify ignores more than 1 rows from same query, such as for ddl swap | ||
QUALIFY | ||
ROW_NUMBER() OVER (PARTITION BY query_id ORDER BY start_time DESC) = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have the deduplicated_queries cte - should we push stuff down to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for after join with access_history. Table access_history may have more than one row for same query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are now ignoring duplicate swaps, we can in fact remove this qualify as it is fine to process same swap query multiple times.
metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py
Outdated
Show resolved
Hide resolved
"aspect": { | ||
"json": { | ||
"statement": { | ||
"value": "CREATE TABLE foo_staging AS\nSELECT\n f.a,\n d.b\nFROM foo AS f\nJOIN foo_dep AS d\n USING (c);\n\nCREATE TABLE foo_backup AS\nSELECT\n *\nFROM foo_staging", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably would be tricky to do, but it'd be super cool if the swap statement would show up in the composite query
@@ -185,6 +185,10 @@ class TableSwap: | |||
urn1: UrnStr | |||
urn2: UrnStr | |||
|
|||
def id(self) -> str: | |||
# TableSwap(A,B) is same as TableSwap(B,A) | |||
return str(hash(frozenset([self.urn1, self.urn2]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do dataclasses.dataclass(frozen=True)
to make it hashable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try that earlier but it does not ignore the order of urns. Having array instead of individual fields would work but then we would need to ensure somehow that array has only two entries. Also might make code less readable, so keeping this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at the tests super closely, but this seems reasonable
…ries v2
Checklist