Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion/tableau): restructure the tableau graphql datasource query #11230

Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
fcc2450
cursor based pagination
sid-acryl Aug 23, 2024
752cd95
lint fix
sid-acryl Aug 23, 2024
ff54319
error message
sid-acryl Aug 23, 2024
7213497
add a debug log
sid-acryl Aug 23, 2024
d75912c
add fetch size
sid-acryl Aug 24, 2024
1dff5d8
commented upstreamFields and upstreamColumns of published and embedde…
sid-acryl Aug 26, 2024
f182099
existing test-cases are working
sid-acryl Aug 27, 2024
0d2431c
fix test case
sid-acryl Aug 27, 2024
fe70845
fix unit test case
sid-acryl Aug 27, 2024
8cb1d03
Merge branch 'master' into cus-2491-tableau-ingestion-hitting-20000-n…
sid-acryl Aug 27, 2024
464bb43
doc updated
sid-acryl Aug 27, 2024
902aa3c
Merge branch 'cus-2491-tableau-ingestion-hitting-20000-node-limit' of…
sid-acryl Aug 27, 2024
2ec4ca4
Merge branch 'master' into cus-2491-tableau-ingestion-hitting-20000-n…
sid-acryl Aug 27, 2024
dd533b4
Merge branch 'master' into cus-2491-tableau-ingestion-hitting-20000-n…
sid-acryl Aug 28, 2024
127391d
fix the retries_remaining
sid-acryl Aug 28, 2024
b25c6f1
change parent container emit sequence
sid-acryl Aug 28, 2024
a093b15
log message
sid-acryl Aug 29, 2024
02d4655
update sequence
sid-acryl Aug 29, 2024
9cce6c0
Merge branch 'master' into cus-2491-tableau-ingestion-hitting-20000-n…
sid-acryl Aug 30, 2024
439a375
address review comments
sid-acryl Aug 30, 2024
29f9406
generate container for parent first
sid-acryl Sep 3, 2024
dc28a3e
doc updates
sid-acryl Sep 4, 2024
3fcedb7
revert the file
sid-acryl Sep 4, 2024
b8f4428
Merge branch 'master' into cus-2491-tableau-ingestion-hitting-20000-n…
sid-acryl Sep 5, 2024
c5afb8a
address review comments
sid-acryl Sep 5, 2024
1559493
Update metadata-ingestion/src/datahub/ingestion/source/tableau/tablea…
sid-acryl Sep 9, 2024
45a2e07
Merge branch 'master' into cus-2491-tableau-ingestion-hitting-20000-n…
sid-acryl Sep 9, 2024
07baf75
address review comments
sid-acryl Sep 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@
"snowflake-summary = datahub.ingestion.source.snowflake.snowflake_summary:SnowflakeSummarySource",
"snowflake-queries = datahub.ingestion.source.snowflake.snowflake_queries:SnowflakeQueriesSource",
"superset = datahub.ingestion.source.superset:SupersetSource",
"tableau = datahub.ingestion.source.tableau:TableauSource",
"tableau = datahub.ingestion.source.tableau.tableau:TableauSource",
"openapi = datahub.ingestion.source.openapi:OpenApiSource",
"metabase = datahub.ingestion.source.metabase:MetabaseSource",
"teradata = datahub.ingestion.source.sql.teradata:TeradataSource",
Expand Down
Empty file.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

import datahub.emitter.mce_builder as builder
from datahub.configuration.common import ConfigModel
from datahub.ingestion.source import tableau_constant as c
from datahub.ingestion.source.tableau import tableau_constant as c
from datahub.metadata.com.linkedin.pegasus2avro.dataset import (
DatasetLineageType,
FineGrainedLineage,
Expand Down Expand Up @@ -223,19 +223,19 @@ class MetadataQueryException(Exception):
description
isHidden
folderName
upstreamFields {
name
datasource {
id
}
}
upstreamColumns {
name
table {
__typename
id
}
}
# upstreamFields {
# name
# datasource {
# id
# }
# }
# upstreamColumns {
# name
# table {
# __typename
# id
# }
# }
... on ColumnField {
dataCategory
role
Expand Down Expand Up @@ -336,6 +336,26 @@ class MetadataQueryException(Exception):
}
"""


datasource_upstream_fields_graphql_query = """
{
id
upstreamFields {
name
datasource {
id
}
}
upstreamColumns {
name
table {
__typename
id
}
}
}
"""

published_datasource_graphql_query = """
{
__typename
Expand Down Expand Up @@ -368,19 +388,19 @@ class MetadataQueryException(Exception):
description
isHidden
folderName
upstreamFields {
name
datasource {
id
}
}
upstreamColumns {
name
table {
__typename
id
}
}
# upstreamFields {
# name
# datasource {
# id
# }
# }
# upstreamColumns {
# name
# table {
# __typename
# id
# }
# }
... on ColumnField {
dataCategory
role
Expand Down Expand Up @@ -910,6 +930,40 @@ def make_filter(filter_dict: dict) -> str:
return filter


def query_metadata_cursor_based_pagination(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this cursor stuff still relevant after your other graphql simplifications?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup it is relevant as per tableau documentation.

server: Server,
main_query: str,
connection_name: str,
first: int,
after: Optional[str],
qry_filter: str = "",
) -> dict:
query = f"""
query GetItems(
$first: Int,
$after: String
) {{
{connection_name} ( first: $first, after: $after, filter:{{ {qry_filter} }})
{{
nodes {main_query}
pageInfo {{
hasNextPage
endCursor
}}
}}
}}""" # {{ is to escape { character of f-string

result = server.metadata.query(
query=query,
variables={
"first": first,
"after": after,
},
)

return result


def query_metadata(
server: Server,
main_query: str,
Expand Down Expand Up @@ -940,7 +994,7 @@ def query_metadata(

def get_filter_pages(query_filter: dict, page_size: int) -> List[dict]:
filter_pages = [query_filter]
# If this is primary id filter so we can use divide this query list into
# If this is primary id filter, so we can use divide this query list into
# multiple requests each with smaller filter list (of order page_size).
# It is observed in the past that if list of primary ids grow beyond
# a few ten thousands then tableau server responds with empty response
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
EXTRACT_LAST_INCREMENTAL_UPDATE_TIME = "extractLastIncrementalUpdateTime"
EXTRACT_LAST_UPDATE_TIME = "extractLastUpdateTime"
PUBLISHED_DATA_SOURCES_CONNECTION = "publishedDatasourcesConnection"
FIELDS_CONNECTION = "fieldsConnection"
DATA_SOURCE_FIELDS = "datasourceFields"
SHEETS_CONNECTION = "sheetsConnection"
CREATED_AT = "createdAt"
Expand Down
Loading
Loading