-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingestion/tableau): restructure the tableau graphql datasource query #11230
fix(ingestion/tableau): restructure the tableau graphql datasource query #11230
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update introduces enhancements to the Tableau ingestion module within the DataHub project. Key changes include the implementation of cursor-based pagination for metadata querying, the addition of new configuration fields, and the introduction of structured JSON files that represent data fields and connections. Multiple test files have also been updated to reflect changes in import paths and functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant TableauAPI
participant TableauConfig
participant DataHub
User->>TableauAPI: Request Metadata
TableauAPI->>TableauConfig: Fetch Metadata with Pagination
TableauConfig->>TableauAPI: Send Paginated Query
TableauAPI-->>TableauConfig: Return Metadata Page
TableauConfig-->>DataHub: Send Retrieved Metadata
DataHub-->>User: Provide Data
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
… github.com:sid-acryl/datahub-fork into cus-2491-tableau-ingestion-hitting-20000-node-limit
@coderabbitai review |
Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (18)
- metadata-ingestion/src/datahub/ingestion/source/tableau/init.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/tableau/tableau.py (15 hunks)
- metadata-ingestion/src/datahub/ingestion/source/tableau/tableau_common.py (6 hunks)
- metadata-ingestion/src/datahub/ingestion/source/tableau/tableau_constant.py (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_04ed1dcc7090_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_1570e7f932f6_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_26675da44a38_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_69eb47587cc2_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_6f5f4cc0b6c6_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_a0fced25e056_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_a561c7beccd3_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_bda46be068e3_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_c651da2f6ad8_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/publishedDatasourcesConnection_all.json (47 hunks)
- metadata-ingestion/tests/integration/tableau/setup/publishedDatasourcesFieldUpstream_17139d6e97ae_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/setup/publishedDatasourcesFieldUpstream_8e19660bb5dd_all.json (1 hunks)
- metadata-ingestion/tests/integration/tableau/test_tableau_ingest.py (19 hunks)
- metadata-ingestion/tests/unit/test_tableau_source.py (1 hunks)
Files skipped from review due to trivial changes (3)
- metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_6f5f4cc0b6c6_all.json
- metadata-ingestion/tests/integration/tableau/setup/publishedDatasourcesConnection_all.json
- metadata-ingestion/tests/unit/test_tableau_source.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/tableau/__init__.py
2-2:
datahub.ingestion.source.tableau.tableau.TableauConfig
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
3-3:
datahub.ingestion.source.tableau.tableau.TableauSiteSource
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
4-4:
datahub.ingestion.source.tableau.tableau.TableauSource
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
5-5:
datahub.ingestion.source.tableau.tableau.TableauSourceReport
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
Additional comments not posted (52)
metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_bda46be068e3_all.json (1)
1-39
: LGTM!The JSON structure is well-formed and consistent with the expected schema.
The code changes are approved.
metadata-ingestion/src/datahub/ingestion/source/tableau/tableau_constant.py (1)
69-69
: LGTM!The new constant
FIELDS_CONNECTION
is correctly defined and consistent with the existing constants.The code changes are approved.
metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_04ed1dcc7090_all.json (5)
1-4
: LGTM!The initial structure and nodes of the JSON file are correctly formatted.
The code changes are approved.
5-16
: LGTM!The first node is correctly formatted and follows the expected structure.
The code changes are approved.
17-87
: LGTM!The subsequent nodes are correctly formatted and follow the expected structure.
The code changes are approved.
90-93
: LGTM!The pageInfo structure is correctly formatted and follows the expected structure.
The code changes are approved.
94-94
: LGTM!The totalCount field is correctly formatted and follows the expected structure.
The code changes are approved.
metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_26675da44a38_all.json (5)
1-4
: LGTM!The initial structure and nodes of the JSON file are correctly formatted.
The code changes are approved.
5-16
: LGTM!The first node is correctly formatted and follows the expected structure.
The code changes are approved.
17-100
: LGTM!The subsequent nodes are correctly formatted and follow the expected structure.
The code changes are approved.
102-105
: LGTM!The pageInfo structure is correctly formatted and follows the expected structure.
The code changes are approved.
106-106
: LGTM!The totalCount field is correctly formatted and follows the expected structure.
The code changes are approved.
metadata-ingestion/tests/integration/tableau/setup/publishedDatasourcesFieldUpstream_8e19660bb5dd_all.json (5)
1-4
: LGTM!The initial structure and nodes of the JSON file are correctly formatted.
The code changes are approved.
5-16
: LGTM!The first node is correctly formatted and follows the expected structure.
The code changes are approved.
17-100
: LGTM!The subsequent nodes are correctly formatted and follow the expected structure.
The code changes are approved.
102-105
: LGTM!The pageInfo structure is correctly formatted and follows the expected structure.
The code changes are approved.
106-106
: LGTM!The totalCount field is correctly formatted and follows the expected structure.
The code changes are approved.
metadata-ingestion/tests/integration/tableau/setup/publishedDatasourcesFieldUpstream_17139d6e97ae_all.json (1)
1-281
: LGTM!The JSON structure is well-formed and consistent.
The code changes are approved.
metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_69eb47587cc2_all.json (1)
1-421
: LGTM!The JSON structure is well-formed and consistent.
The code changes are approved.
metadata-ingestion/src/datahub/ingestion/source/tableau/tableau_common.py (4)
933-964
: LGTM!The function
query_metadata_cursor_based_pagination
is well-implemented and follows best practices for dynamic query generation and pagination.The code changes are approved.
Line range hint
966-996
: LGTM!The function
query_metadata
is correctly implemented.The code changes are approved.
Line range hint
997-1017
: LGTM!The function
get_filter_pages
is correctly implemented and the comment provides useful context.The code changes are approved.
226-238
: Verify the impact of commenting out these sections.The sections related to
upstreamFields
andupstreamColumns
have been commented out. Ensure this change does not affect other parts of the codebase.Run the following script to verify the impact:
Also applies to: 391-403
metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_a561c7beccd3_all.json (3)
1-3
: LGTM!The top-level structure of the JSON file is correctly formatted.
The code changes are approved.
4-20
: LGTM!The "fieldsConnection" object and its nested "nodes" array are correctly formatted.
The code changes are approved.
Also applies to: 22-24, 26-33, 35-47, 49-59, 61-71, 73-85, 87-97, 99-109, 111-121, 123-133, 135-145, 147-157, 159-169, 171-181, 183-193, 195-205, 207-217, 219-229, 231-241, 243-253, 255-265, 267-277, 279-289, 291-301, 303-313, 315-325, 327-337, 339-349, 351-361, 363-373, 375-385, 387-397, 399-409, 411-421, 423-433, 435-445, 447-457, 459-469, 471-481, 483-493, 495-505, 507-517, 519-529, 531-541, 543-553, 555-565, 567-577, 579-589, 591-601, 603-613, 615-625, 627-637, 639-649, 651-661, 663-673, 675-685, 687-697, 699-709, 711-721, 723-733, 735-745, 747-757, 759-769, 771-781, 783-793, 795-805, 807-817, 819-829, 831-841, 843-853, 855-865, 867-877, 879-889, 891-901, 903-913, 915-925, 927-937, 939-949, 951-961, 963-973, 975-985, 987-997, 999-1004
581-584
: LGTM!The "pageInfo" object is correctly formatted.
The code changes are approved.
metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_1570e7f932f6_all.json (3)
1-3
: LGTM!The top-level structure of the JSON file is correctly formatted.
The code changes are approved.
4-9
: LGTM!The "fieldsConnection" object and its nested "nodes" array are correctly formatted.
The code changes are approved.
Also applies to: 11-19, 21-29, 31-39, 41-49, 51-59, 61-69, 71-79, 81-101, 103-113, 115-123, 125-133, 135-143, 145-153, 155-163, 165-173, 175-183, 185-193, 195-203, 205-213, 215-223, 225-233, 235-243, 245-253, 255-263, 265-273, 275-283, 285-293, 295-303, 305-313, 315-323, 325-333, 335-343, 345-353, 355-363, 365-373, 375-383, 385-393, 395-403, 405-413, 415-423, 425-433, 435-443, 445-453, 455-463, 465-473, 475-483, 485-493, 495-503, 505-513, 515-523, 525-533, 535-543, 545-553, 555-563, 565-573, 575-583, 585-593, 595-603, 605-613, 615-623, 625-633, 635-643, 645-653, 655-663, 665-673, 675-683, 685-693, 695-703, 705-713, 715-723, 725-733, 735-743, 745-753, 755-763, 765-773, 775-783, 785-793, 795-803, 805-813, 815-823, 825-833, 835-843, 845-853, 855-863, 865-873, 875-883, 885-893, 895-903, 905-913, 915-923, 925-933, 935-943, 945-953, 955-963, 965-973, 975-985, 987-995, 997-1004
997-1000
: LGTM!The "pageInfo" object is correctly formatted.
The code changes are approved.
metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_c651da2f6ad8_all.json (2)
1-1246
: LGTM!The structure of the JSON file is well-organized and follows a logical hierarchy.
The structure of the JSON file is approved.
1-1246
: Verify thetotalCount
field.The
totalCount
field is set to 8, which seems inconsistent with the number of nodes provided. Please verify and correct if necessary.Ensure that the
totalCount
field accurately reflects the number of nodes in thenodes
array.metadata-ingestion/tests/integration/tableau/setup/embeddedDatasourcesFieldUpstream_a0fced25e056_all.json (3)
4-1539
: LGTM!The
nodes
array is correctly formatted, with uniqueid
values and appropriateupstreamFields
andupstreamColumns
.The code changes are approved.
1541-1544
: LGTM!The
pageInfo
object is correctly formatted with appropriate values forhasNextPage
andendCursor
.The code changes are approved.
1545-1545
: LGTM!The
totalCount
value is correctly set to 8.The code changes are approved.
metadata-ingestion/tests/integration/tableau/test_tableau_ingest.py (18)
29-29
: Update import path fortableau_common
module.The import path has been updated to reflect the new module structure. Ensure that the new path is correct and that the
tableau_common
module is accessible from the updated path.The code changes are approved.
288-288
: Update mock patch path forServer
class.The mock patch path has been updated to reflect the new module structure. Ensure that the new path is correct and that the
Server
class is being mocked correctly.The code changes are approved.
334-365
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
386-386
: Update mock patch path forServer
class.The mock patch path has been updated to reflect the new module structure. Ensure that the new path is correct and that the
Server
class is being mocked correctly.The code changes are approved.
425-457
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
490-522
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
555-587
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
621-653
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
686-718
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
750-785
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
853-885
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
1001-1033
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
1201-1234
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
1352-1385
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
1418-1491
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
1523-1556
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
1588-1621
: Enhance test coverage for embedded datasource fields.The additional
read_response
calls introduce new JSON files that are read during the tests, enhancing the test coverage for embedded datasource fields. This ensures a broader range of scenarios are tested.The code changes are approved.
Line range hint
1643-1666
: Update mock patch path forServer
class.The mock patch path has been updated to reflect the new module structure. Ensure that the new path is correct and that the
Server
class is being mocked correctly.The code changes are approved.
from datahub.ingestion.source.tableau.tableau import ( | ||
TableauConfig, | ||
TableauSiteSource, | ||
TableauSource, | ||
TableauSourceReport, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
The following imports are unused and should be removed to maintain clean and efficient code:
TableauConfig
TableauSiteSource
TableauSource
TableauSourceReport
Apply this diff to remove the unused imports:
-from datahub.ingestion.source.tableau.tableau import (
- TableauConfig,
- TableauSiteSource,
- TableauSource,
- TableauSourceReport,
-)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from datahub.ingestion.source.tableau.tableau import ( | |
TableauConfig, | |
TableauSiteSource, | |
TableauSource, | |
TableauSourceReport, | |
) |
Tools
Ruff
2-2:
datahub.ingestion.source.tableau.tableau.TableauConfig
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
3-3:
datahub.ingestion.source.tableau.tableau.TableauSiteSource
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
4-4:
datahub.ingestion.source.tableau.tableau.TableauSource
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
5-5:
datahub.ingestion.source.tableau.tableau.TableauSourceReport
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
current_cursor: Optional[str], # initial value is None | ||
fetch_size: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling for "NODE_LIMIT_EXCEEDED".
The function is correctly implemented, but the error handling for "NODE_LIMIT_EXCEEDED" could be improved by suggesting actionable steps to the user.
Apply this diff to suggest actionable steps for "NODE_LIMIT_EXCEEDED" error:
if node_limit_errors:
logger.debug(f"Node Limit Error. query_data {query_data}")
self.report.warning(
title="Node Limit Errors",
message="Increase your tableau node limit or reduce the fetch size in the configuration.",
context=f"""{{
"errors": {node_limit_errors},
"connection_type": {connection_type},
"query_filter": {query_filter},
"query": {query},
}}""",
)
Also applies to: 986-987, 990-991, 995-1001, 1003-1003, 1013-1017, 1031-1035, 1037-1037, 1051-1051, 1059-1063, 1082-1093, 1131-1137
datasource = self.update_datasource_for_field_upstream( | ||
datasource=datasource, | ||
field_upstream_query=datasource_upstream_fields_graphql_query, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract logic to a helper function.
The function is correctly implemented, but the logic for updating the datasource fields with their upstream fields could be extracted to a helper function to improve readability.
Apply this diff to extract the logic to a helper function:
def update_datasource_with_upstream_fields(self, datasource: dict) -> dict:
return self.update_datasource_for_field_upstream(
datasource=datasource,
field_upstream_query=datasource_upstream_fields_graphql_query,
)
def emit_embedded_datasources(self) -> Iterable[MetadataWorkUnit]:
datasource_filter = {c.ID_WITH_IN: self.embedded_datasource_ids_being_used}
for datasource in self.get_connection_objects(
embedded_datasource_graphql_query,
c.EMBEDDED_DATA_SOURCES_CONNECTION,
datasource_filter,
):
datasource = self.update_datasource_with_upstream_fields(datasource)
yield from self.emit_datasource(
datasource,
datasource.get(c.WORKBOOK),
is_embedded_ds=True,
)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
datasource = self.update_datasource_for_field_upstream( | |
datasource=datasource, | |
field_upstream_query=datasource_upstream_fields_graphql_query, | |
) | |
def update_datasource_with_upstream_fields(self, datasource: dict) -> dict: | |
return self.update_datasource_for_field_upstream( | |
datasource=datasource, | |
field_upstream_query=datasource_upstream_fields_graphql_query, | |
) | |
def emit_embedded_datasources(self) -> Iterable[MetadataWorkUnit]: | |
datasource_filter = {c.ID_WITH_IN: self.embedded_datasource_ids_being_used} | |
for datasource in self.get_connection_objects( | |
embedded_datasource_graphql_query, | |
c.EMBEDDED_DATA_SOURCES_CONNECTION, | |
datasource_filter, | |
): | |
datasource = self.update_datasource_with_upstream_fields(datasource) | |
yield from self.emit_datasource( | |
datasource, | |
datasource.get(c.WORKBOOK), | |
is_embedded_ds=True, | |
) |
def update_datasource_for_field_upstream( | ||
self, | ||
datasource: dict, | ||
field_upstream_query: str, | ||
) -> dict: | ||
# Collect field ids to fetch field upstreams | ||
field_ids: List[str] = [] | ||
for field in datasource.get(c.FIELDS, []): | ||
if field.get(c.ID): | ||
field_ids.append(field.get(c.ID)) | ||
|
||
# Fetch field upstreams and arrange them in map | ||
field_vs_upstream: Dict[str, dict] = {} | ||
for field_upstream in self.get_connection_objects( | ||
field_upstream_query, | ||
c.FIELDS_CONNECTION, | ||
{c.ID_WITH_IN: field_ids}, | ||
): | ||
if field_upstream.get(c.ID): | ||
field_id = field_upstream[c.ID] | ||
# delete the field id, we don't need it for further processing | ||
del field_upstream[c.ID] | ||
field_vs_upstream[field_id] = field_upstream | ||
|
||
# update datasource's field for its upstream | ||
for field_dict in datasource.get(c.FIELDS, []): | ||
field_upstream_dict: Optional[dict] = field_vs_upstream.get( | ||
field_dict.get(c.ID) | ||
) | ||
if field_upstream_dict: | ||
# Add upstream fields to field | ||
field_dict.update(field_upstream_dict) | ||
|
||
return datasource | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify the logic using dictionary comprehension.
The function is correctly implemented, but the logic could be simplified by using a dictionary comprehension.
Apply this diff to simplify the logic:
# Fetch field upstreams and arrange them in map
field_vs_upstream: Dict[str, dict] = {
field_upstream[c.ID]: {k: v for k, v in field_upstream.items() if k != c.ID}
for field_upstream in self.get_connection_objects(
field_upstream_query,
c.FIELDS_CONNECTION,
{c.ID_WITH_IN: field_ids},
)
if field_upstream.get(c.ID)
}
# update datasource's field for its upstream
for field_dict in datasource.get(c.FIELDS, []):
field_upstream_dict: Optional[dict] = field_vs_upstream.get(field_dict.get(c.ID))
if field_upstream_dict:
# Add upstream fields to field
field_dict.update(field_upstream_dict)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def update_datasource_for_field_upstream( | |
self, | |
datasource: dict, | |
field_upstream_query: str, | |
) -> dict: | |
# Collect field ids to fetch field upstreams | |
field_ids: List[str] = [] | |
for field in datasource.get(c.FIELDS, []): | |
if field.get(c.ID): | |
field_ids.append(field.get(c.ID)) | |
# Fetch field upstreams and arrange them in map | |
field_vs_upstream: Dict[str, dict] = {} | |
for field_upstream in self.get_connection_objects( | |
field_upstream_query, | |
c.FIELDS_CONNECTION, | |
{c.ID_WITH_IN: field_ids}, | |
): | |
if field_upstream.get(c.ID): | |
field_id = field_upstream[c.ID] | |
# delete the field id, we don't need it for further processing | |
del field_upstream[c.ID] | |
field_vs_upstream[field_id] = field_upstream | |
# update datasource's field for its upstream | |
for field_dict in datasource.get(c.FIELDS, []): | |
field_upstream_dict: Optional[dict] = field_vs_upstream.get( | |
field_dict.get(c.ID) | |
) | |
if field_upstream_dict: | |
# Add upstream fields to field | |
field_dict.update(field_upstream_dict) | |
return datasource | |
def update_datasource_for_field_upstream( | |
self, | |
datasource: dict, | |
field_upstream_query: str, | |
) -> dict: | |
# Collect field ids to fetch field upstreams | |
field_ids: List[str] = [] | |
for field in datasource.get(c.FIELDS, []): | |
if field.get(c.ID): | |
field_ids.append(field.get(c.ID)) | |
# Fetch field upstreams and arrange them in map | |
field_vs_upstream: Dict[str, dict] = { | |
field_upstream[c.ID]: {k: v for k, v in field_upstream.items() if k != c.ID} | |
for field_upstream in self.get_connection_objects( | |
field_upstream_query, | |
c.FIELDS_CONNECTION, | |
{c.ID_WITH_IN: field_ids}, | |
) | |
if field_upstream.get(c.ID) | |
} | |
# update datasource's field for its upstream | |
for field_dict in datasource.get(c.FIELDS, []): | |
field_upstream_dict: Optional[dict] = field_vs_upstream.get( | |
field_dict.get(c.ID) | |
) | |
if field_upstream_dict: | |
# Add upstream fields to field | |
field_dict.update(field_upstream_dict) | |
return datasource |
metadata-ingestion/src/datahub/ingestion/source/tableau/__init__.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/tableau/tableau.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/tableau/tableau.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/tableau/tableau.py
Outdated
Show resolved
Hide resolved
|
||
parent_project: Optional[ | ||
TableauProject | ||
] = self.tableau_project_registry.get(project_.parent_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how exactly does this interact with project path filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here #L870, connector adding project into the registry if its allowed by pattern.
parent_container_key = self.gen_project_key(project.parent_id) | ||
elif self.config.add_site_container and self.site and self.site.id: | ||
parent_container_key = self.gen_site_key(self.site.id) | ||
visited_project_ids: Set[str] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visited_project_ids: Set[str] = set() | |
generated_project_ids: Set[str] = set() |
parent_container_key: Optional[ContainerKey] = None | ||
# set site as parent container | ||
if self.config.add_site_container and self.site and self.site.id: | ||
parent_container_key = self.gen_site_key(self.site.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this after the visited_project_ids
check?
also, why are we adding to visited_project_ids
in two places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reorganized the code
@@ -910,6 +930,40 @@ def make_filter(filter_dict: dict) -> str: | |||
return filter | |||
|
|||
|
|||
def query_metadata_cursor_based_pagination( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this cursor stuff still relevant after your other graphql simplifications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup it is relevant as per tableau documentation.
metadata-ingestion/tests/integration/tableau/test_tableau_ingest.py
Outdated
Show resolved
Hide resolved
…u.py Co-authored-by: Harshal Sheth <[email protected]>
…ery (datahub-project#11230) Co-authored-by: Harshal Sheth <[email protected]>
Hi @sid-acryl I just came across your changes regarding emitting projects in topological order. We solved this for us some time ago with these changes: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/tableau/tableau.py#L1103-L1107. Did you still have issues with this? I wonder if we really need both, or if we can keep one implementation. For us, the issue was solved with this small snippet. |
Fixes
emit_project_container
to emit project in topological order