Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion/oracle): Improved foreign key handling #11867

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

acrylJonny
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@acrylJonny acrylJonny marked this pull request as draft November 15, 2024 12:34
@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Nov 15, 2024
@anshbansal anshbansal removed the community-contribution PR or Issue raised by member(s) of DataHub Community label Dec 4, 2024
@datahub-cyborg datahub-cyborg bot added the pending-submitter-response Issue/request has been reviewed but requires a response from the submitter label Dec 4, 2024
"\nac.constraint_name,"
"\nac.constraint_type,"
"\nacc.column_name AS local_column,"
"\nNULL AS remote_table,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you set these to NULL?
Aren't these NULL anyway if we omit to select R constraints? (if I guess right and R means remote tables`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but this does end up being a little more efficient splitting this out into 2 UNION'ed queries on the Oracle side.

"\nON ac.owner = acc.owner"
"\nAND ac.constraint_name = acc.constraint_name"
"\nAND ac.table_name = acc.table_name"
"\nWHERE ac.table_name = :table_name"
"\nAND ac.constraint_type IN ('R','P', 'U', 'C')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't R be omitted from this as it will be UNION-ed anyway. This way it will generate duplicate rows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point - I'll adjust

)

text = text % {"dblink": dblink}
if schema is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do schema and owner come together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -552,8 +561,8 @@ def get_view_definition(
text = "SELECT text FROM dba_views WHERE view_name=:view_name"

if schema is not None:
text += " AND owner = :schema"
params["schema"] = schema
params["owner"] = schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is schema set as an owner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Oracle's DBA_VIEWS, the OWNER column represents the schema name that owns the view.

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 6, 2025
@datahub-cyborg datahub-cyborg bot removed the needs-review Label for PRs that need review from a maintainer. label Jan 6, 2025
@datahub-cyborg datahub-cyborg bot added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Jan 6, 2025
) # local_column
pkeys.append(col_name)
except Exception as e:
logger.error(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a reporter hasn't been implemented at all in the Oracle connector I imagine that we should do that, right? There are a number of other areas that are currently logged but not reported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oracle source uses our SQL common source internally, which does do this reporting in most places.

Overall, I'd like to opportunistically move towards structured logging whenever we make tweaks to connectors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporter's errors/warnings have now been added.

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 65.21739% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...gestion/src/datahub/ingestion/source/sql/oracle.py 65.21% 8 Missing ⚠️
Files with missing lines Coverage Δ
...gestion/src/datahub/ingestion/source/sql/oracle.py 75.00% <65.21%> (-2.10%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update efc5d31...f2ad6a1. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants