Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(great-expectations): add SqlAlchemyDataset support #9225

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

seuf
Copy link

@seuf seuf commented Nov 10, 2023

Hi,
This Pull request add support for SqlAlchemyDataset data asset type in the great expectations integration.

Tested with bigquery engine.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 10, 2023
@seuf seuf force-pushed the great-expectations-integration-sqlalchemy-dataset branch from f23b158 to 2d5894c Compare November 13, 2023 14:27
@maggiehays maggiehays added community-contribution PR or Issue raised by member(s) of DataHub Community product PR or Issue related to the DataHub UI/UX and removed product PR or Issue related to the DataHub UI/UX labels Nov 13, 2023
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for this?

if "." in data_asset._table.name:
# bigquery case
schema_name, table_name = data_asset._table.name.split(".")
sqlalchemy_uri = f"{data_asset.engine.url}/{schema_name}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the engine url already have the project name in it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the engine full url is like bigquery://my-project
Then it is parsed to fetch the project name and the database, that's why I've added the /{schema_name} here

@seuf
Copy link
Author

seuf commented Nov 14, 2023

Should I also bump the great-expectation requirements ? because to use this I have to install acryl-datahub[great-expectations] then install my great-expectation version to avoid conflicts

@hsheth2
Copy link
Collaborator

hsheth2 commented Nov 20, 2023

@seuf if we can upgrade the great expectations action code to be compatible with both older and newer versions of GX, then go for it. I suspect it'll just need some conditional imports e.g. try: from great_expectations import NewThing; except ImportError: import NewThing = None

In our SQL sources, we also use great-expectations purely for data profiling. A good interim outcome would be that we leave those as-is (and so things like acryl-datahub[snowflake] will still require GX v0.15), but acryl-datahub[great-expectations] allows you to install newer versions of GX.

Related to #8115.

@hsheth2
Copy link
Collaborator

hsheth2 commented Oct 31, 2024

@seuf I know it's been a while on this one, but we've made some progress on making it easier to support other GX versions in the plugin and also support additional datasource types. See my comment here #8115 (comment)

I think it makes sense to revisit this PR now that we have a separated gx-plugin package. Let me know if you're up for it!

@hsheth2 hsheth2 added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed poc-marathon-dec-2023 labels Oct 31, 2024
@ms32035
Copy link
Contributor

ms32035 commented Nov 3, 2024

@hsheth2 this and the outdated openlineage version in Airflow are the two places where Datahub is holding all other packages back

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Dec 24, 2024
@hsheth2 hsheth2 added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Dec 25, 2024
@seuf
Copy link
Author

seuf commented Jan 7, 2025

Sorry, I didn't had time to answer.. Now that GX 1.0 is out I need to upgrade it on my stack and after I'll try to update this MR with tests. Or fell free to do it if you have some bandwidth 😅

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 7, 2025
@hsheth2 hsheth2 added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata pending-submitter-response Issue/request has been reviewed but requires a response from the submitter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants