Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(great-expectations): add SqlAlchemyDataset support #9225

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
ExpectationSuiteIdentifier,
ValidationResultIdentifier,
)
from great_expectations.dataset.sqlalchemy_dataset import SqlAlchemyDataset
from great_expectations.execution_engine.sqlalchemy_execution_engine import (
SqlAlchemyExecutionEngine,
)
Expand Down Expand Up @@ -686,10 +687,36 @@
batch_spec_type=type(ge_batch_spec)
)
)
elif isinstance(data_asset, SqlAlchemyDataset):
if "." in data_asset._table.name:

Check warning on line 691 in metadata-ingestion/src/datahub/integrations/great_expectations/action.py

View check run for this annotation

Codecov / codecov/patch

metadata-ingestion/src/datahub/integrations/great_expectations/action.py#L690-L691

Added lines #L690 - L691 were not covered by tests
# bigquery case
schema_name, table_name = data_asset._table.name.split(".")
sqlalchemy_uri = f"{data_asset.engine.url}/{schema_name}"

Check warning on line 694 in metadata-ingestion/src/datahub/integrations/great_expectations/action.py

View check run for this annotation

Codecov / codecov/patch

metadata-ingestion/src/datahub/integrations/great_expectations/action.py#L693-L694

Added lines #L693 - L694 were not covered by tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the engine url already have the project name in it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the engine full url is like bigquery://my-project
Then it is parsed to fetch the project name and the database, that's why I've added the /{schema_name} here

else:
schema_name = data_asset._table.schema
table_name = data_asset._table.name
sqlalchemy_uri = data_asset.engine.url

Check warning on line 698 in metadata-ingestion/src/datahub/integrations/great_expectations/action.py

View check run for this annotation

Codecov / codecov/patch

metadata-ingestion/src/datahub/integrations/great_expectations/action.py#L696-L698

Added lines #L696 - L698 were not covered by tests

dataset_urn = make_dataset_urn_from_sqlalchemy_uri(

Check warning on line 700 in metadata-ingestion/src/datahub/integrations/great_expectations/action.py

View check run for this annotation

Codecov / codecov/patch

metadata-ingestion/src/datahub/integrations/great_expectations/action.py#L700

Added line #L700 was not covered by tests
sqlalchemy_uri=sqlalchemy_uri,
schema_name=schema_name,
table_name=table_name,
env=self.env
)
partitionSpec = None
batchSpec = None
dataset_partitions.append(

Check warning on line 708 in metadata-ingestion/src/datahub/integrations/great_expectations/action.py

View check run for this annotation

Codecov / codecov/patch

metadata-ingestion/src/datahub/integrations/great_expectations/action.py#L706-L708

Added lines #L706 - L708 were not covered by tests
{
"dataset_urn": dataset_urn,
"partitionSpec": partitionSpec,
"batchSpec": batchSpec,
}
)
else:
# TODO - v2-spec - SqlAlchemyDataset support
warn(
"DataHubValidationAction does not recognize this GE data asset type - {asset_type}. This is either using v2-api or execution engine other than sqlalchemy.".format(
"""
DataHubValidationAction does not recognize this GE data asset type - {asset_type}.
This is either using v2-api or execution engine other than sqlalchemy.""".format(
asset_type=type(data_asset)
)
)
Expand Down Expand Up @@ -763,7 +790,8 @@
return None
# If data platform is snowflake, we artificially lowercase the Database name.
# This is because DataHub also does this during ingestion.
# Ref: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_utils.py#L155
# Ref:
# https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_utils.py#L155
database_name = (
url_instance.database.lower()
if data_platform == "snowflake"
Expand Down
Loading