-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metadata-ingestion: Update great-expectations dependency from 0.15 to 0.16 #8115
Comments
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io |
This issue was closed because it has been inactive for 30 days since being marked as stale. |
Any update on this? |
This issue is on our radar, but unfortunately isn't a simple fix because of the level of customization and patching we've done in our existing GX-based data profilers. We've had some conversations with the GX team around what it would take to get this done, and are working to scope it accordingly. |
any updates on this issue? I mean GX is at 0.18 in the meantime :) |
Any updates on this? they are about to move to 1.x.x :) |
to make datahub work with recent currently it clinches with and
|
Are there any loose timelines around when this can be resolved? |
I'm sorry, I know these sorts of "me too" comments are rarely of much help. I wanted to highlight that great-expectations at the pinned version has a variety of upper bounds constraints: https://raw.githubusercontent.com/great-expectations/great_expectations/0.15.50/requirements.txt
And at least for us the problem isn't so much that "great expectations is old" but that being on the lower side of these transitive dependencies -- like the pydantic v1-v2 transitions -- has ever increasing opportunity costs. (In our particular transitive set pydantic <2 is also keeping us on pandas<2, which adds further to the expense.) I know this doesn't change anything about the difficulty of migration, but I hope it clarifies the "cost" somewhat when this issue is next triaged. |
Any updates on this? The latest version of datahub_action for GX also needs to get updated to reflect the latest changes. It is a one line change tho. |
Just want to clarify which of these issues people are trying to solve:
|
@shirshanka for the datahub action to work with the latest version of GX I managed to just modify a couple of lines of code to fix the class constructor function. But the bigger issue is that if we have airflow installed with the datahub plugin we cannot use the latest version of GX in our dags due to version conflict. |
This one. We use a monorepo and minimizing the number of transitive dependency sets we are juggling maximizes the usefulness of said monorepo. |
@shirshanka It looks like the changes that introduced pydantic v2 support in great-expectations will be easy to backport to |
If anyone wants it, I pushed it up to my fork, and here's the diff from |
We've done some work on our end in #11096. The main outcome of that is the GX validation action now lives in the For ingestion (e.g. snowflake/bigquery/redshift/other sql sources), we still depend on GX 0.15.50 for profiling, and that remains a particularly tricky dependency to loosen given the extent of the monkey-patching we've done to improve query efficiency. If you're using the only the Python SDKs, you usually can install We recommend not installing full ingestion sources into your main environment (e.g. avoid having a dependency on However, I recognize that this isn't a full fix yet, and so I'll be leaving this issue open for now. @jskrzypek those improvements sounds great - we'd definitely be open to using the forked GX version that supports pydantic v2. The core acryl-datahub SDK already supports both pydantic 1 and 2, but many of our sources still require v1 because of the GX dependency. |
@hsheth2 cool! Please feel free to just take over my fork of GX if you want. It shouldn't require much ongoing maintenance, but I don't really have the time or bandwidth to keep up with it. I am not sure if GX would consider adopting it themselves, but imagine a request to do so will be more well received if it comes from a project like datahub – our company doesn't use GX directly. |
This is so confusing to still be pinned to pydantic-v1 and GE release happened on datahub/metadata-ingestion/setup.py Line 136 in bb4d6bc
I'm struggling to install both
|
Currently, DataHub depends on great-expectations <= 0.15.50, which is no longer actively maintained. The latest version is 0.16.13, which adds Fluent Datasources that make GX much more user friendly.
However, the new releases remove deprecated code that is used by DataHub, e.g., SQLAlchemyDataset/Datasource in the data profiler and probably some data-asset related stuff in the GX action.
Please update the dependency to 0.16 so that our users can use the new GX version with the datahub action.
The text was updated successfully, but these errors were encountered: