Skip to content

Commit

Permalink
mark as breaking change, update messages
Browse files Browse the repository at this point in the history
  • Loading branch information
mayurinehate committed Jan 15, 2024
1 parent f7f3d86 commit cbfe832
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 8 deletions.
19 changes: 19 additions & 0 deletions docs/how/updating-datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,25 @@ This file documents any backwards-incompatible changes in DataHub and assists pe
- Neo4j 5.x, may require migration from 4.x
- Build requires JDK17 (Runtime Java 11)
- Build requires Docker Compose > 2.20
- #9601 - The Unity Catalog(UC) ingestion source config `include_metastore` is now disabled by default. This change will affect the urns of all entities in the workspace.<br/>
Entity Hierarchy with `include_metastore: true` (Old)
```
- UC Metastore
- Catalog
- Schema
- Table
```

Entity Hierarchy with `include_metastore: false` (New)
```
- Catalog
- Schema
- Table
```
We recommend using `platform_instance` for differentiating across metastores.

If stateful ingestion is enabled, running ingestion with latest cli version will perform all required cleanup. Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
`datahub delete --platform databricks --soft` and then reingesting with latest cli version.

### Potential Downtime

Expand Down
10 changes: 5 additions & 5 deletions metadata-ingestion/docs/sources/databricks/unity-catalog_post.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@


#### Advanced
### Advanced

##### Multiple Databricks Workspaces
#### Multiple Databricks Workspaces

If you have multiple databricks workspaces **that point to the same Unity Catalog metastore**, our suggestion is to use separate recipes for ingesting the workspace-specific Hive Metastore catalog and Unity Catalog metastore's information schema.

Expand All @@ -20,14 +20,14 @@ To ingest Unity Catalog information schema
- Use filters to only ingest each catalog once, but shouldn’t be necessary


#### Troubleshooting
### Troubleshooting

##### No data lineage captured or missing lineage
#### No data lineage captured or missing lineage

Check that you meet the [Unity Catalog lineage requirements](https://docs.databricks.com/data-governance/unity-catalog/data-lineage.html#requirements).

Also check the [Unity Catalog limitations](https://docs.databricks.com/data-governance/unity-catalog/data-lineage.html#limitations) to make sure that lineage would be expected to exist in this case.

##### Lineage extraction is too slow
#### Lineage extraction is too slow

Currently, there is no way to get table or column lineage in bulk from the Databricks Unity Catalog REST api. Table lineage calls require one API call per table, and column lineage calls require one API call per column. If you find metadata extraction taking too long, you can turn off column level lineage extraction via the `include_column_lineage` config flag.
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,8 @@ class UnityCatalogSourceConfig(
description=(
"Whether to ingest the workspace's metastore as a container and include it in all urns."
" Changing this will affect the urns of all entities in the workspace."
" This will be disabled by default in the future,"
" so it is recommended to set this to `False` for new ingestions."
" This config is deprecated and will be removed in the future,"
" so it is recommended to not set this to `True` for new ingestions."
" If you have an existing unity catalog ingestion, you'll want to avoid duplicates by soft deleting existing data."
" If stateful ingestion is enabled, running with `include_metastore: false` should be sufficient."
" Otherwise, we recommend deleting via the cli: `datahub delete --platform databricks` and re-ingesting with `include_metastore: false`."
Expand Down Expand Up @@ -299,7 +299,7 @@ def include_metastore_warning(cls, v: bool) -> bool:
if v:
msg = (
"`include_metastore` is enabled."
" This is not recommended and will be disabled by default in the future, which is a breaking change."
" This is not recommended and this option will be removed in the future, which is a breaking change."
" All databricks urns will change if you re-ingest with this disabled."
" We recommend soft deleting all databricks data and re-ingesting with `include_metastore` set to `False`."
)
Expand Down

0 comments on commit cbfe832

Please sign in to comment.