Skip to content

Commit

Permalink
Merge branch 'master' into redshift-copy-lineage
Browse files Browse the repository at this point in the history
  • Loading branch information
mayurinehate authored Oct 22, 2024
2 parents 7e0ae7e + c09f13b commit 6913997
Show file tree
Hide file tree
Showing 58 changed files with 1,566 additions and 461 deletions.
11 changes: 10 additions & 1 deletion datahub-web-react/src/app/ingest/source/builder/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import csvLogo from '../../../../images/csv-logo.png';
import qlikLogo from '../../../../images/qliklogo.png';
import sigmaLogo from '../../../../images/sigmalogo.png';
import sacLogo from '../../../../images/saclogo.svg';
import datahubLogo from '../../../../images/datahublogo.png';

export const ATHENA = 'athena';
export const ATHENA_URN = `urn:li:dataPlatform:${ATHENA}`;
Expand Down Expand Up @@ -125,6 +126,11 @@ export const SIGMA = 'sigma';
export const SIGMA_URN = `urn:li:dataPlatform:${SIGMA}`;
export const SAC = 'sac';
export const SAC_URN = `urn:li:dataPlatform:${SAC}`;
export const DATAHUB = 'datahub';
export const DATAHUB_GC = 'datahub-gc';
export const DATAHUB_LINEAGE_FILE = 'datahub-lineage-file';
export const DATAHUB_BUSINESS_GLOSSARY = 'datahub-business-glossary';
export const DATAHUB_URN = `urn:li:dataPlatform:${DATAHUB}`;

export const PLATFORM_URN_TO_LOGO = {
[ATHENA_URN]: athenaLogo,
Expand Down Expand Up @@ -165,6 +171,7 @@ export const PLATFORM_URN_TO_LOGO = {
[QLIK_SENSE_URN]: qlikLogo,
[SIGMA_URN]: sigmaLogo,
[SAC_URN]: sacLogo,
[DATAHUB_URN]: datahubLogo,
};

export const SOURCE_TO_PLATFORM_URN = {
Expand All @@ -178,5 +185,7 @@ export const SOURCE_TO_PLATFORM_URN = {
[SNOWFLAKE_USAGE]: SNOWFLAKE_URN,
[STARBURST_TRINO_USAGE]: TRINO_URN,
[DBT_CLOUD]: DBT_URN,
[VERTICA]: VERTICA_URN,
[DATAHUB_GC]: DATAHUB_URN,
[DATAHUB_LINEAGE_FILE]: DATAHUB_URN,
[DATAHUB_BUSINESS_GLOSSARY]: DATAHUB_URN,
};
2 changes: 1 addition & 1 deletion docs-website/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ module.exports = {
announcementBar: {
id: "announcement-2",
content:
'<div style="display: flex; justify-content: center; align-items: center;width: 100%;"><!--img src="/img/acryl-logo-white-mark.svg" / --><div style="font-size: .8rem; font-weight: 600; background-color: white; color: #111; padding: 0px 8px; border-radius: 4px; margin-right:12px;">NEW</div><p><span>Join us at Metadata & AI Summit, Oct. 29 & 30!</span></p><a href="http://www.acryldata.io/conference?utm_source=datahub_web&utm_medium=metadata_ai_2024&utm_campaign=home_banner" target="_blank" class="button">Register</a></div>',
'<div style="display: flex; justify-content: center; align-items: center;width: 100%;"><!--img src="/img/acryl-logo-white-mark.svg" / --><div style="font-size: .8rem; font-weight: 600; background-color: white; color: #111; padding: 0px 8px; border-radius: 4px; margin-right:12px;">NEW</div><p>Join us at Metadata & AI Summit, Oct. 29 & 30!</p><a href="http://www.acryldata.io/conference?utm_source=datahub_web&utm_medium=metadata_ai_2024&utm_campaign=home_banner" target="_blank" class="button">Register<span> →</span></a></div>',
backgroundColor: "#111",
textColor: "#ffffff",
isCloseable: false,
Expand Down
29 changes: 21 additions & 8 deletions docs-website/src/styles/global.scss
Original file line number Diff line number Diff line change
Expand Up @@ -116,21 +116,34 @@ div[class^="announcementBar"] {
>div {
display: flex;
align-items: center;
> div {
@media (max-width: 580px) {
display: none;
}
}
a>span {
@media (max-width: 580px) {
display: none;
}
}

>p {
text-align: left;
line-height: 1.1rem;
margin: 0;

>span {
@media (max-width: 780px) {
display: none;
}
}

@media (max-width: 480px) {
display: none;
@media (max-width: 580px) {
font-size: .9rem;
}
// >span {
// @media (max-width: 780px) {
// display: none;
// }
// }

// @media (max-width: 480px) {
// display: none;
// }
}
}

Expand Down
1 change: 0 additions & 1 deletion docs/businessattributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ Taking the example of "United States- Social Security Number", if an application
What you need to create/update and associate business attributes to dataset schema field

* **Manage Business Attributes** platform privilege to create/update/delete business attributes.
* **Edit Dataset Column Business Attribute** metadata privilege to associate business attributes to dataset schema field.

## Using Business Attributes
As of now Business Attributes can only be created through UI
Expand Down
6 changes: 5 additions & 1 deletion docs/lineage/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ conn_id = datahub_rest_default # or datahub_kafka_default
```

| Name | Default value | Description |
|----------------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -------------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| enabled | true | If the plugin should be enabled. |
| conn_id | datahub_rest_default | The name of the datahub connection you set in step 1. |
| cluster | prod | name of the airflow cluster |
Expand Down Expand Up @@ -191,6 +191,10 @@ These operators are supported by OpenLineage, but we haven't tested them yet:
There's also a few operators (e.g. BashOperator, PythonOperator) that have custom extractors, but those extractors don't generate lineage.
-->

Known limitations:

- We do not fully support operators that run multiple SQL statements at once. In these cases, we'll only capture lineage from the first SQL statement.

## Manual Lineage Annotation

### Using `inlets` and `outlets`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,13 @@
TableSchemaMetadataValue,
)
from dagster._core.execution.stats import RunStepKeyStatsSnapshot, StepEventStatus
from dagster._core.snap import JobSnapshot

try:
from dagster._core.snap import JobSnapshot # type: ignore[attr-defined]
except ImportError:
# Import changed since Dagster 1.8.12 to this -> https://github.com/dagster-io/dagster/commit/29a37d1f0260cfd112849633d1096ffc916d6c95
from dagster._core.snap import JobSnap as JobSnapshot

from dagster._core.snap.node import OpDefSnap
from dagster._core.storage.dagster_run import DagsterRun, DagsterRunStatsSnapshot
from datahub.api.entities.datajob import DataFlow, DataJob
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -155,12 +155,11 @@ async def _get_flow_run_graph(self, flow_run_id: str) -> Optional[List[Dict]]:
The flow run graph in json format.
"""
try:
response = orchestration.get_client()._client.get(
response_coroutine = orchestration.get_client()._client.get(
f"/flow_runs/{flow_run_id}/graph"
)

if asyncio.iscoroutine(response):
response = await response
response = await response_coroutine

if hasattr(response, "json"):
response_json = response.json()
Expand Down Expand Up @@ -410,10 +409,9 @@ async def get_flow_run(flow_run_id: UUID) -> FlowRun:
if not hasattr(client, "read_flow_run"):
raise ValueError("Client does not support async read_flow_run method")

response = client.read_flow_run(flow_run_id=flow_run_id)
response_coroutine = client.read_flow_run(flow_run_id=flow_run_id)

if asyncio.iscoroutine(response):
response = await response
response = await response_coroutine

return FlowRun.parse_obj(response)

Expand Down Expand Up @@ -477,10 +475,9 @@ async def get_task_run(task_run_id: UUID) -> TaskRun:
if not hasattr(client, "read_task_run"):
raise ValueError("Client does not support async read_task_run method")

response = client.read_task_run(task_run_id=task_run_id)
response_coroutine = client.read_task_run(task_run_id=task_run_id)

if asyncio.iscoroutine(response):
response = await response
response = await response_coroutine

return TaskRun.parse_obj(response)

Expand Down
28 changes: 15 additions & 13 deletions metadata-ingestion/docs/transformer/dataset_transformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,12 +122,13 @@ transformers:
```
## Simple Add Dataset ownership
### Config Details
| Field | Required | Type | Default | Description |
|--------------------|----------|--------------|-------------|---------------------------------------------------------------------|
| `owner_urns` | ✅ | list[string] | | List of owner urns. |
| `ownership_type` | | string | "DATAOWNER" | ownership type of the owners (either as enum or ownership type urn) |
| `replace_existing` | | boolean | `false` | Whether to remove ownership from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |
| Field | Required | Type | Default | Description |
|--------------------|----------|--------------|-------------|------------------------------------------------------------------------------------------------------------|
| `owner_urns` | ✅ | list[string] | | List of owner urns. |
| `ownership_type` | | string | "DATAOWNER" | ownership type of the owners (either as enum or ownership type urn) |
| `replace_existing` | | boolean | `false` | Whether to remove ownership from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |
| `on_conflict` | | enum | `DO_UPDATE` | Whether to make changes if domains already exist. If set to DO_NOTHING, `semantics` setting is irrelevant. |

For transformer behaviour on `replace_existing` and `semantics`, please refer section [Relationship Between replace_existing And semantics](#relationship-between-replace_existing-and-semantics).

Expand Down Expand Up @@ -191,13 +192,14 @@ transformers:

## Pattern Add Dataset ownership
### Config Details
| Field | Required | Type | Default | Description |
|--------------------|----------|----------------------|-------------|-----------------------------------------------------------------------------------------|
| `owner_pattern` | ✅ | map[regx, list[urn]] | | entity urn with regular expression and list of owners urn apply to matching entity urn. |
| `ownership_type` | | string | "DATAOWNER" | ownership type of the owners (either as enum or ownership type urn) |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |
| `is_container` | | bool | `false` | Whether to also consider a container or not. If true, then ownership will be attached to both the dataset and its container. |
| Field | Required | Type | Default | Description |
|--------------------|----------|----------------------|-------------|------------------------------------------------------------------------------------------------------------------------------|
| `owner_pattern` | ✅ | map[regx, list[urn]] | | entity urn with regular expression and list of owners urn apply to matching entity urn. |
| `ownership_type` | | string | "DATAOWNER" | ownership type of the owners (either as enum or ownership type urn) |
| `replace_existing` | | boolean | `false` | Whether to remove owners from entity sent by ingestion source. |
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |
| `is_container` | | bool | `false` | Whether to also consider a container or not. If true, then ownership will be attached to both the dataset and its container. |
| `on_conflict` | | enum | `DO_UPDATE` | Whether to make changes if domains already exist. If set to DO_NOTHING, `semantics` setting is irrelevant. |

let’s suppose we’d like to append a series of users who we know to own a different dataset from a data source but aren't detected during normal ingestion. To do so, we can use the `pattern_add_dataset_ownership` module that’s included in the ingestion framework. This will match the pattern to `urn` of the dataset and assign the respective owners.

Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@
sqlglot_lib = {
# Using an Acryl fork of sqlglot.
# https://github.com/tobymao/sqlglot/compare/main...hsheth2:sqlglot:main?expand=1
"acryl-sqlglot[rs]==25.20.2.dev6",
"acryl-sqlglot[rs]==25.25.2.dev9",
}

classification_lib = {
Expand Down
Loading

0 comments on commit 6913997

Please sign in to comment.