Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
shirshanka committed Jan 9, 2025
1 parent 0a9a585 commit 527df2d
Showing 1 changed file with 249 additions and 33 deletions.
282 changes: 249 additions & 33 deletions docs/api/tutorials/structured-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import TabItem from '@theme/TabItem';
## Why Would You Use Structured Properties?

Structured properties are a structured, named set of properties that can be attached to logical entities like Datasets, DataJobs, etc.
Structured properties have values that are types. Conceptually, they are like “field definitions”.
Structured properties have values that are typed and support constraints.

Learn more about structured properties in the [Structured Properties Feature Guide](../../../docs/features/feature-guides/properties/overview.md).

Expand All @@ -15,6 +15,7 @@ Learn more about structured properties in the [Structured Properties Feature Gui

This guide will show you how to execute the following actions with structured properties.
- Create structured properties
- List structured properties
- Read structured properties
- Delete structured properties
- Add structured properties to a dataset
Expand All @@ -32,7 +33,8 @@ Additionally, you need to have the following tools installed according to the me
<Tabs>
<TabItem value="CLI" label="CLI" default>

Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding DataHub Cloud release version is `v0.2.16.5`
Install the relevant CLI version.
Structured Properties were introduced in version `0.13.1`, but we continuously improve and add new functionality, so you should always [upgrade](https://datahubproject.io/docs/cli/#installation) to the latest cli for best results.
Connect to your instance via [init](https://datahubproject.io/docs/cli/#init):

- Run `datahub init` to update the instance you want to load into.
Expand All @@ -56,33 +58,8 @@ Requirements for OpenAPI are:
The following code will create a structured property `io.acryl.privacy.retentionTime`.

<Tabs>
<TabItem value="graphql" label="graphQL" default>

```graphql
mutation createStructuredProperty {
createStructuredProperty(
input: {
id: "retentionTime",
qualifiedName:"retentionTime",
displayName: "Retention Time",
description: "Retention Time is used to figure out how long to retain records in a dataset",
valueType: "urn:li:dataType:datahub.number",
allowedValues: [
{numberValue: 30, description: "30 days, usually reserved for datasets that are ephemeral and contain pii"},
{numberValue: 90, description:"description: Use this for datasets that drive monthly reporting but contain pii"},
{numberValue: 365, description:"Use this for non-sensitive data that can be retained for longer"}
],
cardinality: SINGLE,
entityTypes: ["urn:li:entityType:datahub.dataset", "urn:li:entityType:datahub.dataFlow"],
}
) {
urn
}
}
```

</TabItem>
<TabItem value="CLI" label="CLI">
<TabItem value="CLI" label="CLI" default>

Create a yaml file representing the properties you’d like to load.
For example, below file represents a property `io.acryl.privacy.retentionTime`. You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/struct_props.yaml).
Expand All @@ -108,13 +85,41 @@ For example, below file represents a property `io.acryl.privacy.retentionTime`.
```
Use the CLI to create your properties:
```commandline
```shell
datahub properties upsert -f {properties_yaml}
```

If successful, you should see `Created structured property urn:li:structuredProperty:...`

</TabItem>

<TabItem value="Graphql" label="GraphQL" default>

```graphql
mutation createStructuredProperty {
createStructuredProperty(
input: {
id: "retentionTime",
qualifiedName:"retentionTime",
displayName: "Retention Time",
description: "Retention Time is used to figure out how long to retain records in a dataset",
valueType: "urn:li:dataType:datahub.number",
allowedValues: [
{numberValue: 30, description: "30 days, usually reserved for datasets that are ephemeral and contain pii"},
{numberValue: 90, description:"description: Use this for datasets that drive monthly reporting but contain pii"},
{numberValue: 365, description:"Use this for non-sensitive data that can be retained for longer"}
],
cardinality: SINGLE,
entityTypes: ["urn:li:entityType:datahub.dataset", "urn:li:entityType:datahub.dataFlow"],
}
) {
urn
}
}
```

</TabItem>

<TabItem value="OpenAPI v2" label="OpenAPI v2">

```shell
Expand Down Expand Up @@ -236,9 +241,135 @@ Example Response:
</TabItem>
</Tabs>

## Read Structured Properties
## List Structured Properties

You can list all structured properties in your DataHub instance using the following methods:

<Tabs>
<TabItem value="CLI" label="CLI" default>

```shell
datahub properties list
```

This will show all properties with their full details.

Example Response:
```json
{
"urn": "urn:li:structuredProperty:clusterName",
"qualified_name": "clusterName",
"type": "urn:li:dataType:datahub.string",
"description": "Test Cluster Name Property",
"display_name": "Cluster's name",
"entity_types": [
"urn:li:entityType:datahub.dataset"
],
"cardinality": "SINGLE"
}
{
"urn": "urn:li:structuredProperty:projectNames",
"qualified_name": "projectNames",
"type": "urn:li:dataType:datahub.string",
"description": "Test property for project name",
"display_name": "Project Name",
"entity_types": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"cardinality": "MULTIPLE",
"allowed_values": [
{
"value": "Tracking",
"description": "test value 1 for project"
},
{
"value": "DataHub",
"description": "test value 2 for project"
}
]
}
```


If you only want to see the URNs, you can use:

```shell
datahub properties list --no-details
```

Example Response:
```
[2025-01-08 22:23:00,625] INFO {datahub.cli.specific.structuredproperties_cli:134} - Listing structured property urns only, use --details for more information
urn:li:structuredProperty:clusterName
urn:li:structuredProperty:clusterType
urn:li:structuredProperty:io.acryl.dataManagement.deprecationDate
urn:li:structuredProperty:projectNames
```

</TabItem>

<TabItem value="OpenAPI v3" label="OpenAPI v3">

Example Request:
```bash
curl -X 'GET' \
'http://localhost:9002/openapi/v3/entity/structuredproperty?systemMetadata=false&includeSoftDelete=false&skipCache=false&aspects=structuredPropertySettings&aspects=propertyDefinition&aspects=institutionalMemory&aspects=structuredPropertyKey&aspects=status&count=10&sortCriteria=urn&sortOrder=ASCENDING&query=*' \
-H 'accept: application/json'
```

Example Response:
```json
{
"scrollId": "...",
"entities": [
{
"urn": "urn:li:structuredProperty:clusterName",
"propertyDefinition": {
"value": {
"immutable": false,
"qualifiedName": "clusterName",
"displayName": "Cluster's name",
"valueType": "urn:li:dataType:datahub.string",
"description": "Test Cluster Name Property",
"entityTypes": [
"urn:li:entityType:datahub.dataset"
],
"cardinality": "SINGLE"
}
},
"structuredPropertyKey": {
"value": {
"id": "clusterName"
}
}
}
]
}
```

Key Query Parameters:
- `count`: Number of results to return per page (default: 10)
- `sortCriteria`: Field to sort by (default: urn)
- `sortOrder`: Sort order (ASCENDING or DESCENDING)
- `query`: Search query to filter properties (* for all)

</TabItem>
</Tabs>

The list endpoint returns all structured properties in your DataHub instance. Each property includes:
- URN: Unique identifier for the property
- Qualified Name: The property's qualified name
- Type: The data type of the property (string, number, date, etc.)
- Description: A description of the property's purpose
- Display Name: Human-readable name for the property
- Entity Types: The types of entities this property can be applied to
- Cardinality: Whether the property accepts single (SINGLE) or multiple (MULTIPLE) values
- Allowed Values: If specified, the list of allowed values for this property

## Read a single Structured Property

You can see the properties you created by running the following command:
You can read an individual property you created by running the following command:

<Tabs>
<TabItem value="CLI" label="CLI" default>
Expand Down Expand Up @@ -279,6 +410,91 @@ If successful, you should see metadata about your properties returned.
}
```

</TabItem>
<TabItem value="GraphQL" label="GraphQL">

Example Request:
```graphql
query {
structuredProperty(urn: "urn:li:structuredProperty:projectNames") {
urn
type
definition {
qualifiedName
displayName
description
cardinality
allowedValues {
value {
... on StringValue {
stringValue
}
... on NumberValue {
numberValue
}
}
description
}
entityTypes {
urn
info {
type
qualifiedName
}
}
}
}
}
```

Example Response:
```json
{
"data": {
"structuredProperty": {
"urn": "urn:li:structuredProperty:projectNames",
"type": "STRUCTURED_PROPERTY",
"definition": {
"qualifiedName": "projectNames",
"displayName": "Project Name",
"description": "Test property for project name",
"cardinality": "MULTIPLE",
"allowedValues": [
{
"value": {
"stringValue": "Tracking"
},
"description": "test value 1 for project"
},
{
"value": {
"stringValue": "DataHub"
},
"description": "test value 2 for project"
}
],
"entityTypes": [
{
"urn": "urn:li:entityType:datahub.dataset",
"info": {
"type": "DATASET",
"qualifiedName": "datahub.dataset"
}
},
{
"urn": "urn:li:entityType:datahub.dataFlow",
"info": {
"type": "DATA_FLOW",
"qualifiedName": "datahub.dataFlow"
}
}
]
}
}
},
"extensions": {}
}
```
</TabItem>

<TabItem value="OpenAPI v2" label="OpenAPI v2">
Expand Down Expand Up @@ -389,7 +605,7 @@ Example Response:
This action will set/replace all structured properties on the entity. See PATCH operations to add/remove a single property.

<Tabs>
<TabItem value="graphQL" label="GraphQL" default>
<TabItem value="GraphQL" label="GraphQL" default>

```graphql
mutation upsertStructuredProperties {
Expand Down Expand Up @@ -537,7 +753,7 @@ datahub dataset get --urn {urn}
For reading all structured properties from a dataset:

<Tabs>
<TabItem value="graphql" label="GraphQL" default>
<TabItem value="Graphql" label="GraphQL" default>

```graphql
query getDataset {
Expand Down

0 comments on commit 527df2d

Please sign in to comment.