-
Notifications
You must be signed in to change notification settings - Fork 300
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improved Neptune connector docs, CFN deploy for example (#1998)
- Loading branch information
Showing
9 changed files
with
351 additions
and
300 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Neptune Athena Connector Example | ||
|
||
To get started with the Neptune Athena Connector, follow these steps: | ||
|
||
1. Create a Amazon Neptune database cluster, if you do not already have one. Then populate the database with the sample `air routes` dataset. This is available in both Labeled Property Graph (LPG) and Resource Description Framework (RDF) formats. You may load both if you would like to test the connector against both formats. For more, see [neptune-cluster-setup/README.md](neptune-cluster-setup/README.md). | ||
2. The connector requires you to define a table structure in AWS Glue. Follow [aws-glue-sample-scripts/README.md](aws-glue-sample-scripts/README.md) to setup for the `air routes` dataset. | ||
3. Deploy the connector following [neptune-connector-setup/README.md](neptune-connector-setup/README.md). To use both LPG and RDF, deploy two copies of the connector. | ||
|
143 changes: 33 additions & 110 deletions
143
athena-neptune/docs/aws-glue-sample-scripts/PropertyGraph.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,141 +1,64 @@ | ||
# Property Graph Glue Data Catalog Setup | ||
|
||
Column types for tables representing Property Graph nodes or edges map from node or edge property tables. As an example, if we have a node labelled “country” with properties “type”, “code” and “desc”. In the Glue database, we will create a table named “country” with columns “type”, “code” and “desc”. Setup data types of the columns based on their data types in the property graph. | ||
To query property graph data using this connector, create a table in the Glue data catalog that maps to property graph data in the Neptune database. There are three styles of mapping available: | ||
|
||
Refer to the diagram below: | ||
- *Vertex-based*: The table represents a vertex with a specified label in the graph. Each row represents a specific vertex. Its columns include the vertex ID and vertex property values. Examples tables include `airport`, `country`, and `continent` tables. | ||
- *Edge-based*: The table represents an edge with a specified label in the graph. Each row represents a specific edge. Its column include the edge ID, source and target vertex IDs, and edge property values. An example is the `route` table. | ||
- *Query-based*: The table represents the resultset of a Gremlin query. Each row is one result. An example is the `customairport` table. | ||
|
||
![](./assets/connector-propertygraph.png) | ||
Columns are named the same as their properties. Reserved column names are: | ||
- `id`: vertex ID if `componenttype` is 'vertex`. edge ID if `componenttype` is 'edge`. | ||
- `out`: If `componenttype` is edge, this is the vertex ID of the *from* vertex. | ||
- `in`: If `componenttype` is edge, this is the vertex ID of the *to* vertex. | ||
|
||
## Create AWS Glue Catalog Database and Tables | ||
Advanced properties for the table are: | ||
|
||
AWS Glue Catalog Database and Tables can be created either by using [Amazon Neptune Export Configuration](#create-aws-glue-database-and-tables-using-amazon-neptune-export-configuration) or [Manually](#create-aws-glue-database-and-tables-manually). | ||
|Property|Values|Description| | ||
|--------|------|-----------| | ||
|componenttype|`vertex`, `edge`, or `view`|| | ||
|glabel|vertex label or edge type. If not specified, this is assumed to be the table name|| | ||
|query|Gremlin query if `componenttype` is `view`| | ||
|
||
### Create AWS Glue Database and Tables using Amazon Neptune Export Configuration | ||
## Examples | ||
|
||
You can use the sample node.js script [here](./automation/script.js) to create a Glue Database by the name "graph-database" and tables: airport, country, continent and route corresponding to the Air Routes Property Graph sample dataset. The node.js script uses the Amazon Neptune export configuration file. There is a sample export configuration for the Air Routes sample dataset in the [folder](./automation). | ||
|
||
From inside the [folder](./automation), run these commands | ||
|
||
Install dependencies | ||
|
||
``` | ||
npm install | ||
``` | ||
|
||
Make sure you have access to your AWS environment via CLI and Execute the script | ||
|
||
``` | ||
node script.js | ||
``` | ||
If you are using a different dataset make sure to replace the config.json with export output from your database. Refer [this](https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-export) for how to export configuration from Amazon Neptune database. You have to download the source code and build it. Once you have built the neptune-export jar file, run the below command from machine where your Amazon Neptune cluster is accessible, to generated export configuration | ||
|
||
``` | ||
bin/neptune-export.sh create-pg-config -e <neptuneclusterendpoint> -d <outputfolderpath> | ||
``` | ||
|
||
### Create AWS Glue Database and Tables manually | ||
|
||
|
||
If you want to create database and tables manually, you can use the sample shell script [here](./manual/sample-cli-script.sh) to create a Glue Database by the name "graph-database" and tables: airport, country, continent and route corresponding to the Air Routes Property Graph sample dataset. | ||
|
||
If you're planning to use your own data set instead of the Air Routes sample dataset, then you need to modify the script according to your data structure. | ||
|
||
Ensure to have the right executable permissions on the script once you download it. | ||
|
||
``` | ||
chmod 755 sample-cli-script.sh | ||
``` | ||
Ensure to setup credentials for your AWS CLI to work. | ||
|
||
Replace <aws-profile> with the AWS profile name that carries your credentials and replace <aws-region> with AWS region where you are creating the AWS Glue tables which should be the same as your Neptune Cluster's AWS region. | ||
|
||
``` | ||
./sample-cli-script.sh <aws-profile> <aws-region> | ||
``` | ||
|
||
|
||
If all goes well you now have the Glue Database and Tables that are required for your Athena Neptune Connector setup and you can move on to those steps mentioned [here](../neptune-connector-setup/). | ||
|
||
### Sample table post setup | ||
The next screenshot shows columns and advanced properties for the sample `airport` table that maps to airport vertices in Neptune. It is a vertex table, indicated the `componenttype` of `vertex`. Its columns include `id` (the airport vertex ID) plus `type`, `code`, `icao`, and `desc` (vertex properties). | ||
|
||
![](./assets/table.png) | ||
|
||
### Query examples | ||
|
||
##### Graph Query | ||
Here is an edge table for `route`. Columns include built-in `id`, `out`, and `in`. The `dist` column maps to an edge property of the `route` edge. | ||
|
||
``` | ||
g.V().hasLabel("airport").as("source").out("route").as("destination").select("source","destination").by(id()).limit(10) | ||
``` | ||
|
||
##### Equivalent Athena Query | ||
``` | ||
SELECT | ||
a.id as "source",b.id as "destination" FROM "graph-database"."airport" as a | ||
inner join "graph-database"."route" as b | ||
on a.id = b.out | ||
inner join "graph-database"."airport" as c | ||
on c.id = b."in" | ||
limit 10; | ||
``` | ||
![](./assets/table_route.png) | ||
|
||
## Custom query | ||
Finally, here is a table that presents a custom view. Notice `componentype` is `view`. | ||
|
||
Neptune connector custom query feature allows you to specify a custom Glue table, which matches response of a Gremlin Query. For example a gremlin query like | ||
![](./assets/table_custom.png) | ||
|
||
The `query` property is | ||
``` | ||
g.V().hasLabel("airport").as("source").out("route").as("destination").select("source","destination").by(id()).limit(10) | ||
g.V().hasLabel("airport").as("source").out("route").as("destination").select("source","destination").by("code").limit(10) | ||
``` | ||
|
||
matches to a Glue table | ||
|
||
![](./assets/customquery-exampletable.png) | ||
|
||
Refer example scripts on how to create a table [here](./manual/sample-cli-script.sh) | ||
Columns are `source` and `destination`, which are the values returned by the Gremlin query above. | ||
|
||
> **NOTE** | ||
> | ||
> Custom query feature allows simple type (example int,long,string,dateime) projections as query output | ||
Run SQL queries against the Athena service to retrieve this property graph data. | ||
|
||
|
||
### Example query patterns | ||
|
||
##### project node properties | ||
The following query retrieves 100 airports. | ||
|
||
``` | ||
g.V().hasLabel("airport").valueMap("code","city","country").limit(10000) | ||
select * from "graph-database"."airport" | ||
LIMIT 100 | ||
``` | ||
|
||
##### project edge properties | ||
The following query retrieves 100 routes. | ||
|
||
``` | ||
g.E().hasLabel("route").valueMap("dist").limit(10000) | ||
select * from "graph-database"."route" | ||
LIMIT 100 | ||
``` | ||
|
||
##### n hop query with select clause | ||
The following query uses the custom view to get source-destination routes: | ||
|
||
``` | ||
g.V().hasLabel("airport").as("source").out("route").as("destination").select("source","destination").by("code").limit(10) | ||
``` | ||
|
||
##### n hop query with project clause | ||
select * from "graph-database"."customairport" | ||
LIMIT 100 | ||
``` | ||
g.V().hasLabel("airport").as("s").out("route").as("d").project("source","destination").by(select("s").id()).by(select("d").id()).limit(10) | ||
``` | ||
|
||
### Sample table post setup | ||
|
||
![](./assets/customtable.png) | ||
|
||
### Benefits | ||
|
||
Using custom query feature you can project output of a gremlin query directly. This helps to avoid the effort to write a lengthly sql query on the graph model. It also allows more control on how the table schema should be designed for analysis purpose. You can limit the number of records to retrieve in the gremlin query itself. | ||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.