- Old
lakehouse
catalog is nowdelta
- And
iceberg
also exists with this deployment.
- Docker Desktop (with k8s enabled)
- make
In this workshop I provide to you a setup for a MinIO with a Metastore (utilising postgres internally) to successfully query delta tables via trino.
If you are using M1 chip:
make build-image COMPONENT=metastore DOCKER_BUILD="buildx build --platform linux/amd64" CONTEXT="tools/docker"
Otherwise
make build-image COMPONENT=metastore CONTEXT="tools/docker"
make deploy-local-metastore ENV=local
To delete
make delete-local-metastore ENV=local
For Hive
, Postgres
, MinIO
we choose to use StatefulSet
as the deployment type over Deployment
.
A StatefulSet should be chosen for stateful components that require predictable Pod identities, ordered rollouts, and stable storage access.
Deployment should be chosen instead when your service is stateless, doesn't require persistent storage, and won't be affected by changes to Pod identities.
First add the trino helm repo!
helm repo add trino https://trinodb.github.io/charts
Make sure you use version >= 0.21.0
helm search repo trino
Should give:
NAME CHART VERSION APP VERSION DESCRIPTION
trino/trino 0.21.0 448 Fast distributed SQL query engine for big data ...
To get latest chart versions
helm repo update
NAME CHART VERSION APP VERSION DESCRIPTION
trino/trino 1.36.0 468 Fast distributed SQL query engine for big data ...
We use default values for the deployment, but please note section:
catalogs:
iceberg: |-
connector.name=iceberg
hive.metastore.uri=thrift://hive-service.metastore:9083
hive.s3.endpoint=http://minio-service.metastore:9000
hive.s3.path-style-access=true
hive.s3.aws-access-key=minio
hive.s3.aws-secret-key=minio123
iceberg.file-format=parquet
delta: |-
connector.name=delta_lake
hive.metastore.uri=thrift://hive-service.metastore:9083
hive.s3.endpoint=http://minio-service.metastore:9000
hive.s3.path-style-access=true
hive.s3.aws-access-key=minio
hive.s3.aws-secret-key=minio123
delta.enable-non-concurrent-writes=true
rdbms: |-
connector.name=postgresql
connection-url=jdbc:postgresql://postgres-service.metastore:5432/hivemetastore?allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC
connection-user=metastore
connection-password=password
This will ensure we can query our tables from our metastore.
make deploy-local-trino NAMESPACE=trino
And to delete
make delete-local-trino NAMESPACE=trino
The trino cluster is now up and running!
We will have to execute ourselves into the coordinator
and create a table.
kubectl get pods -n trino
kubectl exec -it datapains-trino-cluster-coordinator-747bc6ccff-b6tx2 -n trino -- /bin/bash
[trino@datapains-trino-cluster-coordinator-747bc6ccff-b6tx2 /]$ trino
trino> SHOW CATALOGS;
Catalog
-----------
delta
iceberg
system
tpcds
tpch
(4 rows)
Query 20240526_120408_00000_kyyrh, FINISHED, 2 nodes
Splits: 20 total, 20 done (100.00%)
2.57 [0 rows, 0B] [0 rows/s, 0B/s]
trino> CREATE SCHEMA delta.bronze;
trino> CREATE TABLE delta.bronze.products ( name VARCHAR, price DOUBLE, product_no BIGINT, ingest_date DATE, created TIMESTAMP WITH TIME ZONE);
trino> SELECT * FROM delta.bronze.products;
trino> INSERT INTO delta.bronze.products VALUES (CAST('Apple' AS VARCHAR), CAST(889.0 AS DOUBLE), CAST(1654523786273 AS BIGINT), CURRENT_DATE, CURRENT_TIMESTAMP);
trino> SELECT * FROM delta.bronze.products;
trino> CREATE SCHEMA iceberg.bronze;
trino> CREATE TABLE iceberg.bronze.products ( name VARCHAR, price DOUBLE, product_no BIGINT, ingest_date DATE, created TIMESTAMP WITH TIME ZONE);
trino> SELECT * FROM iceberg.bronze.products;
trino> INSERT INTO iceberg.bronze.products VALUES (CAST('Apple' AS VARCHAR), CAST(889.0 AS DOUBLE), CAST(1654523786273 AS BIGINT), CURRENT_DATE, CURRENT_TIMESTAMP);
trino> SELECT * FROM iceberg.bronze.products;