The ZetaSQL Toolkit is a library that helps users use ZetaSQL Java API to perform SQL analysis for multiple query engines.
This toolkit offers built-in support for:
- Building catalogs using BigQuery and Cloud Spanner resources. Supports tables, views, functions, table-valued functions and procedures. Connections and models coming soon.
- Analyzing queries and scripts using the BigQuery or Cloud Spanner feature sets.
- Analyzing scripts that perform DDL.
- Analyzing scripts that declare and use variables.
It also includes tooling to understand column-level lineage in analyzed queries.
When analyzing queries using BigQuery semantics, you need to:
- Create a
and add resources to it. TheBigQueryCatalog
supports tables, views, functions, table-valued functions and procedures. Connections and models coming soon. - Configure the ZetaSQL
using the BigQuery feature set. - Use
to perform analysis.
String query =
"INSERT INTO `bigquery-public-data.samples.wikipedia` (title) VALUES ('random title');\n"
+ "SELECT title, language FROM `bigquery-public-data.samples.wikipedia` WHERE title = 'random title';";
// Create a BigQueryCatalog
// By default, it will use the BigQuery API with application-default credentials
// to fetch BigQuery resources.
BigQueryCatalog catalog = new BigQueryCatalog(/*bqProjectId=*/"bigquery-public-data");
// Add resources to the catalog
// After a resource is added, it will be available when ZetaSQL perform analysis
// Configure the analyzer options using the BigQuery feature set
AnalyzerOptions options = new AnalyzerOptions();
// Use the ZetaSQLToolkitAnalyzer to run the analyzer
// It results an iterator over the resulting AnalyzedStatements
ZetaSQLToolkitAnalyzer analyzer = new ZetaSQLToolkitAnalyzer(options);
Iterator<AnalyzedStatement> statementIterator = analyzer.analyzeStatements(query, catalog);
// Use the resulting AnalyzedStatements
statementIterator.forEachRemaining(analyzedStatement -> {
| +-TableScan(table=bigquery-public-data.samples.wikipedia, ...)
| +-InsertRow
| +-value_list=
| +-DMLValue
| +-value=
| +-Literal(type=STRING, value=string_value: "random title")
| +-bigquery-public-data.samples.wikipedia.title#1 AS `title` [STRING]
| +-bigquery-public-data.samples.wikipedia.language#3 AS `language` [STRING]
+-column_list=bigquery-public-data.samples.wikipedia.[title#1, language#3]
| +-TableScan(table=bigquery-public-data.samples.wikipedia, ...)
+-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL)
+-ColumnRef(type=STRING, column=bigquery-public-data.samples.wikipedia.title#1)
+-Literal(type=STRING, value=string_value: "random title")
Similarly, when analyzing queries using Spanner semantics, you need to:
- Create a
and add resources to it. TheSpannerCatalog
supports tables and views. - Configure the ZetaSQL
using the Spanner feature set. - Use
to perform analysis
String query = "UPDATE MyTable SET column2 = 5 WHERE column1 = ''; SELECT * FROM MyTable;";
// Configure your Cloud Spanner project, instance and database
String spannerProjectId = "projectId";
String spannerInstanceName = "instanceName";
String spannerDatabaseName = "databaseName";
// Create your SpannerCatalog
// By default, it will use the Spanner database client with application-default
// credentials to fetch resources.
SpannerCatalog catalog = new SpannerCatalog(
spannerProjectId, spannerInstanceName, spannerDatabaseName
// Add your tables to the catalog
// After a resource is added, it will be available when ZetaSQL perform analysis
// Configure the analyzer options
AnalyzerOptions options = new AnalyzerOptions();
// Use the ZetaSQLToolkitAnalyzer to run the analyzer
// It results an iterator over the resulting AnalyzedStatements
ZetaSQLToolkitAnalyzer analyzer = new ZetaSQLToolkitAnalyzer(options);
Iterator<AnalyzedStatement> statementIterator = analyzer.analyzeStatements(query, catalog);
// Use the resulting AnalyzedStatements
statementIterator.forEachRemaining(analyzedStatement -> {
| +-TableScan(table=MyTable, column_list=MyTable.[column1#1, column2#2])
| +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL)
| +-ColumnRef(type=STRING, column=MyTable.column1#1)
| +-Literal(type=STRING, value=string_value: "")
| +-ColumnRef(type=INT64, column=MyTable.column2#2)
+-Literal(type=INT64, value=int64_value: 5)
| +-MyTable.column1#1 AS `column1` [STRING]
| +-MyTable.column2#2 AS `column2` [INT64]
+-column_list=MyTable.[column1#1, column2#2]
+-TableScan(table=MyTable, column_list=MyTable.[column1#1, column2#2])
See a list of comprehensive usage examples here.
This is not an officially supported Google product.