This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.
Run git submodule update --init
to init test files.
To run an example, use the cargo run
command, such as:
git clone https://github.com/apache/datafusion
cd datafusion
# Download test data
git submodule update --init
# Change to the examples directory
cd datafusion-examples/examples
# Run the `dataframe` example:
# ... use the equivalent for other examples
cargo run --example dataframe
advanced_udaf.rs
: Define and invoke a more complicated User Defined Aggregate Function (UDAF)advanced_udf.rs
: Define and invoke a more complicated User Defined Scalar Function (UDF)advanced_udwf.rs
: Define and invoke a more complicated User Defined Window Function (UDWF)advanced_parquet_index.rs
: Creates a detailed secondary index that covers the contents of several parquet filesanalyzer_rule.rs
: Use a custom AnalyzerRule to change a query's semantics (row level access control)catalog.rs
: Register the table into a custom catalogcomposed_extension_codec
: Example of using multiple extension codecs for serialization / deserializationcsv_sql_streaming.rs
: Build and run a streaming query plan from a SQL statement against a local CSV filecustom_datasource.rs
: Run queries against a custom datasource (TableProvider)custom_file_format.rs
: Write data to a custom file formatdataframe-to-s3.rs
: Run a query using a DataFrame against a parquet file from s3 and writing back to s3dataframe.rs
: Run a query using a DataFrame API against parquet files, csv files, and in-memory datadataframe_output.rs
: Examples of methods which write data out from a DataFramedeserialize_to_struct.rs
: Convert query results into rust structs using serdeexpr_api.rs
: Create, execute, simplify, analyze and coerceExpr
sfile_stream_provider.rs
: Run a query onFileStreamProvider
which implementsStreamProvider
for reading and writing to arbitrary stream sources / sinks.flight_sql_server.rs
: Run DataFusion as a standalone process and execute SQL queries from JDBC clientsfunction_factory.rs
: RegisterCREATE FUNCTION
handler to implement SQL macrosmake_date.rs
: Examples of using the make_date functionmemtable.rs
: Create an query data in memory using SQL andRecordBatch
esoptimizer_rule.rs
: Use a custom OptimizerRule to replace certain predicatesparquet_index.rs
: Create an secondary index over several parquet files and use it to speed up queriesparquet_sql_multiple_files.rs
: Build and run a query plan from a SQL statement against multiple local Parquet filesparquet_exec_visitor.rs
: Extract statistics by visiting an ExecutionPlan after executionparse_sql_expr.rs
: Parse SQL text into DataFusionExpr
.plan_to_sql.rs
: Generate SQL from DataFusionExpr
andLogicalPlan
- `planner_api.rs: APIs to manipulate logical and physical plans
pruning.rs
: Use pruning to rule out files based on statisticsquery-aws-s3.rs
: Configureobject_store
and run a query against files stored in AWS S3query-http-csv.rs
: Configureobject_store
and run a query against files vi HTTPregexp.rs
: Examples of using regular expression functionssimple_udaf.rs
: Define and invoke a User Defined Aggregate Function (UDAF)simple_udf.rs
: Define and invoke a User Defined Scalar Function (UDF)simple_udfw.rs
: Define and invoke a User Defined Window Function (UDWF)sql_analysis.rs
: Analyse SQL queries with DataFusion structuressql_frontend.rs
: Create LogicalPlans (only) from sql stringssql_dialect.rs
: Example of implementing a custom SQL dialect on top ofDFParser
to_char.rs
: Examples of using the to_char functionto_timestamp.rs
: Examples of using to_timestamp functions
flight_client.rs
andflight_server.rs
: Run DataFusion as a standalone process and execute SQL queries from a client using the Flight protocol.