ExaSpark is an extension of Apache Spark, which supports virtual tables. Furthermore, user-defined functions and user-defined aggregate functions are supported already from Apache Spark.
- Create a Java class with the name of the desired virtual table
- The constructor of this class should have the same arguments as the virtual table
- This java class should contain a mapReduce function, where users write the code for the functionality of the virtual table, create a view or table (better to be temporary) and then return its name.
Finally:
- This java class should be placed to madgik/exaSpark/vtFunctions
- mvn clean compile assembly:single, use this command so as to compile the maven project
- A jar file would be produced
- Run the .jar with the java -jar NameOfJar.jar
- A console should appear, so as to write sql queries
Example
There are some vtable functions in the path /madgik/exaSpark/vtFunctions so as to test the application or write your own based on them.
$ SELECT * FROM FOO(',','/pathOfFile.txt')
$ SELECT * FROM BOO(',',(SELECT * FROM FOO(',','/pathOfFile.txt')))
Apachelogsplit
Breaks a single apache log row into multiple fields.
$ select * from apachelogsplit('/path/of/access_log')
Sample
Returns a random sample_size set of rows.
$ select * from sample(HowMany,(select * from apachelogsplit('/path/of/access_log')))
- Improved console (auto-complete, command history, new design)
- ReservedWords.txt file contains reserved-sql words for auto-complete method
- "show virtual tables" command has been included
- ExaremeSparkSession (extension of SparkSession) has been included, so as to support sql queries with virtual tables without console
Rest api
Through our REST API a user is able to:
- submit queries
Settings A POST request is used to perform the functionality
- ExaSpark Rest API listens on port 9090 (can be configured from application.properties file
- Declare the Accept request HTTP header to:
- application/json (for json responses)
- text.csv (for csv responses)
- Every request should contain a form with the following value:
- query : the ExaDSpark query
Swagger UI
Visualization and interactaction with the API’s resources
Endpoints
- http://:9090/query/ : to perform a query
- http://:9090/swagger-ui.html : to visualize the API's resources