-
Notifications
You must be signed in to change notification settings - Fork 88
Gradoop Data Importers
Data Importers can be used to create simple graphs from common data formats.
Data Importers are unlike Data Sources as the format does not have to be Gradoop-specific,
but they can be used like a data source (without a matching data sink).
The minimal JSON importer can be used to turn text files where every line is a JSON object into a graph.
A vertex will be created for every JSON object, the properties of the object will be added as properties to the new vertex. Every property is parsed as a string. The label of the vertex will be the same for every vertex.
For example the JSON object
{"Name": "Max", "Age": 28, "Address": {"Street": "Main Street", "City": "SomeCity", "ZIPCode": 12345}}
will be turned into a vertex with label JsonRowVertex
and properties Name
, Age
and Address
set to "Max"
, "28"
and "{\"Street\":\"Main Street\",\"City\":\"SomeCity\",\"ZIPCode\":12345}}"
respectively.
Array-type properties are supported, but every element of an array is assumed to be a string.
Paths to files can point to local (file://
) or distributed (hdfs://
) files.
DataSource importer = new MinimalJSONImporter("/path/to/jsonfile");
The MinimalCSVImporter can be used to create an EPGM instance from a CSV file of vertices that are not already in Gradoop format. Each line will imported as a vertex and each row will set as a property of this vertex. To set the name of each property it is possible to read the first line of the file and set it as property key or to pass a list of the property names to the constructor. The checkReoccurringHeader parameter specifies if each line of the file should be checked for reoccurring of the column property names. In case of reoccuring this line will be skipped.
Paths to files can point to local (file://
) or distributed (hdfs://
) files.
The delimiter attribute specifies which token delimiter should be used.
If each line of the file should be checked for the reoccurring of the header set the checkReoccurringHeaderFlag to true.
DataSource importVertexImporter = new MinimalCSVImporter("/path/to/csvfile", delimiter, gradoopFlinkConfig, checkReoccurringHeaderFlag);
DataSource importer = new MinimalCSVImporter("/path/to/csvfile", delimiter, gradoopFlinkConfig, listOfColumnNames, checkReoccurringHeaderFlag);
In case the file does not contain a header line the constructor need a list of the names of the column.
DataSource importVertexImporter = new MinimalCSVImporter("/path/to/csvfile", delimiter, gradoopFlinkConfig, charset, checkReoccurringHeaderFlag) ;
The default charset is UTF-8
.