updated README, switched to 0.1, #102

dbs-leipzig · Dec 10, 2015 · ec1bc69 · ec1bc69
1 parent 2739bb7
commit ec1bc69
Show file tree

Hide file tree

Showing 7 changed files with 21 additions and 135 deletions.
diff --git a/README.md b/README.md
@@ -10,34 +10,19 @@ extends the widespread
 by the concept of logical graphs and further provides operators that can be applied 
 on single logical graphs and collections of logical graphs. The combination of these 
 operators allows the flexible, declarative definition of graph analytical workflows.
-Gradoop can be easily integrated in a workflows which already use Flink operators
+Gradoop can be easily integrated in a workflow which already uses Flink operators
 and Flink libraries (i.e. Gelly, ML and Table).
 
-```java
-// load social network from hdfs
-LogicalGraph db = EPGMDatabase.fromJsonFile("hdfs://...").getDatabaseGraph();
-// detect communities
-GraphCollection communities = db.callForCollection(new LabelPropagation(...));
-// filter large communities
-GraphCollection communities = communities.select((LogicalGraph g) -> g.vertexCount() > 100);
-// combine them to a single graph
-LogicalGraph relevantSubgraph = communities.reduce((LogicalGraph g1, LogicalGraph g2) -> g1.combine(g2));
-// summarize the network based on the city users live in
-LogicalGraph summarizedGraph = relevantSubgraph.summarize("city");
-// write back to HDFS
-summarizedGraph.writeAsJson("hdfs://...");
-```
-
 Gradoop is **work in progress** which means APIs may change. It is currently used
 as a proof of concept implementation and far from production ready.
 
 ## Data Model
 
 In the extended property graph model (EPGM), a database consists of multiple 
-property graphs which are called logical graphs. These graphs are
-application-specific subsets from shared sets of vertices and edges, i.e. may
-have common vertices and edges. Additionally, not only vertices and edges but
-also logical graphs have a type label and can have different properties.
+property graphs which are called logical graphs. These graphs describe
+application-specific subsets of vertices and edges, i.e. a vertex or an edge can
+be contained in multiple logical graphs. Additionally, not only vertices and edges 
+but also logical graphs have a type label and can have different properties.
 
 Data Model elements (logical graphs, vertices and edges) have a unique identifier, 
 a single label (e.g. User) and a number of key-value properties (e.g. name = Alice).
@@ -46,20 +31,20 @@ properties even if they have the same label.
 
 ### Graph operators
 
-The EPGM provides operators for both single graphs as well as collections of
-graphs; operators may also return single graphs or graph collections.
+The EPGM provides operators for both single logical graphs as well as collections 
+of logical graphs; operators may also return single graphs or graph collections.
 
-The following table contains an overview (GC = GraphCollection, G = Graph).
+The following table contains an overview (GC = Graph Collection, G = Logical Graph).
 
 | Operator      | In      | Out           | Output description                                                      | Impl |
 |:--------------|:--------|:--------------|:------------------------------------------------------------------------|:----:|
 | Selection     | GC      | GC            | Filter graphs based on their attached data (i.e. label, properties)     | Yes  |
-| Distinct      | GC      | GC            | No duplicate graphs                                                     | No   |
-| SortBy        | GC      | GC            | Graphs sorted by a given graph property key                             | No   |
+| Distinct      | GC      | GC            | Collection with no duplicate graphs                                     | No   |
+| SortBy        | GC      | GC            | Collection sorted by values of a given property key                     | No   |
 | Top           | GC      | GC            | The first n elements of the input collection                            | No   |
 | Union         | GC x GC | GC            | All graphs from both input collections                                  | Yes  |
 | Intersection  | GC x GC | GC            | Only graphs that exist in both collections                              | Yes  |
-| Difference    | GC x GC | GC            | Only graphIds that exist in one collection                              | Yes  |
+| Difference    | GC x GC | GC            | Only graphs that exist only in the first collection                     | Yes  |
 | Equality      | GC x GC | {true, false} | Compare collections in terms of contained element data or identifiers   | Yes  |
 | Combination   | G x G   | G             | Graph with vertices and edges from both input graphs                    | Yes  |
 | Overlap       | G x G   | G             | Graph with vertices and edges that exist in both input graphs           | Yes  |
@@ -70,7 +55,7 @@ The following table contains an overview (GC = GraphCollection, G = Graph).
 | Projection    | G       | G             | Graph with projected vertex and edge sets                               | Yes  |
 | Summarization | G       | G             | Structural condense of the input graph                                  | Yes  |
 | Apply         | GC      | GC            | Applies operator to each graph in collection                            | No   |
-| Reduce        | GC      | G             | Reduces collection to graph using operator                              | Yes  |
+| Reduce        | GC      | G             | Reduces collection to graph using binary operator (e.g. combine)        | Yes  |
 
 ## Setup
 
@@ -85,113 +70,14 @@ The following table contains an overview (GC = GraphCollection, G = Graph).
     > cd gradoop
     
     > mvn clean install
-
-### Load data into gradoop
-
-#### JSON
-
-Gradoop supports JSON as input format for vertexIds, edgeIds and graphIds. Besides the
-unique id, each JSON document stores the properties of the specific entity in an 
-embedded document `data`. Meta information, like the obligatory label, is stored
-in a second embedded document `meta`. The meta document of vertexIds and edgeIds may
-contain a mapping to the logical graphIds they are contained in.
-
-Two persons (Alice and Bob) that have three properties each and are contained in 
-two logical graphIds (`"graphIds":[0,2]`).
-```
-// content of hdfs:///input/nodes.json
-{"id":0,"data":{"gender":"f","city":"Leipzig","name":"Alice"},"meta":{"label":"Person","graphIds":[0,2]}}
-{"id":1,"data":{"gender":"m","city":"Leipzig","name":"Bob"},"meta":{"label":"Person","graphIds":[0,2]}}
-```
-
-Edges are represented in a similar way. Alice and Bob are connected by an edge
-(knows). Edges may have properties (e.g., `"since":2014`) and may also be contained 
-in logical graphIds. Additionally, edge JSON documents store the obligatory source and
-target vertex identifier.
-
-```
-// content of hdfs:///input/edgeIds.json
-{"id":0,"source":0,"target":1,"data":{"since":2014},"meta":{"label":"knows","graphIds":[0,2]}}
-```
-
-Graphs may also have properties and must have a label (e.g., Community).
-
-```
-// content of hdfs:///input/graphIds.json
-{"id":0,"data":{"interest":"Databases","vertexCount":3},"meta":{"label":"Community"}}
-{"id":1,"data":{"interest":"Hadoop","vertexCount":3},"meta":{"label":"Community"}}
-{"id":2,"data":{"interest":"Graphs","vertexCount":4},"meta":{"label":"Community"}}
-```
-
-#### HBase
-
-Gradoop can read and write an EPGM database from HBase using an EPGM store. The
-current implementation is work in progress, at the moment one can read or write
-the whole database. We are working on reading only data that is needed for the
-analysis (e.g., a collection of specific communities).
-
-The following example shows how to create an EPGM Store and how to write an EPGM
-database to it.
-
-```java
-EPGMDataBase epgmDB = EPGMDatabase.fromJsonFile(...);
-
-// do some fancy analysis ...
-
-EPGMStore epgmStore = HBaseEPGMStore.createOrOpenEPGMStore(vertexTable, edgeTable, graphHeadTable);
-epgmDB.writeToHBase(epgmStore);
-epgmStore.close();
-```
-
-You can now read the database from HBase.
-
-```java
-EPGMStore epgmStore = HBaseEPGMStore.createOrOpenEPGMStore(vertexTable, edgeTable, graphHeadTable);
-EPGMDatabase epgmDB = EPGMDatabase.fromHBase(epgmStore);
-
-// do some fancy analysis ...
-
-epgmStore.close()
-```
-
-### Example: Extract schema graph from possibly large-scale graph
-
-In this example, we use the `summarize` operator to create a condensed version 
-of our input graph. By summarizing on vertex and edge labels, we compute the schema
-of our graph. Each vertex in the resulting graph represents all vertexIds with the
-same label (e.g., Person or Group), each edge represents all edgeIds with the same
-label that connect vertexIds from the same vertex groups.
-
-```java
-String vertexInputPath = "hdfs:///input/nodes.json";
-String edgeInputPath = "hdfs:///input/edgeIds.json";
-String graphInputPath = "hdfs:///input/graphIds.json";
-EPGMDatabase db = EPGMDatabase.fromJsonFile(vertexInputPath, edgeInputPath, graphInputPath, env);
-LogicalGraph schemaGraph = db.getDatabaseGraph().summarizeOnVertexAndEdgeLabels();
-schemaGraph.writeAsJson(vertexOutputPath, edgeOutputPath, graphOutputPath);
-```
-
-### Cluster deployment
-
-If you want to execute Gradoop on a cluster, you need *Hadoop 2.5.1* and 
-*Flink 0.9.0 for Hadoop 2.4.1* installed and running.
-
-* start a flink yarn-session (e.g. 5 Task managers with 4GB RAM and 4 processing slots each) 
-
-> ./bin/yarn-session.sh -n 5 -tm 4096 -s 4
-
-* run your program (e.g. the included Summarization example)
-
-> ./bin/flink run -c org.gradoop.examples.SummarizationExample ~/gradoop-flink-0.0.2-jar-with-dependencies.jar --vertex-input-path hdfs:///nodes.json --edge-input-path hdfs://edgeIds.json --use-vertex-labels --use-edge-labels
     
 ## Gradoop modules
 
 ### gradoop-core
 
-The main contents of that module are the Extended Property Graph Data
-Model and a corresponding POJO implementation which is used in Flink. The
-persistent representation of the EPGM is also contained in core and its mapping
-to Apache HBase.
+The main contents of that module are the EPGM data model and a corresponding POJO 
+implementation which is used in Flink. The persistent representation of the EPGM
+is also contained in gradoop-core and together with its mapping to Apache HBase.
 
 ### gradoop-flink
 

diff --git a/gradoop-algorithms/pom.xml b/gradoop-algorithms/pom.xml
@@ -5,7 +5,7 @@
     <parent>
         <artifactId>gradoop</artifactId>
         <groupId>de.uni-leipzig.de.dbs</groupId>
-        <version>0.0.3-SNAPSHOT</version>
+        <version>0.1</version>
     </parent>
     <modelVersion>4.0.0</modelVersion>
 

diff --git a/gradoop-checkstyle/pom.xml b/gradoop-checkstyle/pom.xml
@@ -7,7 +7,7 @@
     <parent>
         <artifactId>gradoop</artifactId>
         <groupId>de.uni-leipzig.de.dbs</groupId>
-        <version>0.0.3-SNAPSHOT</version>
+        <version>0.1</version>
         <relativePath>..</relativePath>
     </parent>
 

diff --git a/gradoop-core/pom.xml b/gradoop-core/pom.xml
@@ -7,7 +7,7 @@
     <parent>
         <artifactId>gradoop</artifactId>
         <groupId>de.uni-leipzig.de.dbs</groupId>
-        <version>0.0.3-SNAPSHOT</version>
+        <version>0.1</version>
     </parent>
 
     <artifactId>gradoop-core</artifactId>

diff --git a/gradoop-examples/pom.xml b/gradoop-examples/pom.xml
@@ -7,7 +7,7 @@
     <parent>
         <artifactId>gradoop</artifactId>
         <groupId>de.uni-leipzig.de.dbs</groupId>
-        <version>0.0.3-SNAPSHOT</version>
+        <version>0.1</version>
     </parent>
 
     <artifactId>gradoop-examples</artifactId>

diff --git a/gradoop-flink/pom.xml b/gradoop-flink/pom.xml
@@ -7,7 +7,7 @@
     <parent>
         <artifactId>gradoop</artifactId>
         <groupId>de.uni-leipzig.de.dbs</groupId>
-        <version>0.0.3-SNAPSHOT</version>
+        <version>0.1</version>
     </parent>
 
     <artifactId>gradoop-flink</artifactId>

diff --git a/pom.xml b/pom.xml
@@ -7,7 +7,7 @@
     <groupId>de.uni-leipzig.de.dbs</groupId>
     <artifactId>gradoop</artifactId>
     <packaging>pom</packaging>
-    <version>0.0.3-SNAPSHOT</version>
+    <version>0.1</version>
 
     <name>Gradoop Parent</name>
     <url>http://www.gradoop.org</url>