This project englobes all the necessary steps to transform a simple CSV file into Linked Data in order to take leverage of all the Linked Data that is available on the web. We started with a simple CSV about energy consumption of public buildings in Madrid and transformed it into a RDF graph that is consumed by a simple application that showcases the power of Linked Data.
As mentioned before, a dataset containing information on the energy consumption of each public building in Madrid was selected. The source dataset can be found at this website under a public license.
The data were analyzed to identify the format of the data and the types of each column.
OpenRefine was used for data cleaning, identifying and correcting inconsistencies, missing or erroneous data.
Before defining the mapping process, a reconciliation step with Wikidata entities was performed. This step enhanced the value of the linked data by aligning our dataset with standardized Wikidata entities and having direct access from our RDF graph to the vast Linked Data available from WikiData.
A lightweight ontology specific to the data domain was created, defining relationships and characteristics of the dataset elements (such as buildings, districts, neighbourhoods, measures and sensors).
YARRRML (Yet Another RDF Rules Mapping Language) was used to define the mapping rules for our dataset. We used Matey, a tool that allows transforming YARRRML files into rml.io files. The rml.io file was then used to generate the RDF graph using morph, a powerful tool to transform regular data into Linked Data using a mapping definition.
The knowledge graph was published on an endpoint using Helio Publisher. This allowed us to make queries to retrieve the data from our graph and then use it in our application.
Enabled the ability to perform SPARQL queries on the knowledge graph, allowing for information extraction from our RDF graph but also the Wikidata graph.
To ensure the consistency of our Knowledge Graph (KG), we thought about validating it using a set of SHACL rules. To keep it simple, we used a tool named Astrea that auto-generates some basic SHACL rules from our ontology. We then validated our KG before the last step.
After all these steps, we developed a simple React.js application to test the quality and power of our KG.
The application lists all the neighbourhoods and districts of Madrid in the main page. The user can select any of these locations and access some statistics about the consumption of energy of all the buildings in that location; the population of that location, an interactive map of the location and a simple image that shows where it is located inside Madrid.
All the data that is obtained from our KG and from the links with Wikidata that we defined in the reconciliation section.
In order to make the app work, here are some guidelines:
- Install dependencies of the app:
cd app
npm install
-
Publish the RDF file: You will need to publish the
rdf/result-with-links.ttl
in your host machine under thehttp://localhost:9000/api/sparql
endpoint so the app can acces the data. You can use Helio Publisher to do so. -
Start the app:
cd app
npm run dev
Note: Due to browsers CORS policy, the access to the data will be blocked (endpoint and app have the same hostname). In order to test it, it is not possible to configure CORS in Helio, so you would need to disable CORS in the browser tab where you accesed the app. (ONLY FOR TESTING)
We'd also like to thanks the OEG for teaching and guiding us on this project.
This work is licensed under Attribution 4.0 International