Performant creation of aggregated data #4

SvenLieber · 2021-07-15T17:12:45Z

To materialize aggregated information about collections, item-level Knowledge Graph elements need to be queried.
To this end, one n-triples file is created for every WARC file.

However, some WARC files may result in large n-triples files. Especially an initial crawl fetching the last 3,600 tweets from multiple accounts lead to at least 1 GB n-triples in our initial tests.

Similarly to how the KG generation is triggered by a warc_created message, a HDT compression could be triggered by the KG generation. Alternatively, the KG generation component could perform the HDT compression immediately.

The text was updated successfully, but these errors were encountered:

SvenLieber added this to the Materialize aggregated information milestone Jul 15, 2021

SvenLieber self-assigned this Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performant creation of aggregated data #4

Performant creation of aggregated data #4

SvenLieber commented Jul 15, 2021

Performant creation of aggregated data #4

Performant creation of aggregated data #4

Comments

SvenLieber commented Jul 15, 2021