Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performant creation of aggregated data #4

Open
SvenLieber opened this issue Jul 15, 2021 · 0 comments
Open

Performant creation of aggregated data #4

SvenLieber opened this issue Jul 15, 2021 · 0 comments
Assignees

Comments

@SvenLieber
Copy link
Contributor

To materialize aggregated information about collections, item-level Knowledge Graph elements need to be queried.
To this end, one n-triples file is created for every WARC file.

However, some WARC files may result in large n-triples files. Especially an initial crawl fetching the last 3,600 tweets from multiple accounts lead to at least 1 GB n-triples in our initial tests.

Similarly to how the KG generation is triggered by a warc_created message, a HDT compression could be triggered by the KG generation. Alternatively, the KG generation component could perform the HDT compression immediately.

@SvenLieber SvenLieber self-assigned this Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant