Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support batching/chunking #3

Open
dachafra opened this issue Oct 21, 2020 · 1 comment
Open

Support batching/chunking #3

dachafra opened this issue Oct 21, 2020 · 1 comment

Comments

@dachafra
Copy link
Member

issue: Have you tested your tool on 10M rows? Do you build some triples model in memory before output? That's not scalable.

suggestion:

  1. The tool must support streaming, i.e. output each triples block when ready, not gathering triples in memory and dumping them at the very end
  2. Further, using intermediate storage (eg rdf4j In-memory or Native) is slow (eg 30x slower than direct output)
@VladimirAlexiev VladimirAlexiev changed the title Support Streaming Support batching/chunking Oct 23, 2020
@dachafra
Copy link
Member Author

dachafra commented Mar 2, 2022

This is already available in tools such as Morph-KGC, RMLStreamer, or SDM-RDFizer. I think it's out of the scope of the community group atm. If there are no concerns I'll close it...

@dachafra dachafra transferred this issue from kg-construct/mapping-challenges Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant