Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share code between RunBuilder/WriteBufferWriter and RunReader/WriteBufferReader #516

Open
wenkokke opened this issue Jan 3, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@wenkokke
Copy link
Collaborator

wenkokke commented Jan 3, 2025

There is a fair amount of duplication between RunBuilder and WriteBufferWriter as well as RunReader and WriteBufferReader. It would be good for maintainability to extract the shared functionality.

@jorisdral wrote in #478:

Regarding deduplication, there are two avenues I can think of.

Deduplicating WriteBufferReader and RunReader. And my thinking is that we could create a more general KOpsReader type that can be instantiated to be either a WriteBufferReader or RunReader. We should probably do this first, because it requires fewer changes than the next avenue

Deduplicating WriteBufferWriter and RunBuilder. My thinking is that we could construct this hierarchy:

                      /-------- WriteBufferBuilder
                     /
KOpsAcc <- KOpsBuilder <---------- RunBuilder
                            /
ProbeAcc <- ProbeBuilder <-/

RunAcc is split up into KOpsAcc and ProbeAcc. The former takes care of building pages for key/ops and blob files. The latter takes care of building the bloom filters and fence pointer indices. RunBuilder is split in two in the same way, doing the IO for the Accumulators. The new RunBuilder is the composition of a KOPsBuilder and a ProbeBuilder. The WriteBufferBuilder is just a small wrapper around a KOpsBuilder

@wenkokke wenkokke added the enhancement New feature or request label Jan 3, 2025
@jorisdral
Copy link
Collaborator

jorisdral commented Jan 17, 2025

I've been prototyping my suggestion a little bit today, and I'm not so sure it's going to work out nicely. It seems that modularising the components in the way I suggested above adds quite a bit of overhead in terms of lines of code, cognitive overhead, and performance. This overhead is arguably more sizeable than maintaining a separate write buffer reader/writers and run reader/writers. Moreover, having two separate code paths means that we can optimise better for the specific use cases

I do think we should still look into deduplicating the bits that we can, but the component refactoring I suggested above should probably be scrapped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants