-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION]: Unified approach for joins to output batches close to batch_size
#14238
Comments
The direction proposed by @berkaysynnada is worth to discuss. The join specifics doesn't guarantee output batch size in records. It can much much smaller or even empty because of filtering, and it can be much larger because of join explosions. The idea to discuss how we can make the output batches after joins to be more uniform and close to configured One of the options is to use |
I'd suggest to rename "splitting" part of the problem to "restricting" -- if join is able to produce a batch that needs to be splitted (event if this batch exists only internally), than it already may be issue, which may hurt on some specific cases. I also think that In this case (for splitting / restricting), I think, what @berkaysynnada suggests:
is a better fit than separate operators on top of join -- each join operator should by itself be able to limit / restrict its internally created record batches to prevent excessive accumulation of data in memory (or at least, if it's required, to track them via memory reservations). |
thanks @korowa totally agree for the memory perspective, having splitter won't help as the memory already allocated for the batch. However another path related to coalesce might help downstream nodes or direct consumer not to struggle because of swarm of small batches. More uniform method for all joins is to call |
I don't have a strong opinion here -- intuitively it seems like embedding coalescer into filtering operators (not only joins) could be beneficial for query execution time just because there will be less operators in the pipeline, but it still should be checked and somehow measured. I'll try to come up with some POC during this weekend for coalescer in e.g. FilterExec (this one just seems to be the easiest to implement) -- the idea is that if it'll work well for filters, than joins would also benefit from it, otherwise -- having separate operator would make more sense (at least for now). |
Simple embedding of coalescer into filter (branch commit) gave following tpch results: iterations = 50, target_partitions = 1
iterations = 50, target_partitions = 4
I tend to interpret these numbers as no difference between the two approaches so I don't see (yet) a rationale for embedding coalescer into the operators, and it seems to be fine to leave it as it is. |
As a further step, I'd like to also try it for HashJoin because of this point
Now hash join produces batches with less than or equal to batch_size rows, and in cases when the number of output rows is greater than the number of input rows (some non-unique keys on build side) it may end up with a sequence of batches sized as
where small batches are the result of key non-uinqueness. This sequence won't be compacted by CoalesceBatchesExec, due to full-sized batches, but having internal join buffer for the output may help with better output batch size alignment -> less number of batches. |
On the other hand,
BatchSplitter
is used in other joins, and SMJ could (should) have it too, as there is no other way of splitting the batches according to target batch size.I've thought about this, and I believe the most optimal solution is to make all join operators capable of performing both coalescing and splitting in a built-in manner. This is because the output of a join can either be smaller or larger than the target batch size. Ideally, there should be no need (or only minimal need) for CoalesceBatchesExec.
To achieve this built-in coalescing and splitting, we can leverage existing tools like BatchSplitter and BatchCoalescer (although there are no current examples of BatchCoalescer being used in joins). My suggestion is to generalize these tools so they can be utilized by any operator and applied wherever this mechanism is needed. As this pattern becomes more common, it will be easier to expand its usage and simplify its application.
Originally posted by @berkaysynnada in #14160 (comment)
The text was updated successfully, but these errors were encountered: