-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multithreaded operators #78
base: main
Are you sure you want to change the base?
Add multithreaded operators #78
Conversation
nit: Is formatting correct? Maybe we could run |
@@ -0,0 +1,18 @@ | |||
-- Create a temporary table for testing | |||
CREATE TEMPORARY TABLE test_table ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you are testing heap table, but not columnstore table here
- no need to be a temp table
- primary key and auto increment column impacts parallelism
|
||
select * from test_table; | ||
-- Drop the temporary table | ||
DROP TABLE test_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new line after each file
} | ||
return SinkResultType::NEED_MORE_INPUT; | ||
} | ||
|
||
SinkCombineResultType Combine(ExecutionContext &context, OperatorSinkCombineInput &input) const override { | ||
auto &gstate = input.global_state.Cast<ColumnstoreDeleteGlobalState>(); | ||
auto &lstate_delete = input.local_state.Cast<ColumnstoreDeleteLocalState>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just lstate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
SinkCombineResultType Combine(ExecutionContext &context, OperatorSinkCombineInput &input) const override { | ||
auto &gstate = input.global_state.Cast<ColumnstoreDeleteGlobalState>(); | ||
auto &lstate_delete = input.local_state.Cast<ColumnstoreDeleteLocalState>(); | ||
gstate.row_ids.insert(lstate_delete.local_row_ids.begin(), lstate_delete.local_row_ids.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need lock on gstate to ensure thread-safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ofc, I misinterpreted Combine to be thread safe from the documentation, but multiple combines can run concurrently, added a lock
@@ -21,6 +21,11 @@ class ColumnstoreDeleteGlobalState : public GlobalSinkState { | |||
ColumnDataCollection return_collection; | |||
}; | |||
|
|||
class ColumnstoreDeleteLocalState : public LocalSinkState { | |||
public: | |||
unordered_set<row_t> local_row_ids; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i would just name it row_ids
since the meaning is clear from the context, e.g. lstate.row_ids
vs gstate.row_ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -101,5 +121,4 @@ unique_ptr<PhysicalOperator> Columnstore::PlanDelete(ClientContext &context, Log | |||
del->children.push_back(std::move(plan)); | |||
return std::move(del); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add back the new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
bool IsSink() const override { | ||
return true; | ||
} | ||
|
||
bool ParallelSink() const override { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that DuckDB doesn't always parallelize its PhysicalInsert
(See DuckCatalog::PlanInsert
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked if the plan supports parallelism and if number of threads > 1, as done in PhysicalInsert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuckDB also doesn't parallelize PhysicalInsert
when there's RETURNING
: executor(context, bound_defaults), insert_count(0), return_collection(context, types) { | ||
: executor(context, bound_defaults), insert_count(0), return_collection(context, types) {} | ||
|
||
ExpressionExecutor executor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not thread-safe to put in global state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, replicated what PhysicalInsert was doing with having this in the local state
} | ||
} | ||
if (return_chunk) { | ||
gstate.return_collection.Append(gstate.chunk); | ||
lstate.return_collection.Append(lstate.chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuckDB directly writes to gstate.return_collecion
. It appears that Append is thread-safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if Append is thread safe or not, atleast the documentation doesn't mention that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, ColumnDataCollection::Append()
is not thread-safe
I mis-read PhysicalInsert::Sink()
that gstate.return_collection.Append()
is only used under !parallel
branch
You also need to parallelize |
Fixes #72