Skip to content

Commit

Permalink
GH-39096: [Python] Release GIL in .nbytes (#39097)
Browse files Browse the repository at this point in the history
### Rationale for this change

The `.nbytes` holds the GIL while computing the data size in C++, which has caused performance issues in Dask because threads were blocking each other

See #39096

### Are these changes tested?

I am not sure if additional tests are necessary here. If so, I'm happy to add them but would welcome some pointers.

### Are there any user-facing changes?

No

* Closes: #39096

Authored-by: Hendrik Makait <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
  • Loading branch information
hendrikmakait authored Dec 7, 2023
1 parent f7286a9 commit 6e61c5e
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 8 deletions.
5 changes: 3 additions & 2 deletions python/pyarrow/array.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -1206,8 +1206,9 @@ cdef class Array(_PandasConvertible):
cdef:
CResult[int64_t] c_size_res

c_size_res = ReferencedBufferSize(deref(self.ap))
size = GetResultValue(c_size_res)
with nogil:
c_size_res = ReferencedBufferSize(deref(self.ap))
size = GetResultValue(c_size_res)
return size

def get_total_buffer_size(self):
Expand Down
15 changes: 9 additions & 6 deletions python/pyarrow/table.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -248,8 +248,9 @@ cdef class ChunkedArray(_PandasConvertible):
cdef:
CResult[int64_t] c_res_buffer

c_res_buffer = ReferencedBufferSize(deref(self.chunked_array))
size = GetResultValue(c_res_buffer)
with nogil:
c_res_buffer = ReferencedBufferSize(deref(self.chunked_array))
size = GetResultValue(c_res_buffer)
return size

def get_total_buffer_size(self):
Expand Down Expand Up @@ -2386,8 +2387,9 @@ cdef class RecordBatch(_Tabular):
cdef:
CResult[int64_t] c_res_buffer

c_res_buffer = ReferencedBufferSize(deref(self.batch))
size = GetResultValue(c_res_buffer)
with nogil:
c_res_buffer = ReferencedBufferSize(deref(self.batch))
size = GetResultValue(c_res_buffer)
return size

def get_total_buffer_size(self):
Expand Down Expand Up @@ -4337,8 +4339,9 @@ cdef class Table(_Tabular):
cdef:
CResult[int64_t] c_res_buffer

c_res_buffer = ReferencedBufferSize(deref(self.table))
size = GetResultValue(c_res_buffer)
with nogil:
c_res_buffer = ReferencedBufferSize(deref(self.table))
size = GetResultValue(c_res_buffer)
return size

def get_total_buffer_size(self):
Expand Down

0 comments on commit 6e61c5e

Please sign in to comment.