Possible deadlock in tensorstore #206

stuarteberg · 2024-11-28T16:09:55Z

I've been encountering deadlocks when using tensorstore. I'm posting this issue somewhat reluctantly because I'm not 100% that tensorstore is to blame. If you have any thoughts or comments, let me know.

(BTW, I am using Linux, python 3.12.6, and tensorstore 0.1.67. I see that the current version is 0.1.69, so I'll try upgrading.)

In my particular use-case, I'm exporting a large array from a bespoke database into a sharded precomputed volume. I'm using a cluster, but I'm careful to make sure that my workers' tasks are aligned to the shard shape. In addition to writing the shards, occasionally I do have to read from the volume.

After running for a few hours, my code deadlocked. After inspecting all thread stacks for all Python workers, only one appeared problematic: it was stuck in a tensorstore function. (All other threads were just waiting in their base worker eventloop, waiting for new tasks.)

The particular line of code it was stuck on is shown below, which in which it happens to be reading:

vol_xyzc = store[my_slicing].read(order='F').result()

...where store had been previously initialized via:

store = ts.open(spec, read=True, write=create, context=ts.Context(context)).result()

...using the following spec and context configuration:

context and spec

context:
  cache_pool:
    total_bytes_limit: 0
  data_copy_concurrency:
    limit: 1
  file_io_concurrency:
    limit: 1

spec:
  driver: neuroglancer_precomputed
  create: true
  open: true
  recheck_cached_metadata: false
  recheck_cached_data: false
  kvstore:
    driver: file
    path: /nrs/flyem/malecns/malecns-seg-2024-11-26-5bb30a-16nm
  multiscale_metadata:
    type: segmentation
    data_type: uint64
    num_channels: 1
  scale_metadata:
    chunk_size: [64, 64, 64]
    encoding: compressed_segmentation
    compressed_segmentation_block_size: [8, 8, 8]
    resolution: [16, 16, 16]
    voxel_offset: [0, 0, 0]
    size: [47044, 38624, 67296]
    sharding:
      "@type": neuroglancer_uint64_sharded_v1
      data_encoding: gzip
      hash: identity
      preshift_bits: 6
      minishard_bits: 3
      minishard_index_encoding: gzip
      shard_bits: 21

To see if I could drill down a bit more, I attached to the running process with gdb and obtained the backtrace for the relevant thread, shown below. This seems to indicate that it's stuck in GetResult(), but I can't say much more than that.

gdb backtrace

#0  0x000014d45669c39a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x000014d4566a7838 in __new_sem_wait_slow64.constprop.0 () from /lib64/libc.so.6
#2  0x000014d41d87a4e8 in tensorstore::internal_python::InterruptibleWaitImpl(tensorstore::internal_future::FutureStateBase&, absl::lts_20240722::Time, tensorstore::internal_python::PythonFutureObject*) ()
   from /groups/flyem/proj/cluster/miniforge/envs/flyem-312/lib/python3.12/site-packages/tensorstore/_tensorstore.cpython-312-x86_64-linux-gnu.so
#3  0x000014d41d87a564 in tensorstore::internal_python::PythonFutureObject::GetResult(absl::lts_20240722::Time) ()
   from /groups/flyem/proj/cluster/miniforge/envs/flyem-312/lib/python3.12/site-packages/tensorstore/_tensorstore.cpython-312-x86_64-linux-gnu.so
#4  0x000014d41d880053 in pybind11::cpp_function::initialize<tensorstore::internal_python::(anonymous namespace)::DefineFutureAttributes(pybind11::class_<tensorstore::internal_python::PythonFutureObject>&)::{lambda(tensorstore::internal_python::PythonFutureObject&, std::optional<double>, std::optional<double>)#5}, pybind11::object, tensorstore::internal_python::PythonFutureObject&, std::optional<double>, std::optional<double>, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg_v, pybind11::arg_v, char [603]>(tensorstore::internal_python::(anonymous namespace)::DefineFutureAttributes(pybind11::class_<tensorstore::internal_python::PythonFutureObject>&)::{lambda(tensorstore::internal_python::PythonFutureObject&, std::optional<double>, std::optional<double>)#5}&&, pybind11::object (*)(tensorstore::internal_python::PythonFutureObject&, std::optional<double>, std::optional<double>), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg_v const&, pybind11::arg_v const&, char const (&) [603])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /groups/flyem/proj/cluster/miniforge/envs/flyem-312/lib/python3.12/site-packages/tensorstore/_tensorstore.cpython-312-x86_64-linux-gnu.so
#5  0x000014d41d6da4da in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
   from /groups/flyem/proj/cluster/miniforge/envs/flyem-312/lib/python3.12/site-packages/tensorstore/_tensorstore.cpython-312-x86_64-linux-gnu.so
#6  0x000055c545334928 in cfunction_call (func=0x14d41f9fff10, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/methodobject.c:537
#7  0x000055c545314c03 in _PyObject_MakeTpCall (tstate=0x55c54af17f80, callable=0x14d41f9fff10, args=0x14d4350cac40, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.12.6/Objects/call.c:240
#8  0x000055c5452232f7 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=0x14d4350caba0, throwflag=<optimized out>) at Python/bytecodes.c:2715
#9  0x000055c545361e4c in _PyEval_EvalFrame (throwflag=0, frame=0x14d4350ca848, tstate=0x55c54af17f80) at /usr/local/src/conda/python-3.12.6/Include/internal/pycore_ceval.h:89
#10 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x14d41c1526f0, locals=0x0, func=0x14d420a8a8e0, tstate=0x55c54af17f80) at /usr/local/src/conda/python-3.12.6/Python/ceval.c:1683
#11 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x14d41c1526f0, func=0x14d420a8a8e0) at /usr/local/src/conda/python-3.12.6/Objects/call.c:419
#12 _PyObject_VectorcallTstate (tstate=0x55c54af17f80, callable=0x14d420a8a8e0, args=0x14d41c1526f0, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.12.6/Include/internal/pycore_call.h:92
#13 0x000055c54536192e in method_vectorcall (method=method@entry=0x14d41cfcdf80, args=args@entry=0x14d41c1526f8, nargsf=<optimized out>, kwnames=kwnames@entry=0x14d42375e2f0)
    at /usr/local/src/conda/python-3.12.6/Objects/classobject.c:61
#14 0x000055c54534547b in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x14d41cfcdf80, func=0x55c545361650 <method_vectorcall>, tstate=0x55c54af17f80)
    at /usr/local/src/conda/python-3.12.6/Objects/call.c:283
#15 _PyObject_Call (tstate=0x55c54af17f80, callable=0x14d41cfcdf80, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/call.c:354
#16 0x000055c545417a21 in partial_call (pto=0x14d41c205300, args=0x14d4239cd510, kwargs=<optimized out>) at /usr/local/src/conda/python-3.12.6/Modules/_functoolsmodule.c:331
#17 0x000055c545314c03 in _PyObject_MakeTpCall (tstate=0x55c54af17f80, callable=0x14d41c205300, args=0x14d4350ca828, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.12.6/Objects/call.c:240
#18 0x000055c54532960e in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x14d41c205300, tstate=0x55c54af17f80)
    at /usr/local/src/conda/python-3.12.6/Include/internal/pycore_call.h:92
#19 PyObject_Vectorcall (callable=0x14d41c205300, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/call.c:325
#20 0x000055c5452232f7 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=0x14d4350ca730, throwflag=<optimized out>) at Python/bytecodes.c:2715
#21 0x000055c5453813cf in _PyEval_EvalFrame (throwflag=0, frame=0x14d4350ca5f8, tstate=0x55c54af17f80) at /usr/local/src/conda/python-3.12.6/Include/internal/pycore_ceval.h:89
#22 _PyEval_Vector (kwnames=0x0, argcount=<optimized out>, args=0x14d43cbe0830, locals=0x0, func=0x14d42614ec00, tstate=0x55c54af17f80) at /usr/local/src/conda/python-3.12.6/Python/ceval.c:1683
#23 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, stack=0x14d43cbe0830, func=0x14d42614ec00) at /usr/local/src/conda/python-3.12.6/Objects/call.c:419
#24 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x14d43cbe0830, callable=0x14d42614ec00, tstate=0x55c54af17f80) at /usr/local/src/conda/python-3.12.6/Include/internal/pycore_call.h:92
#25 vectorcall_unbound (nargs=<optimized out>, args=0x14d43cbe0830, func=<optimized out>, unbound=<optimized out>, tstate=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/typeobject.c:2234
#26 vectorcall_method (name=<optimized out>, args=<optimized out>, nargs=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/typeobject.c:2265
#27 0x000055c54543a82c in slot_tp_iternext (self=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/typeobject.c:8965
#28 0x000055c54535d3c5 in list_extend (self=0x14d41d0717c0, iterable=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/listobject.c:944
#29 0x000055c545384977 in list___init___impl (iterable=<optimized out>, self=0x14d41d0717c0) at /usr/local/src/conda/python-3.12.6/Objects/listobject.c:2799
#30 list_vectorcall (type=<optimized out>, args=0x14d4350ca5d8, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/listobject.c:2824
#31 0x000055c5452261c6 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=0x14d4350ca578, throwflag=<optimized out>) at Python/bytecodes.c:2880
#32 0x000055c54545dd5c in _PyObject_VectorcallTstate (tstate=0x55c54af17f80, callable=0x14d450acdb20, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.12.6/Include/internal/pycore_call.h:92
#33 0x000055c5452d4f90 in context_run (self=0x14d41f806c40, args=0x14d42385b6d8, nargs=9, kwnames=0x0) at /usr/local/src/conda/python-3.12.6/Python/context.c:668
#34 0x000055c54532986b in cfunction_vectorcall_FASTCALL_KEYWORDS (func=<optimized out>, args=0x14d42385b6d8, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.12.6/Objects/methodobject.c:438
#35 0x000055c545223f8f in PyCFunction_Call (kwargs=0x14d4239fe700, args=0x14d42385b6c0, callable=0x14d41c24ff10) at /usr/local/src/conda/python-3.12.6/Objects/call.c:387
#36 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=0x14d4350ca2b0, throwflag=<optimized out>) at Python/bytecodes.c:3263
#37 0x000055c545361e4c in _PyEval_EvalFrame (throwflag=0, frame=0x14d4350ca020, tstate=0x55c54af17f80) at /usr/local/src/conda/python-3.12.6/Include/internal/pycore_ceval.h:89
#38 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x14d43cbe0da8, locals=0x0, func=0x14d45628f740, tstate=0x55c54af17f80) at /usr/local/src/conda/python-3.12.6/Python/ceval.c:1683
--Type <RET> for more, q to quit, c to continue without paging--
#39 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x14d43cbe0da8, func=0x14d45628f740) at /usr/local/src/conda/python-3.12.6/Objects/call.c:419
#40 _PyObject_VectorcallTstate (tstate=0x55c54af17f80, callable=0x14d45628f740, args=0x14d43cbe0da8, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.12.6/Include/internal/pycore_call.h:92
#41 0x000055c545361960 in method_vectorcall (method=<optimized out>, args=0x55c5457405d0 <_PyRuntime+76336>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.12.6/Objects/classobject.c:69
#42 0x000055c54543e8b2 in thread_run (boot_raw=0x55c54b0e48f0) at /usr/local/src/conda/python-3.12.6/Modules/_threadmodule.c:1114
#43 0x000055c5453fe6e4 in pythread_wrapper (arg=<optimized out>) at /usr/local/src/conda/python-3.12.6/Python/thread_pthread.h:237
#44 0x000014d45669f802 in start_thread () from /lib64/libc.so.6
#45 0x000014d45663f450 in clone3 () from /lib64/libc.so.6

The text was updated successfully, but these errors were encountered:

laramiel · 2024-11-29T09:10:36Z

The stacktrace indicates that your process is stuck waiting for the async tensorstore operation to complete. Can you get a backtrace of the non-python other threads? The only thing that stands out so far is that you have concurrency set to 1 with 0 bytes for the cache pool.

Edit: I tried to setup a long-running benchmark-style test of this and couldn't get it to trigger. Additional debugging is required to determine what's going on here.

griffbad · 2025-01-16T14:29:24Z

Hey,

Been also experiencing this issue when using forking with accessing a tensorstore dataset. Same issue with infinite hanging on x[slices].read().result where x has previously been opened with 'd = ts.open(json).result()' and then 'x=d[ts.d['channel'][0]]'. This works fine if I don't have forking.

Would love an update!

jbms · 2025-01-16T14:40:12Z

Forking is known not to work because TensorStore uses multiple threads internally. That is unfortunately not something that can be fixed. Instead you need to ensure forking happens before starting any threads, i.e. before doing anything with TensorStore.

griffbad · 2025-01-17T09:00:05Z

Ah ok! That makes sense, thanks for the quick reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible deadlock in tensorstore #206

Possible deadlock in tensorstore #206

stuarteberg commented Nov 28, 2024 •

edited

Loading

laramiel commented Nov 29, 2024 •

edited

Loading

griffbad commented Jan 16, 2025

jbms commented Jan 16, 2025

griffbad commented Jan 17, 2025

Possible deadlock in tensorstore #206

Possible deadlock in tensorstore #206

Comments

stuarteberg commented Nov 28, 2024 • edited Loading

laramiel commented Nov 29, 2024 • edited Loading

griffbad commented Jan 16, 2025

jbms commented Jan 16, 2025

griffbad commented Jan 17, 2025

stuarteberg commented Nov 28, 2024 •

edited

Loading

laramiel commented Nov 29, 2024 •

edited

Loading