`test_threading` failing randomly on CI #129266

kumaraditya303 · 2025-01-24T17:05:55Z

See https://github.com/python/cpython/actions/runs/12953185538/job/36134744339?pr=128869

test_reinit_tls_after_fork (test.test_threading.ThreadJoinOnShutdown.test_reinit_tls_after_fork) ... Warning -- Uncaught thread exception: AssertionError
Exception in thread Thread-34 (do_fork_and_wait):
Traceback (most recent call last):
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/threading.py", line 1054, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/threading.py", line 996, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/test_threading.py", line 1438, in do_fork_and_wait
    support.wait_process(pid, exitcode=50)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/support/__init__.py", line 2205, in wait_process
    raise AssertionError(f"process {pid} is still running "
                         f"after {dt:.1f} seconds")
AssertionError: process 10953 is still running after 300.2 seconds
ok

colesbury · 2025-01-24T17:17:12Z

The other failure in that same job is test_main_thread_after_fork_from_dummy_thread:

test_main_thread_after_fork_from_dummy_thread (test.test_threading.ThreadTests.test_main_thread_after_fork_from_dummy_thread) ... Timeout (0:10:00)!
Thread 0x0000000204a60f40 (most recent call first):
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/selectors.py", line 398 in select
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/subprocess.py", line 2121 in _communicate
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/subprocess.py", line 1218 in communicate
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/support/script_helper.py", line 155 in run_python_until_end
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/support/script_helper.py", line 165 in _assert_python
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/support/script_helper.py", line 182 in assert_python_ok
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/test_threading.py", line 836 in test_main_thread_after_fork_from_foreign_thread
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/test_threading.py", line 849 in test_main_thread_after_fork_from_dummy_thread
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/unittest/case.py", line 606 in _callTestMethod
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/unittest/case.py", line 660 in run
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/unittest/case.py", line 716 in __call__
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/unittest/suite.py", line 122 in run
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/unittest/suite.py", line 122 in run
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/unittest/runner.py", line 259 in run
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/single.py", line 58 in _run_suite
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/single.py", line 38 in run_unittest
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/single.py", line 136 in test_func
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/single.py", line 92 in regrtest_runner
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/single.py", line 139 in _load_run_test
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/single.py", line 184 in _runtest_env_changed_exc
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/single.py", line 293 in _runtest
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/single.py", line 322 in run_single_test
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/worker.py", line 83 in worker_process
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/worker.py", line 118 in main
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/test/libregrtest/worker.py", line 122 in <module>
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/runpy.py", line 88 in _run_code
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/runpy.py", line 198 in _run_module_as_main

kumaraditya303 · 2025-01-24T17:35:48Z

Another failure but in multiprocessing https://github.com/python/cpython/actions/runs/12954294733/job/36135983482?pr=129267

test_timeout (test.test_multiprocessing_spawn.test_threads.WithThreadsTestSemaphore.test_timeout) ... skipped 'test not appropriate for threads'
======================================================================
FAIL: test_default_timeout (test.test_multiprocessing_spawn.test_threads.WithThreadsTestBarrier.test_default_timeout)
Test the barrier's default timeout
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\_test_multiprocessing.py", line 2291, in test_default_timeout
    self.assertEqual(len(results), barrier.parties)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 0 != 5
----------------------------------------------------------------------
Ran 105 tests in 68.027s
FAILED (failures=1, skipped=27)

colesbury · 2025-01-24T19:57:00Z

I've run test_threading locally >500 times and unfortunately can't reproduce this (or any other failures in test_threading). Sometimes these failures only happen when the machine is heavily loaded -- I'll try it again with more runs simultaneously later.

colesbury · 2025-01-25T16:14:19Z

Encountered the following locally (while running 5 tests simulataneously):

======================================================================
FAIL: test_set_and_clear (test.test_threading.EventTests.test_set_and_clear)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/sgross/Projects/cpython/Lib/test/lock_tests.py", line 519, in test_set_and_clear
    with Bunch(f, N):
         ~~~~~^^^^^^
  File "/Users/sgross/Projects/cpython/Lib/test/lock_tests.py", line 77, in __exit__
    for _ in support.sleeping_retry(support.SHORT_TIMEOUT):
             ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sgross/Projects/cpython/Lib/test/support/__init__.py", line 2545, in sleeping_retry
    for _ in busy_retry(timeout, err_msg, error=error):
             ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sgross/Projects/cpython/Lib/test/support/__init__.py", line 2515, in busy_retry
    raise AssertionError(msg)
AssertionError: timeout (30.4 seconds)

----------------------------------------------------------------------
Ran 206 tests in 43.732s

colesbury · 2025-01-25T16:30:00Z

I'm increasingly confident that these are bugs in the test code and not in the threading module or other parts of Python.

For example, the test in test_set_and_clear has an inherent race condition: if the main thread sets and clears the event before all of the five background threads start waiting on the event, they will never get woken up and the tests times out. wait_threads_blocked just sleeps for 0.05 seconds. There's no guarantee that all the threads will have started executing and reached the event.wait() by then, especially if the system is heavily loaded as is sometimes the case in the CI.

cpython/Lib/test/lock_tests.py

Lines 510 to 527 in e119526

    
           def test_set_and_clear(self): 
        
               # gh-57711: check that wait() returns true even when the event is 
        
               # cleared before the waiting thread is woken up. 
        
               event = self.eventtype() 
        
               results = [] 
        
               def f(): 
        
                   results.append(event.wait(support.LONG_TIMEOUT)) 
        
               N = 5 
        
               with Bunch(f, N): 
        
                   # Threads blocked on event.wait() 
        
                   wait_threads_blocked(N) 
        
                   # Threads unblocked 
        
                   event.set() 
        
                   event.clear() 
        
               self.assertEqual(results, [True] * N)

kumaraditya303 · 2025-01-25T16:36:44Z

I think we should fix these tests to not rely on arbitrary sleeps for synchronization. In the past we had similar issues with using sleeps in asyncio tests so I ended up removing a lot of those and switch to barrier or other sync primitives, the same should be done for threading tests.

colesbury · 2025-01-25T17:27:45Z

It's not clear how to avoid the sleeps in this test. You need to wait until the other threads are blocked on event.wait(). How do you do that?

kumaraditya303 · 2025-01-25T18:16:45Z

I think in this case we can check the waiter count of the underlying condition used _cond. The waiter count isn't exposed as a public attribute unlike barrier's n_waiting so would need to check len(_cond._waiters).

kumaraditya303 · 2025-01-26T06:02:11Z

I tested the following by adding a artificial delay before waiting on the events in threads and seems to work.

diff --git a/Lib/test/lock_tests.py b/Lib/test/lock_tests.py
index 8c8f8901f00..9e9bbed4f2f 100644
--- a/Lib/test/lock_tests.py
+++ b/Lib/test/lock_tests.py
@@ -513,12 +513,14 @@ def test_set_and_clear(self):
         event = self.eventtype()
         results = []
         def f():
+            time.sleep(0.1)
             results.append(event.wait(support.LONG_TIMEOUT))
 
         N = 5
         with Bunch(f, N):
             # Threads blocked on event.wait()
-            wait_threads_blocked(N)
+            while len(event._cond._waiters) != N:
+                time.sleep(0.01)
 
             # Threads unblocked
             event.set()

kumaraditya303 added the topic-free-threading label Jan 24, 2025

picnixz added the tests Tests in the Lib/test dir label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`test_threading` failing randomly on CI #129266

`test_threading` failing randomly on CI #129266

kumaraditya303 commented Jan 24, 2025

colesbury commented Jan 24, 2025

kumaraditya303 commented Jan 24, 2025

colesbury commented Jan 24, 2025

colesbury commented Jan 25, 2025

colesbury commented Jan 25, 2025

kumaraditya303 commented Jan 25, 2025

colesbury commented Jan 25, 2025

kumaraditya303 commented Jan 25, 2025 •

edited

Loading

kumaraditya303 commented Jan 26, 2025

test_threading failing randomly on CI #129266

test_threading failing randomly on CI #129266

Comments

kumaraditya303 commented Jan 24, 2025

colesbury commented Jan 24, 2025

kumaraditya303 commented Jan 24, 2025

colesbury commented Jan 24, 2025

colesbury commented Jan 25, 2025

colesbury commented Jan 25, 2025

kumaraditya303 commented Jan 25, 2025

colesbury commented Jan 25, 2025

kumaraditya303 commented Jan 25, 2025 • edited Loading

kumaraditya303 commented Jan 26, 2025

`test_threading` failing randomly on CI #129266

`test_threading` failing randomly on CI #129266

kumaraditya303 commented Jan 25, 2025 •

edited

Loading