Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cylc message traceback #6501

Open
oliver-sanders opened this issue Nov 28, 2024 · 3 comments
Open

cylc message traceback #6501

oliver-sanders opened this issue Nov 28, 2024 · 3 comments
Labels
bug Something is wrong :(
Milestone

Comments

@oliver-sanders
Copy link
Member

We have started seeing mysterious traceback in job.err files:

Exception ignored in: <function BaseEventLoop.__del__ at 0x7f2e1f2c25e0>
Traceback (most recent call last):
  File "~/cylc/lib/python3.9/asyncio/base_events.py", line 688, in __del__
    self.close()
  File "~/cylc/lib/python3.9/asyncio/unix_events.py", line 58, in close
    super().close()
  File "~/cylc/lib/python3.9/asyncio/selector_events.py", line 87, in close
    self._close_self_pipe()
  File "~/cylc/lib/python3.9/asyncio/selector_events.py", line 94, in _close_self_pipe
    self._remove_reader(self._ssock.fileno())
  File "~/cylc/lib/python3.9/asyncio/selector_events.py", line 272, in _remove_reader
    key = self._selector.get_key(fd)
  File "~/cylc/lib/python3.9/selectors.py", line 191, in get_key
    return mapping[fileobj]
  File "~/cylc/lib/python3.9/selectors.py", line 72, in __getitem__
    fd = self._selector._fileobj_lookup(fileobj)
  File "~/cylc/lib/python3.9/selectors.py", line 226, in _fileobj_lookup
    return _fileobj_to_fd(fileobj)
  File "~/cylc/lib/python3.9/selectors.py", line 42, in _fileobj_to_fd
    raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1
Exception ignored in: <function BaseEventLoop.__del__ at 0x7fc2f4f8ed30>
Traceback (most recent call last):
  File "~/cylc/lib/python3.9/asyncio/base_events.py", line 688, in __del__
    self.close()
  File "~/cylc/lib/python3.9/asyncio/unix_events.py", line 58, in close
    super().close()
  File "~/cylc/lib/python3.9/asyncio/selector_events.py", line 87, in close
    self._close_self_pipe()
  File "~/cylc/lib/python3.9/asyncio/selector_events.py", line 94, in _close_self_pipe
    self._remove_reader(self._ssock.fileno())
  File "~/cylc/lib/python3.9/asyncio/selector_events.py", line 272, in _remove_reader
    key = self._selector.get_key(fd)
  File "~/cylc/lib/python3.9/selectors.py", line 191, in get_key
    return mapping[fileobj]
  File "~/cylc/lib/python3.9/selectors.py", line 72, in __getitem__
    fd = self._selector._fileobj_lookup(fileobj)
  File "~/cylc/lib/python3.9/selectors.py", line 226, in _fileobj_lookup
    return _fileobj_to_fd(fileobj)
  File "~/cylc/lib/python3.9/selectors.py", line 42, in _fileobj_to_fd
    raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1

The traceback would appear to be harmless, there is nothing to suggest that task messaging has failed in any way.

The error is originating in a __del__ routine:

  File "~/cylc/lib/python3.9/asyncio/base_events.py", line 688, in __del__
    self.close()

This is probably being called when the cylc message process exits (i.e. after it has successfully sent the message), hence harmless to the purpose of the command, however, it could potentially indicate that we haven't closed an async resource properly? Whether this is the case or not, the pollution of job.err files is bad enough as it is to warrant the bug label.

The traceback occurs repeatably for both myself and Dave, but strangely, on different Cylc stacks, I can only produce the traceback on one, Dave on the other, on the plot thickens 🤦!

I have been able to develop a simple script that can reproduce this error reliably-ish by using SIGINT to force a teardown of running code (although SIGINT is not involved in the cylc message examples above):

import asyncio
from multiprocessing import Process
from functools import partial
from subprocess import Popen
import os


def task(name: str, payload: dict):
    loop = asyncio.new_event_loop()
    coro = asyncio.sleep(0, loop=loop)
    result = loop.run_until_complete(coro)
    loop.close()
    return result


def killer(pid):
    from time import sleep
    sleep(0.1)
    Popen(['kill', '-s', 'SIGINT', str(pid)])


p = Process(target=partial(killer, os.getpid()))
p.start()

for x in range(10000):
  task('x', {})

Run this script like so:

for i in $(seq 1 10) ; do echo "$i"; ( timeout 2 python mess.py 2>&1 | grep 'Invalid file descriptor'); done

Although the script is reproducing the error in a completely different way, perhaps it will help us track down the problem / develop a solution 🤷?

@oliver-sanders oliver-sanders added the bug Something is wrong :( label Nov 28, 2024
@oliver-sanders oliver-sanders added this to the 8.3.x milestone Nov 28, 2024
@hjoliver
Copy link
Member

Are you able to try to reproduce with different Python versions? (3.9 is quite old now).

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Nov 29, 2024

Are you able to try to reproduce with different Python versions

Using the script above, I can reproduce the "Invalid file descriptor" traceback with Python 3.12. (remove the loop=loop argument to make it work).

3.9 is quite old now

3.9 is the newest version Cylc UIS supports.

@hjoliver
Copy link
Member

hjoliver commented Dec 2, 2024

3.9 is quite old now

3.9 is the newest version Cylc UIS supports.

Yes, but this is a scheduler issue. And it seems to be fairly deep in the Python library so I was just curious if changes in recent years had fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

No branches or pull requests

2 participants