Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include copy of https://github.com/uktrade/iterable-subprocess #538

Merged
merged 14 commits into from
Nov 23, 2023

Conversation

mih
Copy link
Member

@mih mih commented Nov 15, 2023

This brings the preconditions for #539, which in itself has the evidence for the utility of this approach.

Historic context from here.

ATM` it is still unclear whether it will be adopted. This changeset is primarily a preparation for running the tests in all contexts that we need.

The sources where included via

git subtree add --prefix datalad_next/iterable_subprocess https://github.com/uktrade/iterable-subprocess.git main --squash

Closes #536
Closes #537

TODO:

  • some tests use external commands, generally not to be expected on windows: ls, cat, funzip. The first two should be replacable with dir and type, but the latter (from unzip) might need to become a conditional for running the test. It is installed on appveyor, though.
    SOLVED: All tools, also funzip come with Git (also on Windows). It may need a PATH adjustment on a particular machine, but not additional install.
  • some tests fail on windows. These all seem to be related to improper exit handling on windows. STDIN is closed unconditionally, which is not appreciated by Windows (as the underlying process is no longer running). This leads to OSError Errno 22. The conditions under which this happens are not completely clear to me right now. Simply catching OSError seems not enough.
    • test_if_process_closes_standard_input_but_exits_with_non_zero_error_code_then_broken_pipe_error FAILED
    • test_if_process_closes_standard_input_but_exits_with_non_zero_error_code_then_iterable_subprocess_error FAILED
    • test_program_that_outputs_for_a_long_time_is_interrupted_on_context_exit FAILED

Test log

FAILED ..\iterable_subprocess\test_iterable_subprocess.py::test_if_process_closes_standard_input_but_exits_with_non_zero_error_code_then_broken_pipe_error - OSError: [Errno 22] Invalid argument
FAILED ..\iterable_subprocess\test_iterable_subprocess.py::test_if_process_closes_standard_input_but_exits_with_non_zero_error_code_then_iterable_subprocess_error - OSError: [Errno 22] Invalid argument
FAILED ..\iterable_subprocess\test_iterable_subprocess.py::test_program_that_outputs_for_a_long_time_is_interrupted_on_context_exit - assert b'BrokenPipeError' in b'Traceback (most recent call last):\r\n  File "<string>", line 3, in <module>\r\nOSError: [Errno 22] Invalid argument\r\nException ignored in: <_io.TextIOWrapper name=\'<stdout>\' mode=\'w\' encoding=\'cp1252\'>\r\nOSError: [Errno 22] Invalid argument\r\n'
 +  where b'Traceback (most recent call last):\r\n  File "<string>", line 3, in <module>\r\nOSError: [Errno 22] Invalid argument\r\nException ignored in: <_io.TextIOWrapper name=\'<stdout>\' mode=\'w\' encoding=\'cp1252\'>\r\nOSError: [Errno 22] Invalid argument\r\n' = IterableSubprocessError(120, b'Traceback (most recent call last):\r\n  File "<string>", line 3, in <module>\r\nOSError: [Errno 22] Invalid argument\r\nException ignored in: <_io.TextIOWrapper name=\'<stdout>\' mode=\'w\' encoding=\'cp1252\'>\r\nOSError: [Errno 22] Invalid argument\r\n').stderr
 +    where IterableSubprocessError(120, b'Traceback (most recent call last):\r\n  File "<string>", line 3, in <module>\r\nOSError: [Errno 22] Invalid argument\r\nException ignored in: <_io.TextIOWrapper name=\'<stdout>\' mode=\'w\' encoding=\'cp1252\'>\r\nOSError: [Errno 22] Invalid argument\r\n') = <ExceptionInfo IterableSubprocessError(120, b'Traceback (most recent call last):\r\n  File "<string>", line 3, in <module>\r\nOSError... in: <_io.TextIOWrapper name=\'<stdout>\' mode=\'w\' encoding=\'cp1252\'>\r\nOSError: [Errno 22] Invalid argument\r\n') tblen=3>.value

@mih

This comment was marked as resolved.

It is not 100% clear to me whether this is correct, but
`OSError(errno22)` can and does occur when interacting with
file descriptors of dead processes.
Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (d0e3438) 91.24% compared to head (f880cc4) 92.66%.

Files Patch % Lines
...ad_next/iterable_subprocess/iterable_subprocess.py 97.70% 2 Missing ⚠️
datalad_next/itertools/itemize.py 95.45% 2 Missing ⚠️
datalad_next/itertools/decode_bytes.py 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
+ Coverage   91.24%   92.66%   +1.41%     
==========================================
  Files         128      137       +9     
  Lines        9641    10031     +390     
  Branches     1045     1085      +40     
==========================================
+ Hits         8797     9295     +498     
+ Misses        825      712     -113     
- Partials       19       24       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mih
Copy link
Member Author

mih commented Nov 16, 2023

This is ready. I will now port previous attempts to this new implementation -- in separate PRs.

@mih
Copy link
Member Author

mih commented Nov 22, 2023

#539 has now shown that this code is both fast and usable.

I will now add a thin wrapper around it, such that we can extend without changing its nature too much (possibly ease future merges).

mih and others added 5 commits November 22, 2023 14:07
That module imports too many components with too many dependency that
also needs (these) constants. This creates import loops.

The change established a toplevel `consts` module with minimal
dependencies. The legacy location is kept and a comment
warns against further use.
This is a convenience wrapper around `datalad_next.iterable_subprocess`.
The main purpose is to have a place for documentation and feature
additions -- without adding needless diff to the original implementation
that was merged in from an external source.

The wrapper uses `COPY_BUFSIZE` as default chunk size (which is
numerically identical to the hardcoded default on POSIX systems).
This should be better than a separate literal.

A second different is the option to run a subprocess without any
explicit input.
There is not really a point to compatibility with versions that
are nearing EOL
For now containers processors for byte-string iterables and
functionality related to subprocess output.
@mih
Copy link
Member Author

mih commented Nov 23, 2023

OK, let's do this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alternative paradigm for concurrent batch processes Performance issues in batch-commands
2 participants