Skip to content

Commit

Permalink
[patch] Bump development status to Beta (#482)
Browse files Browse the repository at this point in the history
* [patch] Bump development status to Beta

The filesystem interaction is stable and robust (i.e. recovery files get written when things go wrong), which has the knock-on effect that you can scale workflows to remote processes. Support for individual nodes running on remote processes beyond the lifetime of the parent python process is pretty rough, but also there.

* Extend docs

* Extend readme
  • Loading branch information
liamhuber authored Sep 30, 2024
1 parent c57d387 commit d8728f1
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 4 deletions.
5 changes: 3 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,19 @@
`pyiron_workflow` is a framework for constructing workflows as computational graphs from simple python functions. Its objective is to make it as easy as possible to create reliable, reusable, and sharable workflows, with a special focus on research workflows for HPC environments.

Nodes are formed from python functions with simple decorators, and the resulting nodes can have their data inputs and outputs connected.
Unlike regular python, they operate in a delayed way.

By allowing (but not demanding, in the case of data DAGs) users to specify the execution flow, both cyclic and acyclic graphs are supported.

By scraping type hints from decorated functions, both new data values and new graph connections are (optionally) required to conform to hints, making workflows strongly typed.

Individual node computations can be shipped off to parallel processes for scalability. (This is a beta-feature at time of writing; standard python executors like `concurrent.futures.ThreadPoolExecutor` and `ProcessPoolExecutor` work, and the `Executor` executor from [`executorlib`](https://github.com/pyiron/executorlib) is supported and tested; `executorlib`'s more powerful flux- and slurm- based executors have not been tested and may fail.)
Individual node computations can be shipped off to parallel processes for scalability. Standard python executors like `concurrent.futures.ThreadPoolExecutor` and `ProcessPoolExecutor` work, but so does, e.g., the `Executor` executor from [`executorlib`](https://github.com/pyiron/executorlib), which facilitates running on HPC. It is also straightforward to run an entire graph on a remote process, e.g. a SLURM allocation, by locally saving the graph and remotely loading, running, and re-saving. Cf. [this notebook](../notebooks/hpc_example.ipynb) for some simple examples.

Once you're happy with a workflow, it can be easily turned it into a macro for use in other workflows. This allows the clean construction of increasingly complex computation graphs by composing simpler graphs.

Nodes (including macros) can be stored in plain text as python code, and imported by future workflows for easy access. This encourages and supports an ecosystem of useful nodes, so you don't need to re-invent the wheel. When these python files are in a properly managed git repository and released in a stable channel (e.g. conda-forge), they fulfill most requirements of the [FAIR](https://en.wikipedia.org/wiki/FAIR_data) principles.

Executed or partially-executed graphs can be stored to file, either by explicit call or automatically after running. These can be reloaded (automatically on instantiation, in the case of workflows) and examined/rerun, etc.
Executed or partially-executed graphs can be stored to file, either by explicit call or automatically after running. These can be reloaded (automatically on instantiation, in the case of workflows) and examined/rerun, etc. If your workflow fails, it will (by default) save a recovery file for you to restore it at the time of failure.

## Installation

Expand Down
12 changes: 12 additions & 0 deletions pyiron_workflow/mixin/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,18 @@ class Runnable(UsesState, HasLabel, HasRun, ABC):
Child classes can optionally override :meth:`process_run_result` to do something
with the returned value of :meth:`on_run`, but by default the returned value just
passes cleanly through the function.
The `run` cycle is broken down into sub-steps:
- `_before_run`: prior to the `running` status being set to `True`
- `_run`: after the `running` status has been set to `True`
- `_finish_run`: what is done to the results of running, and when `running` is
set to `False`
- `_run_exception`: What to do if an encountered
- `_run_finally`: What to do after _every_ run, regardless of whether an exception
was encountered
Child classes can extend the behavior of these sub-steps, including introducing
new keyword arguments.
"""

def __init__(self, *args, **kwargs):
Expand Down
10 changes: 9 additions & 1 deletion pyiron_workflow/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ class Node(
- In addition to operations, some methods exist for common routines, e.g.
casting the value as `int`.
- When running their computation, nodes may or may not:
- If already running, check for serialized results from a process that
survived the death of their original process
- First update their input data values using kwargs
- (Note that since this happens first, if the "fetching" step later occurs,
any values provided here will get overwritten by data that is flowing
Expand All @@ -100,10 +102,12 @@ class Node(
the execution flow
- Running the node (and all aliases of running) return a representation of data
held by the output channels (or a futures object)
- If an error is encountered _after_ reaching the state of actually computing the
- If an error is encountered _after_ reaching the state of actually running the
node's task, the status will get set to failure
- Nodes can be instructed to run at the end of their initialization, but will exit
cleanly if they get to checking their readiness and find they are not ready
- Nodes can suppress raising errors they encounter by setting a runtime keyword
argument.
- Nodes have a label by which they are identified within their scope, and a full
label which is unique among the entire semantic graph they exist within
- Nodes can run their computation using remote resources by setting an executor
Expand Down Expand Up @@ -140,6 +144,10 @@ class Node(
IO data is not pickle-able.
- Saving is triggered manually, or by setting a flag to make a checkpoint save
of the entire graph after the node runs.
- Saving the entire graph can be set to happen at the end of a particular
node's run with a checkpoint flag.
- A specially named recovery file for the entire graph will (by default) be
automatically saved if the node raises an exception.
- The pickle storage interface comes with all the same caveats as pickle and
is not suitable for storage over indefinitely long time periods.
- E.g., if the source code (cells, `.py` files...) for a saved graph is
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ readme = "docs/README.md"
keywords = [ "pyiron",]
requires-python = ">=3.10, <3.13"
classifiers = [
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
"Topic :: Scientific/Engineering",
"License :: OSI Approved :: BSD License",
"Intended Audience :: Science/Research",
Expand Down

0 comments on commit d8728f1

Please sign in to comment.