[patch] Introduce caching #395

liamhuber · 2024-07-30T06:00:40Z

I wasn't going to make caching default, but since I'm just using a simple == comparison, it doesn't seem to be introducing any meaningful slowdown. The default easy enough to change in the future, in any case.

Closes #169

Description copied from the update to the deepdive:

Caching

By default, all nodes exploit caching. I.e. when they run they save a fresh dictionary of their input values; in all subsequent runs if the dictionary of their current input values matches (==) that last-used dictionary, they skip executing altogether and leverage their existing outputs.

Any changes to the inputs will obviously stop the cache from being retrieved, but for Composite nodes it is also reset if any child nodes are added/removed/replaced.

Note that since we do a simple == on the dictionary of input values, if your workflow non-idempotently passes around mutable data, it's possible you'll wind up in a situation where you get a false cache hit.

Caching behaviour can be defined at the class-level as a default, but can be overridden for individual nodes. Let's take a look:

from pyiron_workflow import Workflow
import random 

@Workflow.wrap.as_function_node(use_cache=False)
def Randint(low=0, high=999):
    rand = random.randint(low, high)
    return rand

wf = Workflow("mixed_caching")
wf.use_cache = False  # Turn _off_ caching for the whole workflow!

wf.always_new = Randint()
wf.cached = Randint()
wf.cached.use_cache = True  # Turn _on_ caching for this node

wf()
>>> {'always_new__rand': 598, 'cached__rand': 307}

Running the same workflow again, we see that the cached node just keeps returning the same "random" number, while the un-cached node gives us something new

wf()
>>> {'always_new__rand': 492, 'cached__rand': 307}

If we look into the caching data, we can see that the non-caching node has not stored any inputs and does not register a cache hit; even if we had previously cached something, if we switch to use_cache = False, we won't even look for the cache hit but will just give new data!

for node in wf:
    print(node.label, node.inputs.to_value_dict(), node.cached_inputs, node.cache_hit)
>>> always_new {'low': 0, 'high': 999} None False
>>> cached {'low': 0, 'high': 999} {'low': 0, 'high': 999} True

Wrap them in `emit()` and `emitting_channels` instead of manually calling them. This lets us tighten up If-like nodes too.

To shortcut actually running a node and just return existing output if its cached input matches its current input (by `==` test)

So it can be set at class definition time, even by decorators

review-notebook-app · 2024-07-30T06:00:45Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions · 2024-07-30T06:01:15Z

👈 Launch a binder notebook on branch pyiron/pyiron_workflow/caching

codacy-production · 2024-07-30T06:14:29Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.05% (target: -1.00%)	✅ 98.48%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`ddc1b1e`)	3470	3206	92.39%
Head commit (`3be52ef`)	3506 (+36)	3241 (+35)	92.44% (+0.05%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#395)	66	65	98.48%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{🚀 Don’t miss a bit, follow what’s new on Codacy.}

_{Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more}

coveralls · 2024-07-30T06:15:47Z

Pull Request Test Coverage Report for Build 10157140074

Details

0 of 0 changed or added relevant lines in 0 files are covered.
34 unchanged lines in 5 files lost coverage.
Overall coverage increased (+0.05%) to 92.442%

Files with Coverage Reduction	New Missed Lines	%
nodes/function.py	1	98.28%
nodes/transform.py	2	98.14%
nodes/composite.py	6	91.35%
node.py	11	94.57%
nodes/standard.py	14	91.48%

Totals
Change from base Build 10116646729:	0.05%
Covered Lines:	3241
Relevant Lines:	3506

💛 - Coveralls

liamhuber added 6 commits July 29, 2024 20:50

Refactor: output signals to emission

e921c99

Wrap them in `emit()` and `emitting_channels` instead of manually calling them. This lets us tighten up If-like nodes too.

Introduce caching

6132453

To shortcut actually running a node and just return existing output if its cached input matches its current input (by `==` test)

Extend speedup test to include caching

f082cd2

Add docstring

33ffb16

Expose use_cache as a class attribute

ec46f4c

So it can be set at class definition time, even by decorators

Discuss caching in the deepdive

0b8fdbb

liamhuber added the format_black trigger the Black formatting bot label Jul 30, 2024

Format black

3be52ef

liamhuber merged commit 41d8d42 into main Jul 30, 2024
16 of 17 checks passed

liamhuber deleted the caching branch July 30, 2024 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[patch] Introduce caching #395

[patch] Introduce caching #395

liamhuber commented Jul 30, 2024

review-notebook-app bot commented Jul 30, 2024

github-actions bot commented Jul 30, 2024

codacy-production bot commented Jul 30, 2024 •

edited

Loading

coveralls commented Jul 30, 2024 •

edited

Loading

[patch] Introduce caching #395

[patch] Introduce caching #395

Conversation

liamhuber commented Jul 30, 2024

Caching

review-notebook-app bot commented Jul 30, 2024

github-actions bot commented Jul 30, 2024

codacy-production bot commented Jul 30, 2024 • edited Loading

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

coveralls commented Jul 30, 2024 • edited Loading

Pull Request Test Coverage Report for Build 10157140074

Details

💛 - Coveralls

codacy-production bot commented Jul 30, 2024 •

edited

Loading

coveralls commented Jul 30, 2024 •

edited

Loading