Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch] Introduce caching #395

Merged
merged 7 commits into from
Jul 30, 2024
Merged

[patch] Introduce caching #395

merged 7 commits into from
Jul 30, 2024

Conversation

liamhuber
Copy link
Member

I wasn't going to make caching default, but since I'm just using a simple == comparison, it doesn't seem to be introducing any meaningful slowdown. The default easy enough to change in the future, in any case.

Closes #169

Description copied from the update to the deepdive:

Caching

By default, all nodes exploit caching. I.e. when they run they save a fresh dictionary of their input values; in all subsequent runs if the dictionary of their current input values matches (==) that last-used dictionary, they skip executing altogether and leverage their existing outputs.

Any changes to the inputs will obviously stop the cache from being retrieved, but for Composite nodes it is also reset if any child nodes are added/removed/replaced.

Note that since we do a simple == on the dictionary of input values, if your workflow non-idempotently passes around mutable data, it's possible you'll wind up in a situation where you get a false cache hit.

Caching behaviour can be defined at the class-level as a default, but can be overridden for individual nodes. Let's take a look:

from pyiron_workflow import Workflow
import random 

@Workflow.wrap.as_function_node(use_cache=False)
def Randint(low=0, high=999):
    rand = random.randint(low, high)
    return rand

wf = Workflow("mixed_caching")
wf.use_cache = False  # Turn _off_ caching for the whole workflow!

wf.always_new = Randint()
wf.cached = Randint()
wf.cached.use_cache = True  # Turn _on_ caching for this node

wf()
>>> {'always_new__rand': 598, 'cached__rand': 307}

Running the same workflow again, we see that the cached node just keeps returning the same "random" number, while the un-cached node gives us something new

wf()
>>> {'always_new__rand': 492, 'cached__rand': 307}

If we look into the caching data, we can see that the non-caching node has not stored any inputs and does not register a cache hit; even if we had previously cached something, if we switch to use_cache = False, we won't even look for the cache hit but will just give new data!

for node in wf:
    print(node.label, node.inputs.to_value_dict(), node.cached_inputs, node.cache_hit)
>>> always_new {'low': 0, 'high': 999} None False
>>> cached {'low': 0, 'high': 999} {'low': 0, 'high': 999} True

Wrap them in `emit()` and `emitting_channels` instead of manually calling them. This lets us tighten up If-like nodes too.
To shortcut actually running a node and just return existing output if its cached input matches its current input (by `==` test)
So it can be set at class definition time, even by decorators
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

Binder 👈 Launch a binder notebook on branch pyiron/pyiron_workflow/caching

Copy link

codacy-production bot commented Jul 30, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.05% (target: -1.00%) 98.48%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (ddc1b1e) 3470 3206 92.39%
Head commit (3be52ef) 3506 (+36) 3241 (+35) 92.44% (+0.05%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#395) 66 65 98.48%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences


🚀 Don’t miss a bit, follow what’s new on Codacy.

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@liamhuber liamhuber added the format_black trigger the Black formatting bot label Jul 30, 2024
@coveralls
Copy link

coveralls commented Jul 30, 2024

Pull Request Test Coverage Report for Build 10157140074

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 34 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.05%) to 92.442%

Files with Coverage Reduction New Missed Lines %
nodes/function.py 1 98.28%
nodes/transform.py 2 98.14%
nodes/composite.py 6 91.35%
node.py 11 94.57%
nodes/standard.py 14 91.48%
Totals Coverage Status
Change from base Build 10116646729: 0.05%
Covered Lines: 3241
Relevant Lines: 3506

💛 - Coveralls

@liamhuber liamhuber merged commit 41d8d42 into main Jul 30, 2024
16 of 17 checks passed
@liamhuber liamhuber deleted the caching branch July 30, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_black trigger the Black formatting bot
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hashing input to avoid running
3 participants