Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minor] Refactor storage #422

Merged
merged 10 commits into from
Aug 15, 2024
Merged

[minor] Refactor storage #422

merged 10 commits into from
Aug 15, 2024

Conversation

liamhuber
Copy link
Member

Moving from three storage back-ends down to one, we have the opportunity to refactor the storage interface and apply lesson's learned on the first "alpha" pass at storage.

* Remove unused import

* Never interact directly with the storage object

Always go through HasStorage methods. In practice, this was only .delete.
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

Binder 👈 Launch a binder notebook on branch pyiron/pyiron_workflow/refactor_storage

Copy link

codacy-production bot commented Aug 12, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.03% (target: -1.00%) 94.01%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (21d8e95) 3169 2912 91.89%
Head commit (b07a21e) 3109 (-60) 2856 (-56) 91.86% (-0.03%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#422) 167 157 94.01%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

liamhuber and others added 3 commits August 12, 2024 14:04
* Refactor: merge HasStorage back into Node

* Indicate that StorageInterface is abstract

* Refactor: Move storage out of the mixin submodule

* 🐛 pass node through methods

We were never getting to the cloudpickle deletion branch

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>
* 🐛 compare like-objects

* Distinguish saving and checkpointing

* Rename flag to reflect checkpoint behaviour

* Reflect save/checkpoint difference in StorageInterface

It only needs save now -- checkpointing is the responsibility of the node itself
@liamhuber
Copy link
Member Author

liamhuber commented Aug 13, 2024

End-of-day reminders to self:

  • Format checkpoint: Literal["pickle"] | StorageInterface | None so that it can use something other than the default pickle back end
  • Don't load by default for anything but Workflow. This is a subtlety I only noticed going through this refactoring -- if we auto-load everything you could have a macro that crashes at instantiation time because one of its children happens to have the same label as a saved node. As parent-most objects, it's safe to always greedily try loading new workflows
  • Allow any_storage_has_contents to take a StorageInterface as an argument, and if it's not None try to delete there too. This is to accommodate loading at instantiation with a custom interface. Maybe the method can be private too.
  • And more, this isn't exhaustive, just stuff I don't want to forget.

liamhuber and others added 2 commits August 13, 2024 12:22
* Expose the backend as a node method kwarg

* Use the backend kwarg directly in tests

* Expose backend in storage deletion too

In case someone has their own custom backend

* Remove the storage property

And route the `storage_backend` kwarg to the post-setup routine

* Format black

* Allow the backend to be specified for checkpoints

* Update storage_backend docs and type hint

* Use storage_backend kwarg as intended in tests

Including a couple bug-fixes, where I was looping over backends but forgot to pass the loop value to the save method!

* 🐛 fix JSON edit typo

* Check for stored content to autoload

Even if the autoload back end is a custom interface

* Give Node.delete_storage consistent use of the backend arg

* Pass backend iterator variable to deletion method in tests

* Remove unused method

* Allow specifically mentioned backends to pass on missing files

* Refactor: remove code duplication

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>
* Refactor: rename storage_backend to autoload

* Update autoload default behaviour on the base class

And note the difference in Workflow

* Add a test to demonstrate the issue

And ensure that, for macros at least, the default behaviour stays safe

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>
@liamhuber liamhuber changed the title Refactor storage [minor] Refactor storage Aug 13, 2024
liamhuber and others added 4 commits August 13, 2024 15:16
* Clean up after macro test

* Refactor: rename overwrite_save to delete_existing_savefiles

* 🐛 Actually deactivate auto-loading for Node

* Remove void __init__ implementation

It just passes through

* Consistent and sensible kwarg ordering, and a doc

* Refactor: rename run_after_init to autoload

It's shorter and, IMO, better in line with the other kwarg names like "autoload"

* Delete storage in the order the nodes are instantiated

Otherwise if it fails before hard_input is defined, the finally clause never deletes the self.n1 storage because there's a second failure _inside_ finally
* Remove unused property

* Shorten variable name

* Refactor: rename method

* Call base method and remove specialist

* Condense has_contents

* Remove unused property

* Move methods from Node to StorageInterface

* Have StorageInterface.load just return the new instance

* Break dependence on Node.working_directory

* Refactor: rename obj to node in the storage module

Since it is the interface for Node instances

* 🐛 pass argument

* Remove tidy method

* Clean up using pathlib

* Rely on pathlib

This is almost exclusively because `pyiron_snippets.files.DirectoryObject` has no option to avoid the `self.create()` call at instantiation. That means if I want to peek at stuff, I need to create then clean it, which is killer slow. I'm very open to re-introducing `files` dependence again in the future, I just don't have the time to implement the necessary fixes, with test, and go through the release cycle. This is all under-the-hood, so should be safely changeable with a `pyiron_workflow` patch change later anyhow.

* Remove the post-setup cleanup

No longer necessary now that storage isn't creating directories willy nilly

* Simplify and remove unused

* Allow semantic objects to express as pathlib.Path

* Leverage new semantic path for storage

* Remove unneeded exception

* Remove unused imports

* Use a generator in the storage module for available backends

* Test loading failure

* Fix docstring

* Format black

* Remove unused import

* Don't overload builtin name

---------

Co-authored-by: pyiron-runner <[email protected]>
* Expose existing kwargs on the node methods

* Refactor: slide

* Allow storage to specify either a node xor a filename directly

* Add tests

* Expose backend-specific kwargs

And use them for the PickleStorage

* Add storage to the API

* Pass kwargs through the node interface to storage

* Add docstrings

* Refactor: rename methods

* Shorten kwargs docstrings

* Format black

* Deduplicate test class name

* Remove unnecessary passes

Codacy actually catching real things for once

---------

Co-authored-by: pyiron-runner <[email protected]>
@liamhuber liamhuber marked this pull request as ready for review August 15, 2024 18:51
@liamhuber liamhuber merged commit d1cf8bf into minor_bump Aug 15, 2024
15 of 16 checks passed
@liamhuber liamhuber deleted the refactor_storage branch August 15, 2024 19:21
"""


class StorageInterface(ABC):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrv this is the guy I was talking about

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant