Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenericJob.save just keeps copying things #1367

Open
liamhuber opened this issue Mar 4, 2024 · 6 comments
Open

GenericJob.save just keeps copying things #1367

liamhuber opened this issue Mar 4, 2024 · 6 comments
Assignees

Comments

@liamhuber
Copy link
Member

Repeated calls to save silently create new database items while duplicating the job name, i.e. they are a sort of silent "save-as" in database space. The job name however, stays the same, so you get an error if you try to load by job name:

from pyiron_base import Project

pr = Project("tmp")
pr.remove_jobs(silently=True, recursive=True)

job = pr.create.job.TableJob("my_table")
job.save()

print(len(pr.job_table()))
# 1

job.save()
print(len(pr.job_table()))
# 2

try:
    loaded = pr.load("my_table")
except ValueError as e:
    print(e.args[0])
# job name 'my_table' in this project '...' is not unique...

Is this intentional behaviour? I find it very counter-intuitive to other programs, where "save" updates my save file for the current object but doesn't create a new save file (or database entry in this case).

To further complicate things, PythonFunctionContainerJob does behave in the way I would expect by loading instead of saving if it finds the job. I like this better, but the difference in behaviour is definitely bad -- whatever behaviour we decide on should be consistent.

@jan-janssen
Copy link
Member

save() is intended as internal function which serialises the job object, consisting of two parts, the storage in HDF5 and the creation of a SQL database entry. The function was never intended to be called by the user directly. I would be open to renaming it to a private function as we also had other users misusing it before.

@liamhuber
Copy link
Member Author

Renaming it is reasonable to me, but is there a reason we want the duplicating behaviour even for internal use?

@pmrv
Copy link
Contributor

pmrv commented Mar 5, 2024

I have run into this before, but never deeply looked into it. Moving the linked section from PythonFunctionContainerJob to GenericJob should fix the issue and would be a-ok with me.

@liamhuber
Copy link
Member Author

I haven't gotten to checking but one of you might know offhand -- is the status also stored in hdf? If so then I guess the snippet needs to update that bit of serialized data? Otherwise I agree, I think the new code should be promoted up to the parent class.

@pmrv
Copy link
Contributor

pmrv commented Mar 6, 2024

It is stored, but not sure where, might be just job['status'] even.

@liamhuber
Copy link
Member Author

It is stored, but not sure where, might be just job['status'] even.

We probably want to update this in the stored version too then.

IMO it would also be good to give a

warnings.warn(f"{self.job_name} already exists and was not re-saved. Use `Project.remove_job` to remove the stored version and re-save it if you have really made changes you want to save")

(or similar)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants