job crashes early in hdfio #1422

freyso · 2024-05-21T17:22:40Z

Summary

A SPHInX (restart) job fails to run due to failures in hdf5io. Error message is "ValueError: Objects can be only recovered from hdf5 if TYPE is given"

I cannot tell if this is related to restart.

pyiron Version and Platform

cmti

Expected Behavior

Job runs.

Actual Behavior

Job crashes.
Job execution crashes with the following error.out

> Traceback (most recent call last):
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/runpy.py", line 196, in _run_module_as_main
>     return _run_code(code, main_globals, None,
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/runpy.py", line 86, in _run_code
>     exec(code, run_globals)
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/cli/__main__.py", line 3, in <module>
>     main()
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/cli/control.py", line 59, in main
>     args.cli(args)
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/cli/wrapper.py", line 37, in main
>     job_wrapper_function(
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/jobs/job/wrapper.py", line 161, in job_wrapper_function
>     job = JobWrapper(
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/jobs/job/wrapper.py", line 64, in __init__
>     self.job = pr.load(int(job_id))
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/project/jobloader.py", line 104, in __call__
>     return super().__call__(job_specifier, convert_to_object=convert_to_object)
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/project/jobloader.py", line 75, in __call__
>     return self._project.load_from_jobpath(
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/project/generic.py", line 1001, in load_from_jobpath
>     job = job.to_object()
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/jobs/job/core.py", line 596, in to_object
>     return self.project_hdf5.to_object(object_type, **qwargs)
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/storage/hdfio.py", line 1142, in to_object
>     return _to_object(self, class_name, **kwargs)
>   File "/cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_latest/lib/python3.10/site-packages/pyiron_base/storage/hdfio.py", line 117, in _to_object
>     raise ValueError("Objects can be only recovered from hdf5 if TYPE is given")
> ValueError: Objects can be only recovered from hdf5 if TYPE is given

Steps to Reproduce

?? Deleting and setting up the job again produces the error again.

The text was updated successfully, but these errors were encountered:

samwaseda · 2024-05-21T19:23:38Z

Hm there's not a single line coming from Sphinx in the error message. Do you have a small code to reproduce the error?

pmrv · 2024-05-21T19:53:46Z

Could it be that there's a stray entry in the database from a time when you deleted the job files manually outside of pyiron?

samwaseda · 2024-05-21T19:55:36Z

Can you also maybe try to see whether a different version of pyiron helps? It might help us figure out which changes could have caused the problem.

freyso · 2024-05-22T12:04:50Z

Changing to pyiron/2024-05-20 seemed to help. I was on pyiron/latest before, which apparently is NOT latest. Is it possible that the pyiron version used on the cluster is incompatible with the pyiron/latest on the login node?

This is a VERY frustrating experience I am having here. Loads of incomprehensible warnings. Error messages with zero information value. 'Objects can be only recovered from hdf5 if TYPE is given' is essentially a 'Something error occured'.

I close the ticket, nothing to win here any more.

jan-janssen · 2024-05-22T12:30:37Z

Changing to pyiron/2024-05-20 seemed to help. I was on pyiron/latest before, which apparently is NOT latest. Is it possible that the pyiron version used on the cluster is incompatible with the pyiron/latest on the login node?

@niklassiemer Can you comment on this?

samwaseda · 2024-05-22T13:26:24Z

Hmmm to my taste the PR got closed a bit too early. If there are updates I would appreciate you guys to post them here.

niklassiemer · 2024-05-22T17:40:47Z

Changing to pyiron/2024-05-20 seemed to help. I was on pyiron/latest before, which apparently is NOT latest. Is it possible that the pyiron version used on the cluster is incompatible with the pyiron/latest on the login node?

@niklassiemer Can you comment on this?

pyiron/latest is indeed after all the hand updated version with python3.10 which was somewhat older than the docker-stack build from yesterday. However, the version on the cluster and the one on the login node should not differ! Actually, the kernel chosen in the notebook should also be loaded on the compute node via preserving of the environment. If this is not the case, I need to know and find a solution!

freyso · 2024-05-24T11:38:53Z

Got the problem again, with the new kernel. So it's not about the python kernel.

I solved the problem again. This time, by avoiding minus-sign in the job name. I may have done this last time, too.

Is it possible that the appearance of a minus sign in the job name causes issues? It seems reproducible.
E20Vnm-test - fails in hdfio
E20Vnm_neutral - runs.

freyso · 2024-05-24T11:46:07Z

another thought: could be some inconsistency in the name normalization. For hdf5 file '-' seems replaced by m, in the job table, the '-' is still there. In the working directory, it becomes E20Vnmmtest_hdf/E20Vnm-test/ = some mixture.
I got confused by this at some point, that's why I had changed from minus to underscore. Yet, for me, minus is more convenient to type, so high chances I do this again.
Also, when I remove the job via pr.remove_job, the _hdf5 directory stays in place.

niklassiemer · 2024-05-24T11:52:55Z

Thanks for coming back to this! This could indeed be a reason! I opened an issue on pyiron_base.

freyso added the bug Something isn't working label May 21, 2024

jan-janssen assigned samwaseda May 21, 2024

freyso closed this as completed May 22, 2024

freyso reopened this May 24, 2024

niklassiemer mentioned this issue May 24, 2024

Job name normalization causing issues pyiron/pyiron_base#1445

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

job crashes early in hdfio #1422

job crashes early in hdfio #1422

freyso commented May 21, 2024 •

edited by pmrv

Loading

samwaseda commented May 21, 2024

pmrv commented May 21, 2024

samwaseda commented May 21, 2024

freyso commented May 22, 2024

jan-janssen commented May 22, 2024

samwaseda commented May 22, 2024

niklassiemer commented May 22, 2024

freyso commented May 24, 2024

freyso commented May 24, 2024

niklassiemer commented May 24, 2024

job crashes early in hdfio #1422

job crashes early in hdfio #1422

Comments

freyso commented May 21, 2024 • edited by pmrv Loading

samwaseda commented May 21, 2024

pmrv commented May 21, 2024

samwaseda commented May 21, 2024

freyso commented May 22, 2024

jan-janssen commented May 22, 2024

samwaseda commented May 22, 2024

niklassiemer commented May 22, 2024

freyso commented May 24, 2024

freyso commented May 24, 2024

niklassiemer commented May 24, 2024

freyso commented May 21, 2024 •

edited by pmrv

Loading