Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdf5 files are opened too often in DataContainer when loading SPHInX jobs #1574

Open
freyso opened this issue Aug 29, 2024 · 7 comments
Open

Comments

@freyso
Copy link
Contributor

freyso commented Aug 29, 2024

Test example:

import pyiron
import pyiron_base.storage.hdfio

# --- replace pyiron_base.storage.hdfio.FileHDFio._open_hdf by wrapper to count calls
def _open_hdf(filename,mode="r",swmr=False):
    import h5io_browser.base
    global count_open_calls
    count_open_calls += 1
    return h5io_browser.base._open_hdf(filename,mode,swmr)
setattr(pyiron_base.storage.hdfio,"_open_hdf",_open_hdf)
count_open_calls=0

pr = pyiron.Project ("test")
bulk_str = pr.create.structure.bulk ("TiC", crystalstructure="rocksalt", a=4.33)
dft = pr.create.job.Sphinx ("bulk-dummy")
dft.structure = bulk_str
dft.input['Xcorr']='PBE'
dft.set_encut (450)
dft.set_kpoints (3*[4])
print(count_open_calls)
dft.calc_static ()
print(count_open_calls)

dft.run (delete_existing_job=True)
print(count_open_calls)

pr.load ("bulk-dummy")
print(count_open_calls)

I get 269 _open_hdf calls for the run, and 802 calls for the load.

From discussions with @jan-janssen and @pmrv , I get the info that the key challenge here is

  • one usually does not want to unconditionally load the entire hdf5 content
  • one does not want to leave the hdf5 file open for too long, due to concurrent accessing e.g. from running jobs
  • one cannot see at the bottom how long an hdf5 file should be kept

Possible solution

  • create an hdf5 "open files" cache (dict: name->h5 object) that is checked before opening an h5 (as context) inside the pyiron_base.storage.hdfio
  • add internal functions to add/remove to the cache
  • wrap those in a new context object to be used before hdf5-intense operations (say hdf_leave_open)

Use scenario:

from pyiron_base.storage.hdfio import hdf_leave_open

with hdf_leave_open(h5filename):
    # file is kept open within cache in this block
    data_container.recursive_load (h5filename)
# h5file is closed

In this way, high-level code can indicate when it is going to enter and leave hdf5-intense code, and low-level code would be augmented by a single check for the existence of an open file handle before opening the file. Performance-wise, the open call is much more expensive than the dictionary lookup, so no measurable price to pay when the cache is not used.

Using a context should make this rather robust against unexpected errors. The new context manager should check at context enter if the cache is already filled, and if so, do nothing and set a "noop" flag for the context exit. Then, one could even deal with nested contexts for the same file if programmers do not realize that other parts of the code also have caching instructions. If pyiron needs to be thread-safe, a locking mechanism for the cache access is needed.

@pmrv
Copy link
Contributor

pmrv commented Aug 29, 2024

Yep, this would the solution I'd like too. Last I tried to prototype it I got bogged down in the details, but in the end I think it would have to be something like this.

@samwaseda
Copy link
Member

Last I tried to prototype it I got bogged down in the details, but in the end I think it would have to be something like this.

Same for me…

@pmrv
Copy link
Contributor

pmrv commented Aug 29, 2024

So I did a quick survey and I think it can be implemented in h5io_browsers _open_hdf, but there's two things with the context manager approach that make this slightly more tricky:

  1. What happens if the same file gets opened with different modes in nested calls to the context manager
  2. The returned file handle is used outside of h5io_browser as a context manager, which closes the underlying file. So if there are multiple of such usages (as there are bound to be), the first one would close the file making it unusable for the second.

The first is probably the smallest problem, but I'd have to check where we could hook into to circumvent the second problem.

@freyso
Copy link
Contributor Author

freyso commented Aug 30, 2024

To avoid concurrent efforts: I will make a first draft of this; we can discuss it on Monday.

@pmrv
Copy link
Contributor

pmrv commented Aug 30, 2024

To avoid concurrent efforts: I will make a first draft of this; we can discuss it on Monday.

Sorry, I couldn't sleep yesterday and I have something almost there now. :')

@pmrv
Copy link
Contributor

pmrv commented Aug 30, 2024

Main changes to h5io are here, I won't be unhappy if you want to take a look and take it from there, but it would need then strategic placement of the context manager around pyiron_base and maybe even the sphinx class. When I use the context manager in sphinx' to_hdf/from_hdf/collect_output/set_input_read_only, I can save most of the file open that you reported.

@samwaseda
Copy link
Member

Ok I’m gonna try to work on it by Monday just in order to make it more complicated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants