chunk size benchmarks for intermediate files #423

dkazanc · 2024-08-16T16:12:46Z

investigate how the size of chunks affects the writing speed (reading is not as important in our case as the result is usually saved). From a quick play with the --frames-per-chunk parameter, it can be seen the the speed of writing differs significantly. For 40gb raw dataset, saving 54 Gb data.

1 frame per chunk - 28s
4fpc = 13.3s
16fpc = 12.2
32fpc - 14.8s

@ptim0626 also mentioned that we should look into a possibility of changing rdcc values according to this guide. In the code, I believe, this is the place where it needs to be changed to something like this for 16 fpc?

        self._h5file = h5py.File(
            file,
            "w",
            driver="mpio",
            comm=comm,
            rdcc_nslots=1e5,
            rdcc_nbytes=4 * (2048**2 * 16),
            rdcc_w0=1,
        )

The text was updated successfully, but these errors were encountered:

dkazanc · 2024-10-07T10:41:28Z

Just to add more info on what I heard from the hdf5 group on NOBUGS about our sizes of chunks when writing the intermediate files. They are suggested for us to try chunks that are not taking the whole frame dimension, but rather a smaller non-rectangular part of it and increase the depth.

Currently we have a parameter --frames-per-chunk which will take the whole frame, for instance, for the sinogram case, our one frame will be something of [1800, 2560] with sizes are [number of projections, detectorX]. Then if we to increase the depth, this would result in very large chunks. They are suggested for us to try much smaller chunks with some depth. Following the example above, the chunk can be [180, 256, depth]. And vary the depth in a range [2:32].

If this would accelerate the slow writing of sinogram sliced data (e.g., the result of the reconstruction), we might want to rethink the --frames-per-chunk parameter and rather specify the size of the chunk relatively, e.g., 10% of a full frame and the depth?

yousefmoazzam mentioned this issue Aug 28, 2024

adding watermark to sweep feature while saving images #426

Merged

3 tasks

ptim0626 linked a pull request Jan 7, 2025 that will close this issue

Decide the value of frames_per_chunk from saturation bandwidth #537

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunk size benchmarks for intermediate files #423

chunk size benchmarks for intermediate files #423

dkazanc commented Aug 16, 2024 •

edited

Loading

dkazanc commented Oct 7, 2024

chunk size benchmarks for intermediate files #423

chunk size benchmarks for intermediate files #423

Comments

dkazanc commented Aug 16, 2024 • edited Loading

dkazanc commented Oct 7, 2024

dkazanc commented Aug 16, 2024 •

edited

Loading