Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write HDF5 Error for large matrix #702

Open
Goon83 opened this issue Mar 26, 2020 · 14 comments
Open

Write HDF5 Error for large matrix #702

Goon83 opened this issue Mar 26, 2020 · 14 comments

Comments

@Goon83
Copy link

Goon83 commented Mar 26, 2020

Hi All,
Hope you are doing well under current virus epidemic situation. I understand that you may have important things to do now. I just posted an error information here in case you can get chance to look into it.

Recently, I tested the function StoreHDF::write and found out that it works on small matrix but has issue for large array on multiple processes (CPU units). Below is the code:

using dash::io::hdf5::hdf5_options;
using dash::io::hdf5::StoreHDF;

#define N1 201
#define N2 15000
int main(int argc, char *argv[])
{
	dash::init(&argc, &argv);

	dash::Matrix<double, 2> *h5matrix = new dash::Matrix<double, 2>(dash::SizeSpec<2>(N1, N2));

	auto myid = dash::myid();

	if (!myid)
	{
		for (int i = 0; i < N1; i++)
		{
			for (int j = 0; j < N2; j++)
				h5matrix->at(i, j) = i + j;
		}
	}
	StoreHDF::write(*h5matrix, "testf.h5", "testg/testd2D");
	StoreHDF::read(*h5matrix, "testf.h5", "testg/testd2D");

	if (myid == 1)
	{
		for (int i = 0; i < N1; i++)
		{
			for (int j = 0; j < N2; j++)
			{
				double t = h5matrix->at(i, j);
				if (t != (i + j))
				{
					std::cout << "Wrong result \n";
					exit(-1);
				}
			}
		}
	}
	dash::finalize();

	return 0;
}

I compiled the code and ran it with 2 process, and it reports below error. Note that, if you change N1 and N2 in the code to small number, e.g, 10 by 10. It works.

>> mpirun  -n 2 ./h5-test
HDF5-DIAG: Error detected in HDF5 (1.10.5) MPI-process 1:
  #000: H5Dio.c line 322 in H5Dwrite(): could not get a validated dataspace from file_space_id
    major: Invalid arguments to routine
    minor: Bad value
  #001: H5S.c line 254 in H5S_get_validated_dataspace(): selection + offset not within extent
    major: Dataspace
    minor: Out of range
^C[mpiexec@dbinMac] Sending Ctrl-C to processes as requested
[mpiexec@dbinMac] Press Ctrl-C again to force abort
@Goon83
Copy link
Author

Goon83 commented Jun 12, 2020

Hi DASH Community,
Just check whether someone can help to check this issue ?

Bests,
Bin

@dhinf
Copy link
Member

dhinf commented Jun 17, 2020

I will look into it this week.

@dhinf
Copy link
Member

dhinf commented Jun 18, 2020

The problem is not the total size of the NArray, it is the size of the extension. E.g. 21 x 20 also results in an error. For 21 a the first rank has 11 elements in the first dimension, but the second one only 20. This seems to be a problem. I tried it with the outputstream. In the end it should be the same result.

#define N1 200
#define N2 15000

#define FILENAME "example.hdf5"

int main(int argc, char *argv[])
{
  dash::init(&argc, &argv);
  dash::Matrix<double, 2> h5matrix(dash::SizeSpec<2>(N1, N2));
  auto myid = dash::myid();

  if (!myid) {
    for (int i = 0; i < N1; i++) {
	for (int j = 0; j < N2; j++)
	  h5matrix.at(i, j) = i + j;
    }
  }
  dash::io::hdf5::OutputStream os(FILENAME);
  os << dash::io::hdf5::dataset("group/data") << h5matrix;

  dash::barrier();
  if(dash::myid() == 0){
    std::string syscall = "h5dump ";
    auto status = system((syscall + FILENAME).c_str());
  }
  dash::finalize();

 return 0;
}

@dhinf
Copy link
Member

dhinf commented Jun 19, 2020

It is a bug in the TilePattern. When you use the proxy dash::NArray instead of dash::Matrix it should work. dash::Matrix uses the per default the TilePattern while dash::Narray uses a BlockPattern instead. That's the only difference.

@dhinf
Copy link
Member

dhinf commented Jun 19, 2020

little work around until we fixed the pattern

@dhinf
Copy link
Member

dhinf commented Jun 19, 2020

I fixed it, but if you compile dash with enabled assertions you will get an error by using the TilePattern with underfilled blocks.
@devreal and @fuchsto: Why does a TilePattern can't have underfilled blocks? What was the reason to forbid it.

@devreal
Copy link
Member

devreal commented Jun 19, 2020

I believe that is a longstanding issue that has never been properly implemented. If someone has a patch I would love that...

@dhinf
Copy link
Member

dhinf commented Jun 22, 2020

The solution would be the same as for the BlockedPattern. Only the last Block is underfilled. If that is fine i will open a pull request.

@devreal
Copy link
Member

devreal commented Jun 22, 2020

Absolutely, please give it a shot 👍

@dhinf
Copy link
Member

dhinf commented Jun 26, 2020

fixed with pr #713

@Goon83
Copy link
Author

Goon83 commented Jan 5, 2021

@dhinf @devreal

Thanks for working on this issue.
Tested the bug-dash-hdf5-pattern branch and it works.

Could you please review the merge and get the code into development branch?

Thanks.
Bin

@Goon83
Copy link
Author

Goon83 commented Apr 26, 2021

@dhinf @devreal

I recently tested the code on a 1D data and found the StoreHDF::write and StoreHDF::read still can not work.
The test cod code and error information are presented in below.
Could you help to look into this?

Bests,
Bin
Test code:

#include "libdash.h"
#include

using dash::io::hdf5::hdf5_options;
using dash::io::hdf5::StoreHDF;

#define N1 201

int main(int argc, char *argv[])
{
dash::Matrix<double, 1> *h5matrix_1d = new dash::Matrix<double, 1>(dash::SizeSpec<1>(N1));
auto myid = dash::myid();

if (!myid)
{
    for (int i = 0; i < N1; i++)
    {
        h5matrix_1d->at(i) = i;
    }
}
StoreHDF::write(*h5matrix_1d, "testf-1d.h5", "testg/testd1D");
StoreHDF::read(*h5matrix_1d, "testf-1d.h5", "testg/testd1D");

if (myid == 1)
{
    for (int i = 0; i < N1; i++)
    {
        double t = h5matrix_1d->at(i);
        if (t != i)
        {
            std::cout << "Wrong result \n";
            exit(-1);
        }
    }
}

dash::finalize();

return 0;

}

==========
Error Info:

dbin@Bins-MBP dash % ~/work/soft/dash/build/install/bin/dash-mpiCC h5-1d.cpp -o h5-1d
In file included from h5-1d.cpp:1:
In file included from /Users/dbin/work/soft/dash/build/install//include/libdash.h:71:
In file included from /Users/dbin/work/soft/dash/build/install//include/dash/io/HDF5.h:4:
/Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:566:26: error: no member named
'underfilled_blocksize' in 'dash::TilePattern<1, dash::ROW_MAJOR, long>'
} else if (pattern.underfilled_blocksize(dimensions.back()) == 0) {
~~~~~~~ ^
/Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:600:12: note: in instantiation of function
template specialization 'dash::io::hdf5::StoreHDF::_get_hdf_slabs_with_underfilled<dash::TilePattern<1,
dash::ROW_MAJOR, long> >' requested here
return _get_hdf_slabs_with_underfilled(pattern);
^
/Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/internal/DriverImplZeroCopy.h:32:21: note: in instantiation
of function template specialization 'dash::io::hdf5::StoreHDF::_get_hdf_slabs<1, dash::ROW_MAJOR, long>' requested
here
auto hyperslabs = _get_hdf_slabs(container.pattern());
^
/Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:762:5: note: in instantiation of function
template specialization 'dash::io::hdf5::StoreHDF::_process_dataset_impl_zero_copy<dash::Matrix<double, 1, long,
dash::TilePattern<1, dash::ROW_MAJOR, long>, dash::HostSpace> >' requested here
_process_dataset_impl_zero_copy(StoreHDF::Mode::WRITE, container, h5dset,
^
/Users/dbin/work/soft/dash/build/install//include/dash/io/hdf5/StorageDriver.h:234:5: note: in instantiation of function
template specialization 'dash::io::hdf5::StoreHDF::_write_dataset_impl<dash::Matrix<double, 1, long,
dash::TilePattern<1, dash::ROW_MAJOR, long>, dash::HostSpace> >' requested here
_write_dataset_impl(array, h5dset, internal_type);
^
h5-1d.cpp:21:15: note: in instantiation of function template specialization
'dash::io::hdf5::StoreHDF::write<dash::Matrix<double, 1, long, dash::TilePattern<1, dash::ROW_MAJOR, long>,
dash::HostSpace> >' requested here
StoreHDF::write(*h5matrix_1d, "testf-1d.h5", "testg/testd1D");
^
1 error generated.

@dhinf
Copy link
Member

dhinf commented Apr 27, 2021

@Goon83

i'll look into it. I need to add the missing method inside the pattern. i'll try to do it this week.

best
Denis

@dhinf
Copy link
Member

dhinf commented Apr 28, 2021

It is fixed and merged to development. Btw. in your example code dash::init(&argc,&argv) is missing.
Please check, if the fix works your environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants