-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds sparse communicator class #1546
Conversation
@upsj I think this PR might be a good opportunity to discuss your |
58c6462
to
11f9ba3
Compare
aa8a600
to
0b779ba
Compare
0b779ba
to
5800134
Compare
111ff51
to
56b1dc6
Compare
5800134
to
5ca7d71
Compare
56b1dc6
to
e4c2467
Compare
e4c2467
to
b6c6932
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this class structure mixes a few things together: the communicator
class idea, with the specific dense communicate (involving a all_to_all_v). I think it would make sense to separate the two.
I think it should be possible to make the sparse_communicator a first class communicator and overload only those functions as necessary, for example, all_to_all_v in this case. The distributed object can then just use sparse communicator as a normal communicator and inherits the better all_to_all_v as it uses the sparse communicator for communication. For all other functions where the sparse communicator does not make sense (MPI_Send etc), you transparently use the underlying default communicator.
* Simplified MPI communicator that handles only neighborhood all-to-all | ||
* communication. | ||
*/ | ||
class sparse_communicator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it only provides a neighborhood all to all communication, then I think calling it a sparse communicator might be misleading.
const detail::DenseCache<ValueType>& send_buffer, | ||
const detail::DenseCache<ValueType>& recv_buffer) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A public method of a public class should not be taking objects within the detail
namespace. Probably need to generalize it to array or Dense ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that if the communicator stores the caches (which I also prefer) then the communicator can handle only one async communication at a time. Any new communication would need to wait until the internal buffers are ready to access.
MPI_Info info; | ||
GKO_ASSERT_NO_MPI_ERRORS(MPI_Info_create(&info)); | ||
GKO_ASSERT_NO_MPI_ERRORS(MPI_Dist_graph_create_adjacent( | ||
comm.get(), send_ids.get_size(), send_ids.get_const_data(), | ||
MPI_UNWEIGHTED, recv_ids.get_size(), recv_ids.get_const_data(), | ||
MPI_UNWEIGHTED, info, false, &sparse_comm)); | ||
GKO_ASSERT_NO_MPI_ERRORS(MPI_Info_free(&info)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not
MPI_Info info; | |
GKO_ASSERT_NO_MPI_ERRORS(MPI_Info_create(&info)); | |
GKO_ASSERT_NO_MPI_ERRORS(MPI_Dist_graph_create_adjacent( | |
comm.get(), send_ids.get_size(), send_ids.get_const_data(), | |
MPI_UNWEIGHTED, recv_ids.get_size(), recv_ids.get_const_data(), | |
MPI_UNWEIGHTED, info, false, &sparse_comm)); | |
GKO_ASSERT_NO_MPI_ERRORS(MPI_Info_free(&info)); | |
GKO_ASSERT_NO_MPI_ERRORS(MPI_Dist_graph_create_adjacent( | |
comm.get(), send_ids.get_size(), send_ids.get_const_data(), | |
MPI_UNWEIGHTED, recv_ids.get_size(), recv_ids.get_const_data(), | |
MPI_UNWEIGHTED, MPI_INFO_NULL, false, &sparse_comm)); |
since no key-value pairs are being set anyway ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One MPI implementation, I can't remember on which system, crashed without it, so I just kept it.
mpi::request sparse_communicator::communicate( | ||
const matrix::Dense<ValueType>* local_vector, | ||
const detail::DenseCache<ValueType>& send_buffer, | ||
const detail::DenseCache<ValueType>& recv_buffer) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this now, I dont get why the sparse_communicator class handles this specific communicate function.
b6c6932
to
bb36d91
Compare
@pratikvn you are right, the class mixes a few responsibilities. I will split it up into one class that handles the all-to-all communication and another one which does the distributed row gather. |
c46d056
to
9546c60
Compare
9546c60
to
3698fb5
Compare
1e3d482
to
63ba23d
Compare
4c69493
to
2aa609c
Compare
2aa609c
to
f2559d9
Compare
Based on #1544
This PR adds a class that handles the communication of a local vector. It uses the neigborhood non-blocking all-to-all for this. The sparse comm is constructed from an
index_map
.In another PR this will be used in the distributed matrix.
Note: This uses C++17 to use type-erasure for the
sparse_communicator
.PR Stack: