Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds sparse communicator class #1546

Closed
wants to merge 2 commits into from
Closed

Conversation

MarcelKoch
Copy link
Member

@MarcelKoch MarcelKoch commented Feb 14, 2024

Based on #1544

This PR adds a class that handles the communication of a local vector. It uses the neigborhood non-blocking all-to-all for this. The sparse comm is constructed from an index_map.

In another PR this will be used in the distributed matrix.

Note: This uses C++17 to use type-erasure for the sparse_communicator.

PR Stack:

@MarcelKoch MarcelKoch self-assigned this Feb 14, 2024
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. labels Feb 14, 2024
@MarcelKoch
Copy link
Member Author

@upsj I think this PR might be a good opportunity to discuss your distributed::RowGather idea. In principle this class is already that, but not in the lin-op hierarchy.

Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class structure mixes a few things together: the communicator class idea, with the specific dense communicate (involving a all_to_all_v). I think it would make sense to separate the two.

I think it should be possible to make the sparse_communicator a first class communicator and overload only those functions as necessary, for example, all_to_all_v in this case. The distributed object can then just use sparse communicator as a normal communicator and inherits the better all_to_all_v as it uses the sparse communicator for communication. For all other functions where the sparse communicator does not make sense (MPI_Send etc), you transparently use the underlying default communicator.

* Simplified MPI communicator that handles only neighborhood all-to-all
* communication.
*/
class sparse_communicator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it only provides a neighborhood all to all communication, then I think calling it a sparse communicator might be misleading.

Comment on lines +55 to +56
const detail::DenseCache<ValueType>& send_buffer,
const detail::DenseCache<ValueType>& recv_buffer) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A public method of a public class should not be taking objects within the detail namespace. Probably need to generalize it to array or Dense ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that if the communicator stores the caches (which I also prefer) then the communicator can handle only one async communication at a time. Any new communication would need to wait until the internal buffers are ready to access.

Comment on lines +81 to +87
MPI_Info info;
GKO_ASSERT_NO_MPI_ERRORS(MPI_Info_create(&info));
GKO_ASSERT_NO_MPI_ERRORS(MPI_Dist_graph_create_adjacent(
comm.get(), send_ids.get_size(), send_ids.get_const_data(),
MPI_UNWEIGHTED, recv_ids.get_size(), recv_ids.get_const_data(),
MPI_UNWEIGHTED, info, false, &sparse_comm));
GKO_ASSERT_NO_MPI_ERRORS(MPI_Info_free(&info));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

Suggested change
MPI_Info info;
GKO_ASSERT_NO_MPI_ERRORS(MPI_Info_create(&info));
GKO_ASSERT_NO_MPI_ERRORS(MPI_Dist_graph_create_adjacent(
comm.get(), send_ids.get_size(), send_ids.get_const_data(),
MPI_UNWEIGHTED, recv_ids.get_size(), recv_ids.get_const_data(),
MPI_UNWEIGHTED, info, false, &sparse_comm));
GKO_ASSERT_NO_MPI_ERRORS(MPI_Info_free(&info));
GKO_ASSERT_NO_MPI_ERRORS(MPI_Dist_graph_create_adjacent(
comm.get(), send_ids.get_size(), send_ids.get_const_data(),
MPI_UNWEIGHTED, recv_ids.get_size(), recv_ids.get_const_data(),
MPI_UNWEIGHTED, MPI_INFO_NULL, false, &sparse_comm));

since no key-value pairs are being set anyway ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One MPI implementation, I can't remember on which system, crashed without it, so I just kept it.

Comment on lines +196 to +199
mpi::request sparse_communicator::communicate(
const matrix::Dense<ValueType>* local_vector,
const detail::DenseCache<ValueType>& send_buffer,
const detail::DenseCache<ValueType>& recv_buffer) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this now, I dont get why the sparse_communicator class handles this specific communicate function.

@MarcelKoch
Copy link
Member Author

@pratikvn you are right, the class mixes a few responsibilities. I will split it up into one class that handles the all-to-all communication and another one which does the distributed row gather.

@MarcelKoch MarcelKoch force-pushed the optional-cxx17 branch 2 times, most recently from c46d056 to 9546c60 Compare April 4, 2024 10:29
@MarcelKoch
Copy link
Member Author

This is superseded by #1588 and #1589

@MarcelKoch MarcelKoch closed this Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mod:core This is related to the core module. reg:build This is related to the build system. reg:testing This is related to testing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants