Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some explanation of MPISharedArray type to the docs #246

Merged
merged 2 commits into from
Sep 16, 2024

Conversation

johnomotani
Copy link
Collaborator

No description provided.

@johnomotani johnomotani added the documentation Improvements or additions to documentation label Sep 6, 2024
@johnomotani
Copy link
Collaborator Author

@LucasMontoya4 do these new docs make the MPISharedArray any clearer? https://mabarnes.github.io/moment_kinetics/previews/PR246/developing/#Array-types

I'm not sure what I've written so far adds much, but I'm blanking at the moment on what else to put. It'd help if someone would ask questions to help decide what to add!

@LucasMontoya4
Copy link
Collaborator

I think it makes more sense now. So the MPISharedArray exists in shared memory, but so will the Array when you run using mpi, and so they behave the same way?

@johnomotani
Copy link
Collaborator Author

Julia doesn't actually know that our arrays are shared-memory or not (which is partly why we have to do all the synchronization and checking by hand). We use an MPI function to allocate the shared memory, then create a Julia Array that accesses that bit of memory, in this function

"""
Get a shared-memory array of `mk_float` (shared by all processes in a 'block')
Create a shared-memory array using `MPI.Win_allocate_shared()`. Pointer to the memory
allocated is wrapped in a Julia array. Memory is not managed by the Julia array though.
A reference to the `MPI.Win` needs to be freed - this is done by saving the `MPI.Win`
into a `Vector` in the `Communication` module, which has all its entries freed by the
`finalize_comms!()` function, which should be called when `moment_kinetics` is done
running a simulation/test.
Arguments
---------
dims - mk_int or Tuple{mk_int}
Dimensions of the array to be created. Dimensions passed define the size of the
array which is being handled by the 'block' (rather than the global array, or a
subset for a single process).
comm - `MPI.Comm`, default `comm_block[]`
MPI communicator containing the processes that share the array.
maybe_debug - Bool
Can be set to `false` to force not creating a DebugMPISharedArray when debugging is
active. This avoids recursion when including a shared-memory array as a member of a
DebugMPISharedArray for debugging purposes.
Returns
-------
Array{mk_float}
"""
function allocate_shared(T, dims; comm=nothing, maybe_debug=true)
if comm === nothing
comm = comm_block[]
elseif comm == MPI.COMM_NULL
# If `comm` is a null communicator (on this process), then this array is just a
# dummy that will not be used.
array = Array{T}(undef, (0 for _ dims)...)
@debug_shared_array begin
# If @debug_shared_array is active, create DebugMPISharedArray instead of Array
if maybe_debug
array = DebugMPISharedArray(array, comm)
end
end
return array
end
br = MPI.Comm_rank(comm)
bs = MPI.Comm_size(comm)
n = prod(dims)
if n == 0
# Special handling as some MPI implementations cause errors when allocating a
# size-zero array
array = Array{T}(undef, dims...)
@debug_shared_array begin
# If @debug_shared_array is active, create DebugMPISharedArray instead of Array
if maybe_debug
array = DebugMPISharedArray(array, comm)
end
end
return array
end
if br == 0
# Allocate points on rank-0 for simplicity
dims_local = dims
else
dims_local = Tuple(0 for _ dims)
end
@debug_shared_array_allocate begin
# Check that allocate_shared was called from the same place on all ranks
st = stacktrace()
stackstring = string([string(s, "\n") for s st]...)
# Only include file and line number in the string that we hash so that
# function calls with different signatures are not seen as different
# (e.g. time_advance!() with I/O arguments on rank-0 but not on other
# ranks).
signaturestring = string([string(s.file, s.line) for s st]...)
hash = sha256(signaturestring)
all_hashes = MPI.Allgather(hash, comm)
l = length(hash)
for i 1:length(all_hashes)÷l
if all_hashes[(i - 1) * l + 1: i * l] != hash
error("allocate_shared() called inconsistently\n",
"rank $(block_rank[]) called from:\n",
stackstring)
end
end
end
win, array_temp = MPI.Win_allocate_shared(Array{T}, dims_local, comm)
# Array is allocated contiguously, but `array_temp` contains only the 'locally owned'
# part. We want to use as a shared array, so want to wrap the entire shared array.
# Get array from rank-0 process, which 'owns' the whole array.
array = MPI.Win_shared_query(Array{T}, dims, win; rank=0)
# Don't think `win::MPI.Win` knows about the type of the pointer (its concrete type
# is something like `MPI.Win(Ptr{Nothing} @0x00000000033affd0)`), so it's fine to
# put them all in the same global_Win_store - this won't introduce type instability
push!(global_Win_store, win)
@debug_shared_array begin
# If @debug_shared_array is active, create DebugMPISharedArray instead of Array
if maybe_debug
debug_array = DebugMPISharedArray(array, comm)
if comm == comm_anyv_subblock[]
push!(global_anyv_debugmpisharedarray_store, debug_array)
else
push!(global_debugmpisharedarray_store, debug_array)
end
return debug_array
end
end
return array
end

Normally, when debugging is not activated, MPISharedArray is just an alias for Array, defined here

"""
"""
const MPISharedArray = @debug_shared_array_ifelse(DebugMPISharedArray, Array)

(I think I might write an actual docstring for that now...). When debugging is not activated, that macro expands out to give just

const MPISharedArray = Array

but if we hard-coded Array into our struct definitions, etc., then when we do activate debugging and want to replace Array with DebugMPISharedArray, we wouldn't be able to because DebugMPISharedArray isn't an Array. Julia only allows types to inherit from 'abstract types' not 'concrete types'. Array and DebugMPISharedArray both have to be concrete types (you can actually create instances of them), and they both inherit from AbstractArray (an abstract type, which effectively declares that a thing acts like an array, so generic functions know they can work with it). We don't want to put AbstractArray in our struct definitions because it's not a concrete type, so structs defined like that would not be type-stable.

@LucasMontoya4
Copy link
Collaborator

That makes so much sense now, thank you! I didn't mean for you to write such a detailed comment, but I'm sure that that will help others in future as well...

@johnomotani johnomotani merged commit cab2c90 into master Sep 16, 2024
17 checks passed
@johnomotani johnomotani deleted the array-type-docs branch September 16, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants