Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dpm: set default rankby order if not specified by caller #13035

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hppritcha
Copy link
Member

The MPI standard defines a specific mappings of ranks in the MCW of a set of processes started by MPI_Comm_spawn_multiple to the supplied commands. See for example section 11.8.3 of the MPI 4.1 standard.

Something changed between orte in OMPI 4.1.x and prtte in OMPI 5 and main that caused this to not be the default ordering of ranks to commands in the MPI_Comm_spawn_multiple. This resulted in the ompi-tests/ibm/dynamic/spawn_multiple to fail with wrong mapping of arguments to ranks.

So, add a check in the ompi_dpm_spawn to set the PMIX_RANKBY attribute to "slot" if the app has not specified some other ranking option.

The MPI standard defines a specific mappings of ranks in the MCW
of a set of processes started by MPI_Comm_spawn_multiple to the supplied commands.
See for example section 11.8.3 of the MPI 4.1 standard.

Something changed between orte in OMPI 4.1.x and prtte in OMPI 5 and main
that caused this to not be the default ordering of ranks to commands in
the MPI_Comm_spawn_multiple.  This resulted in the ompi-tests/ibm/dynamic/spawn_multiple
to fail with wrong mapping of arguments to ranks.

So, add a check in the ompi_dpm_spawn to set the PMIX_RANKBY attribute to
"slot" if the app has not specified some other ranking option.

Signed-off-by: Howard Pritchard <[email protected]>
@rhc54
Copy link
Contributor

rhc54 commented Jan 13, 2025

Did the Standard change in more recent versions? I ask because it's the OMPI community that specifically insisted that the ranking follow the mapping (e.g., "if I map by package then the ranking should follow it"), and that's what your users also complained about when we didn't do it. I personally don't care - just a tad confused by the conflicting directives.

@hppritcha
Copy link
Member Author

this PR doesn't change the behavior if the user specifies some particular mapping. what is weird is that by default I'm seeing on a single node that MCW rank0 maps to app 0, rank1 to app1, rank 2 to app0, rank3 to app1 etc. that definitely is not the default as specified in the standard.

@rhc54
Copy link
Contributor

rhc54 commented Jan 14, 2025

Hmm...that's a bug, dude - let me take a look at it. We are supposed to rank all the app0 procs, and then rank all the app1 procs.

@rhc54
Copy link
Contributor

rhc54 commented Jan 14, 2025

I'm unable to reproduce that behavior - can you please provide the reproducer? Everything I'm trying works correctly, ranking each app separately.

@rhc54
Copy link
Contributor

rhc54 commented Jan 14, 2025

Little reminder: I don't have access to the ompi-tests repo, so I cannot see the referenced code. You could either provide it to me privately, or you can add me to the repo as an individual contributor for "read-only" access.

Just FWIW: this fix isn't actually correct. It may work for a particular default mapping pattern and/or combination of the number of apps and procs/app, but if the problem is as you describe, then it will fail if that combination changes. You are just getting lucky with that particular test and default mapping policy because the procs are being mapped such that app0's procs wind up on the first package, and app1's procs wind up on the second package. So when we rank by slot, you'll work through all the app0 procs first (since they are all on the 1st package) and then the app1 procs.

Change the default mapping pattern, or have more than two packages, or change any of a few other elements of the situation, and you'll get the wrong answer.

However, I'd first like to be able to reproduce it. I'm using the head of your "main" branch, in combination with the head of the PMIx and PRRTE master branches. Let's ensure this isn't some artifact of the PRRTE divergence problem.

@rhc54
Copy link
Contributor

rhc54 commented Jan 14, 2025

Dug a little more into this, and found the bug. This has nothing to do with comm_spawn. You'd get the same behavior any time someone specifies more than one app-context on the mpirun cmd line. Problem was simply a missing loop over the app-contexts in the job so that we computed all the ranks for each app before moving on to the next. Minus all the indent changes, it's a couple of lines of code.

Little advice for the future: if you see a rank assignment problem, it's probably best to start by looking at the rank assignment code in PRRTE's "src/mca/rmaps/base/rmaps_base_ranking.c" file. It has changed since ORTE - we split apart the various ranking modes instead of trying to do it all in one giant loop with big if clauses (which has its own bugs, but in modes that are rarely used and so nobody cares). Still, you can compare the two and immediately see the missing loop. Five minute fix.

I'm going to check a few more cases today to ensure the changes don't break something else, and then I'll commit it to PRRTE upstream master and v4.0 branches. If it backports cleanly to the v3.0 branch (as I suspect it will), I'll add it there as well. Since this has been there for quite some time without user complaint, I suspect this isn't something people either use or are particularly impacted by, so probably nothing terribly urgent.

@rhc54
Copy link
Contributor

rhc54 commented Jan 14, 2025

Oh, forgot to mention. The reason I couldn't reproduce this is that it requires you be executing on a multi-package machine, and that you execute more than 2 procs for each app-context. I met the latter requirement, but being on a single-package box, failed the former - and therefore could see no problem. Anytime you see a mapping-related issue, might be worth remembering to ask about number of packages on their machine as that is a critical factor in debugging it. I usually ask for a copy of their topology just to ensure I have the complete picture.

@rhc54
Copy link
Contributor

rhc54 commented Jan 14, 2025

Probably should go ahead and close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants