-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly handle dynamic extensions of the DVM #854
Conversation
The DVM can be extended in response to add_host and add_hostfile directives. In such cases, we need to provide the new daemons with a complete picture of the currently executing jobs so they can properly map the new one. Signed-off-by: Ralph Castain <[email protected]>
@rhc54 we are looking at making use of this expansion feature. Are there any examples in the test suite that generate how to use the |
Currently, it is done via a "spawn" command - i.e., as part of starting another job. I suppose we could either add an option to the PMIx |
Note that I'd need to check that this still worked as it has been a couple of years since anyone used it. I'm unaware of any examples that exercise it, nor anything in the test suite that tests it. |
Thanks. We actually found a "MPI" equivalent tests in the ompi unit tests. We're thinking currently to add an extension to prun for the experiments we're trying to do. we'll look into palloc and pctrl as other options too. |
I don't see an "add-host" or "add-hostfile" cmd line option defined in Like I said, I'll have to check the backend to ensure PRRTE still handles those correctly. |
It looks like the backend support is present, though I haven't checked it out. In order to do that, I had to add the cmd line options anyway - see #1769. Check the help text to see if it makes sense to you and meets your needs. I'll let you know once I've checked the backend to ensure it is working. |
Just an update: the add-host and add-hostfile features are now working on PRRTE master branch as per the help text from #1769. I don't plan to bring that to the v3.0 or v3.1 release branches - let me know if you need it and we can discuss backporting it. |
@rhc54 I am using #1769. I am trying to use add-hostfile to add a new node to an existing DVM. I am getting
Could you please let me know if I am using this feature incorrectly or what might be an issue with this? |
Not entirely sure of the problem. Could be that the presence of Slurm is causing confusion. You might try adding |
@rhc54 Thank you. This works now. Apparently, we have to specify the number of slots along with the node name in the hostfile. Otherwise, the added node is assigned -1 slots. I am not sure if it is a bug or expected. |
A bit of both. It is intended as an indicator that PRRTE should discover the number of slots based on CPUs on the new node. However, there is a check in there so that it isn't done in managed environments such as Slurm because the scheduler assigns the number of slots - and we cannot override it. The problem here is that we are faking a dynamic environment inside of what is actually a static one. Slurm assigned the nodes and defined the number of slots for each node. We are then trying to use those nodes as if they are ours to define. I'll have to ponder this a bit. There might be a way around it, but we have to be careful not to break the normal mode of operation. |
@BhattaraiRajat I believe this should now work correctly, even under a Slurm allocation. See #1851 for the change. |
The DVM can be extended in response to add_host and add_hostfile
directives. In such cases, we need to provide the new daemons
with a complete picture of the currently executing jobs so they
can properly map the new one.
Signed-off-by: Ralph Castain [email protected]