Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute on servers with dedicated job scheduler #27

Open
jewellsean opened this issue Jun 11, 2014 · 2 comments
Open

Execute on servers with dedicated job scheduler #27

jewellsean opened this issue Jun 11, 2014 · 2 comments

Comments

@jewellsean
Copy link

Under the current framework, is it possible to send all planned jobs to servers with a dedicated job scheduler? The workflow would likely be as follows:

  1. Plan runs locally (through cross product interface).
  2. Send and schedule jobs on the remote server.
    --- Completely disconnect from server, since large jobs will require at least a week to complete. It is not reasonable to require a consistent connection over that time period. ---
  3. Re-establish a connection and sync results.
@netj
Copy link
Owner

netj commented Jun 11, 2014

Hi @jewellsean, I'm glad you are looking for what I recently added to the codebase. In my view, you are asking for two features: asynchronous execution of runs and job scheduler support. The former is in the master branch already, but the latter is not there yet.

Using the LATEST version, you can define a ssh-cluster type target to schedule runs to one or more remote machines via plain ssh, then synchronize later as you want. If you have a set of machines that you have ssh access to, defining and using a ssh-cluster target will serve your needs well. This feature is undocumented yet, but you can get basic hint how to define one by running 3x target NAME define ssh-cluster. One thing to remember is that the same version of 3x should be installed on the remote machines.

However, if your cluster has a more sophisticated resource scheduler and/or disallows direct ssh access to individual machines, ssh-cluster target won't be that useful. If you have a specific job scheduler you want to use, and can provide some info how you submit and check your jobs, I'd be happy to add some code to 3X to directly support that.

@jewellsean
Copy link
Author

@netj, thank you for your detailed and quick response! Unfortunately, I am unable to access individual machines directly via ssh.

If you're willing to add some functionality for systems reliant on a job scheduler it would be greatly appreciated, and I can help where possible. The most comprehensive description of the scheduling environment can be found here, but I will also summarize. Essentially, I initially scp or rsync both input files and executables to the server's filesystem and then create a simple pbs script which specifies server specific parameters like walltime, cpu requirements etc. This script is then submitted to the job scheduler. The job scheduler has some built in commands to check the status of jobs, for example, 'showq -u 'username'' would list all active / queueing / blocked jobs.

If example scripts are helpful let me know. I will also help test these features. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants