Skip to content

Latest commit

 

History

History
209 lines (177 loc) · 8 KB

README.md

File metadata and controls

209 lines (177 loc) · 8 KB

🍋 LMN - A minimal launcher

A lightweight tool to run your local project on a remote machine, with a single command.

lmn can set up a container environment (Docker or Singularity) and can work with job schedulers (Slurm or PBS).

Test PyPI

🧑‍💻 lmn in action:

In your project directory, you can simply run

$ lmn run <remote-machine> -- python generate_fancy_images.py

It does:

  1. rsync your local project directory with the remote machine
  2. ssh into the remote machine, (optionally) set up the specified env (Docker or Singularity)
  3. Inside of the environment, lmn runs python generate_fancy_images.py
  4. If files are created (such as images), lmn rsync the generated files back to the local project directory

🚀 Quickstart

▶️  Installation

You can install lmn via pip:

$ pip install lmn

▶️  Configuration

All you need is to place a single configuration file at project_dir/.lmn.json5 (on your local machine).
The config file looks like:

Example Configuration
{
    "project": {
        "name": "my_project",
        // What not to rsync with the remote machine:
        "exclude": [".git", ".venv", "wandb", "__pycache__"],
        // Project-specific environment variables:
        "environment": {
            "MUJOCO_GL": "egl"
        }
    },
    "machines": {
        "elm": {
            // Host information
            "host": "elm.ttic.edu",
            "user": "takuma",
            // Rsync target directory (on the remote host)
            "root_dir": "/scratch/takuma/lmn",
            // Mode: ["ssh", "docker", "slurm", "pbs", "slurm-sing", "pbs-sing"]
            "mode": "docker",
            // Docker configurations
            "docker": {
                "image": "ripl/my_transformer:latest",
                "network": "host",
                // Mount configurations (host -> container)
                "mount_from_host": {
                    "/ripl/user/takuma/project/": "/project",
                    "/dev/shm": "/dev/shm",
                },
            },
            // Host-specific environment variables
            "environment": {
                "PROJECT_DIR": "/project",
            },
        },
        "tticslurm": {
            "host": "slurm.ttic.edu",
            "user": "takuma",
            "mode": "slurm-sing",  // Running a Singularity container on a cluster with Slurm job scheduler
            "root_dir": "/share/data/ripl-takuma/lmn",
            // Slurm job configurations
            "slurm": {
                "partition": "contrib-gpu",
                "cpus_per_task": 1,
                "time": "04:00:00",
                "output": "slurm-%j.out.log",
                "error": "slurm-%j.error.log",
                "exclude": "gpu0,gpu18",
            },
            // Singularity configurations
            "singularity": {
                "sif_file": "/share/data/ripl-takuma/singularity/my_transformer.sif",
                "writable_tmpfs": true,
                "startup": "ldconfig /.singularity.d/libs",  // Command to run after starting up the container
                "mount_from_host": {
                    "/share/data/ripl-takuma/project/": "/project",
                },
            },
            "environment": {
                "PROJECT_DIR": "/project",
            }
        }
    }
}

More example configurations can be found in the example directory.

▶️  Command examples

Make sure that you're in the project directory first.

# Launch an interactive shell in the docker container (on elm):
$ lmn run elm -- bash

# Run a job in the docker container (on elm):
$ lmn run elm -- python train.py

# Run a script on the host (on elm):
$ lmn run elm --mode ssh -- python hello.py

# Run a command quickly on the host without syncing any files ("bare"-run; on elm)
$ lmn brun elm -- hostname

# Check GPU usage on elm (This is equivalent to `lmn brun elm -- nvidia-smi`)
$ lmn nv elm

# Launch an interactive shell in the Singularity container via Slurm scheduler (on tticslurm)
$ lmn run tticslurm -- bash

# Run a job in the Singularity container via Slurm scheduler (on tticslurm)
$ lmn run tticslurm -- python train.py

# Submit a batch job that runs in the Singularity container via Slurm scheduler (on tticslurm)
$ lmn run tticslurm -d -- python train.py

# Launching a sweep (batch jobs) that runs in the Singularity container via Slurm scheduler (on tticslurm)
# This submits 10 batch jobs where `$LMN_RUN_SWEEP_IDX` is set from 0 to 9.
$ lmn run tticslurm --sweep 0-9 -d -- python train.py -l '$LMN_RUN_SWEEP_IDX'

# Run a script on the login node (on tticslurm)
$ lmn run tticslurm --mode ssh -- squeue -u takuma

# Get help
$ lmn --help

# Get help on `lmn run`
$ lmn run --help
More about `--sweep` format
  • --sweep 0-9: ten jobs with LMN_RUN_SWEEP_IDX=0, 1 through 9
    • Internally lmn simply runs range(0, 9 + 1)
  • --sweep 7: a single job with LMN_RUN_SWEEP_IDX=7
  • --sweep 3,5,8: three jobs with LMN_RUN_SWEEP_IDX=3 and 5 and 8

Comparison with other packages

lmn is inspired by the following great packages: geyang/jaynes, justinjfu/doodad and afdaniele/cpk.

  • jaynes and doodad focus on launching a lot of jobs in non-interactive mode
    • ✅ support ssh, docker, slurm and AWS / GCP (but not PBS scheduler)
    • 😢 do not support Singularity
    • 😢 cannot launch interactive jobs
    • 😢 only work with Python project, and require (although small) modifications to the project codebase
  • cpk focuses on (though not limited to) ROS application and running programs in docker containers
    • ✅ supports X forwarding and more stuff that are helpful to run ROS applications on the container
    • ✅ provides more functionalities such as creating and deploying ssh keys on remote machines
    • 😢 does not support clusters with schedulers (Slurm or PBS), nor does it support Singularity

Tasks

  • Use Pydantic for configurations
  • add ssh-release subcommand (to drop the persistent ssh connection)
  • Remove project.name from the global config, and get project name from the project root directory name