Slurm_tools/jobs at master · OleHolmNielsen/Slurm_tools

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
joblist		joblist
jobnice		jobnice
jobqos		jobqos
jobtimelimit		jobtimelimit
notifybadjob		notifybadjob
psjob		psjob
sbadjobs		sbadjobs
schedjobs		schedjobs
showjob		showjob
showjobreasons		showjobreasons
sratelimit		sratelimit
warn_maxjobs		warn_maxjobs

README.md

Slurm job scripts

Some convenient scripts for managing Slurm jobs:

psjob: Do a ps process status on a job's node-list, but exclude system processes: psjob [-c columns | -h] <jobid>. Requires ClusterShell. The scratch disk usage with Slurm job_container/tmpfs (if configured) is printed. GPU usage may also be printed.
showjob: Show status of Slurm job(s). Both queue information and accounting information is printed.
showjobreasons: Show list of non-resource reasons for pending jobs, such as AssocGrpCpuLimit and AssocGrpNodeLimit
jobqos: Set Quality of Service (QOS) of jobs, or list jobs with the given QOS.
jobnice: Add nice level to jobs, or list jobs with non-zero nice level.
jobtimelimit: Update timelimit of job(s).
sbadjob: Print a warning about bad jobs hanging indefinitely in the queue.
notifybadjob: Notify about or Kill a badly behaving job and send information mail to the user.
joblist: Handy utility for converting jobids to a comma-separated list when the input may be separated by spaces or in a multi-line file.
warn_maxjobs: Issue warnings about the number of Slurm jobs approaching MaxJobCount
schedjobs: Stop or start job scheduling in ALL Slurm partitions

GPU monitoring

The psjob script can also monitor the job's GPU usage using the gpustat tool from https://github.com/wookayin/gpustat. If the the job uses gres/gpu on nodes with GPUs, the gpustat tool is used to display GPU usage. If gpustat isn't installed, set the variable enable_gpustat=0 in the psjob script.

All GPU nodes should have this tool installed. On EL8 systems:

dnf install gcc python3 python3-pip python3-devel
python3 -m pip install setuptools-scm
python3 -m pip install gpustat

Usage

Copy these scripts to /usr/local/bin/. If necessary configure the variables in the script.

The warn_maxjobs and sbadjob may be run regularly from crontab, for example:

5 * * * * /usr/local/bin/warn_maxjobs; /usr/local/bin/sbadjobs

Examples

Example output from psjob:

JOBID     TASKS USER      START_TIME          TIME          TIME_LIMIT    
3381322   1     user1     2021-01-19T22:42:23 10:16:01      2-00:00:00    
NODELIST: d064
---------------
d064
---------------
  PID NLWP S USER      STARTED     TIME %CPU   RSS COMMAND
27673    1 S user1    22:42:27 00:00:00  0.0  1372 /bin/bash -l /var/spool/slurmd/job3381322/slurm_s
27675    3 R user1    22:42:27 10:15:35 99.9 242464 python3 /home/user1/wlda/atomic_benchma
27676    5 S user1    22:42:27 00:00:00  0.0 15132 orted --hnp --set-sid --report-uri 8 --singleton-
Total: 3 processes and 9 threads
Uptime: 08:58:25 up 8 days, 12:01,  0 users,  load average: 2.01, 2.02, 2.04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobs

jobs

README.md

Slurm job scripts

GPU monitoring

Usage

Examples

Files

jobs

Directory actions

More options

Directory actions

More options

Latest commit

History

jobs

Folders and files

parent directory

README.md

Slurm job scripts

GPU monitoring

Usage

Examples