SLURM How-To#

Upstream documentation links#

Slurm installed version: 22.05.9

Sugested usage#

All/any sbatch scripts will have the following structure:

Head of the script#

will contain SLURM directives; these are identified by #SBATCH label at the beggining of line

#!/bin/bash
#SBATCH --job-name=MEANINGFUL_NAME       # Common name for the running jobs; can be customized with https://slurm.schedmd.com/sbatch.html#SECTION_%3CB%3Efilename-pattern%3C/B%3E
#SBATCH --mail-type=FAIL                   # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=EMAIL_ADDRES           # N.B. for a 1000 jobs started and UNTESTED, that fails, one will receive 1000 mails!!!!

# other tags are possible, see https://slurm.schedmd.com/sbatch.html#SECTION_FILENAME-PATTERN
#SBATCH --output=%x_%j.out                 # STDOUT file with format: [JOB_NAME]_[JOB_ID].out
#SBATCH --error=%x_%j.err                  # STDERR file with format: [JOB_NAME]_[JOB_ID].err

#SBATCH --ntasks=1                         # We are using only 1 task per job
#SBATCH --cpus-per-task=1                  # !!! if needed change, to 2 !!! ; ensuing job steps will require ncpus number of processors per task

## see https://slurm.schedmd.com/sbatch.html#OPT_hint
#SBATCH --hint=nomultithread               # [don't] use extra threads with in-core multi-threading;

## Possibility to specify nodes
## #SBATCH --nodelist=issaf-0-2.issaf,issaf-0-3.issaf,issaf-0-4.issaf,issaf-0-5.issaf    # select a given list of nodes for allocation
## #SBATCH --exclude=issaf-0-0.issaf,issaf-0-1.issaf                                     # exclude these nodes from allocation

Optional mechanics to be added before starting work#

# Define and create a unique scratch directory for this job
# in the tail of the script we will copy back all content to SLURM_SUBMIT_DIR
export SCRATCH_DIRECTORY=/scratch/workdir_${USER}/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}

# You can copy everything you need to the scratch directory
# ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from
rsync -azWHAXS4 ${SLURM_SUBMIT_DIR}/ ${SCRATCH_DIRECTORY}/

# This is where the actual work is done.

# Save job info before starting the actual executable - useful for debbuging
scontrol show jobid -ddd  ${SLURM_JOB_ID} | sed 's/^[ \t]*//g' > job_info_${SLURM_JOB_ID}.txt
echo -e "\n##########\n" >> job_info_${SLURM_JOB_ID}.txt
env                      >> job_info_${SLURM_JOB_ID}.txt
echo -e "\nStarting job @ $(date +%Y%m%d_%H%M%S)"

Body of the script#

This part will contain everything related to the work to be done.
Very usefull is to have an stand-alone analysis scritp that can be checked and tested
on it's own. (this should contain everything related to job to be done)

Tail of the script#

Optional part if a work-directory was created#

# After the job is done we copy our output back to $SLURM_SUBMIT_DIR
rsync -azWHAXS4 ${SCRATCH_DIRECTORY}/* ${SLURM_SUBMIT_DIR}/

# After everything is saved to the home directory, delete the work directory to save space on /scratch/workdir
cd ${SLURM_SUBMIT_DIR} && rm -rf ${SCRATCH_DIRECTORY}

Add this at the end of the script; it can be used to infer the succesful finish of the job.

echo "end of ${SLURM_JOB_NAME}"