2. You can run many independent programs at the same time. This might be the same programs on lots of different data sets or multiple different programs (or even, both).
3. You can sometimes get a single program to run faster. However, some programs don't scale well with additional CPU cores so it's not a guarantee.
In a nutshell, Slurm runs jobs in a queue. A job is a single piece of work you ask Slurm to run on your behalf. Everyone's jobs go in a queue and wait to run. Slurm decides which job goes where in the queue based on the available resources and a fair-share approach. When a job gets to the top of the queue it runs on any one of many different servers, called nodes. You get the resources on the node you asked for (no more, no less), and it runs your job as if it were you, in the path you submitted the job from. Since there are lots of nodes it means that lots of jobs can all run at the same time. When a job finishes, a new job is picked from the queue to run.
- test: up to 8 cores and 50GB memory, default time and max time 10 minutes
- short: up to 120 cores and 1850GB memory, default time 1 hour, max time 1 day
- long: up to 128 cores and 1900GB memory, default time 1 hour, max time 1 week
- gpu: one node with 4x 24GB GPUs (96GB total), default time 1 hour, max time 1 day
If you have a job which needs more than the maximum allowed time, we'll be asking you to contact us directly to discuss your requirements so that we can balance your request with the needs of other users.
- If you have jobs running, your first pending job will have a priority equal to the highest possibly prioriy minus the number of jobs you currently have running. Subsequent jobs in the queue will each have their priority reduced by one.
The overall outcome of this is that those who have fewer jobs running will have their priority boosted over those who have more. There are, however, two important caveats:
- We can only offer best-efforts to schedule jobs, and are unable to offer guarantees as to how long they will take to start
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2339119 batch SNV_pop adibabdu PD 0:00 1 (Resources) 2335042 batch bamcover terenova PD 0:00 1 (Priority) 2330696 batch star_ali herrmann R 17:37:04 1 imm-wn5 2286818 batch average_ hyunlee R 3-22:54:02 1 imm-wn4 2308243 batch E14_20_m oudelaar R 2-19:00:14 1 imm-wn1 2308244 batch E14_21_m oudelaar R 2-18:54:11 1 imm-wn5 2308246 batch E14_23_m oudelaar R 2-17:59:43 1 imm-wn6 2332744 batch ENCLB555 shusain R 10:56:05 1 imm-wn7 2332745 batch ENCLB555 shusain R 10:54:33 1 imm-wn6 2332743 batch ENCLB555 shusain R 11:00:26 1 imm-wn2
Interesting job states are R for running and PD for pending. The last column shows either the node the job is running on, or the reason it's not running (yet). In this example, Resources and Priority are OK - the job is just waiting for either free space to run, or waiting behind a higher priority job.
To see just you own jobs (and not the whole queue), you can use the command:
$ squeue --me
Deliver it to: 4 CPUs and 10GB memory for 2 hours
For which the job script might look like this:
#!/bin/sh #Format of --time is DAYS-HOURS:MINUTES:SECONDS #SBATCH --time=0-02:00:00 #SBATCH --ntasks=4 #SBATCH --mem=10G #SBATCH --partition=short module load rna-star STAR --genomeDir /databank/igenomes/Drosophila_melanogaster/UCSC/dm3/Sequence/STAR/ \ --outSAMtype BAM SortedByCoordinate \ --readFilesIn C1_R1_1.fq.gz C1_R1_2.fq.gz \ --readFilesCommand zcat \ --outSAMattributes All \ --outFileNamePrefix C1_R1_ \ --limitBAMsortRAM 7000000000 \ --runThreadN 4
Here, we have 4 lines which tell Slurm how much time we think the job will need, how many CPU cores we want, how much memory we want and which queue we want it to run on. If you miss any of these out, you'll get 1 hour, 1 CPU, 15GB memory and the short queue respectively.
After that, it's just the list of commands we want to run when the job starts. Note the the backslash (\) character is Bash's line continuation character and is used for legibility so that lines do not run off the end of the screen.
For more information about estimating the amount of time, CPU and memory your job will need, please refer to profiling(7).
$ sbatch ./jobscript.sh
but you can also specify a partition (queue), number of nodes and amount of memory - if you have not already done so inside the script itself - like so:
$ sbatch --partition=short --ntasks=1 --mem=10G ./jobscript.sh
The standard nodes have 128 cores and 2TB memory (though you can only request up to ~1950GB for a single job).
$ srun --partition=short --cpus-per-task=4 --mem=32G --pty bash -i
In doing so, please be aware that you are taking up space on the cluster which is wasted if you leave it unused. As such, we request that you only request what you'll use and note that interactive CPU jobs are only permitted on the 'test' and 'short' queues. Sessions left idle for extended periods will additionally be terminated without notice.
Batch job example:
#SBATCH --partition=gpu #SBATCH --gpus=2 ...
Interactive job example:
$ srun --partition=gpu --gpus=2 ...
You can confirm GPU acquisition by running nvidia-smi from within the GPU node. It will list accessible GPUs and their utilisation - in example below I requested 2 GPUs.
$ nvidia-smi +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA TITAN RTX Off | 00000000:1E:00.0 Off | N/A | | 35% 44C P0 66W / 280W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA TITAN RTX Off | 00000000:1F:00.0 Off | N/A | | 21% 42C P0 39W / 280W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
Please note that there are only 4 GPUs, so please be considerate in how many you request. Typically this is determined by how much total GPU memory you need. And confirm that your code is running on GPU not CPU, with nvidia-smi.
CUDA driver 12.2 is preinstalled. To load a CUDA compiler simply load one of our CUDA modules:
$ module load cuda
$ scancel 342552