2. You can run many independent programs at the same time. This might be the same programs on lots of different data sets or multiple different programs (or even, both).
3. You can sometimes get a single program to run faster. However, some programs don't scale well with additional CPU cores so it's not a guarantee.
In a nutshell, Slurm runs jobs in a queue. A job is a single piece of work you ask Slurm to run on your behalf. Everyone’s jobs go in a queue and wait to run. Slurm decides which job goes where in the queue based on the available resources and a fair-share approach. When a job gets to the top of the queue it runs on any one of many different servers, called nodes. You get the resources on the node you asked for (no more, no less), and it runs your job as if it were you, in the path you submitted the job from. Since there are lots of nodes it means that lots of jobs can all run at the same time. When a job finishes, a new job is picked from the queue to run.
- If you have jobs running, your first pending job will have a priority equal to the highest possibly prioriy minus the number of jobs you currently have running. Subsequent jobs in the queue will each have their priority reduced by one.
The overall outcome of this is that those who have fewer jobs running will have their priority boosted over those who have more. There are, however, two important caveats:
- We can only offer best-efforts to schedule jobs, and are unable to offer guarantees as to how long they will take to start
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2339119 batch SNV_pop adibabdu PD 0:00 1 (Resources) 2335042 batch bamcover terenova PD 0:00 1 (Priority) 2330696 batch star_ali herrmann R 17:37:04 1 cbrgwn015p 2286818 batch average_ hyunlee R 3-22:54:02 1 cbrgwn004p 2308243 batch E14_20_m oudelaar R 2-19:00:14 1 cbrgwn019p 2308244 batch E14_21_m oudelaar R 2-18:54:11 1 cbrgwn015p 2308246 batch E14_23_m oudelaar R 2-17:59:43 1 cbrgwn016p 2332744 batch ENCLB555 shusain R 10:56:05 1 cbrgwn017p 2332745 batch ENCLB555 shusain R 10:54:33 1 cbrgwn006p 2332743 batch ENCLB555 shusain R 11:00:26 1 cbrgwn002p
Interesting job states are R for running and PD for pending. The last column shows either the node the job is running on, or the reason it's not running (yet). In this example, Resources and Priority are OK - the job is just waiting for either free space to run, or waiting behind a higher priority job.
To see just you own jobs (and not the whole queue), you can use the command:
$ squeue --me
Deliver it to: 4 CPUs and 10GB memory for 2 hours
For which the job script might look like this:
#!/bin/sh #Format of --time is DAYS-HOURS:MINUTES:SECONDS #SBATCH --time=0-02:00:00 #SBATCH --ntasks=4 #SBATCH --mem=10G module load rna-star STAR --genomeDir /databank/igenomes/Drosophila_melanogaster/UCSC/dm3/Sequence/STAR/ \ --outSAMtype BAM SortedByCoordinate \ --readFilesIn C1_R1_1.fq.gz C1_R1_2.fq.gz \ --readFilesCommand zcat \ --outSAMattributes All \ --outFileNamePrefix C1_R1_ \ --limitBAMsortRAM 7000000000 \ --runThreadN 4
Here, we have 3 lines which tell Slurm how much time we think the job will need, how many CPU cores we want, and how much memory we want. After that, it's just the list of commands we want to run when the job starts. Note the the backslash (\) character is Bash's line continuation character and is used for legibility so that lines do not run off the end of the screen.
For more information about estimating the amount of time, CPU and memory your job will need, please refer to profiling(7).
$ sbatch ./jobscript.sh
but you can also specify a partition (queue), number of nodes and amount of memory - if you have not already done so inside the script itself - like so:
$ sbatch -p batch --ntasks=1 --mem=10G ./jobscript.sh
The standard nodes have ~240GB of memory and 24 cores, so asking for 120GB of memory means you'll only get two jobs at a time on the node.
If you'd like to submit a job on the GPU nodes you need to specify two things - the gpu partition and the number of GPUs you need using the --gpus argument. For example:
#SBATCH -p gpu #SBATCH --gpus=2
in your job script would launch a job with access to 2 GPUs. Please note that most GPU boxes only have 2 GPUs and that the most a single node has is 4, and that the GPUs can be quite heavily used. As such, it's best to ask for only 1 or 2 GPUs if you need your job to run in the foreseeable future.
$ scancel 342552