Partition Details

What are Partitions?

Partitions are work queues that have a set of rules/policies and computational nodes included in it to run the jobs. The available partitions are normal, gpu, interactive, class, backfill, epscor, so on. Run the below command to find the available list of partitions in discovery.

Syntax: sinfo --all

$ sinfo --all

Output:

PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*         up 7-01:00:00      1   resv discovery-c1
normal*         up 7-01:00:00     20    mix discovery-c[2-15,26,28-31],discovery-g13
normal*         up 7-01:00:00      4   idle discovery-c27,discovery-g[1,12,16]
interactive     up 1-01:00:00      4   idle discovery-c[34-35],discovery-g[14-15]
backfill        up 14-02:00:0      1   resv discovery-c1
backfill        up 14-02:00:0     20    mix discovery-c[2-15,26,28-31],discovery-g13
backfill        up 14-02:00:0     33   idle discovery-c[16-25,27,32-38],discovery-g[1-12,14-16]
epscor          up 7-01:00:00      6   idle discovery-c[37-38],discovery-g[8-11]
iiplab          up 7-01:00:00      1   idle discovery-g7
cfdlab          up 7-01:00:00     15   idle discovery-c[16-25],discovery-g[2-6]
cfdlab-debug    up    1:00:00     15   idle discovery-c[16-25],discovery-g[2-6]
class           up 7-01:00:00      2   idle discovery-c[32-33]
cblab           up 7-01:00:00      1   idle discovery-c36
osg             up 1-01:00:00      1   resv discovery-c1
osg             up 1-01:00:00     20    mix discovery-c[2-15,26,28-31],discovery-g13
osg             up 1-01:00:00     29   idle discovery-c[16-25,27,32-33,36-38],discovery-g[1-12,16]

The output shows the list of all the available partitions in discovery as of September 2020. The state alloc denotes that the nodes are allocated for the jobs. The state mix implies that some CPUs in the nodes are allocated while others remain idle.

Some partitions in discovery are condo partitions which are restricted to certain researchers and lab groups.

normal

It’s the default queue. Some important information about the normal partition can be found below.

Parameter values

Maximum Walltime

7–01:00:00 (7 days and 1 hour)

Nodes

discovery-c[1–15, 26-31], discovery-g[1,12-13,16]

Total Nodes

25

Maximum Jobs(Running)

10

Maximum Submitted Jobs

20

Maximum jobs is the highest number of jobs that can actively run at a time in a partition. Maximum Submitted Jobs is the maximum number of jobs you can submit to a partition. In normal partition, you can submit 20 jobs but only 10 jobs will be actively running and the remaining 10 jobs will be in the queue.

interactive

This partition is ideal for running interactive jobs.

Parameter values

Maximum wall-time

1–01:00:00 (1 day and 1 hour)

Nodes

discovery-c[34–35], discovery-g[14–15]

Total Nodes

4

Maximum Jobs(Running)

3

Maximum Submitted Jobs

3

Maximum CPU Per Job

16

Maximum Memory per Job

64G

backfill

This partition scavenges nodes from all partitions to use. It has the lowest priority of all the partitions. The jobs submitted to the backfill partition may be stopped and requeued multiple times depending on the demand of high priority jobs.

Parameter values

Maximum wall-time

14–02:00:00 (14 days and 2 hours)

Nodes

discovery-c[1–38], discovery-g[1–16]

Total Nodes

54

Maximum Jobs(Running)

10

Maximum Submitted Jobs

20

HPC team are exploring to get the best out of the backfill queue and your valuable suggestions are always welcome.

Condo Partitions

Some partitions in Discovery are condo partitions and are restricted for certain team/research group. New partitions are getting added and the below table shows the list of the condo partitions.

Partition Owned By

class

NSF CC*2020 grant(class or academic use)

cfdlab

Dr. Gross’s Lab

cblab

Dr. Brungard’s Lab

cfdlab-debug

Dr. Gross’s Lab

epscor

EPSCoR group

iiplab

Dr. Boucheron’s Lab

Details about each condo partition are as follows:

Partition Details

class

Maximum wall-time

7–01:00:00 (7 days and 1 hour)

Nodes

discovery-c[32–33]

Total Nodes

10

Max Jobs(Running)

10

Max Submitted Jobs

10

cfdlab

Maximum wall-time

7–01:00:00 (7 days and 1 hour)

Nodes

discovery-c[16–25], discovery-g[2–6]

Total Nodes

15

Max Jobs(Running)

No Limit

Max Submitted Jobs

No Limit

cblab

Maximum wall-time

7–01:00:00 (7 days and 1 hour)

Nodes

discovery-c36

Total Nodes

1

Max Jobs(Running)

No Limit

Max Submitted Jobs

No Limit

cblab(discovery-c36) won’t be available in the Discovery cluster until May 2022.

cfdlab-debug

Maximum wall-time

0–01:00:00 (1 hour)

Nodes

discovery-c[16–25], discovery-g[2–6]

Total Nodes

15

Max Jobs(Running)

No Limit

Max Submitted Jobs

No Limit

epscor

Maximum wall-time

7–01:00:00 ( 7 days and 1 hour)

Nodes

discovery-g[8–11], discovery-c[37–38]

Total Nodes

6

Max Jobs(Running)

10

Max Submitted Jobs

20

iiplab

Maximum wall-time

7–01:00:00 ( 7 days and 1 hour)

Nodes

discovery-g7

Total Nodes

1

Max Jobs(Running)

No Limit

Max Submitted Jobs

No Limit

As of the 2019 update of the cluster, new partitions have been added. Most of these are restricted use and users will only see partitions that they have access to.

Wall time vs. CPU time

CPU time isn’t wall time (as in a clock on the wall). The CPU time is the total execution time or runtime for which the CPU was dedicated to a process. The CPU must service many processes every second, not just yours, so your process only gets small task slices between processing other requests. Each of those small task slices is counted toward the total execution time. However, the time between processing your requests, while the CPU is processing someone else’s request, is NOT counted towards your CPU time.

For example, wall time ("real" time) can be equal to the sum of CPU time ("user" time), system time (kernel mode), and time-sliced to other processes.

Partition

Wall Time

# of Nodes

Memory (GB)

Normal

7d 1h

15

64/128

GPU

7d 1h

1

64

backfill

14d 2h

38

64/128/192/256

interactive

24 hour

4

384/512