Discovery details

Discovery has 25 compute, 11 GPU and 2 high memory nodes:

discovery-h1 — The head node.

discovery-s1 — The storage node.

discovery-l1 –This is the node that you will be on when you log in.  This is *not* a place to run any computation.  You can be here when installing software, but do not run programs from here as you can hurt the node if improperly executed.

discovery-g1 — Discovery’s GPU node that doubles as a computational node. This node has2 Nvidia Tesla K40 GPUs installed. With 64GB of RAM, two Intel E5-2640 v3 CPUs with 8 cores each. This means there are 16 cores (or 32 threads) on these nodes.

discovery-g[2-6] — Discovery’s Durip GPU nodes. This nodes have 2 Nvidia Tesla P100 GPUs installed. With 192GB of RAM, two Intel Xeon Gold 5117 2.0G CPUs with 14 cores each. This means there are 28 cores (or 56 threads) on this node.

discovery-g[7] — Discovery’s IIPLAB GPU nodes. This nodes have 2 Nvidia Tesla P100 GPUs installed. With 192GB of RAM, two Intel Xeon Gold 5117 2.0G CPUs with 14 cores each. This means there are 28 cores (or 56 threads) on this node.

discovery-g[8-11] — Discovery’s EPSCoR GPU nodes. This nodes have 2 Nvidia Tesla V100 GPUs installed. With 192GB of RAM, two Intel Xeon Gold 5218 CPUs with 16 cores each. This means there are 32 cores (or 64 threads) on this node.

discovery-hm1 — Discovery’s EPSCoR high memory node. With 3TB of RAM, two Intel Xeon Gold 5218 CPUs with 16 cores each. This means there are 32 cores (or 64 threads) on this node.

discovery-hhm1 — Discovery’s EPSCoR hybrid high memory node. With 3TB of RAM, two Intel Xeon Gold 5218 CPUs with 16 cores each. This means there are 32 cores (or 64 threads) on this node.

discovery-c[1-6] — Discovery’s “old” nodes with 64GB of RAM, two Intel E5-2640 v3 CPUs with 8 cores each. This means there are 16 cores (or 32 threads) on these nodes.

discovery-c[7-13] — Discovery’s “new” nodes with 128GB of RAM, two Intel E5-2650 v4 CPUs with 12 cores each. This means there are 24 cores (or 48 threads) on these nodes.

discovery-c[14-15] — Discovery’s “new” nodes with 256GB of RAM, two Intel E5-2650 v4 CPUs with 12 cores each. This means there are 24 cores (or 48 threads) on these nodes.

discovery-c[16-25] — Discovery’s Durip nodes with 192GB of RAM, two Intel Xeon Gold 5117 2.0G CPUs with 14 cores each. This means there are 28 cores (or 56 threads) on these nodes.

 

Discovery has 8 Queues/Partitions:

Four partitions are usable by everyone, while three (listed last) are reserved.

normal — The default queue.  It has a maximum wall-time of 7 days 1 hour (–time 7-01:00:00).  This queue contains nodes discovery-c[1-15].  To make sure you land on a particular node type (“old” or “new”), please learn about how to use slurm.

gpu — The queue that will ensure your job landing on a node with a GPU.  It has a maximum wall-time of 7 days 1 hour (–time 7-01:00:00) and currently contains only node discovery-g1.

debug — The queue to be used to debug your code.  This was created so that you don’t have to wait in line (in the normal queue) for hours or days to debug your code.  It has a maximum wall-time of 1 hour (–time 0-01:00:00) and contains both the discovery-g1 and discovery-c[1-15] nodes.

backfill — This queue scavenges nodes from all partitions for use (discovery-c[1-25], discovery-g[1-11], discovery-hm1, and discovery-hhm1). It has the lowest priority and therefore may be paused multiple times (or indefinitely) depending on the demand of higher priority jobs. It has a maximum wall-time of 6 hours (–time 14-02:00:00).

osg — As a part of Open Science Grid we contribute our CPU hours when they are not in use. This partition is usable only by OSG.

cfdlab — This partition is for the discovery-c[16-25] and discovery-g[2-6] nodes, and has a maximum wall-time of 7 days 1 hour (–time 7-01:00:00). It is a condo partition and is restricted to Dr. Gross’s lab.

cfdlab-debug — This partition is for the discovery-g[2-6] nodes and has a maximum wall-time of 1 hour (–time 0-01:00:00). It is a condo partition and is restricted to Dr. Gross’s lab.

iiplab — This partition is for the discovery-g7 node and has a maximum wall-time of 7 days 1 hour (–time 7-01:00:00). It is a condo partition and is restricted to Dr. Boucheron’s lab.

epscor — This partition is for the discovery-g[8-11], discovery-hm1, and discovery-hhm1 nodes, and has a maximum wall-time of 7 days 1 hour (–time 7-01:00:00). It is a condo partition and is restricted to EPSCoR group.

Note: You can only run 10 jobs at a time.  You can submit as many as you desire, but only 10 will ever run at a time.  To make better use of resources and jobs, please consult “Example 4: How to run programs in parallel” to group several analyses into 1 job.

 

Discovery runs Centos7:

Of the several flavors of Linux/Unix available, Discovery uses CentOS7 as its operating system.  Knowing the OS may be important for installing software.  This also means that the generic Linux/Unix functions and programs can be used on Discovery. Please, use tab-to-complete whenever possible.  This is helpful for determining both pathways and file names.