Why invest in the Discovery condominium model?
The Discovery cluster is managed by Information and Communication Technologies (ICT) in a manner similar to many other university high performance computing (HPC) cluster condos. Investors (single or groups of researchers) are able to “purchase” compute nodes and storage from the approved vendors through ICT. The nodes are installed and administered by the Discovery system administrators.
Investors have on-demand access to their purchase, competing only with the people from within their groups for access to the resource. This is possible through the use of individualized queues, linked to the resources purchased. Only those individuals approved by the purchaser have access to the queue. The relationship with Discovery is at the desire of the investor, and those who wish to remove their equipment from the system are free to do so. NMSU and ICT will provide system administrators for the equipment, a secure, climate controlled location, and nightly storage for home directories. Investors benefit by being able to burst out into the rest of the Discovery system, thereby gaining easy and smooth access to additional resources as needed.
When the investor-purchased resources are not in use, they will be made available to the community as a whole for use. This type of arrangement is often referred to as a backfill queue. However, this general access to investor-owned resources can be immediately revoked when the investors wish to use their resources. The job will automatically be re-queued to ease the burden on users. ***Investors will also be given priority on this backfill queue.***
Investors can purchase 1 general compute node without prior approval. Due to the cost of the purchase (over $20k), more than 1 node needs approval as the purchase must be competitively quoted by 3 parties. Writing node purchases into a grant does not require any prior discussion with the HPC team, but once the nodes are ready to be purchased, please come speak with us so we can help you in getting the best price.
Users of the system may install and use software as needed. If assistance is needed, please expect an average lead time of 2-4 weeks. If assistance is needed for licensed software, please be prepared to review the license to assess the viability of installing on an HPC system and for general use. Note: some software require a license per node, so to remain compliant, we must read the license.
Node costs are based on the equipment cost and include a 5-year warranty. After the 5-years are concluded, the purchased resource becomes a part of the general-compute queue and will be maintained until failure or deemed unreliable by the HPC team. At this point, it will be at ICT’s discretion to remove and dispose of the resources.
If the investor wishes to remove their equipment from the system, they are able to do so. Removing equipment causes several changes:
- The equipment will be taken from the restricted access room, although the shared access space can still be used for storing and running the machine
- The InfiniBand interconnect will remain behind, which will slow the communication between nodes, if several nodes were purchased
- A rack may be needed for performance
- A trained and knowledgeable administrator needs to be found for the machine (this is a requirement of having a device on the network)
Anyone on campus can purchase a compute node or storage.
How to Order
Please review the Service Catalog below to decide what best suits your needs. Contact the Discovery admins at firstname.lastname@example.org when you are ready to purchase. Someone will contact you to verify and then nodes will be purchased. Please allow for 2 months once the purchase has been approved by purchasing. The nodes will be installed and available as quickly as possible; the majority of this time is for the equipment provider to make/send it.
If you need a special node, please contact Discovery admins at email@example.com.
Availability of the System
Discovery is meant to be available for campus use around the clock, every day of the year. Maintenance windows lasting 2 weeks happen 2-3 times a year for important and major updates. These will be scheduled during low-use times, and users will be given at least 2 weeks notice. Occasionally an immediate issue arises that require the system to be taken offline. The administrators will communicate this and work to restore the system with as little inconvenience as possible to our users. Every effort will be made to pause and resume the jobs when the system is restarted.
Standard Compute Nodes
The following nodes have been competitively priced. However, as they are quotes, they are estimates and may change. Pricing is heavily dependent on market conditions and the size of the purchase.
If you are planning to purchase or include nodes in a grant proposal, please contact Discovery admins at firstname.lastname@example.org for up-to-date pricing or if you need a different configuration.
|Processors||Cores per Node||Memory||Local Disk||Networking||Budgetary Estimate / Warranty|
|2x Intel Xeon Gold 6226R, 2.9-3.9 GHz, 16C/32T||32 (64T)||192 GB RDIMM, 2666MT/s, Dual Rank||480GB SSD SATA||Infiniband HDR (100Gbps)||$10,190 / 5 yrs|
|2x Intel Xeon Gold 6226R, 2.9-3.9 GHz, 16C/32T||32 (64T)||384 GB RDIMM, 2666MT/s, Dual Rank||480GB SSD SATA||Infiniband HDR (100Gbps)||$11,490 / 5 yrs|
Special Compute Nodes
If you are interested in purchasing a non-standard compute resource, please contact Discovery admins at email@example.com. For budgetary estimates, please see the below. You are not limited to the below; if you have a special need, please contact us. The prices quoted are inclusive and consider the cost of the equipment and 5 years of a manufacturer maintenance agreement, as well as 5 years of administration by ICT.
|Type||Processors||Accelerator(s)||Cores per Node||Memory||Local Disk||Networking||Budgetary Estimate / Warranty|
|GPU||2x Intel Xeon Gold 6226R, 2.9-3.9 GHz, 16C/32T||4x NVIDIA Tesla T4 16GB||32 (64T)||384 GB RDIMM, 2666MT/s, Dual Rank||480GB SSD SATA||Infiniband HDR (100Gbps)||$25,975 / 5 yrs|
|GPU||2x Intel Xeon Gold 6226R, 2.9-3.9 GHz, 16C/32T||2x NVIDIA Tesla V100s 32GB||32 (64T)||384 GB RDIMM, 2666MT/s, Dual Rank||480GB SSD SATA||Infiniband HDR (100Gbps)||$39,790 / 5 yrs|
|Large Memory||4x Intel Xeon Gold 5120 2.2G, 14C/28T||N/A||48 (96T)||3TB LRDIMM, 2666MT/s, Quad Rank||2TB 7.2K RPM NLSAS 12Gbps||Infiniband HDR (100Gbps)||$82,830 / 5 yrs|
All users are provided with 100GB of backed-up home directory space and 1TB of non-backed-up scratch directory space. Users working together on a project can request 500GB of project space. This storage space is shared between users working on the project.
The Discovery administrators respond to problems Monday through Friday between 8am and 5pm. Issued are resolved as quickly as possible, with a goal of resolution within 5 business days. For critical problems, this timeline is escalated.