Neon Hardware Retirement

As part of the HPC Model Change hardware that was purchased as part of the Neon HPC system will be retired on January 6, 2021 during regularly scheduled HPC Maintenance.

Frequently Asked Questions:

How do I know if my queue will be impacted by the retirement of this hardware?

  The Argon Queues and Policies page lists resources attached to each queue.

  • Nodes which are Neon-era hardware have (neon) next to them.
  • Slot/Memory volumes available *after* the hardware is retired are in ()'s
  • Investors with *only* Neon hardware will have no invested resources remaining after the retirement of this hardware.

What will the impact be on UI queue resources?

  • UI Resources will be impacted in the UI and UI-GPU queues.  1664 slots and 3328GB of memory will be retired as well as 9 Tesla K-20 GPUs.
  • Part of the October 2019 Phase 3 Expansion to Argon increased the volume of resources in excess of those predicted to be retired:
    • 54 nodes, ranging from 384GB to 1.5TB in RAM were added to UI* queues; all of these nodes have 80 slots/each.
    • 95 GPUs (V100 and 2080ti) were added to the UI-GPU and UI-HM-GPU queues.

How will this impact the INFORMATICS queue resources?

INFORMATICS queue resources will move from 1152 slots to 672 slots and memory from  3520 to 1280.

  • Per the UI3 Transition FAQ, further investment in UI3 resources has been suspended.  Remaining UI3 nodes will run as part of Argon until their normal end of life.

How reliable has the Neon H/W been since it was deployed?

Node failure rates since the Neon hardware was deployed have been as high as 48%, and, have averaged around 20%/year for the past 7 years.

Total Neon Nodes

2019-20 Failures

% Failure Rate

Remaining Neon Nodes

263

42

15.97%

221

       

Historical Failures

 

   

Year

# Failures

   

2014

41

15.59%

 

2015

121

46.01%

 

2016

126

47.91%

 

2017

48

18.25%

 

2018

46

17.49%

 

At this point, the hardware is 7+ years old and has suffered a significant number of failures over its life. Failures have been increasing, and recent power and maintenance events have already caused several additional failures. Therefore we do not think the remaining hardware will continue to operate normally.

Can I request my hardware be returned to me after it is shut down?

Yes, but please read this whole FAQ to learn the trade-offs involved before considering this option. For returned hardware, note the following:

  • If you wish to retain your hardware, please inquire to research-computing@uiowa.edu no later than December 15th 2020.
  • All Research Services staff are working remotely at the direction of the University Administration until June 2021, and will work to return claimed hardware to Faculty Partners as time allows.
  • Hard drives will be wiped when the hardware is retired from the cluster; no software will be present on returned hardware.
  • Due to the physical configuration of the Neon hardware, Faculty Partners retaining fewer than 4 nodes must make arrangements to ensure they have an available blade enclosure to house returned nodes.

Can Research Services help me with my hardware if I request its return?

No, Research Services is unable to support Neon hardware after it has been decommissioned from the cluster. However, we can put you in contact with your local IT Support organization to inquire if they offer support options.

What will happen to unreturned hardware after it is decommissioned?

Research Services will coordinate with the ITS DataCenter Hosting Team to physically remove and transport unreturned hardware to UI Surplus.

Will there be any value recovered from Surplus sales returned to investors?

We are working with UI Surplus to determine if there is any value for Neon hardware.  In the past, HPC hardware of this age has provided minimal return of funds and in this case we would propose re-investing any returned funds into the HPC environment.

Can I still purchase new hardware in Argon? How long will such purchases remain in service?

Yes! Please see our HPC Buy page for more information.

Per the HPC Model Change, hardware will be supported through the full 5 years of its original warranty period, and on a best-effort basis for 2 additional years before retirement.  Any changes to this policy will be communicated in advance.