Helium 2013 Maintenance - Extended

Update (7:00 PM, March 22, 2013): All but six compute nodes are now available online. We have discovered that data loss occurred on the Gluster scratch volume. We apologize for this data loss and are investigating the root cause of this problem but do not anticipate being able to recover data from this volume. As a reminder no backups of data are made on the Helium system unless you have made explicit arrangements.

Update (4:30 PM, March 22, 2013): Helium is now available for use. At this time approximately 20 compute nodes remain offline. These systems will be back online as soon as possible. Memory limits are now in place as described here: http://hpc.uiowa.edu/system-news/memory-limits-jobs-sharing-nodes

Update (1:00 PM, March 22, 2013): Approximately 300 compute nodes have been vetted. The Helium team has made a decision to focus on bringing the system back into production with the nodes that are currently working. This means that some queues will come up with less than there full core count initially. Final tests of the working systems are in progress and we are estimating early evening availability of the Helium system. The remaining compute nodes will continue to be reviewed and will be brought online as soon as possible.

Update (10:25 AM, March 22, 2013): Approximately half of the compute nodes in Helium have been vetted and restored to a usable state. The Helium team continues to work through the remaining nodes. Unless we encounter additional issues we still anticipate completion sometime this evening.

Update (7:45 AM, March 22, 2013): The process that keep the nodes in sync failed. While the root cause is not yet known, a more manual method to get the nodes in sync is currently being used. This process is time consuming but progress is being made.

Update (10:30 PM, March 21, 2013): Problems have been encountered during the maintenance and the maintenance is now expected to be completed the evening of Friday, March 22nd, 2013. More details will be posted as they become available.

 

Originally planned maintenance: We will be performing maintenance on the Helium cluster on March 21, 2013. This will require a full shutdown of Helium between the hours of March 21, 2013 7:00 AM to March 22, 2013 7:00 AM. If this will cause a problem for you and your research please let us know as soon as possible via an email to hpc-sysadmins@iowa.uiowa.edu.