Unplanned /scratch Outage

Update May 29: /scratch has been available since the last announcement but performance at times has been poor. We have identified several large scale users of /scratch and are working with them on alternative solutions. We have also identified configurations with the current /scratch system that can only be resolved by reformatting the volumes. Work on alternatives to provide better I/O systems on Helium are ongoing and an email announcement with current plans will be sent prior to the end of the week.

Update May 24 @ 11 AM:

The user whose jobs were causing the heavy load has cancelled the jobs and things are back to normal.  We are currently working on moving the jobs off /scratch and hope to have this work done later today.  We will post further updates as necessary.  Thank you for your patience.

 

Update May 24 @ 8:05AM: The Lustre system is under very heavy load and has been since approximately 10pm last night. We are investigating to identify what users are the cause of this load and will ask them to suspend usage until we can decrease the I/O load being generated.

This morning at approximately 8:00am one of the Lustre data storage servers went offline affecting access to /scratch. The system was restored to service at approximately 8:30am but load on the system is high, meaning that degraded performance may be experienced. Prior to this event the /scratch filesystem was also under very heavy load. Root cause analysis and mitigation of this issue is under way. If you have questions or concerns please contact us at: hpc-sysadmins@iowa.uiowa.edu

We apologize for the inconvenience.