Argon Compute Unavailable

Since this outage affected one of the two data centers, not all jobs were affected. If your job was running on a node in the data center that lost power, ITF, then your job would need to be resubmitted. If your job was running in the unaffected data center, LC, then the job would have stayed running. In the special case of array jobs not all jobs were affected equally. If you had an array job then there is a high probability that the array tasks were split between the two data centers in some proportion. That means that array tasks running in ITF would have been lost while those running in LC would have continued. For those array jobs, it will be necessary to determine which of the array tasks did not complete and submit a new array job, or jobs, with just those array tasks. 

Update, 04/07/2020 3:52 PM
The queues have been enabled and jobs that were queued up are now getting scheduled to nodes. All compute nodes, barring ones with problems, will be back online shortly.

At approximately 1:10PM one of the data centers hosting the Argon HPC environment experienced a power outage.  While the outage was short lived, a large portion of the HPC environment was taken offline and running jobs were lost.   The cause remains unknown and we are working to restore HPC service.

 

We will update this notification as  more information is learned.

 

Thank you for your patience, we apologise for the interruption.

 

ITS Research Services Advanced Computing Support Team