The next scheduled maintenance window for the HPC systems is August 8, 2018 8AM - August 9, 2018 8AM.
During this time the Neon and Argon systems will be offline for maintenance. All jobs running at the beginning of the maintenance will be killed.
The following work is currently scheduled:
- System updates and security patches.
- Data center generator testing.
- System updates and security patches (includes evaluation of options for improving performance related to Spectre/Meltdown issues as detailed here.)
- The gpu.cuda.dev_free and gpu.cuda.procsum resources will be removed from the system. Instead a new resource ngpus is available that supersedes these resources. For more details on requesting GPU resources on Argon please see the Argon Documentation.
- Change in qlogin behavior - There is a change being made to how job slots are allocated for qlogin sessions. Prior to August 8, 2018, by default, requesting a qlogin session would try to request an entire node. If one was not available then the qlogin request would not be fulfilled. Note that it was always possible to explicitly request less than an entire node. After August 8, 2018, by default, a qlogin request will try to allocate as many job slots on a node as possible. That might be an entire node but if an entire node is not available then the request will automatically be scaled down to fewer job slots. This should result in fewer failed qlogin requests while still attempting to provide as many resources as possible for use in the qlogin session. Note that if you do have a specific requirement for a number of job slots then you can request those explicitly.
General HPC Infrastructure
- The license server hosting the Intel Compilers licensing will be modified to allow cloud support.