The HPC team has concluded initial evaluation of the Meltdown and Spectre processor vulnerabilities. All HPC systems are vulnerable, as are the vast majority of computing devices. These are serious security vulnerabilities that can lead to the loss of sensitive data. While the vulnerabilities are currently difficult to exploit, they do require action. Please read on for details as they relate to the HPC environment.
Do you plan to patch the HPC systems? – Yes, we plan to patch the Argon system at the next maintenance and the Neon system will be patched soon after. While patches for these exploits continue to evolve we plan to apply the current patches that are available. Not patching is not an option as all future updates will be dependent on installation of the patches for these vulnerabilities.
Will patching affect system performance? – Yes, we do expect some decrease in system performance. Based on the testing we have performed we expect the average performance decrease to be in the 5-10% range. However actual performance variation is highly impacted by the type of workload you are performing. For heavily compute intensive jobs we have seen many cases where there is nearly no performance impact. I/O intensive jobs including jobs that write large amounts of data to disk or send large number of MPI messages are likely to be more impacted with some workloads seeing performance hits more in the 30% range.
How can I minimize the performance impact on my work? – The biggest thing you can do to minimize performance impact is to limit your I/O to the greatest extent possible. We also recommend that you optimize your I/O to decrease the number of small file operations you are executing. In many respects optimizing in this new environment is similar to traditional optimization and mirrors many of our recommendations for high throughput jobs.
Can you help me minimize my performance impact? – The HPC team is happy to engage in these conversations but unfortunately has limited time available. As such, all consults in this area will be handled on a first come, first serve, time available basis. We also strongly encourage you to have done initial evaluation of performance and optimized I/O as best you can before contacting us.
Do you expect stability issues? – There have been reports of stability issues in the media but to date we have not experienced stability issues. We will be carefully evaluating the patches we apply for stability issues before they are rolled out across the clusters.