As many of you know we have experienced significant challenges with our scratch filesystem over the last couple of months. These issues stem from two sources:
- The current /scratch hardware configuration provided by HP is suboptimal for Lustre. The software configuration of Lustre is also currently incorrect for the hardware. We believe we now understand how to improve this situation but it requires reformatting of /scratch, resulting in the loss of all data on the file system.
- The amount of data intensive work on Helium has increased as has the core count on the system. Both of these have lead to increasing use of /scratch. This has exacerbated the problems noted in number 1.
After extensive deliberation the HPC technical team has developed the following plan.
May 30, 2012: A new filesystem available at /nfsscratch has been provisioned and is available to all users. Any data that you would like to preserve from /scratch should be copied to this new filesystem. Please only copy important data to the new filesystem.
June 11, 2012: /scratch will be changed to read only. No new data can be created on this volume. Please redirect jobs to /nfsscratch. If you have large scale I/O needs that are not working on this volume please contact us as we have another filesystem in preliminary testing that may provide an alternate solution.
June 18, 2012: /scratch taken offline for reconfiguration. All data remaining on /scratch will be lost during this reconfiguration. Testing will be performed on the newly created /scratch volume and once this testing is complete the volume will be made available again.
At this time we do not anticipate /nfsscratch will be permanent but a final system configuration won't be known with certainty until after we have had a chance to reconfigure and test the Lustre /scratch environment.