After consulting with MSI’s advisory committee, we are making a few changes to the way that Global Scratch Storage is managed to address some performance issues. Most users will not be impacted by this change. Due to the intense demands placed on our global scratch system, users have observed highly variable performance making it difficult to rely on this system when designing workflows. In the past few months these demands have, at times, caused the global scratch systems to completely stop functioning. The changes detailed below aim to address these acute issues, pending a broader overhaul being planned for the rollout of our new system in early 2021.
The changes are as follows:
- Effective immediately, MSI will be introducing group quotas of 90 TB and 30,000,000 files on Global Scratch. There are no groups currently above these quantities so there should be no immediate effects.
- Starting April 1, the default quota on Global Scratch will be reduced to 40 TB and 10,000,000 files. There are a small number of groups that are consistently using more than these amounts and we will work with you to ensure there is a solution for your research needs. Additionally, MSI will review requests for temporary increases of this quota for individual projects on an ongoing basis.
Scratch was designed to meet short term needs for additional storage. Even with the default quotas, Global Scratch will continue to be space to retain datasets on a short-term basis (max 30 days). MSI will continue to monitor scratch performance and may need to further. It is also important to note that global scratch is not backed up and snapshotted on the assumption that data stored there is transitory.
How does MSI currently manage Global Scratch?
MSI has historically provided minimal oversight for scratch. Our current process is to regularly purge data that hasn’t been modified in the past 30 days.
What isn’t working about the current management of Global Scratch?
- Large amounts of static data are causing storage capacities to be reached.
- Large quantities of files are exacerbating IO loads on the system.
- Large-scale "find and touch" jobs are causing massive IO loads on the system.
- Current management practices incentivize activities involving “find and touch” jobs to extend storage persistence.
Our hope is that by introducing flexible quotas we can make the loads on the Global Scratch system more predictable, providing a more consistent experience for all MSI users.