Techniques for Modeling Large-scale HPC I/O Workloads

Shane Snyder, Philip Carns, Robert Latham, Misbah Mubarak, Chris Carothers, Huong Vu Thanh Luu, Surendra Byna, Prabhat

Accurate analysis of HPC storage system designs is contingent on the use of I/O workloads that are truly representative of expected use. Generally, I/O analyses are bound to specific workload modeling techniques such as synthetic benchmarks or trace replay mechanisms, however, despite the fact that no single workload modeling technique is appropriate for all use cases. In this work, we present the design of IOWA, a novel I/O workload abstraction that allows arbitrary workload consumer components to obtain I/O workloads from a range of diverse input sources. Thus, researchers can choose specific I/O workload generators based on the resources they have available and the type of evaluation they wish to perform. As part of this research, we also outline the design of three distinct workload generation methods, based on I/O traces, synthetic I/O kernels, and I/O characterizations. We analyze and contrast each of these workload generation techniques in the context of storage system simulation models as well as production storage system measurements. We found that each generator mechanism offers varying levels of accuracy, flexibility, and breadth of use that should be considered before performing I/O analyses. We also recommend a set of best practices for HPC I/O workload modeling based on challenges that we encountered while performing our evaluation.