GIT-CERCS-03-29
Zachary Kurmas, Kimberly Keeton, Kenneth Mackenzie,
Synthesizing Representative I/O Workloads Using Iterative Distillation
Storage systems designers are still searching for better methods of obtaining
representative I/O workloads to drive studies of I/O systems. Traces of
production workloads are very accurate, but inflexible and difficult to
obtain. (Privacy and performance concerns discourage most system
administrators from collecting such traces and making them available to the
public.) The use of synthetic workloads addresses these limitations; however,
synthetic workloads are accurate only if they share certain key properties
with the production workload on which they are based (e.g., mean request
size, read percentage). Unfortunately, we do not know which properties are
"key" for a given
workload and storage system.
We have developed a
tool, the Distiller, that automatically identifies the key properties (more
formally called attribute-values) of the workload. These attribute-values
can then be used to generate a synthetic workload representative of the
production workload. This paper presents the design and evaluation of the
Distiller. We demonstrate how the Distiller finds representative
synthetic workloads for simple artificial workloads and three
production workload traces.