GIT-CERCS-08-04
Sangeetha Seshadri, Ling Liu, awrence Chiu, Cornel Constantinescu,
A Recovery Conscious Framework for Fault Resilient Storage Systems
This paper presents a recovery-conscious framework for improving the fault resiliency and recovery efficiency of highly concurrent embedded storage software systems. Our framework consists of a three-tier architecture and a suite of recovery conscious techniques. In the top tier, we promote the fine-grained recovery at the task level by introducing recovery scopes to model
recovery dependencies between tasks. At the middle tier we develop highly effective groupings of recovery scopes into recovery groups based on system and workload characteristics. We study how to distribute recovery scopes between recovery groups and schedule recovery groups effectively in a multi-core storage system through a careful tuning of recovery-efficiency sensitive parameters. At the bottom tier, advocate the use of recovery-conscious scheduling instead of performance oriented scheduling to provide high recovery efficiency without sacrificing system performance. An important question to address in this tier is under which combinations of resource pools and recovery groups, the
recovery-conscious scheduling outperforms the performance oriented scheduling. Our techniques have been implemented on a real industry-standard storage system. Experimental results show that the right choice of recovery-sensitive parameters is critical and our techniques are effective, non-intrusive and can significantly boost system resilience while delivering high performance under a variety of system configurations.