GIT-CERCS-05-03
Arnab Paul, Sandip Agarwala, Umakishore Ramachandran,
An Extensible Secure and Fault Tolerant Storage System
With the rapidly falling price of hardware, and increasingly
available bandwidth, storage technology is seeing a paradigm
shift from centralized and managed mode to distributed and unmanaged
configurations. The key issues in designing such systems
include scalability, extensibility and robustness to name a
few.
This paper describes e-SAFE , a scalable distributed storage
system that deploys a pastiche of theoretical and practical techniques,
providing tolerance of malicious faults, reduced management
overhead such as periodic repairs, and very high availability
at an archival scale. e-SAFE is designed to provide a
storage utility for environments such as large-scale data centers
in enterprise networks where the servers experience high loads
and thus show temporary unavailability (as opposed to P2P systems,
where servers disappear over the long run). Consequently,
the design goal of e-SAFE is to provide high load resilience in a
seamlessly extensible way. e-SAFE is based on the simple principle:
efficiently sprinkle data all over a distributed storage and
robustly reconstruct even when many of them are unavailable
under high loads.
The performance gears used in e-SAFE are: (i) Task parallelization
over multiple file segments that can take advantage of
an SMP architecture, (ii) Erasure codes with very fast encoding
and decoding algorithms as opposed to naive replications
and (iii) A back-ground replication mechanism hiding the cost
of replication and dissemination from the user, yet guaranteeing
high durability.