GIT-CERCS-14-01
Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu,
Write-Optimized Indexing for Log-Structured Key-Value Stores
The recent shift towards write-intensive workload on
big data (e.g., financial trading, social user-generated data
streams) has pushed the proliferation of the log-structured
key-value stores, represented by Google's BigTable, HBase
and Cassandra; these systems optimize write performance by
adopting a log-structured merge design. While providing key-
based access methods based on a Put/Get interface, these
key-value stores do not support value-based access methods,
which significantly limits their applicability in many web and
Internet applications, such as real-time search for all tweets
or blogs containing "government shutdown". In this paper,
we present HINDEX, a write-optimized indexing scheme
on the log-structured key-value stores. To index intensively
updated big data in real time, the index maintenance is made
lightweight by a design tailored to the unique characteristic
of the underlying log-structured key-value stores. Concretely,
HINDEX performs append-only index updates, which avoids
the reading of historic data versions, an expensive operation
in the log-structure store. To fix the potentially obsolete
index entries, HINDEX proposes an offline index repair
process through tight coupling with the routine compactions.
HINDEX's system design is generic to the Put/Get interface;
we implemented a prototype of HINDEX based on HBase
without internal code modification. Our experiments show
that the HI NDEX offers significant performance advantage
for the write-intensive index maintenance.