GIT-CERCS-07-12
Sangeetha Seshadri, Brian F. Cooper, Ling Liu,
CubeCache: Efficient and Scalable Processing of OLAP Aggregation Queries in a Peer-to-Peer Network
Peer to Peer (P2P) data sharing systems are emerging as a promising infrastructure for collaborative data sharing among multiple geographically distributed data centers within a large enterprise. This paper presents CubeCache, a peer-to-peer system for efficiently serving OLAP queries and data cube aggregations in a distributed data warehouse system. CubeCache combines multiple client caches into a single query processing and caching system. Compared to
existing peer-to-peer systems the CubeCache solution has a number of unique features. First, we add a query processing layer to perform in-network
data aggregation over peer caches. Second, we introduce the concept of Query-Trails: a cache listing recent data requestors. Query-Trails make it easier to find caches that are likely to have data needed for a query. Third, we design a benefit measure that incorporates the 'rarity' of a chunk into the notion of benefit, allowing controlled replication of chunks in a system plagued by frequent node departures or failures. We report the results of analysis and an experimental study using simulations and an implemented prototype that shows the CubeCache solution reduces the server load, improves query throughput and reduces query latency for OLAP tasks.