GIT-CERCS-05-19
Bugra Gedik, Kun-Lung Wu, Philip S. Yu, Ling Liu,
GRUBJOIN: An Adaptive Multi-Way Windowed Stream Join with Time
Correlation-Aware CPU Load Shedding
Dropping tuples has been commonly used for load shedding. However,
tuple dropping generally is inadequate to shed load for multiway
windowed stream joins. The output rate can be unnecessarily
and severely degraded because tuple dropping does not recognize
time correlations likely to exist among the streams. This paper introduces
GrubJoin: an adaptive multi-way windowed stream join
that efficiently performs time correlation-aware CPU load shedding.
GrubJoin maximizes the output rate by achieving nearoptimal
window harvesting within an operator throttling framework,
i.e., regulating the fractions of the join windows that are
processed by the multi-way join. Window harvesting performs the
join using only certain more useful segments of the join windows.
Due mainly to the combinatorial explosion of possible multi-way
join sequences involving various segments of individual join windows,
GrubJoin faces a set of unique challenges, such as determining
the optimal window harvesting configuration and learning the
time correlations among the streams. To tackle these challenges,
we formalize window harvesting as an optimization problem, develop
greedy heuristics to determine near-optimal window harvesting
configurations and use approximation techniques to capture the
time correlations among the streams. Experimental results show
that GrubJoin is vastly superior to tuple dropping when time correlations
exist among the streams and is equally effective as tuple
dropping in the absence of time correlations.