GIT-CERCS-06-06
Sangeetha Seshadri, Vibhore Kumar, Brian Cooper,
Using hierarchies for Optimizing Distributed Stream Queries
We consider the problem of query optimization in distributed data stream systems where multiple continuous queries may be executing simultaneously. In order to achieve the best performance, query planning (such as join ordering) must be considered in conjunction with deployment planning (e.g., assigning operators to physical nodes). In our scenario, the large number of network nodes, query operators, and opportunities for operator sharing between queries means that brute force and traditional techniques are too expensive. We propose two algorithms - the Bottom-Up algorithm and the Top-Down algorithm, which utilize hierarchical network partitions to provide scalable query optimization. We present analysis that establishes the bounds on the search-space and sub-optimality achieved by our algorithms. Finally, through simulations and experiments using a prototype deployed on Emulab we demonstrate the effectiveness of our algorithms. The Top-Down algorithm, for instance, was able to achieve, on an average, solutions that were sub-optimal by only 10% while considering less than 1% of the search space.