GIT-CERCS-07-02
Sandip Agarwala, Fernando Alegre, Karsten Schwan, Jegannathan Mehalingham,
E2EProf: Automated End-to-End Performance Management for Enterprise Systems
Distributed systems are becoming increasingly complex, caused by the prevalent
use of web services, multi-tier architectures, and grid computing, where
dynamic sets of components interact with each other across distributed and
heterogeneous computing infrastructures. For these applications to be able to
predictably and efficiently deliver services to end users, it is therefore,
critical to understand and control their runtime behavior. In a datacenter
environment, for instance, understanding the end-to-end dynamic behavior of
certain IT subsystems, from the time requests are made to when responses are
generated and finally, received, is a key prerequisite for improving
application response, to provide required levels of performance, or to meet
service level agreements (SLAs).
The 'E2EProf' toolkit enables the efficient and non-intrusive capture and
analysis of end-to-end program behavior for complex enterprise applications.
E2EProf permits an enterprise to recognize and analyze performance problems
when they occur -- online, to take corrective actions as soon as possible and
wherever necessary along the paths currently taken by user requests --
end-to-end, and to do so without the need to instrument applications --
non-intrusively. Online analysis exploits a novel signal analysis algorithm,
termed 'pathmap', which dynamically detects the causal paths taken by
client requests through application and backend servers and annotates these
paths with end-to-end latencies and with the contributions to these latencies
from different path components. Thus, with pathmap, it is possible to
dynamically identify the bottlenecks present in selected servers or services
and to detect the abnormal or unusual performance behaviors indicative of
potential problems or overloads. Pathmap and the E2EProf toolkit successfully detect causal request paths and associated performance bottlenecks in the RUBiS ebay-like multi-tier
web application and in one of the datacenter of our industry partner, Delta Air
Lines.