GIT-CERCS-03-19
James F. Bowring, James M. Rehg, Mary Jean Harrold,
Software Behavior: Automatic Classification and its Applications
A program's behavior is ultimately the collection of all
its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical
analysis and machine learning techniques. We explore the thesis that 1st- and 2nd-order Markov models of
event-transitions are effective predictors of program behavior. We present a technique that
models program executions as Markov models, and a clustering method for Markov models that
aggregates multiple program executions, yielding a statistical description of program behaviors.
With this approach, we can train classifiers to recognize specific behaviors emitted by an
execution without knowledge of inputs or outcomes. We evaluate an application of active
learning to the efficient refinement of our classifiers by conducting three empirical studies that explore a scenario illustrating
automated test plan augmentation. We present a set of potential research questions and applications that our work
suggests.