CERCS Technical Reports

GIT-CERCS-03-19

Software Behavior: Automatic Classification and its Applications

A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. We explore the thesis that 1st- and 2nd-order Markov models of event-transitions are effective predictors of program behavior. We present a technique that models program executions as Markov models, and a clustering method for Markov models that aggregates multiple program executions, yielding a statistical description of program behaviors. With this approach, we can train classifiers to recognize specific behaviors emitted by an execution without knowledge of inputs or outcomes. We evaluate an application of active learning to the efficient refinement of our classifiers by conducting three empirical studies that explore a scenario illustrating automated test plan augmentation. We present a set of potential research questions and applications that our work suggests.