The reign of digital services and products requires continuous monitoring of the performance and availability of Information Technology (IT) enterprise applications. Today, inferior user experience such as slow server response times on websites (i.e., long waiting times) lead to worse conversion rates and thus diminish business success. By capturing essential operational measures around-the-clock, Application Performance Management (APM) enables practitioners to reactively detect and resolve performance regressions or abnormal system behavior. To increase the business value of enterprise applications, prior research emphasizes a shift from reactive towards proactive or predictive APM. Potential advantages are more efficient IT resource planning and broader analytical understanding of performance bottlenecks and anomalies.
The objectives of this thesis thus constitute the change of paradigm in APM to (i) forecasting performance regressions, and (ii) detecting abnormal system behavior in enterprise applications. The contribution is a set of developed models leveraging Machine Learning (ML) techniques for time series forecasting and anomaly detection.
To forecast performance regressions (i.e., high response times), we employ linear and nonlinear supervised ML algorithms on uni- and multivariate feature sets. Models with higher accuracy harness the nonlinear techniques, random forest regressors and Recurrent Neural Networks (RNNs). Moreover, the process of feature selection and engineering reveals that multivariate features sets (i.e., multiple APM measures) are superior to univariate modeling, which only considers historical response time.
In this thesis, we introduce a novel approach for the detection of abnormal system behavior based on density-based clustering. Our model detects abnormal system behavior more reliable than commonly used outlier detection techniques, which we adduce as baseline models. Additionally, the outlined approach provides an indication for the reason of the detected abnormal system behavior and hereby facilitates root cause analysis.
We evaluate the implemented models with real-world monitoring data from two productively running enterprise applications. The data set is obtained from a German car manufacturer and the scope of the analysis is defined in coordination with APM domain experts.