The aim of this thesis is the development of a forecasting tool for resource utilization metrics, as the CPU utilization, in order to support capacity planners of Infrastructure-as-a-Service (IaaS) providers in their decisions. The whole process of measuring, collecting and storing of performance monitoring data is described, as well as the statistical forecasting and the detection of threshold violations. The main focus of the thesis lies on the statistical modeling, forecasting and visualization of the data. The forecasting functionality of the system allows capacity planners a look into the future and in combination with previous performance data long term trends can be visualized. Autoregressive models are used for forecasting.
The data model of the Finanz Informatik Technologie Service GmbH (FI-TS) for storing performance monitoring data is evaluated and its maturity is shown. An introduction to the noSQL database Cassandra which FI-TS uses for storing monitoring data is given. Furthermore, a navigation concept based on mockups is described which simplifies the navigation through the performance data. In order to allow users to drill down to the cause of threshold violations an overview chart for the threshold violations is introduced.
The implementation comprises the development of a web dashboard that visualizes the monitoring data as well as the forecasts and threshold violations. Measuring, collecting and persisting the monitoring data are not part of the implementation because these parts already exist at the FI-TS. Nevertheless, the system was implemented to a large extend independently from a specific IaaS provider. For evaluation purposes, the system was deployed to the infrastructure of FI-TS to gain insights into the system’s prediction accuracy. With a small dataset it is shown that an forecast accuracy in terms of the mean percentage error (MPE) of 24.5% is reached; i.e.: the predictions differed on average by 24.5% from the actually observed data.