Predicting Capacity Usage in Backup Storage Systems

Graduate Student Seminars
Sep 22, 2015
12:30 pm
Fine Hall 214

Protecting data from loss is of crucial significance to businesses. Failure to protect data can lead to heavy financial and strategic losses, that are often difficult to recover from. Thus, businesses employ backup techniques to store copies of data to enable failure recovery. But surprisingly, backups often fail. An analysis of about 48,000 installations over a period of 3 years shows that one in six errors result from inadequate storage capacity and yet, little research has been done in the area of storage capacity forecasting that could mitigate these errors. We propose a simulation model that predicts capacity usage by employing 3 techniques: autoregressive and moving-average modeling, clustering and stochastic modeling, and linear regression. Furthermore, our models provide a range of times when the capacity is likely to be reached, rather than a single point estimate, which is more beneficial in capacity planning. We evaluate the accuracy of our model using synthetic data.Our results show that our models outperform the previous piecewise regression method proposed by Chamness when applied to nonlinear datasets, while performing comparably when applied to linear datasets.