Been thinking about the most accurate method for forecasting. More closely to my line of work, I’d like to forecast database growth over a specified period of time into the future?
The common method I have seen used is to simple use excel and plot a trend line from previously collected data, while this can be quite use-able in most scenarios, I don’t believe this captures everything such as seasonality. Seasonality here refers to the period data growth / purging that occurs. Most ignore this or are simply not aware but most databases will have some seasonal growth which needs to be added to the forecasting model.
The model used will be a 2 factor model of the form:
y = D + X + K
Where, y = the forecast database size, D = the seasonality variable, X = the stochastic variable with trend, and finally K is a constant of some sort.
Before we decide to identify seasonality and subsequently forecast, we need to gather some data (time series) In my case I’m going to be collecting hourly database size. Hourly simply because I want to retain a level of precision.
Lets use a simple Perl script (for portability, honestly anything that can remote exec a SQL query) which will connect to the database and execute some SQL.
See the code for data gathering in part 2.