Friday, March 20, 2015

QBETS: A Time Series Analysis and Forecasting Method

Today I’m going to share some details on an analytics technology I’ve been using for my research.
QBETS (Queue Bounds Estimation from Time Series) is a non-parametric time series analysis method. The basic idea behind QBETS is to analyze a time series, and predict the p-th percentile of it, where p is a user-specified parameter. QBETS learns from the existing data points in the input time series, and estimates a p-th percentile value such that the next data point in the time series has a 0.01p probability of being less than or equal to the estimated value.
For example, suppose we have the following input time series, and we wish to predict the 95th percentile of it:

A0, A1, A2, …., An

If QBETS predicts the value Q as the 95th percentile, we can say that An+1 (the next data point that will be added to the time series by the generating process), has a 95% chance of being less than or equal to Q.

P(An+1 <= Q) = 0.01p

Since QBETS cannot determine the percentile values exactly, but must estimate them, it uses another parameter c (0 < c < 1) as an upper confidence bound on the estimated values. That is, if QBETS was used to estimate the p-th percentile value of a time series with c upper confidence, it would have overestimated the p-th percentile with a probability of 1 – c. For instance, if c = 0.05, then QBETS will generate predictions that overestimate the true p-th percentile 95% of the time. We primarily use parameter c as a means of controlling how conservative we want QBETS to be, when predicting percentiles.
QBETS also supports a technique known as change point detection. To understand what this means, let’s look at the following input time series.

7, 8, 7, 7, 9, 8, 7, 7, 15, 15, 16, 14, 16, 17,15

Here we see a sudden shift in the values after the first 8 data points. The individual data point values have increased from the 7-9 range to 14-17 range. QBETS detects such change points in the time series, and takes action to discard the data points before the change point. This is necessary to make sure that the predictions are not influenced by old historical values that are no longer relevant in the time series generating process.
The paper that originally introduced QBETS, used it as a mechanism to predict the scheduling delays in batch queuing systems for supercomputers and other HPC systems. Over the years researchers have used QBETS with a wide range of datasets, and it has produced positive results in almost all the cases. Lately, I have been using QBETS as a means of predicting API response times, by analyzing historical API performance data. Again, the results have been quite promising.

To learn more about QBETS, go through the paper or contact the authors.