Today I’m going to share some details on an analytics
technology I’ve been using for my research.
QBETS (Queue Bounds Estimation from Time Series) is a
non-parametric time series analysis method. The basic idea behind QBETS is to
analyze a time series, and predict the p-th percentile of it, where p is a
user-specified parameter. QBETS learns from the existing data points in the
input time series, and estimates a p-th percentile value such that the next
data point in the time series has a 0.01p probability of being less than or
equal to the estimated value.
For example, suppose we have the following input time
series, and we wish to predict the 95th percentile of it:
A0, A1, A2, …., An
If QBETS predicts the value Q as the 95th
percentile, we can say that An+1 (the next data point that will be
added to the time series by the generating process), has a 95% chance of being
less than or equal to Q.
P(An+1 <= Q) = 0.01p
Since QBETS cannot determine the percentile values exactly,
but must estimate them, it uses another parameter c (0 < c < 1) as an
upper confidence bound on the estimated values. That is, if QBETS was used to
estimate the p-th percentile value of a time series with c upper confidence, it
would have overestimated the p-th percentile with a probability of 1 – c. For
instance, if c = 0.05, then QBETS will generate predictions that overestimate
the true p-th percentile 95% of the time. We primarily use parameter c as a
means of controlling how conservative we want QBETS to be, when predicting
percentiles.
QBETS also supports a technique known as change point
detection. To understand what this means, let’s look at the following input
time series.
7, 8, 7, 7, 9, 8, 7, 7, 15,
15, 16, 14, 16, 17,15
Here we see a sudden shift in the values after the first 8
data points. The individual data point values have increased from the 7-9 range
to 14-17 range. QBETS detects such change points in the time series, and takes
action to discard the data points before the change point. This is necessary to
make sure that the predictions are not influenced by old historical values that
are no longer relevant in the time series generating process.
The paper that originally introduced QBETS, used it as a
mechanism to predict the scheduling delays in batch queuing systems for
supercomputers and other HPC systems. Over the years researchers have used QBETS
with a wide range of datasets, and it has produced positive results in almost
all the cases. Lately, I have been using QBETS as a means of predicting API
response times, by analyzing historical API performance data. Again, the
results have been quite promising.
To learn more about QBETS, go through the paper or contact
the authors.