Search This Site

Search Google

Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

Saturday, March 17, 2007

Applying the CUSUM to your spreadsheet data


The CUSUM is, simply put the cumulative sum of the differences of your data from its mean value.

i.e. CUSUM(t) = CUSUM(t-1) + data(t)-(mean value)


in spreadsheet for if your data is in range from A2..A32, the cusum could be implemented as follows in cells B2..B32: (Make sure cell B1 is empty equal to zero (0).)


B2=B1+A2-average($a$2..$a$32)

Then copy down to cells B3..B32.

Alternatively you could work out the average in a seperate cell and then absolute reference it.

eg. A1=average($a$2..$a$32)

then B2 would simplify to:

B2=B1+A2-$A$1

In the next post we'll cover the interpretation of the CUSUM.

In the graph shown an example of some noisy data (bottom series) and its corresponding CUSUM is presented. Click here to see the data spreadsheet.

Friday, March 16, 2007

Introducing the CUSUM chart

The Cumulative Sum chart (CUSUM) is a very useful (and powerful) tool to pick up small changes in trends of noisy data- and it works just as well if there are large trend changes. Despite what some might say, it is actually relatively easy to apply this technique in a spreadsheet to analyse your data, although one must understand how to interpret the graph.

The CUSUM is very useful to identify important dates or times (turning points) where a trend change occurred, especially where one finds that this is not obvious when looking at the time series data.
You can use it to guage the likelihood that other noisy variables could be causes if one can pick up similar trend changes in these variables. It finds applications in data analysis especially of process data and is a tool that can be used by Six Sigma practitioners.

In my next post I will elaborate on how to implement this useful technique in a spreadsheet such as Excel or OpenOffice, and also show an example, and more importantly how to interpret the graph, so stay tuned. For now, here are some related links.

Cusum Link 1 (NIST)
Cusum Link 2 (iSixSigma)

Monday, March 5, 2007

Exponential Filter/Smoothing


#1
The problem...
The problem is you've got noisy time series data. Ideally you would like to plot it on graph to show the average trend or compare it with another variable. BUT its too noisy and clutters the graph if you plot it with markers and lines. With just markers it appears as a scatter cloud?

Aha, then you see that Excel has the option to add a "trendline". But you can't seem to find one that does the data justice -or- its just plain wrong in places. The "moving average" seems the better one of the options, but to get good smoothing you have to average many periods. That has some bad effects: it delays when changes occurs, especially when comparing to another variable (plotted in another series).
Resist the temptation of moving average!!!

A solution ...
A nice easy way to sort out this data is to apply a simple exponential filter or smoothing. There are better, but this is simple and for 80% of the time will work great.

How does it work?

I will illustrate this in the spreadsheet, but the maths is simply like this:

New Smoothed value = (1-factor) * actual data + (factor)*(previous smoothed value)
or:
y_n=(1-k)*x_n + k*y_n-1
y = smoothed value
x = actual (noisy) data
k = smoothing factor, a real number between 0 and 1. 0= no smoothing

How to set this up in a spreadsheet:

1. Estimate a value for the starting smoothed average. I usually use the first raw actual data point or take an average of the first few points.
2. Setup the filter to the right of the actual data points.
3. Reference the smoothing constant "k" as cell using absolute $ reference eg. $b$1. This will allow you to adjust it easily.
4. Copy the filter down so that you have two columns- to the left your original data and to your right the smoothed values.
5. Play and adjust "k" to suit your needs. As a first guess use 0.5. I commonly use anywhere from 0.5 to 0.85. Be wary of values above this as you could be over smoothing, but it does depend on your application.
6. For graphing purposes I suggest keeping your original raw data series (but use markers only, i.e. scatter plot) and then plotting the smoothed trend as a second series (lines only).

http://www.4shared.com/file/11752632/8565b041/exponential_filter.html

 

Subscribe to feed

Subscribe in a reader