Science/Engineering/Technical forums

by **Doug Coulter** » Tue Jun 02, 2015 2:36 pm

Sometimes a simple, or even non-simple, nonlinear filter just won't get it. Your signal may have crazy-big outliers that you are sure are bogus (but do be sure!). If simply filtered, a huge outlier value can affect the output of the filter, and make it wrong, for many samples after its occurrence. There are plenty of cases where this is simply not acceptable - we need a reasonable estimate right now. Enter the median smoothing algorithm.

In a median smooth, we simply keep N previous samples of data, along with the current one - sort them (total number is always odd so there will be no doubt that one is the "middle value") and pick the middle value. It might still want some filtering - but you first got rid of the crazy wild error, which is good.

This situation can happen in lots of ways. Suppose you are looking at say, 16 bit samples, but in the range of interest, only the bottom few bits actually matter - yes, you could add gain to the a/d converter in many cases, but sometimes you cannot, because later on, you'll be using those higher bits (as the signal may have quite a large dynamic range - eventually, like say a thermocouple that reads room temperature, or some "safe" temperature when a seriously hot oven isn't fired up).
Variable-gain amplifiers accurate to 16 bits are neither common nor cheap (even a 16 bit d/a and opamp can drift, will add gain errors, etc).

You are stuck with what you've got, what to do? In one case that comes to mind we had a thermometer chip monitoring battery temperature as part of a charger for then-exotic chemistry batteries, and needed a really accurate number for the temperature rise to know how to taper the charge - but almost our entire signal was noise in the chip - no way to change that. As luck would have it, it was largely 1/f noise and primarily gaussian distributed, and this trick saved our bacon.

Usually the easiest way to keep the last N samples is to create a rolling buffer - an array where you wrap the array index around to 0 whenever it exceeds your desired size (if you do this, the newest sample automatically overwrites the oldest one). I can almost hear gasps about sorting such a thing when the sample rate gets high, and yes, most sorting algorithms are not well suited to this kind of thing, since N is almost always 5,7,9 - some small number (and try 5 first!).

I've seen this sort done in very short highly optimized assembler for a (seriously CISC) DSP cpu that sadly, almost no one uses anymore, but the tricks were slick - just a few if() statements did the job. In fact, if your N is 5, you can simply go through the array, using each sample as the "reference" and any sample that has two array members greater or equal (or less than or equal) to it is your target. That can be coded pretty simply - and it does not matter if more than one sample has this (think about it - if they do, you've still got one of the right answers!).

Often as not, since most noise IS gaussian, the middle sample will actually be more accurate, faster, than a similarly low-passed average of the last bunch of samples. This is because gaussian noise has it's peak at zero, and is symmetric around that, so it's actually pretty likely that the middle sample in such a series is the correct one.

Again, in any data filtering technique there is always the danger you'll find what you were looking for - even if it wasn't there. Know your problem, and make sure those "outliers" aren't actually part of some other valid distribution you might want to analyze just as badly!

Science/Engineering/Technical forums

DSP trick - median smoothing

DSP trick - median smoothing

Who is online