I am trying to detect text in a scanned document by examining variations in the lightness of the scan collapsed vertically. Here's a sample of the input I would receive, with the lightness plot of each vertical pixel strip superimposed:

Note: I've applied a Gaussian smoothing function to the data ~ 10 times, but it seems to be pretty wiggly to begin with. It is easy to see that the left margin is *really* wiggly (i.e., has many extrema).

**Problem**: I want to generate a set of critical points of the image.

I've resorted to computing the number of extrema of the function within an interval (using the derivative and its proximity to zero) and dividing that by the length of the interval, but that isn't easy on the computer. (I use Python, and I couldn't find many low-pass filters for the data.)

Thanks!

**Contents**hide

#### Best Answer

A moving standard deviation sounds like a reasonable thing to use… here is a toy example in poorly written untested poorly optimized pseudo-C, things may go out of bounds or not work as I expect, but you should get the general idea:

`const int NPixelColumns; //The number of pixels columns const int WindowSize; //The size of the moving window for the standard deviation double BrightnessVals[NPixelColumns]; //Someplace to store your data initially int startIndex; //Where the moving window starts int lcv; //Generic loop control variable for (startIndex = 0; startIndex++; startIndex < (NPixelColumns-WindowSize)) { int endIndex = startIndex + (WindowSize-1); double sum; //the sum of values in the windows double xbar; //the mean in the window double deltasq[WindowSize]; //the squared differences between the mean and the value double SS=0; //the sum of deltasq for (lcv = startIndex; lcv++; lcv <= endIndex) { sum += BrightnessVals[lcv]; } xbar = sum/WindowSize; for (lcv = 0; lcv++; lcv < WindowSize) { deltasq[lcv] = pow(BrightnessVals[startIndex+lcv]-xbar,2); SS += deltasq[lcv]; } printf("At step %i the moving SD is: %f", startIndex, SS/sqrt(WindowSize-1)); } `

In R this kind of thing is a snap:

`sdwindow <- function(start,end,data) { return(sd(data[start:end])) } nsamp <- 1000 #The number of samples to look over windowsize <- 10 #The size of the window to get the SD of x <- rnorm(nsamp) #Sample data start <- 1:(nsamp-windowsize) #starting points for the window end <- (windowsize+1):nsamp #ending points for the window doit <- Vectorize(sdwindow, vectorize.args = c("start","end")) #save me the trouble of figuring out mapply for the nth time. doit(start,end,x) #generate the result `

### Similar Posts:

- Solved – Moving average for continuous functions
- Solved – Is moving average(sliding window) a smoothing technique or forecasting technique
- Solved – Can anyone explain what is happening in the stl function of R
- Solved – Moving Average for Unevenly Sampled Time Series
- Solved – Moving Average for Unevenly Sampled Time Series