Solved – Implementing Pettitt test in R

I'm trying to implement Pettitt test in R following papers like this pdf (pp. 5 & 6), or this pdf. But, I'm misunderstanding something, because having tested it with some data, I think that output is not correct.

Here is the code:

pettitt <- function(x, alpha=0.99) { # Pettitt AN. 1979 A non-parametric approach to the change point detection. # x is a vector # alpha, integer, level of significance x <- na.omit(x) o <- rank(x) s <- c() L <- length(x) for (i in 1:(L-1)) {       s <- c(s, 2*(colSums(as.matrix(o[1:i]))) - (i*(L+1)) ) } vc <- sqrt((-1) * log(alpha) * (L^3 + L^2)/6) output <- list(abs(s), vc) return(output) } 

Testing with larain and tempdub dataset from TSA package:

library(TSA) data(larain) data(tempdub) pettitt(larain) [[1]]   [1]  78 118 180  76  30  30 144  90 124 148 224 334 314 298 362 444 356 334  [19] 300 302 194 121  83  55  45  57  25  95 175 195 193 287 181 231 175 213  [37] 301 331 421 345 392 322 282 354 372 274 194 130 188 248 175  97  85 153  [55] 105 171 181 189 245 297 401 375 449 557 467 551 594 576 602 490 406 354  [73] 262 266 362 248 244 214 208 200 247 147  89  13   9  15  97   5   9  83  [91]   3  95 123  63  31  12  44   6  48  34  72 108 208 164 170 282 214 148 [109] 202 140 104   6 102  86  [[2]] [1] 50.69224  > max(pettitt(larain)[[1]]) [1] 602  pettitt(tempdub) [[1]]   [1]  83 161 226 235 164  60  80 169 220 219 188  74  57 177 266 281 228 147  [19]  19  82 125 140 102  41 100 197 235 254 233 141   1  97 144 153 112  26  [37]  73 206 255 258 235 137  28  49  98 101  46  29 149 252 281 274 247 160  [55]  43  70 115 126  79  22 157 248 317 328 287 224  96  27  86  79  27  82  [73] 225 348 407 406 351 256 125  10  58  77  32  61 200 314 381 386 353 216  [91] 124  40  35  70  35  36 173 302 365 386 321 242 131  10  51  38  19 146 [109] 241 319 342 359 330 223  89  45 113 144 111   2 123 228 280 275 250 177 [127]  34  50  89 102  59  22 131 248 334 359 302 198  73  46  83 100  73  [[2]] [1] 70.96777  > max(pettitt(tempdub)[[1]]) [1] 407 

I don't know if I lost something in pettitt test or there are error in my code.

After read carefully Pettitt paper, I can understand the method and check my code.

Pettitt starts with a general method expresed:

$$U_{t,T} = sum_{i=1}^{t} sum_{j=t+1}^{T} D_{ij}$$

where $D_{ij} = sign(X_i – X_j)$

Then Pettitt follows expressing a test for discrete Bernoulli and Binomial data and two variants for continuos data.

For discrete test, he use $U_{t,T} = T cdot (S_t – frac{tS_t}{T})$, where $S_t = sum_{1}^{T} X_j$, $S_T = sum_{1}^{T} X_j$ and $X_j$ is eval function expresed like Bernoulli serie where values are 0 or 1.

For continuos test, he use:

  1. $U_{t,T} = 2 cdot W_t – t(T+1)$, when $W_t=sum R_j$ and $R_j$ are the rank or the data. (There is a variation when ties exists.)
  2. $U_{t,T} = U_{t-1,T} + V_{t,T}$, with $t=2,dots,T$ and $V_{t,T} = sum_{j=1}^{T} sign(X_t – X_j)$.

Then, my original code are variant 1 for continuos test and results are correct, except for guessing critical values in Pettitt series results.

I calculate these critical values solving $K_T$ from $P_{OA} = 2e^{(frac{-6{K_{T}}^2}{T^3+T^2})}$, but although $P_{OA}$ is solved correctly, the critical value is not. Why? Actually, I don't know and I can't reach others papers where shows how to calculate table expresed in, for example, Sahin et al paper (link in previous comments).

So that, I change my function code in order to include all variants and $P_{OA}$ calcul. Here are:

<!-- language: lang-R --> pettitt<-function(x,alpha=0.01,method=c("discrete","continuos"),                   alternative=c("rank","variation")) {    # Pettitt AN. 1979 A non-parametric approach to the change point detection.    # (Sección 2.3)    #    # x is a numeric vector    # alpha is a integer    x<-na.omit(x)    orden<-rank(x)    T<-length(x)    method<-match.arg(method)    alternative<-match.arg(alternative)    #    U.t.T<-c()    V.i.T<-c()    P.o.a<-NULL    P.o.a.p<-NULL    P.o.a.n<-NULL    k.T<-NULL    k.T.p<-NULL    k.T.n<-NULL    #    if (!is.numeric(x))       stop("'x' must be a numeric vector)    #    # Discrete values    if (method == "discrete") {       x.j<-sign(x)       x.j[which(x.j==-1)]<-0       S.T<-sum(x.j)       for (i in 1:T) {         S.t<-sum(x.j[1:i])         U.t.T<-c(U.t.T,            T*(S.t-(i*S.T/T)))       }       k.T<-max(abs(U.t.T))       k.T.p<-max(U.t.T)       k.T.n<-min(U.t.T)       P.o.a<-exp((-2*k.T.p^2)/(S.T*(T^2-T*S.T)))       critical<-sqrt((log(alpha)*(S.T*(T^2-T*S.T)))/-2)    }    #    # Continuos value.    if (method == "continuos" & alternative == "rank") {       TIES<-length(unique(x)) < T       if (!TIES) {          for (i in 1:T) {            U.t.T<-c(U.t.T,               2*(colSums(as.matrix(orden[1:i])))                  -(i*(T+1))            )          }       } else {         frequency<-as.vector(table(x))         total.frequency<-sum(frequency)         for (i in 1:length(frequency)) {            U.t.T<-c(U.t.T,               1-(total.frequency*(frequency[i]^2-1))/(T*(T^2-1))            )         }       }       k.T<-max(abs(U.t.T))       P.o.a<-2*exp((-6*k.T^2)/(T^3+T^2))       critical<-sqrt((log(alpha/2)*(T^3+T^2))/-6)    }    if (method == "continuos" & alternativa == "variation") {       V.i.T<-matrix(rep(NA,T^2),ncol=T)       for (i in 1:T) {         for (j in 1:T) {            V.i.T[j,i]<-sign(x[i]-x[j])         }         if (i==1) {            U.t.T<-sum(V.i.T[,i],na.rm=T)         } else {            U.t.T<-c(U.t.T,               U.t.T[(i-1)]+sum(V.i.T[,i],na.rm=T)            )         }       }       V.i.T<-colSums(V.i.T,na.rm=T)       k.T.p<-max(U.t.T)       k.T.n<-min(U.t.T)       k.T<-max(abs(U.t.T))       P.o.a.p<-exp((-6*k.T.p^2)/(T^3+T^2))       P.o.a.n<-exp((-6*k.T.n^2)/(T^3+T^2))       P.o.a<-2*exp((-6*k.T^2)/(T^3+T^2))       critical<-sqrt((log(alpha/2)*(T^3+T^2))/-6)    }    output<-list(U.t.T,V.i.T,P.o.a,P.o.a.p,P.o.a.n,k.T,k.T.p,k.T.n,critical)    return(output) } 

If anyone knows how calcul critical values are welcome.

So many thanks.

Similar Posts:

Rate this post

Leave a Comment