Lets assume that we have a pattern $alpha$ of bits of length $n$. Then I wish to know what the probability is of $alpha$ appearing on a string of bits of length $m$ **at least twice** (where $m > n$), assuming that bit 1 and bit 0 are equiprobable (bin distribution).

So for example, lets assume $n=4$ and the sequence $alpha$ is `0111`

, and $m=10$. So a valid ocurrence would be `011101010111`

, where we would have an $alpha$ at the beginning and one at the end. Of course there are many more, but how can I calculate this probability depending on $n$ and $m$ ?

I reckon this also depends on how $alpha$ looks like, since if for example $alpha=$ `00010000`

, then nothing speaks against starting another ocurrence of $alpha$ in the last 3 bits of the "old" $alpha$, as the sequence some coinciding bits at the beginning and end.

**EDIT** (to add comment as part of the question requirements)

An upper bound analysis for the error is also a valid answer. Intuition with simulations in R, Python or Matlab to arrive to that bound are also valid. I am not looking at ridiculous cases where $n$ is very small, say $alpha$=`01`

, but instead assume that $n$ is in the ballpark 10-20 and $m$ is a handful of hundreds (300-500).

**EDIT2** changed twice for at least twice above, which was desired from the beginning.

**Contents**hide

#### Best Answer

As @NeilG stated in the comments, the desired probability can still be computed exactly by defining a (2n+1)-state markov chain and computing the probability of having seen 2 copies of $alpha$ after m iterations. The states in the markov chain will be of the form $(a, b)$, where $a$ is the number of times we have already seen $alpha$ (0, 1, or 2), and $b$ is our progress in seeing $alpha$ at the current position. For the small example provided with pattern `0111`

, the states are:

begin{align*} (0,&~-) \ (0,&~0) \ (0,&~01) \ (0,&~011) \ (1,&~-) \ (1,&~0) \ (1,&~01) \ (1,&~011) \ (2,&~-) \ end{align*}

The transition probabilities are defined quite naturally. For pattern `0111`

, if we see a 0 from state $(0,~-)$ then we transition to $(0,~0)$ and otherwise we stay in $(0,~-)$. If we see a 1 from state $(0,~0)$ then we transition to $(0,~01)$ and otherwise we stay in $(0,~0)$, as we still have the first 0 in the pattern despite not getting a 1. If we see a 1 from state $(0,~01)$ then we transition to $(0,~011)$ and otherwise we go back to state $(0,~0)$. Finally, if we see a 1 from state $(0,~011)$ then we transition to $(1,~-)$, and otherwise we go back to $(0,~0)$. State $(2,~-)$ is an absorbing state.

This cleanly handles overlapping patterns. If we were searching for pattern `00010000`

, then upon getting a 0 from $(0,~0001000)$ we would transition to $(1,~000)$.

Computing the transition probabilities and iterating the markov chain from the initial state of $(0,~-)$ can be implemented without too much trouble in your favorite programming language (I'll use R here):

`library(expm) best.match <- function(pattern, partial) { to.match <- sapply(seq_len(nchar(pattern)+1), function(s) substr(pattern, s, nchar(pattern))) matches <- match(to.match, partial) matches <- matches[!is.na(matches)] partial[matches[1]] } get.prob <- function(alpha, m) { n <- nchar(alpha) partial <- sapply(0:(n-1), function(k) substr(alpha, 1, k)) state.match <- rep(0:2, c(n, n, 1)) state.pattern <- c(partial, partial, "") mat <- sapply(seq_along(state.match), function(i) { this.match <- state.match[i] this.pattern <- state.pattern[i] if (this.match == 2) { as.numeric(state.match == 2) # Absorbing state } else { rowMeans(sapply(paste0(this.pattern, c("0", "1")), function(new.pattern) { if (new.pattern == alpha && this.match == 0) { as.numeric(state.match == 1 & state.pattern == best.match(substr(new.pattern, 2, nchar(new.pattern)), partial)) } else if (new.pattern == alpha && this.match == 1) { as.numeric(state.match == 2) } else { as.numeric(state.match == this.match & state.pattern == best.match(new.pattern, partial)) } })) } }) tail((mat %^% m)[,1], 1) } `

You can invoke the function by passing the pattern (as a string) and the number of iterations m:

`get.prob("0111", 10) # [1] 0.0234375 get.prob("00010000", 200) # [1] 0.177094 `

Though this is not a closed form solution, it does give the exact desired probabilities, so it can be used to evaluate the quality of any other bounds that are derived.

### Similar Posts:

- Solved – Constructing a transition probability from Q-learning
- Solved – Is a GMM-HMM equivalent to a no-mixture HMM enriched with more states
- Solved – Use multi-state modelling (MSM) to predict individual transition probabilities in R
- Solved – Essential transient state in a Markov chain
- Solved – Essential transient state in a Markov chain