Coupling from the past

Among Markov chain Monte Carlo (MCMC) algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.

The basic idea

Consider a finite state irreducible aperiodic Markov chain M {\displaystyle M} with state space S {\displaystyle S} and (unique) stationary distribution π {\displaystyle \pi } ( π {\displaystyle \pi } is a probability vector). Suppose that we come up with a probability distribution μ {\displaystyle \mu } on the set of maps f : S S {\displaystyle f:S\to S} with the property that for every fixed s S {\displaystyle s\in S} , its image f ( s ) {\displaystyle f(s)} is distributed according to the transition probability of M {\displaystyle M} from state s {\displaystyle s} . An example of such a probability distribution is the one where f ( s ) {\displaystyle f(s)} is independent from f ( s ) {\displaystyle f(s')} whenever s s {\displaystyle s\neq s'} , but it is often worthwhile to consider other distributions. Now let f j {\displaystyle f_{j}} for j Z {\displaystyle j\in \mathbb {Z} } be independent samples from μ {\displaystyle \mu } .

Suppose that x {\displaystyle x} is chosen randomly according to π {\displaystyle \pi } and is independent from the sequence f j {\displaystyle f_{j}} . (We do not worry for now where this x {\displaystyle x} is coming from.) Then f 1 ( x ) {\displaystyle f_{-1}(x)} is also distributed according to π {\displaystyle \pi } , because π {\displaystyle \pi } is M {\displaystyle M} -stationary and our assumption on the law of f {\displaystyle f} . Define

F j := f 1 f 2 f j . {\displaystyle F_{j}:=f_{-1}\circ f_{-2}\circ \cdots \circ f_{-j}.}

Then it follows by induction that F j ( x ) {\displaystyle F_{j}(x)} is also distributed according to π {\displaystyle \pi } for every j N {\displaystyle j\in \mathbb {N} } . However, it may happen that for some n N {\displaystyle n\in \mathbb {N} } the image of the map F n {\displaystyle F_{n}} is a single element of S {\displaystyle S} . In other words, F n ( x ) = F n ( y ) {\displaystyle F_{n}(x)=F_{n}(y)} for each y S {\displaystyle y\in S} . Therefore, we do not need to have access to x {\displaystyle x} in order to compute F n ( x ) {\displaystyle F_{n}(x)} . The algorithm then involves finding some n N {\displaystyle n\in \mathbb {N} } such that F n ( S ) {\displaystyle F_{n}(S)} is a singleton, and outputting the element of that singleton. The design of a good distribution μ {\displaystyle \mu } for which the task of finding such an n {\displaystyle n} and computing F n {\displaystyle F_{n}} is not too costly is not always obvious, but has been accomplished successfully in several important instances.[1]

The monotone case

There is a special class of Markov chains in which there are particularly good choices for μ {\displaystyle \mu } and a tool for determining if | F n ( S ) | = 1 {\displaystyle |F_{n}(S)|=1} . (Here | | {\displaystyle |\cdot |} denotes cardinality.) Suppose that S {\displaystyle S} is a partially ordered set with order {\displaystyle \leq } , which has a unique minimal element s 0 {\displaystyle s_{0}} and a unique maximal element s 1 {\displaystyle s_{1}} ; that is, every s S {\displaystyle s\in S} satisfies s 0 s s 1 {\displaystyle s_{0}\leq s\leq s_{1}} . Also, suppose that μ {\displaystyle \mu } may be chosen to be supported on the set of monotone maps f : S S {\displaystyle f:S\to S} . Then it is easy to see that | F n ( S ) | = 1 {\displaystyle |F_{n}(S)|=1} if and only if F n ( s 0 ) = F n ( s 1 ) {\displaystyle F_{n}(s_{0})=F_{n}(s_{1})} , since F n {\displaystyle F_{n}} is monotone. Thus, checking this becomes rather easy. The algorithm can proceed by choosing n := n 0 {\displaystyle n:=n_{0}} for some constant n 0 {\displaystyle n_{0}} , sampling the maps f 1 , , f n {\displaystyle f_{-1},\dots ,f_{-n}} , and outputting F n ( s 0 ) {\displaystyle F_{n}(s_{0})} if F n ( s 0 ) = F n ( s 1 ) {\displaystyle F_{n}(s_{0})=F_{n}(s_{1})} . If F n ( s 0 ) F n ( s 1 ) {\displaystyle F_{n}(s_{0})\neq F_{n}(s_{1})} the algorithm proceeds by doubling n {\displaystyle n} and repeating as necessary until an output is obtained. (But the algorithm does not resample the maps f j {\displaystyle f_{-j}} which were already sampled; it uses the previously sampled maps when needed.)

References

  1. ^ "Web Site for Perfectly Random Sampling with Markov Chains".
  • Propp, James Gary; Wilson, David Bruce (1996), Proceedings of the Seventh International Conference on Random Structures and Algorithms (Atlanta, GA, 1995), pp. 223–252, MR 1611693
  • Propp, James; Wilson, David (1998), "Coupling from the past: a user's guide", Microsurveys in discrete probability (Princeton, NJ, 1997), DIMACS Ser. Discrete Math. Theoret. Comput. Sci., vol. 41, Providence, R.I.: American Mathematical Society, pp. 181–192, doi:10.1090/dimacs/041/09, ISBN 9780821808276, MR 1630414, S2CID 2781385