Pseudolikelihood

From HandWiki

In statistical theory, a pseudolikelihood is an approximation to the joint probability distribution of a collection of random variables. The practical use of this is that it can provide an approximation to the likelihood function of a set of observed data which may either provide a computationally simpler problem for estimation, or may provide a way of obtaining explicit estimates of model parameters.

The pseudolikelihood approach was introduced by Julian Besag[1] in the context of analysing data having spatial dependence.

Definition

Given a set of random variables X=X1,X2,,Xn the pseudolikelihood of X=x=(x1,x2,,xn) is

L(θ):=iPrθ(Xi=xiXj=xj for ji)=iPrθ(Xi=xiXi=xi)

in discrete case and

L(θ):=ipθ(xixj for ji)=ipθ(xixi)=ipθ(xix1,,x^i,,xn)

in continuous one. Here X is a vector of variables, x is a vector of values, pθ() is conditional density and θ=(θ1,,θp) is the vector of parameters we are to estimate. The expression X=x above means that each variable Xi in the vector X has a corresponding value xi in the vector x and xi=(x1,,x^i,,xn) means that the coordinate xi has been omitted. The expression Prθ(X=x) is the probability that the vector of variables X has values equal to the vector x. This probability of course depends on the unknown parameter θ. Because situations can often be described using state variables ranging over a set of possible values, the expression Prθ(X=x) can therefore represent the probability of a certain state among all possible states allowed by the state variables.

The pseudo-log-likelihood is a similar measure derived from the above expression, namely (in discrete case)

l(θ):=logL(θ)=ilogPrθ(Xi=xiXj=xj for ji).

One use of the pseudolikelihood measure is as an approximation for inference about a Markov or Bayesian network, as the pseudolikelihood of an assignment to Xi may often be computed more efficiently than the likelihood, particularly when the latter may require marginalization over a large number of variables.

Properties

Use of the pseudolikelihood in place of the true likelihood function in a maximum likelihood analysis can lead to good estimates, but a straightforward application of the usual likelihood techniques to derive information about estimation uncertainty, or for significance testing, would in general be incorrect.[2]

References

  1. Besag, J. (1975), "Statistical Analysis of Non-Lattice Data", The Statistician 24 (3): 179–195, doi:10.2307/2987782 
  2. Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, Oxford University Press. ISBN 0-19-920613-9 [full citation needed]