Stochastic Complexity, Histograms and Hypothesis Testing of Homogeneity

Guoqi Qian


Information contained in a sample of quantitative data may be summarized or described by a nonparametric histogram density function. An interesting question is how to construct such a histogram density to express the data information with minimum stochastic complexity.The stochastic complexity is a pseudonym of Rissanen's minimum description length (MDL) which gives the length of a sequence of decipherable binary code resulted from optimally encoding the data information using a probability distribution based code-book. Here we have derived an optimal generalized histogram density estimator to provide both predictive and non-predictive coding description of a data sample. We have also obtained uniform and almost sure asymptotic approximations for the lengths of both descriptions. As an application of this result to statistical inference a new procedure for hypothesis testing of distribution homogeneity is proposed and is proved to have an asymptotic power of 1.


Histogram density estimation; Minimum description length; Model selection; Quantization; Stochastic complexity; Test of homogeneity.

Full Text: