Estimation and Selection in Regression Clustering

Guoqi Qian, Yuehua Wu


Regression clustering is an important model-based clustering tool having applications in a variety of disciplines. It discovers and reconstructs the hidden structure for a data set which is a random sample from a population comprising a fixed, but unknown, number of sub-populations, each of which is characterized by a class-specific regression hyperplane. An essential objective, as well as a preliminary step, in most clustering techniques including regression clustering, is to determine the underlying number of clusters in the

data. In this paper, we briefly review regression clustering methods and discuss how to determine the underlying number of clusters by using model selection techniques, in particular, the information-based technique. A computing algorithm is developed for estimating the number of clusters and other parameters in regression clustering. Simulation studies are also provided to show the performance of the algorithm.


Regression clustering, Least squares, Model Selection

Full Text: