Misspecified Multivariate Regression Models Using the Genetic Algorithm and Information Complexity as the Fitness Function

Hamparsum Bozdogan; J. Andrew Howe

Misspecified Multivariate Regression Models Using the Genetic Algorithm and Information Complexity as the Fitness Function

Authors

Hamparsum Bozdogan UTK
J. Andrew Howe

Keywords:

Misspecified multivariate regression models, Information complexity, Robust estimation, Genetic algorithm, Subset selection, Dimension reduction

Abstract

Model misspecification is a major challenge faced by all statistical modeling techniques.Â Real world multivariate data in high dimensions frequently exhibit higher kurtosis and heavier tails,Â asymmetry, or both. In this paper, we extend Akaikeâ€™s AIC-type model selection criteria in two ways.Â We use a more encompassing notion of information complexity (ICOMP) of Bozdogan for multivariateÂ regression to allow certain types of model misspecification to be detected using the newly proposedÂ criterion so as to protect the researchers against model misspecification. We do this by employing theÂ â€œsandwichâ€or â€œrobustâ€covariance matrix FË†âˆ’1RË†FË†âˆ’1, which is computed with the sample kurtosis andÂ skewness. Thus, even if the data modeled do not meet the standard Gaussian assumptions, an appropriateÂ model can still be found. Theoretical results are then applied to multivariate regression modelsÂ in subset selection of the best predictors in the presence of model misspecification by using the novelÂ genetic algorithm (GA), with our extended ICOMP as the fitness function.Â We demonstrate the power of the confluence of these techniques on both simulated and real-worldÂ datasets. Our simulations are very challenging, combining multicolinearity, unnecessary variables, andÂ redundant variables with asymmetrical or leptokurtic behavior. We also demonstrate our model selectionÂ prowess on the well-known body fat data. Our findings suggest that when data are overly peakedÂ or skewed - both characteristics often seen in real data, ICOMP based on the sandwich covarianceÂ matrix should be used to drive model selection.

Downloads

Issue

Vol. 5 No. 2: (April 2012)

Section

Mathematical Statistics

License

Upon acceptance of an article by the European Journal of Pure and Applied Mathematics, the author(s) retain the copyright to the article. However, by submitting your work, you agree that the article will be published under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This license allows others to copy, distribute, and adapt your work, provided proper attribution is given to the original author(s) and source. However, the work cannot be used for commercial purposes.

By agreeing to this statement, you acknowledge that:

You retain full copyright over your work.
The European Journal of Pure and Applied Mathematics will publish your work under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
This license allows others to use and share your work for non-commercial purposes, provided they give appropriate credit to the original author(s) and source.

How to Cite

Misspecified Multivariate Regression Models Using the Genetic Algorithm and Information Complexity as the Fitness Function. (2012). European Journal of Pure and Applied Mathematics, 5(2), 211-249. https://www.ejpam.com/index.php/ejpam/article/view/1597

Download Citation

Misspecified Multivariate Regression Models Using the Genetic Algorithm and Information Complexity as the Fitness Function

Authors

Keywords:

Abstract

Downloads

Issue

Section

License

How to Cite

submit a manuscript

Information

right_block_image

affiliated_journal_block

formatting_package