Correlated Component PLS-type Regression with Variable Selection Features for Large Datasets
The analysis of landscape matrices, i.e. matrices having more columns (variables) than rows (observations), is a challenging task in several domains. Two different kinds of problems arise when dealing with landscape matrices. The first refers to computational and numerical problems. The second deals with the difficulty in assessing and understanding the results. Partial Least Squares (PLS) methods are classical feature extraction tools that work in the case of high-dimensional data sets. Since PLS methods do not require matrices inversion or diagonalization, they allow us to solve computational problems. However, results interpretation is still a hard problem when facing with very high-dimensional data sets. Nowadays interest is increasing in developing new PLS methods able to be, at the same time, a feature extraction tool and a feature selection method (i.e. a variable selection method). Here a new PLS-type algorithm including variable selection and correlated components will be presented. The use of correlated components instead of orthogonal components allows us to take into account so called suppressor variables, i.e. variables having no direct effect on the response variables but improving prediction by suppressing irrelevant variation in the lower-order components. This is of main importance in order to obtain predictive variable selection.
TRINCHERA, L., ESPOSITO VINZI, V., TENENHAUS, A. and TENENHAUS, M. (2010). Correlated Component PLS-type Regression with Variable Selection Features for Large Datasets. In: Computing & Statistics (ERCIM'10).