Solved – Small sample linear regression: Where to start

FULL DISCLOSURE: This is homework.

I have been provided with a small data set (n=21) the data are messy, looking at it in a scatterplot matrix provides me with little to no insight. I've been provided with 8 variables that are metrics created from a longditudinal study (BI, CONS, CL, CR, …, VOBI). The other measurements are of mutual fund sales, returns, asset levels, market share, share of sales, and proportion of sales to assets

alt text

Correlations, are everywhere.

               BI       CONS           CL          CR         QT        COM        CONV        VOBI          s            r           a          ms         ss       share      share2 BI      1.0000000  0.7620445  0.639830594  0.70384322  0.7741463  0.8451500  0.84704440  0.85003686  0.2106773 -0.238431047  0.36184548  0.40007830  0.4076563  0.31643802 -0.28283564 CONS    0.7620445  1.0000000  0.933595967  0.96979599  0.9892533  0.9069803  0.96781703  0.93416972  0.2316209 -0.074351798  0.31952292  0.40259511  0.4442877  0.24783884 -0.14788906 CL      0.6398306  0.9335960  1.000000000  0.88297431  0.8993748  0.8133169  0.89922684  0.81132166  0.1200420 -0.001107093  0.22132116  0.26729067  0.3033221  0.07650924 -0.25595278 CR      0.7038432  0.9697960  0.882974312  1.00000000  0.9788150  0.8965754  0.92335363  0.90848199  0.2934774 -0.119340914  0.35973640  0.46409570  0.5012178  0.32832247 -0.09005985 QT      0.7741463  0.9892533  0.899374782  0.97881497  1.0000000  0.9216887  0.95458369  0.94848419  0.2826278 -0.108430256  0.35520090  0.43290221  0.4823314  0.31761015 -0.12903075 COM     0.8451500  0.9069803  0.813316918  0.89657544  0.9216887  1.0000000  0.90302002  0.89682825  0.4305866 -0.255581594  0.50724121  0.55718441  0.5773171  0.40378679 -0.12085524 CONV    0.8470444  0.9678170  0.899226843  0.92335363  0.9545837  0.9030200  1.00000000  0.96097892  0.1993837 -0.065237725  0.32010735  0.41843335  0.4531298  0.28873934 -0.19668858 VOBI    0.8500369  0.9341697  0.811321664  0.90848199  0.9484842  0.8968283  0.96097892  1.00000000  0.2424889 -0.087126942  0.30390489  0.40390750  0.4845432  0.36588655 -0.07137107 s       0.2106773  0.2316209  0.120041993  0.29347742  0.2826278  0.4305866  0.19938371  0.24248894  1.0000000 -0.173034217  0.91766914  0.84673519  0.8596887  0.61299987  0.32072790 r      -0.2384310 -0.0743518 -0.001107093 -0.11934091 -0.1084303 -0.2555816 -0.06523773 -0.08712694 -0.1730342  1.000000000 -0.22512978 -0.18337773 -0.1030943 -0.17650579  0.51768144 a       0.3618455  0.3195229  0.221321163  0.35973640  0.3552009  0.5072412  0.32010735  0.30390489  0.9176691 -0.225129778  1.00000000  0.92445370  0.8656139  0.63049461  0.03876774 ms      0.4000783  0.4025951  0.267290668  0.46409570  0.4329022  0.5571844  0.41843335  0.40390750  0.8467352 -0.183377734  0.92445370  1.00000000  0.9572730  0.77582501  0.08435813 ss      0.4076563  0.4442877  0.303322147  0.50121775  0.4823314  0.5773171  0.45312978  0.48454322  0.8596887 -0.103094325  0.86561394  0.95727301  1.0000000  0.83931302  0.24371447 share   0.3164380  0.2478388  0.076509240  0.32832247  0.3176102  0.4037868  0.28873934  0.36588655  0.6129999 -0.176505786  0.63049461  0.77582501  0.8393130  1.00000000  0.20313930 share2 -0.2828356 -0.1478891 -0.255952782 -0.09005985 -0.1290307 -0.1208552 -0.19668858 -0.07137107  0.3207279  0.517681444  0.03876774  0.08435813  0.2437145  0.20313930  1.00000000  

Now, I've tried running a number of "tests", for example:

summary.lm(share2 ~ BI + ...)  

However, none of them provide any reasonable result (mostly negative adjusted R^2).

I'm wondering, if you had data where it looked like there was no relationships (linear at least).

What would your next steps be?

P.S: I did try a number of model formulas that contained interaction effects and received much better results (R^2 Ra^2 > 80% and significant f-tests) but not all the interaction effects where significant.

I'd probably take a look at a ridge regression or, better, the lasso. These techniques are often used when there is multicollinearity. There are several options for doing this in R: See the Regularized and Shrinkage Methods section of the Machine Learning & Statistical Learning Task View on CRAN.

You don't have enough data to start thinking about some of the techniques listed in other sections of that Task View.

Similar Posts:

Rate this post

Leave a Comment