In all contexts I am familiar with cross-validation it is solely used with the goal of increasing predictive accuracy. Can the logic of cross validation be extended in estimating the unbiased relationships between variables?

While this paper by Richard Berk demonstrates the use of a hold out sample for parameter selection in the "final" regression model (and demonstrates why step-wise parameter selection is not a good idea), I still don't see how that exactly ensures unbiased estimates of the effect X has on Y any more so than choosing a model based on logic and prior knowledge of the subject.

I ask that people cite examples in which one used a hold-out sample to aid in causal inference, or general essays that may help my understanding. I also don't doubt my conception of cross validation is naive, and so if it is say so. It seems offhand the use of a hold out sample would be amenable to causal inference, but I do not know of any work that does this or how they would do this.

Citation for the Berk Paper:

Statistical Inference After Model Selection

by: Richard Berk, Lawrence Brown, Linda Zhao

Journal of Quantitative Criminology, Vol. 26, No. 2. (1 June 2010), pp. 217-236.

PDF version here

This question on exploratory data analysis in small sample studies by chl prompted this question.

**Contents**hide

#### Best Answer

I think it's useful to review what we know about cross-validation. Statistical results around CV fall into two classes: efficiency and consistency.

Efficiency is what we're usually concerned with when building predictive models. The idea is that we use CV to determine a model with asymtptotic guarantees concerning the loss function. The most famous result here is due to Stone 1977 and shows that LOO CV is asymptotically equivalent to AIC. But, Brett provides a good example where you can find a predictive model which doesn't inform you on the causal mechanism.

Consistency is what we're concerned with if our goal is to find the "true" model. The idea is that we use CV to determine a model with asymptotic guarantees that, given that our model space includes the true model, we'll discover it with a large enough sample. The most famous result here is due to Shao 1993 concerning linear models, but as he states in his abstract, his "shocking discovery" is opposite of the result for LOO. For linear models, you can achieve consistency using LKO CV as long as $k/n rightarrow 1$ as $n rightarrow infty$. Beyond linear mdoels, it's harder to derive statistical results.

But suppose you can meet the consistency criteria and your CV procedure leads to the true model: $Y = beta X + e$. What have we learned about the causal mechanism? We simply know that there's a well defined correlation between $Y$ and $X$, which doesn't say much about causal claims. From a traditional perspective, you need to bring in experimental design with the mechanism of control/manipulation to make causal claims. From the perspective of Judea Pearl's framework, you can bake causal assumptions into a structural model and use the probability based calculus of counterfactuals to derive some claims, but you'll need to satisfy certain properties.

Perhaps you could say that CV can help with causal inference by identifying the true model (provided you can satisfy consistency criteria!). But it only gets you so far; CV by itself isn't doing any of the work in either framework of causal inference.

If you're interested further in what we can say with cross-validation, I would recommend Shao 1997 over the widely cited 1993 paper:

- An Asymptotic Theory for Linear Model Selection (Shao, 1997)

You can skim through the major results, but it's interesting to read the discussion that follows. I thought the comments by Rao & Tibshirani, and by Stone, were particularly insightful. But note that while they discuss consistency, no claims are ever made regarding causality.

### Similar Posts:

- Solved – Can cross validation be used for causal inference
- Solved – Can cross validation be used for causal inference
- Solved – Does the Heckman correction with an exclusion restriction provide causal inference
- Solved – Feature selection in the training set
- Solved – Model selection: before or after nested cross-validation