Solved – How to justify using “with replacement” in propensity score matching

I am trying to understand how it is justified to use "with replacement" methods in propensity score matching. In the literature there are many statements like this:

However, situations arise when there are not enough controls in the
overlapping region to fully provide one match per treated unit. In
this case it can help to use some control observations as matches for
more than one treated unit. This approach is often called matching
with replacement, a term which commonly refers to with one-to-one
matching but could generalize to multiple control matches for each
control. Such strategies can create better balance, which should yield
estimates that are closer to the truth on average
. Once such data are
incorporated into a regression, however, the multiple matches reduce
to single data points, which suggests that matching with replacement
has limitations as a general strategy.

  1. How is the bold sentence above proven? I thought we were trying to find the truth, not cook the books that give us what we want.
  2. How, even philosophically, can this possibly be justified? If this is real data, can you go around making clones and claiming the new set means anything? With replacement methods in Monte Carlo world make sense; we are taking limits and we want to preserve uniformity in our sample space – I can't see how that works here.

Any insight would be appreciated.

You're not creating clones in matching with replacement. You're matching one control to several treated units. You don't artificially double the size of your control group if you match each control to two treated units. The method of estimating the standard error for your treatment effect takes the reuse of control units into account.

Imagine the parallel scenario of propensity score weighting. When you up-weight individuals, you don't artificially increase the size of your sample; the standard error for your estimate accommodates the weights so you don't create something out of nothing. In the matching scenario, you can think of it as giving each control unit a weight of 2 (if they are matched to two treated units), but again, the standard error accounts for this weighting and does not allow for your sample size to be artificially inflated.

Similar Posts:

Rate this post

Leave a Comment