Solved – Propensity Score Matching – Unbalanced Sample

I have a question regarding PSM. I'm just starting to dive into this topic but I reached a point where I think external help is necessary.

In my regression analysis (OLS), I have an independent variable, which is a dummy to disclose a annual report or not (1/0). I want to study the characterics of the treated group (1) on my dependent variable. I know that this dummy in my regression is not exogenouse since there are published papers analysing the determinants on this dummy, so I found PSM to be a helpful solution to this problem. Here comes my concern:

I have a rather small panel data set with 170 observations. Furthermore, it is unbalanced. 100 companies that do disclose (=1) and 70 that do not disclose (=0). Matching (stata command: psmmatch) now offers me a solution with 140 observations left. That means it matches a treated firm to every non-treated firm in my sample. Actually from how I understood PSM that is the wrong way around. In addition, with this procedure I loose 30 oberservations of my treated group, which potentially have an influence in the final regression. In my opinion this is a seriouse concern and PSM can't be used in this special case where I have more non-treated observation than treated.

I hope my thoughts were cleary expressed and somebody has a hint for me how I can proceed, what literature I can look at or if this is even a major problem.

A friend recommended Heckman procedure, which I think is not approriate since it only controls for unobservable characteristics. I my case I know the determinants on my dummy from previouse literature.

I am looking forward to your replies. Please do not hestitate if you need further information.

Kind regards


Yes, if you choose 1:1 matching ratio, you can only have 70 matched observations, or even less if you exclude some matched observations basing on the digit/caliper (because they were matched poorly).

For your concern on the sample loss, this paper discussed the potential impact

You may need to know the characteristics of the 30 excluded observations and to compare with the included 70 observations

Similar Posts:

Rate this post

Leave a Comment