Is it valid (reliable etc.) to compare a small sample with a large sample (which is made up of several small samples) when both samples are from the same population?

There are 3 firms. Each firm specialises in the manufacture of a particular product, hence it is classified in that way.

Firm A manufactures common cold medicines, Firm B manufactures arthritis medicines and Firm C manufactures heartburn medicines.

The common element linking the three firms is that they are all medicine manufacturers.

Now, Firm A and C only produce generic medicines and Firm B only produces branded medicines.

I have a sample of 80 workers from each firm and I want to test how satisfied they are in regard to their work-life balance. I am using correlation analysis etc.

I am comparing Firm A with B and C but I also want to compare Firm A and C with B. (Firm A and C are grouped together because they are generic medicine providers).

Is comparing a sample of 80 peoples (Firm B) with a sample of 160 people (Firm A and C) any reason for statistical concern?

**Contents**hide

#### Best Answer

Assuming a normal distribution of your outcomes (the work-life balance score?), standard error, confidence intervals and correlation significance levels can change with the size of the sample population (company A, B, C).

For example, if you are using a Pearson correlation to test the relationship between a work-balance score and some other variables, one rule of thumb is that if the correlation coefficients are less than 1.96 / sqrt(popSize) you can not reject the null hypothesis, meaning the correlation coefficients are not significant. There are more accurate techniques to check for significance but the short answer is that the size does have an impact on your analysis, especially for the population sizes you indicated.

### Similar Posts:

- Solved – Aggregate vs. firm-level regressions – How to the regression coefficients differ that much
- Solved – Compare RMSE for the same model but varying sample size
- Solved – Using percentiles as predictors – good idea
- Solved – What model to use? Heckman-Two-Stage? Tobit? OLS
- Solved – Does # of observations in each cluster matter for cluster-robust standard errors