I have a data set in which I am trying to find outliers. I am using python libraries to get the Z-score value using below code :

`df['z_score']=stats.zscore(df[column_Name]) new_df=df.loc[df['z_score'].abs()>3] `

Now the problem is that I get a good percent of my sample data which is having Z-Score > 3 or <-3. And due to which I cant drop it.

So, I checked the Z-Scores for all these columns and rows. The value of Z-Score is ranging from -17 to +20. Is it normal to get so high values of Z-Scores. And what does it shows about my data?

And in this case, how should I proceed, clearly I cant have Z-Score compared with 3. So, how do we do this in real world.

I am new to data science, I googled but did not find much help regarding this. So any leads will be appreciated.

Also, I am not able to understand this range of -5 to 10 which gets displayed at the bottom of box plot. If I look at that, it looks like the data beyond this value of -5 to 10 is my outlier.

**Contents**hide

#### Best Answer

This is totally fine. It might be inconvenient, but it doesn’t meant that there’s something wrong with the data.

What it means is that your data set is more prone to extreme observations than a normal distribution with the same variance. For a norma distribution, you have about a $0.06%$ chance of getting an observation with a z-score of magnitude greater than $3$, and it’s extraordinarily unusual to observe z-scores with magnitudes like $17$ and $20$.

So you don’t have a normal distribution.

This is related to a quantity called *kurtosis*, which quantifies the propensity of a distribution to have extreme values. Every normal distribution has a kurtosis of $3$. If you stick your data into R and call kurtosis in the moments package, I would expect you to get quite a bit higher value than 3. The Python implementation, since you’re into Python, is scipy.stats.kurtosis, though I think scipy subtracts 3 to give you the so-called *excess kurtosis*.