Menu Close

How to detect and exclude outliers in a Python Pandas DataFrame?

To detect and exclude outliers in a Python Pandas DataFrame, we can use the SciPy stats object.

For instance, we write

df = pd.DataFrame(np.random.randn(100, 3))

from scipy import stats
df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]

to create the df dataframe with some random values created from NumPy.

Then we caLL np.abs with stats.zscore to return the values with z-score less than 3.

And we put that in df[] to return the values that matches the condition.

Posted in Python, Python Answers