To detect and exclude outliers in a Python Pandas DataFrame, we can use the SciPy stats
object.
For instance, we write
df = pd.DataFrame(np.random.randn(100, 3))
from scipy import stats
df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]
to create the df
dataframe with some random values created from NumPy.
Then we caLL np.abs
with stats.zscore
to return the values with z-score less than 3.
And we put that in df[]
to return the values that matches the condition.