Menu Close

How to remove duplicates by columns A, keeping the row with the highest value in column B with Python Pandas?

Sometimes, we want to remove duplicates by columns A, keeping the row with the highest value in column B with Python Pandas.

In this article, we’ll look at how to remove duplicates by columns A, keeping the row with the highest value in column B with Python Pandas.

How to remove duplicates by columns A, keeping the row with the highest value in column B with Python Pandas?

To remove duplicates by columns A, keeping the row with the highest value in column B with Python Pandas, we an use the drop_duplicates method.

For instance, we write

df.drop_duplicates(subset='A', keep="last")

to call drop_duplicates on the df data frame with the subset argyments to remove the items in A, while keeping the last values by setting keep to 'last‘.

Conclusion

To remove duplicates by columns A, keeping the row with the highest value in column B with Python Pandas, we an use the drop_duplicates method.

Posted in Python, Python Answers