From the course: Python Data Science Mistakes to Avoid

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Not addressing outliers

Not addressing outliers - Python Tutorial

From the course: Python Data Science Mistakes to Avoid

Start my 1-month free trial

Not addressing outliers

- [Instructor] A mistake that data scientists should avoid is not addressing outliers in data. For example, let's say that I have a variable named Airbnb containing pandas DataFrame that consists of information about Airbnb listings in New York City from 2019, and I want to eventually build a model that predicts the prices of future Airbnb listings in New York City based on this data. I've displayed the first few rows of the dataset here. And I've created a visualization that displays the distribution of listing prices here. From the state of visualization, it appears that the majority of listing prices are under 1,000, and listing prices over 1,000 seem to be outliers. If I do not address the outliers in the data, the data may not be an accurate representation of listing prices, and when I go on to build a predictive model based on this data, my model may not make good predictions. So I need to address these outliers.…

Contents