From the course: Cleaning Bad Data in R

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Screening for outliers

Screening for outliers

From the course: Cleaning Bad Data in R

Start my 1-month free trial

Screening for outliers

- [Instructor] Data scientists often run into datasets that contain values that well outside of the norm. These outliers can present special challenges for data analysis and it's important to understand what they mean when they're present in your datasets. Outliers are data points that lie far outside the norm, and they may occur in two cases. First, outliers may indicate some type of error in the dataset. Someone may have measured the data incorrectly in the first place, incorrectly input it into a system, or performed a calculation improperly. For example, imagine that you're looking at a dataset consisting of temperatures from weather stations in New York and find that there is a single data point recording a temperature of 212 degrees Fahrenheit. This is clearly some type of error. Perhaps the thermometer failed, maybe somebody wrote down the temperature incorrectly, or it could be that the thermometer was misplaced inside of an oven. Whatever the cause, this is clearly an invalid…

Contents