From the course: Cleaning Bad Data in R

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Outliers in subgroups

Outliers in subgroups

From the course: Cleaning Bad Data in R

Start my 1-month free trial

Outliers in subgroups

- [Instructor] In addition to straightforward outlier detection, you should also examine your data set for outliers that might appear in subsets of your data. This is another case where applying domain knowledge is quite helpful. Consider as an example, a data set containing test scores for students in an elementary school that were administered a grade level standardized test. I've provided the code here to load that data file. Let's go ahead and load the tidyverse, set our working directory, and then read in the tests data set. I'm going to start by looking at some summary statistics. I see that I have a student identifier that's an integer value, an age that's a numeric value, a grade level, and a test score. And one thing that jumps out at me right away is that the ages in this data set range from five to 39. Now that sounds suspicious for an elementary school. Let's dig into that variable more by looking at a box plot. Now there certainly shouldn't be students in elementary…

Contents