From the course: Cleaning Bad Data in R

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Aggregations in the data set

Aggregations in the data set

From the course: Cleaning Bad Data in R

Start my 1-month free trial

Aggregations in the data set

- [Instructor] Another issue that can impact the integrity of your dataset is when your source data includes pre-computed aggregations. This situation happens very often when you're dealing with census data. Let's take a look at an example. I'm going to run the code that I have loaded here to load the tidyverse, change my working directory, and then load a data file that includes information about the population of the city of Carpinteria in California. Let's go ahead now and take a look at the dataset that we loaded. I'm going to use the glimpse function to do this. It looks like each row contains information about a type of person who might be in the city, and the number of people in that category. I could go ahead and compute the total population by simply adding up all the rows using the sum function. I'll just use sum, across the Carpinteria data frame, and the population variable in that data frame. When I do that, I get a result of 40,659. Now, I've been to Carpinteria many…

Contents