From the course: Cleaning Bad Data in R

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Common data problems

Common data problems

From the course: Cleaning Bad Data in R

Start my 1-month free trial

Common data problems

- [Instructor] Now that you have some exposure to the basic concepts of tidy data, let's spend some time talking about the common problems that appear in untidy datasets. The five categories are datasets where the column header contain values instead of variable names, datasets that store multiple variables in a single column, datasets that store variables in both rows and columns, datasets that store different types of observational units in the same table, and datasets where a single observational unit is spread across multiple tables. Let's take a look at each one of these problems in more detail. First, you know from the concepts of tidy data that columns should contain variables, therefore, it makes sense that the column header would contain a variable name. One common issue with datasets is that column headers might contain actual values instead of variable names. This has the end result of spreading what should be a single variable across multiple columns and makes it very hard…

Contents