From the course: Cleaning Bad Data in R

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Suspicious values

Suspicious values

From the course: Cleaning Bad Data in R

Start my 1-month free trial

Suspicious values

- [Instructor] As you perform data cleaning, there are some special suspicious values that you should watch out for. The presence of these values doesn't necessary mean that your data is incorrect, but if you see them in many places, you should view them with suspicion. I'm going to review a set of common suspicious values developed by the data science community. Many of these come from the Quartz guide to bad data, which has an excellent exploration of data cleaning issues. The first type of suspicious value stems from the way that computers store data. You probably know that computers store data in binary form, using a sequence of ones and zeros. Each digit in the binary number is called a bit. When you create a numeric variable, you allocate a defined number of bits to store that value. The number of bits that you allocate limits the largest number that you can store in that variable. For example, imagine that we have a two-bit variable. That allows us to have two digits, either…

Contents