From the course: Data Science Foundations: Data Assessment for Predictive Modeling
Unlock the full course today
Join today to access over 22,400 courses taught by industry experts or purchase this course individually.
What is dummy coding?
From the course: Data Science Foundations: Data Assessment for Predictive Modeling
What is dummy coding?
- [Instructor] What is dummy coding? Dummy coding is a special kind of data preparation that most software runs automatically on your categorical variables. That's both nominal and ordinal. We won't focus on the modeling aspect of it right now, just why you need to be aware of this while assessing your data. Take a nominal variable like marital status. We have four categories. These four categories will get converted into four new true or false variables. For instance, the new variable, marital, is true when the old variable, marital status, is equal to married. Why worry about this? Well, again, most software packages and analytics programming languages do this automatically during the modeling phase in your modeling algorithms. If you don't understand that, it's going to complicate interpreting the results. Also, when you go to deploy those same models, you need to make sure that you are creating these new variables…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
Reviewing basic concepts in the level of measurement3m 15s
-
What is dummy coding?2m 31s
-
Expanding our definition of level of measurement5m 44s
-
Taking an initial look at possible key variables2m 51s
-
Dealing with duplicate IDs and transactional data3m 49s
-
How many potential variables (columns) will I have?4m 53s
-
How to deal with high-order multiple nominals2m 30s
-
Challenge: Identifying the level of measurement1m 39s
-
Solution: Identifying the level of measurement3m 59s
-
-
-
-
-
-
-
-
-
-
-