From the course: Data Science Foundations: Data Assessment for Predictive Modeling
Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP
From the course: Data Science Foundations: Data Assessment for Predictive Modeling
Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP
- [Instructor} For the purpose of this course, we will use the terms data assessment and data understanding interchangeably. But if data assessment is used in the title of the course, why risk confusing it with CRISP-DM term data understanding? Well, out of context, it's not always clear what data understanding means. More importantly, others have tried to improvise on the CRISP-DM theme, especially the diagram, but when they have, they've also introduced many competing terms. However, they always include something like the data understanding phase. Let's look at two of the more well known attempts to do this. I always thought IBM spin on the diagram was intriguing. IBM has come up with ASUM-DM, which is an acronym for Analytics Solutions Unified Method for Data Mining/Predictive Analytics. IBM makes it a bit tricky to read up on ASUM-DM, you have to register on the website. And it's evolved quite a bit over just the few years that it's been around. This blog post was written by an IBMer at around the time that ASUM-DM was first making its appearance. And I think that the diagram, when it first showed up, is the most intriguing. Here it is. Notice this interesting figure eight diagram, clearly influenced by CRISP-DM starts with business understanding, then you see data discovery and data wrangling. Very similar indeed. But here's the interesting part. The other half of the figure eight diagram is a completely different cycle. What IBM is identifying here and I've observed this to be true is that the modeling team and the deployment team are often kept separate. So again, quite intriguing. Of course, IBM isn't the only one to come up with modifications, there are several. The variation that is perhaps getting the most traction at the moment is Microsoft's Team Data Science Process. And here it is, let's take a look at their diagram. Here's the data science lifecycle. Again, clearly influenced by CRISP-DM starting with business understanding, and then going to a phase data acquisition and understanding then deployment and modeling. So notice the nature of the feedback loops is rather different. Now an important difference between this one and some others. It is heavily documented, and it's not difficult to access it. If you download the PDF from Microsoft's website, you'll find a 400 page document. So why so long? Well, because unlike CRISP-DM, it refers specifically to technology. And that takes up additional pages. So why stick with CRISP-DM? Well, polls of working data scientists on this subject are infrequent. But they consistently show that if there is a process that the data scientist is familiar with, it's probably CRISP-DM. They really did their homework when they wrote CRISP-DM as well. It was a three year effort. The fact that it is tool and technology neutral is why the document does not get out of date, and why it's not unwieldy. So I read the alternatives when they come out, and I find them interesting, but when working on a project myself, or with clients, I still recommend CRISP-DM.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
(Locked)
Clarifying how data understanding differs from data visualization3m 13s
-
Introducing the critical data understanding phase of CRISP-DM3m 59s
-
Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP3m 55s
-
(Locked)
Navigating the transition from business understanding to data understanding4m 6s
-
(Locked)
How to organize your work with the four data understanding tasks3m 42s
-
(Locked)
-
-
-
-
-
-
-
-
-
-
-
-