From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP

From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Start my 1-month free trial

Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP

- [Instructor} For the purpose of this course, we will use the terms data assessment and data understanding interchangeably. But if data assessment is used in the title of the course, why risk confusing it with CRISP-DM term data understanding? Well, out of context, it's not always clear what data understanding means. More importantly, others have tried to improvise on the CRISP-DM theme, especially the diagram, but when they have, they've also introduced many competing terms. However, they always include something like the data understanding phase. Let's look at two of the more well known attempts to do this. I always thought IBM spin on the diagram was intriguing. IBM has come up with ASUM-DM, which is an acronym for Analytics Solutions Unified Method for Data Mining/Predictive Analytics. IBM makes it a bit tricky to read up on ASUM-DM, you have to register on the website. And it's evolved quite a bit over just the few years that it's been around. This blog post was written by an IBMer at around the time that ASUM-DM was first making its appearance. And I think that the diagram, when it first showed up, is the most intriguing. Here it is. Notice this interesting figure eight diagram, clearly influenced by CRISP-DM starts with business understanding, then you see data discovery and data wrangling. Very similar indeed. But here's the interesting part. The other half of the figure eight diagram is a completely different cycle. What IBM is identifying here and I've observed this to be true is that the modeling team and the deployment team are often kept separate. So again, quite intriguing. Of course, IBM isn't the only one to come up with modifications, there are several. The variation that is perhaps getting the most traction at the moment is Microsoft's Team Data Science Process. And here it is, let's take a look at their diagram. Here's the data science lifecycle. Again, clearly influenced by CRISP-DM starting with business understanding, and then going to a phase data acquisition and understanding then deployment and modeling. So notice the nature of the feedback loops is rather different. Now an important difference between this one and some others. It is heavily documented, and it's not difficult to access it. If you download the PDF from Microsoft's website, you'll find a 400 page document. So why so long? Well, because unlike CRISP-DM, it refers specifically to technology. And that takes up additional pages. So why stick with CRISP-DM? Well, polls of working data scientists on this subject are infrequent. But they consistently show that if there is a process that the data scientist is familiar with, it's probably CRISP-DM. They really did their homework when they wrote CRISP-DM as well. It was a three year effort. The fact that it is tool and technology neutral is why the document does not get out of date, and why it's not unwieldy. So I read the alternatives when they come out, and I find them interesting, but when working on a project myself, or with clients, I still recommend CRISP-DM.

Contents