From the course: Machine Learning with ML.NET

Build an ML model for GitHub issue classification - .NET Tutorial

From the course: Machine Learning with ML.NET

Build an ML model for GitHub issue classification

- In this video, we will look at how to build a custom machine learning model for GitHub issue classification. Let's look an example of what is GitHub issue classification. So imagine that you are an owner of a GitHub repo where you have your source code over there and users are entering issues, a common problem is what tags to apply on those issues that users are opening. And so in this case we'll build a custom machine learning model for classifying issues into different tags based on what was the title of the issue and what was the description. And these two inputs are known as features which are going to be used to train the machine learning model. The output or the column to be predicted is going to be the label or the tag that you want to apply for this particular issue. This is a common problem that lots of developers have, where they have lots of repos and it's hard to type them. So we'll build a custom machine learning model for this scenario. GitHub issue classification is an example of a multi-class classification machine learning task where you can classify an issue into one of the many categories. In this case, you want to classify this issue into A, B or C, which means that you want to classify this bug whether it belongs to System.Web, ASP.NET or secure key or infrastructure performance. So you can choose one of the many labels that you want to apply this particular issue to. So let's look at how we can build a custom machine learning model for GitHub issue classification. I'm going to switch over to my application which is a console app and I'm going to build a custom machine learning model. I have already downloaded Model Builder which is a UI tool to build machine learning models. And I'm going to launch it by right clicking at machine learning. I can choose from any of the scenarios. So in this case, since I'm doing issue classification where I classify data into three plus categories, so I'm going to choose this template. The second step in the machine learning process is around loading your data and choosing what column you want to predict. And so in this case, I can choose file or SQL Server and I can upload a file, which is my training data set. So what I've done is I've exported all my issues from GitHub where I've taken their area title description, and I'm going to use that to train my machine learning model. So you can see some data where, you know, given a title, given a description and belongs to area-System.Net, area infrastructure. And so this is the column that I'll use to predict, as well choose area as the column to predict all the output column or the label column which is going to be result of calling this machine learning model on. Now that I've loaded the data and chosen my column to predict, the next step in this process is to train the model. The longer I train the model for the better model I will get. So in this case, I'm going to choose 100 seconds and start training. And what happens here is a Model Builder uses automated machine learning underneath the covers. An automated machine learning explorers different models with various settings and it evaluates the performance of each of these models to give me the best model for my scenario. So as a developer, I don't have to worry about learning the intrinsics of machine learning and how to fine tune all performance. So let's see what kind of performance do we get for this scenario. So far automated machine learning has found a best model average perceptron ova with the accuracy of a 71%. When we see the state of finalizing model that means that model training has finished, and now the model is being generated. Once it finishes, the training process has completed. And I can go ahead and evaluate the model. In this case, I get a summary of what was the machine learning task, what was my column to predict and the best model, and then best accuracy and other models explored. So in this case, auto ML was only able to find one model within the specified timeframe. So it's only one model. But if I were to increase the timeframe, then you'll find other models as well. The last step in the machine learning workflow is around consuming the model. So here I can click on the add projects button to add projects for consuming the model, and also I can learn how the model was trained. And so in this case, ModelBuilder.cs has the code for how the model was actually trained. I'm going to add a reference to the class library to consume this model in my console app. So let's go ahead and add a reference to the class library. This allows me to use the clauses from that project and I'm going to use this code to consume the model in my application. And I'm going to add it to program CS. Since I added the references to the class library, I'm going to add these using statements to consume the classes from the library. And then I'm going to add my input data which is my input.title to figure out what title, what category does this belong to. So this is ASP.NET, and I can say input.description. So this is my description. And then I'm going to output the results of calling the model on my sample dataset. So I'm going to say result.prediction and then I'm going to run this application. And what happens when you run this application is the ML context gets loaded. Your model gets loaded. The prediction engine gets created, and then you can call the predict function on your sample data. So in this case for my sample input of this is ASP.NET, the machine learning model predicted the output labeled to be area-system.data. So I can quickly sort of use this to play around with other input types, like title and description to figure out what category do they belong to. In this video we looked at how to build a custom machine learning model for GitHub issue classification. And in the next video, we will look at how to predict taxi fares.

Contents