From the course: Supervised Learning Essential Training

Defining logistic and linear regression - Python Tutorial

From the course: Supervised Learning Essential Training

Start my 1-month free trial

Defining logistic and linear regression

- [Instructor] Let's imagine we just launched a hot new e-commerce store and we're comparing sales to our digital marketing costs. As part of our marketing campaign, we've pushed Facebook, Instagram, and Twitter ads and are starting to see sales come through. If we plot our monthly sales on one axis and our monthly ad spend on another, we should be able to fit a straight line or function through the data that represents the relationship between our ad spend and sales. This common business problem outlines linear regression well where the dependent variable, the number of sales, changes when the independent variable, marketing spend, is manipulated. Linear regression is used to model continuous values like, in this case, number of sales. Other examples are someone's height, the weight of your pet, or the time it takes a race car to make a loop around a track. Really, linear regression is just a math function that identifies the relationship between one dependent variable, like the likelihood of survival for a patient, and one or more independent variables, like the patient's vital information. Independent variables, like vital sign information, are unchanged by other variables, whereas the dependent variable, the survival outcome, depends on the other variables, at least within the model. Linear regression is a simple statistical model that requires data to meet specific assumptions. It assumes that there's a linear relationship between the dependent variable and at least one independent variable. Both independent and dependent variables should be continuous, so numeric. If this formula for linear regression looks familiar, it's just the slope formula. You may have heard of it in math course. If it doesn't look familiar, that's okay, too. For problems where there are two or more independent variables that influence a dependent variable, we can use multiple linear regression. Both linear and logistic regression revolve around the topic of decision boundaries. For linear regression, if we plot our data on a 2D graph, what our models are trying to do is draw a line that fits our linear data perfectly so new values can be easily predicted on this line. For example, if we plot the number of umbrellas sold by the frequency of raining days, we should observe a linear relationship. However, with logistic regression, we're not using a straight line to draw our boundaries. We're using a sigmoid function. The sigmoid function looks just like an S-curve between zero and one. Since our data has two types of observations, it's easier to compute a sigmoid function that divides the two features by predicting values between zero and one. So when you're thinking about how to solve an ML problem that can use regression, be sure to differentiate whether you need to use linear or logistic regression. Both of these techniques have assumptions around them that we need to check for. Let's explore those next.

Contents