From the course: Data, Economic Modeling, and Forecasting with Stata

Standard deviation and SE - Stata Tutorial

From the course: Data, Economic Modeling, and Forecasting with Stata

Start my 1-month free trial

Standard deviation and SE

- [Jason] You'll notice here that there's the standard deviation in these summarize commands and a standard error in the mean commands. Well, what does that mean? First of all, the standard deviation is not the standard error. They are not the same thing. The standard deviation shows a distribution of data around the mean. On the left, you can see the mean with the distribution of readings that are not too far away. On the right, you see a mean, but the distribution of the readings are much further. The one on the left has a lower standard deviation. The one on the right has a higher standard deviation because the deviation of the reads are much bigger than they are on the left. You should also know that the standard deviation is represented by Sigma and there's an equation with it. We're not going to talk any more beyond that but that's the equation for the standard deviation. The standard error which is sometimes called the standard error of the mean, shows how far your sample mean is likely to be from the population. As the sample size increases, the standard error narrows around the mean. And I'll explain all that in just a second. But most importantly, equation-wise the standard error which is sometimes written out as the standard error of the mean or SEM, which is also sometimes shown as the Sigma of X-bar, is this equation here. Essentially you're taking the standard deviation and putting it over the square root of the number of readings in your sample. This is important because it indicates how the precision of a model can improve with more data. On the left, we see a mean, but there's not too many readings. This is going to have a much higher standard error then on the right where there's the mean and a lot more readings. There are a few different examples of this. Like if you wanted to find out the average education level in the United States, you probably wouldn't survey four people, you'd want to survey millions of people. And if you wanted to know people's income levels, you do the same. You want as many numbers as possible in order to reduce the error, in order to make sure that the mean you derive is actually what the mean is supposed to be. This is why you take the standard deviation and put it over the square root of N, which is the number of readings. The more responses and readings you have in your data, the smaller the standard error will be. The bottom number gets bigger here. And as this gets bigger, that standard error of the mean goes down. And whether we're talking about things like income level or education or anything else where a bigger population answer will get you closer to the real mean, that's where the standard error is the smallest. This is also why, as we're looking at aluminum prices, we wanted to have a pretty robust data series. And 160 months of aluminum price and manufacturing data's pretty good. We're probably going to have a much lower standard error than if we were to have had, I don't know, four months of aluminum prices, or even 24 months of aluminum prices in manufacturing data. In this case, we see how more data, more good data is helpful for reducing the standard error. The big, important takeaways here are that standard deviation and standard error are different. Standard error is smaller than standard deviation because you're putting standard deviation over that square root of n. The standard deviation reflects a range of values and how wide apart they are, whereas the standard error reflects variation around the mean and how accurate that mean is likely to be. As we look at the data, for example, for that sum PMI number and we see the standard error, it is no surprise, much smaller than the sum PMI standard deviation. It's a lot smaller. The standard deviation is 11.37. The standard error is less than one. This is a good sign that we have a solid sample size, and that standard error is just much, much smaller. This is why whenever you're doing analysis, one of the best things you can do is make sure you have a large enough sample size and enough good data to do your analysis.

Contents