From the course: Google Analytics: Spam Proofing

What spam looks like

- [Instructor] Before we get into spam-proofing your analytics, I want to show you what it is and what it looks like. The reason I'm so passionate about this topic is because I've had a lot of spam hit my personal site, and I wasted a lot of time tracking down false leads. So if we look at my site from the last year, we don't see anything too out of the ordinary, but I was actually being hit by thousands of spam bots. The first spammy thing I noticed was the language, and this is called language spam. What's important here is that almost six percent of my data is invalid. These visits right here were not actual traffic. They were sent via a spam bot. Now to show you how this happens, let's talk about how Google Analytics is supposed to work. A user will access your site. Your web server will send a webpage to that user, and included in that page is the Google Analytics JavaScript tracking code. The user's browser executes that code and sends that data directly to a Google Analytics server. So spam comes from two sources. The first is crawler spam, and that's when a bot actually visits your site, and they either accidentally or purposefully execute the tracking code and send data to Google Analytics. This is the easiest type of spam to understand because it's just like how a user would use your site. But this isn't actually the biggest source of spam. The vast majority of spam, and what we're going to focus on in this course is called ghost spam, and it's called ghost spam because it's not even there. They don't actually have to visit your site. They send fake data directly to Google Analytics. They might randomly generate your Google Analytics tracking code, and send fake traffic without even knowing who you are, or what your website is about. People try all sorts of things to combat spam. The problem is that most of the spam never touches your site, so if you try to block bots with an htaccess file, that won't work. We're going to cover filters within Google Analytics in this course. Those filters will block bad data no matter where it comes from. Language spam isn't the only thing that can happen. Spam can also send you fake referrals. I'll go to Acquisition, Traffic, and Channels. And on this page I was checking out how people browsed my site. And I was comparing this to the previous month. I noticed that social and referral had huge growth, 200%, from the month before. If I click on Referral, I'll see Lifehacker at the top. Zero visits from the month before, and 369 in that month. Now I'm a huge fan of Lifehacker, so when I initially saw this, I was ecstatic. I thought one of my articles was picked up by their site. Amazing. Except, this isn't actually Lifehacker. Notice this K right here. It's technically different to the K that we know. So this is the K that we're all familiar with. This is a special different type of K that looks very similar but is technically a different character, which means you can create a domain with that character. If we hit that domain, it's very likely a spam site. The other one that you'll often see up here is google, with a lowercase capital G, similar to this K. Now if we scroll back up, we can see that there is a huge spike in traffic. This is quite unusual. If you see huge spikes like this, it's either that you got mentioned on a massive site, or it's likely spam. Now in this case, the spam was basically over two days. If you see a huge increase over two days, and then it completely goes away, that's likely spam. If you see huge growth and a gradual decline, that's more likely to be legitimate traffic. I want to show you something else that tripped me up. If I go back to Channels, and then Social, it looks like Reddit linked to me. I went from one visit the previous month to over 600 this month. My first guess would be that someone mentioned my site on a popular subreddit, and we're internet famous. But of course, since you're watching this course, you know that it's spam. Let me increase the date range and I'll show you just how much traffic I got from this. So if I extend this into the middle of December, we can see that it's 900 visits. If we go to Referrals over here on the side, we can see exactly where it came from. I'll click into Reddit, and we can go to this exact article. Now Reddit has already locked this down. This was several months ago, but there was a link and that lead to a spammy website. So this is slightly different than the example before because this is a legitimate website with a spam link embedded into it. So there's a lot of bad things about spam. You could accidentally go to a fake website, you could go to a legitimate website with a spammy link, and most importantly, you could have hundreds or thousands of incorrect data points. Just right here, I have 900 incorrect data points on landing pages, exit pages, languages, and incorrect activity on the site. Throughout this course, we're gonna focus on preventing spam from hitting your site in the first place.

Contents