Call + (44) 0843 289 4539

How to stop referer spam in Google Analytics

If you are actively monitoring your Google Analytics and notice that you seem to be attracting a number of visits from a number of different domains, but your Average Session Duration from these hits is 00:00:00.

It is then unfortunate in all likelihood your site, or rather your Google Analytics tracking code has attracted the attention of what is commonly known as Referer Spammers or Fake Traffic.

Introduction to Fake Traffic

Fake traffic is primarily defined as Fake Hits which are sent to your Google Analytics (GA) Property without actually ever visiting your site.

A ‘hit’ is defined as an user interaction with your website that normally would result in data being sent to your GA property. i.e. PageView, ScreenView, Event, Transaction etc.

A fake hit, is usually generated by a malicious program or Bot which instead of visiting your site and browsing, as actually previously crawled your site, extracted your Google Analytics ID.

This often results in spammers sending what is Fake Referal traffic, Fake Organic traffic or even fake social media traffic to your site. Ultimately faking events, PageViews, HostNames, Keywords etc.

One of the primary purposes of this type of spame, is to inflate, deflate or in some cases even completely delete all of your data from your GA Property. Effectively they could re-write your analytics data without actually gaining access to your GA Account.

This is a data security risk which many people are unaware.

Although Google has done alot of work to minimize the disruption caused by Fake Hits to Google Analytics, there is still some additional configuration required within your Google Analytics to help prevent this from happening.

In the example below, we have highlighted a typical example of what Fake Hits look like within your GA Property.

Fake Hits in Google Analytics

Who benefits from sending Fake Traffic ?

One of the most common reasons why Spammers make use of Referal Spam or Fake Traffic, is actually to generate traffic to their own sites or affiliated sites. Even Black Hat SEO professionals will make use of Referer Spam in order to bloat traffic stats to web properties.

A typical scenario, usually goes something along the lines – as highlighted in the image above – the spammer generates a number of hits to a website, using URL they themselves would like to generate traffic to i.e. somefakewebsite.com, so anyone monitoring their GA, will notice a sudden rise in traffic originating from a domain. This will usually peak the interest of the user, and they may actually browse to the website to explore further.

This will often result, in a spike of traffic generated to somefakewebsite.com, with ‘legitimate’ traffic. However, this is not always the end goal. In some cases, the unsuspecting visitor may actually be redirected to an Affiliate site and then duped into accepting a Cookie, which will then result in them to start receivin PPC adverts for a particular product or service, or even generating PPC impression counts for the Spammer.

In short, it is all about the financial gain for the spammer.

During both the US Presidential Campaign and the UK Brexit Vote, a lot of referer spam and one of it’s derivative’s Language Spam was used as a Propaganda tactic in order to entice/influence people to Vote for Trump etc.

Malicious Bots

We have previously posted, that as much 48% of all traffic on the internet is Bot Activity, which are typically software programs developed to perform repetitive tasks with a high degree of accuracy and speed.

Ironically, bots are also used by Search Engines like Google, Yahoo and Bing for crawling websites and indexing the contents of websites. They are also used for malicious purposes.

  • Commit Click Fraud
  • Harvest Email Addresses
  • Create Fake user accounts
  • Submit spam comments
  • Web Scraping
  • Malware
  • Scrape Google Analytics ID’s
  • Referer Spam

Common Misconceptions Referer Spam

If you have made any of the common mistakes below, then you should undo these changes.

  • Do not Use Referral Exclusion Lists to remove spam – this can be very inefficient
  • Don’t try handle spam individually, it’s extremely inefficient
  • Server-side solutions like WordPress Plugins or .htaccess will not help with Ghost Referals
  • Referer Spam does not affect or harm your SEO. Google Analytics is not used for rankings.



Evolution of Referer Spam

First Generation Referer spam

The first generation of Referer Spam, bots would actually visit your website, generating hits with a Fake Referer Header. These bots would crawl hundreds and thousands of websites everyday generating millions of HTTP Requests.

The fake referer header would contain the URL of thw website the spammer would want to promote or attempt to build back links.

Many Black Hat SEO’s made use of this tactic for exactly this purpose, using website domains within it’s Private Blog Network (PBN), a popular tactic.

This is what initially drove us to develop our popular free Stop Web Crawlers wordpress plugin to block referrer spam from your wordpress website.

Although there are still a number of these bots out in the wild, they are not as ‘popular’ as they once were and their use is gradually diminishing as many Hosting Providers and Security products are available which completely remove the threat.

It was also possible to protect your site from Referer spam by manually blocking them using .htaccess on the webserver

Current Generation

In this instance the  spam bots are not actually visiting your site,  or they have only done so in the past to get your  google analytics account number. Once they have this they will spoof visits to your site to pollute your analytics report with false visits.

Google Analytics provides you with a few utilities to help you eliminate the spam visits from your Google Analytics results. The only drawback that these cannot be applied retrospectively and will only start working from the day you implement them.

Automatic Bot Filtering

To do this you will need to log into your Google Analytics Account and gnavigate to the Admin -> View Settings.

Google Analytics Automatic Bot Filter

 

Enable Bot Filtering

Google Analytics Bot Filters

Valid Hostname filters

This is the most effective solution against referer spam, permanently stopping all ghost traffic.

This filter is based on something that you control, your Hostnames. So as long as you add all of them you don’t have to worry, you won’t exclude any real traffic.

Referer spammers attempt to abuse the measurement protocol a tool that allows sending data to GA directly for legitimate purposes. Therefore since the spammer bot doesn’t specifically know which website the are hitting, they always leave a fake hostname or an “undefined” hostname which will appear as not set in your reports.

Not Set in GA reports

If we use this logic to create a filter that will only traffic with valid hostnames through, all ‘ghost traffic’ will be automatically excluded.

This solution is much more efficient than the one commonly used exclusion filter, in addition it will work for any type referral, keyword, page, language, etc.

To create a Valid Hostname filter:

List of your hostnames:

To see a list of all the active hostnames you need go to the Network report in your Analytics:
Audience > Technology > Network

Change the primary dimension to Hostnames (blue text at the top of the report)

List of all the valid ones you find.

You should see at list one valid, which is your main domain, the rest will depend on the configuration of your site.

Build your hostname expression:

Once you have the list of all your hostnames, you should put all of them together separating them with a pipe “|” character like this:

yourdomain|hostname2|hostname3 etc.

Ensure you add all of them.

Create Filter

Go to the Admin tab, and select the view where you want to apply the filter.

Select Filters under the View column, and select + Add Filter

Enter as a name for the filter i.e. Include Valid Hostnames

Configure the filter:

Filter Type Custom > Include

Filter Field Hostname

In the Filter Pattern box enter your Hostname expression

Google Analytics Hostname Filter

Exclusion Filters

Although this technique is one of the most inefficient apporaches and you should use a Valid Hostname filter, we have included it in order to provide an alternative approach.

Google Analytics Create Filters

Create new Filter and Custom and select Custom and Exclude and Referral from the Filter Field. 

Manually enter each URL you want to filter out one by one, or you can do what I do and put them all into one string using a POSIX regular expression
We created to filter exclusions list you can use below

(ilovevitaly\.co|priceg\.com|blackhatworth\.com|econom\.co|hulfingtonpost\.com|semalt\.com|shopping\.ilovevitaly\.com
|.*\.darodar\.com|ilovevitaly\.com|iedit\.ilovevitaly\.com|cenoval\.ru|buttons-for-website\.com|bestwebsitesawards\.com|o-o-6-o-o\.com)
Google Analytics Bot Filter

Conclusion

Whether you are a blogger, a small local website, or a multinational company, filtering your data is crucial for the accuracy of your reports.

I’ve tried to cover the most important detail in this guide, however, if there is any part of the guide where you got stuck, or you would like further information let me know in the comments section below.

Follow Me

Gary Woodfine

Helps businesses by improving their technical proficiencies and eliminating waste from the software development pipelines.

A unique background as business owner, marketing, software development and business development ensures that he can offer the optimum business consultancy services across a wide spectrum of business challenges.
Follow Me
%d bloggers like this: