Universal Analytics for Search Bots

Update: The library is now open-sourced as Google Universal Analytics for Bots on GitHub.

A few years ago I created PHP-based Google Analytics tracking script that was specifically focused on tracking bots, especially search bots. The JavaScript-based GA tracking codes require a user-agent to handle JavaScript (duh) and handle cookies, which most bots won’t do.

I wanted to look at search bot activity to understand the crawling trends of any given bot over a large period of time. Also, I wanted to know if other bots (like somebody using Xenu Link Sleuth) was occurring so I can get insight on how my sites are being pinged/scraped.

My old GA for Search Bots script was fine, but it was using a very unsupported method of collecting and sending data to GA, so I knew it was a viable long-term solution. Fortunately, Google Analytics unveiled Universal Analytics with a fully supported measurement protocol that we can use in server-side scenarios. I wanted to redo my code to use the measurement protocol so I can have some more assurance that this code will keep working in the future, as well as give me a better way to enhance the script with supported code.

Today I unveil the upgrade to my script… Universal Analytics for Search Bots!

How to Setup UA for Search Bots

  1. Create a new ‘bots only’ Web Property in your Google Analytics account using Universal Analytics. Remember to grab your new Web Property ID (i.e. UA-XXXXXX-YY)
  2. Download the ‘UA for Search Bots’ code library
  3. Unzip and place the ‘/ua-searchbots/’ folder on your website (example: www.domain.com/ua-searchbots/)
  4. Copy the UA for Search Bots Tracking Code found in sample.php and place it in your PHP source code (example: in your common ‘header’ include file)
  5. Edit the UA for Search Bots Tracking code for the following:
    • Set the $UA_SB_ACCOUNT_ID variable to the new GA Web Property ID.
    • Set the $UA_SB_PATH to the location of the ‘/ua-searchbots/’ folder. (Depending on your PHP setup, you may or may not run into issues with setting the location value. Depends on your include_path setting)

One thing to point out in this custom code library is that ‘source’ is set as the user agent, not the traditional campaign source. I found it easier to drill down to the different bots with this method. I would also pay a little more attention to Pageviews rather than Visits to better analyze how the bots crawl your site.

Another thing is that this will only track bots that execute a page with the UA for SB tracking script. If a bot hits a URL that only generates a generic server response (like maybe a 500 status code), then it will NOT be tracked in your new UA profile. Just a little disclaimer.

“The Humans are Dead”

The Humans are Dead - Flight of the Conchords

Well, we’re not really dead. But now the bots are alive in your new GA profile! Let’s take a look at the data.

One of the more common use cases I ran into was that I wanted to see a listing of the different bots hitting my site, then drill down to a specific bot and see its pageview data. I created this GA custom report to help view Search Bots by Source a little more easily. Once you open the report you may see something like this:

Search bots by source

You can see some traditional search bots, plus some other unconventional bots. The top entry of ‘Unknown-Robot’ is a catch-all entry for undefined bots.  To help indentify some of these unknowns you can always edit the botconfig.php file and add some more entries. I hope to update this file myself over time as well.

I’d be curious to hear what other things you’d love to see from this script. My good friend Ani Lopez used my older script to help track bot activity for International SEO insights. What other bot or server-side scenarios would you like to see tracked?

So please download it. Tell me what’s wrong, I’ll fix it. Tell me what you’ll do better… then feel free to revise it! Also, if anybody wants to be a pal and wrap this code up into a WordPress plugin, that would be super awesome. 😉

Tagged with: , , , ,
Posted in analytics, Uncategorized