• Home
  • Google Analytics
    • Customizations
    • For Ecommerce
  • Speaking
  • About
    • About Me
    • Contact Me
    • Disclaimer and Privacy Policy

Analytics Talk

Digital Analytics for Business

You are here: Home / Tips / Make GA Data Quality Suck Less!

Make GA Data Quality Suck Less!

Posted: November 6, 2007 9 Comments

Frustrated with GA Data?  Get a grip!We all know that data quality sucks! But there are a few, vital steps that you can take to insure that your Google Analytics data is as accurate as possible. Remember, accurate data makes for happy, and accurate, analysts.

Here are three simple tips that can help make your data more accurate.

1. Eliminate Duplicate Data

Many sites that I work on have duplicate data. The usual cause is mixed case URLs. Google Analytics is case sensitive, it captures the data exactly as it appears in the location bar of the browser. So if a URL is of mixed case in the browser, it will be captured and displayed in mixed case within GA.

It’s very easy to have two URLs, that have the same functional meaning, appear as two line items in GA because they have a different case. Here’s an example:


/worldseries/index.php?year=2007&keyword=lowell
/Worldseries/index.php?year=2007&keyword=Lowell

Both URLs are probably the same, they just appear different because of the case. We want to force both URLs to have the same case and thus make them appear as a single line item in GA. This can be done with a Lowercase or Uppercase filter. I like the lowercase filter, but you could easily use the uppercase filter. It’s a personal preference.

The filter below will force the Request URI to lowercase:

Google Analytics Lowercase Filter

I recommend adding a case changing filter to any data element (i.e. filter field) that could be mixed case. This includes:

  • Request URI
  • Campaign Name
  • Campaign Term
  • Campaign Medium
  • Campaign Source

Another cause of duplicate data is multiple URLs that display the same content but have a different file extension. Here’s an example:


/champions/redsox.php
/champions/redsox.htm

These URLs may appear different (because of the file extension), but the web server might interpret them as the same file. Please note that not every web server behaves this way. It all depends on your web server. Check with your IT guru if your site has URLs with multiple file extensions.

You should merge duplicates URLs, that have different file extensions, into a single line item. I find the best way to do this is with an advanced filter.

Google Analytics AdVanced Filter for URI ReWrite

Some may think that a search and replace filter is the best way to remove these duplicates. But you would need to create a search and replace filter for each set of URLs that needs to be merged. An advanced filter, because it uses a regular expression, will change every URL that ends in ‘.htm’ to a ‘.php’ extension.

2. Remove Irrelevant Information

Extra information in the URL can cause big problems in Google Analytics. The reason is that GA will capture all of the data in a URL, which includes the query string parameters. Query string parameters that don’t have a functional meaning should be removed from the URL.

An easy way eliminate these parameters is to collect data for a week and then analyze the top content report. Any query string parameter that does not provide insight into what the visitor sees or does should be eliminated.

To remove a query string parameter from GA simply add it to the ‘Exclude URL Query Parameters’ field in the profile settings:

20071104-exclude-parameters.png

Enter multiple parameters as a comma separated list.

Be aware that once you remove a query string parameter from GA it is completely eliminated from the system. So any goals, funnels or other filters that use that parameter will no longer work.

Also remember that you should remove any query string parameters that contain personally identifiable information. It is against the GA terms of service to collect PII.

3. Identify Your Segment

I could have easily named this tip ‘exclude internal data’ but I wanted to change the way we all think about profiles and the data that’s in them. I believe we should think of profile data in terms of the segment we want to analyze, not who we want to exclude. I know these statements are very close in meaning, but there is a slight difference. Segmentation is so important to analysis. I believe that every time we create a profile we should consider what segment of data it will contain.

I can think if a few segments of data that I would like to analyze:

  • CPC traffic
  • New visitors
  • Return visitors
  • European visitors
  • Traffic from a specific marketing campaign
  • Non-employee traffic
  • Traffic generated by my call center

All of the above segments can be created as different profiles using include filters. Each will provide some insight into that segment. Don’t get me wrong, you’ll probably want to exclude internal traffic from 99% of your profiles. But try to think in broader terms, focus on the segment that you want to analyze.

Creating a profile based on a particular segment of traffic is pretty easy. The first thing you want to do is identify what segment of traffic you want to include in your profile. Then create a filter based on the filter filed that represents that segment.

Let’s say I want to see all traffic generated from visitors performing some type of external search on my name. I could apply the following include filter to a profile:

Google Analytics Include Filter

This filter can easily be modified to include a specific marketing campaign (using the Campaign Name field), a specific country (using the Visitor Country field) or any other segment of data so long as it is represented by one of the filter fields. Please note that this will work even if you’re using AdWords auto tagging on, even though you haven’t done any heavy lifting to define the Campaign Term.

By the way, you will want to exclude internal traffic from many profiles. My favorite way to remove internal traffic from a profile is with an ‘Exclude all traffic from an IP address’. Make sure you use anchors at the beginning and end of the regular expression.

Google Analytics Filter: Exclude IP Address

Another good way to exclude internal traffic, especially if you don’t have a static IP address, is to use a little hack called Count Me Out. This hack uses the GA custom segment cookie to identify users.

So remember, yes, you need to exclude internal traffic, but try to take a broad view and think about segmentation when you filter your profiles.

Filed Under: Tips, Tracking Tagged With: data-quality, google-analytics, setup, Tips

Comments

  1. Avinash Kaushik says

    November 6, 2007 at 12:17 pm

    Excellent post Justin. Hopefully this is already included in your most excellent e-book that everyone should buy ten copies of!

    Thanks, really enjoyed the actionability of this post.

    -Avinash.

    Reply
  2. Justin says

    November 6, 2007 at 9:46 pm

    Thanks Avinash!

    Reply
  3. Ward says

    November 9, 2007 at 5:22 pm

    Hi Justin,

    Thanks for that post – good stuff! I had a question about the ip filter – I have been doing this for a while for a bunch of IPs corresponding to different offices around the country. Is it really necessary to have the caret ^ at the front of the match? I wonder if i have been screwing this up all this time.

    Thanks,
    W

    Reply
  4. Rhoda Schueller says

    November 11, 2007 at 9:39 am

    Hi Justin,
    I’ve recently set up the site search on the art site I work on for a client. As the keywords used in the site search began to be listed I noticed many of them either had a capital letter as a first letter or all the letters were capitals. So I was interested in using the lowercase filter to remove the capitals from the site search phrases. The regular keywords used to reach the site are all lowercase already. So I was excited to follow your lowercase filter example. However, it didn’t change the case of the site search keyword phrases. Is there another filter to make the change with?

    Thanks for the help.
    Rhoda Schueller

    Reply
  5. Justin says

    November 17, 2007 at 4:00 pm

    Hi Rhonda,

    Unfortunately the on site search processing happens before filters are applied to the data. That means that we can not use filters to manipulate the reports. The only way to manipulate the data in the on site search reports is programatically. I’ve wirtten a bit about it here.

    Thanks for the comment and thanks for reading.

    Justin

    Reply
  6. Justin says

    November 17, 2007 at 4:05 pm

    Ward,

    Using the carrot depends on your IP address. If the IP address starts with two characters, like this:

    12.123.123.123

    then you ABSOLUTELY MUST use the carrot. The reason is that if you do not use the carrot, your IP regular expression will match ANY IP address that has a digit BEFORE the 12. That means the following IPs will match your reg ex:

    112.123.123.123
    212.123.123.123

    Thanks for the question and thanks for reading the blog!

    Justin

    Reply
  7. Martin Leblanc says

    May 5, 2009 at 11:37 am

    Hi Justin

    Great post.

    Another thing that could be added to the list is a filter that replace %20 with whitespace or the other way around. Of course whitespaces shouldn’t be in URLs but when they do, they show up differently if the visitors use Firefox and Internet explorer.

    /Martin

    Reply
    • Justin Cutroni says

      May 8, 2009 at 6:13 pm

      Great idea Martin, thanks!

      Reply

Trackbacks

  1.   Webanalist… it’s a dirty job! | Webanalisten.nl says:
    November 26, 2008 at 6:30 pm

    […] ‘Data quality sucks, let’s just get over it!’, proberen te redden wat er te redden valt (‘Make data quality suck less’) en hebben een tas vol munitie paraat om de belangrijke schuldvraag af te […]

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

My Books

Google Analytics by Justin Cutroni
Learn More on Amazon.com

Performance Marketing co-authored by Justin Cutroni
Learn More on Amazon.com

Recent Posts

  • Understanding the Google Analytics Cohort Report
  • Using Offline and Online data to drive Google Analytics Remarketing
  • Understanding Cross Device Measurement and the User-ID
  • Universal Analytics: Now out of beta!
  • Advanced Content Tracking with Universal Analytics

Categories

  • About Google Analytics (25)
  • Analysis (52)
  • Analytics Strategy (3)
  • Campaign Tracking (14)
  • Ecommerce (8)
  • Event Tracking (10)
  • Remarketing (2)
  • Reporting (10)
  • Resources (7)
  • Tag Management (5)
  • Tips (25)
  • Tracking (52)
  • Uncategorized (64)
  • Universal Analytics (9)
  • Web Analytics (15)

Copyright © 2023 ·News Pro Theme · Genesis Framework by StudioPress · WordPress