• Home
  • Google Analytics
    • Customizations
    • For Ecommerce
  • Speaking
  • About
    • About Me
    • Contact Me
    • Disclaimer and Privacy Policy

Analytics Talk

Digital Analytics for Business

You are here: Home / Resources / My Regular Expression Tool Box

My Regular Expression Tool Box

Posted: December 10, 2007 12 Comments

Love em or hate em, regular expression are a part of Google Analytics. They provide a lot flexibility but at a price. Small mistakes can become magnified and result in poor data quality.

I know there’s a lot of information out there about regular expressions, but I wanted to simplify the topic. In my opinion, here are the most important things to know.

Key Concept: How GA Regular Expressions Work

Let’s start by talking about how regular expressions work in Google Analytics. In general, we apply a regular expression to a piece of data. If the expression matches ANY part of the data then the expression will return TRUE. If the expression returns TRUE then some action will occur.

It doesn’t matter where you use the reg ex. If it’s part of an exclude filter, and the expression matches the data, then the data will be excluded. If it’s part of an include filter then the data will be included. If it’s part of a report filter then the report will only contain info that matches the reg ex. You get the idea.

How Google Analytics Filters Work
[In this image think of the data as the square cube and the red work bench as the regular expression. If the cube is the same shape as the hole in the bench then an action happens; the cube falls through. Get it?]

It’s really important to understand this because it simplifies the expressions we need to create. Let’s say I want to identify all the keywords in a set of data that contain the term excel. Here’s the full list:


word
excel
ms excel
excel 2003
linux
microsoft excel
excel 2007
excel makes pretty graphs
google

Rather than create some fancy regular expression, I can simply use: excel. After the expression is applied to the data we’ll have the following sub-set:


excel
ms excel
excel 2003
microsoft excel
excel 2007
excel makes pretty graphs

This simplifies the creation of your expression because you only need to match part of the data that you’re looking for. With that in mind, let’s move on to some tips that cover the most common uses of regular expressions.

Tip #1: Use Anchors

Regular Expression AnchorsAnchors are a way to specify if a regular expression should match the begining of the data or the end of the data. Remember, reg ex works by matching ANY PART of a piece of data. Sometimes we’re looking for data that starts or ends a particular way and that’s why we need anchors. Let’s go back to the excel example.


word
excel
ms excel
excel 2003
linux
microsoft excel
excel 2007
excel makes pretty graphs
google

Suppose I only want to see the items that END with the word excel. Well, if I use the regular expression excel, I’m going to get all the items that contain the word excel no matter where it appears.

I need to create a reg ex that means, “ends with.” That’s done by placing a dollar sign, $, at the end of my reg ex. So the expression to find all of the keywords that END with excel would be: excel$.

It would match the following items from our list:


excel
ms excel
microsoft excel

To find all of the keywords to START with excel use a carrot, ^, at the beginning of the regular expression, like this: ^excel. It would match the following items from the list:


excel
excel 2003
excel 2007
excel makes pretty graphs

Now, let’s say I want just the keyword excel. Here’s how that expression would look: ^excel$.

Anchors, pretty handy.

Tip #2: Find This OR That

ORMany times in an analysis we’ll want to find multiple items from a set of data. For example, let’s say I want to find all the keywords that contain the name of an MS Office product. The complete list of keywords is:


word 2007
microsoft excel
outlook express
powerpoint
windows 95
mac OSX
linux
google rocks

Again, I’m only interested in the MS Office products, so I need to create an expression that includes the names of all the products. I want to find word OR excel OR outlook OR powerpoint. The pipe character, |, is used to represent OR logic. The following expression will return true if any of the items occur in the data:


word|excel|outlook|powerpoint

And here are the results:


word 2007
microsoft excel
outlook express
powerpoint

Tip #3: If in Doubt, Escape it Out!

The dangerous thing about regular expressions is that we often don’t know what we don’t know. There are a lot of characters that have special meaning in reg ex. The plus sign, the question mark and the period are just a few. Inadvertently using a special character in an expression can lead to big trouble. There is an easy way to protect yourself: escaping.

Escaping a character means that GA will interpret the character as a LITERAL character and not as a regular expression character. To escape any character place a backslash in front of the character. Here’s the great part. It doesn’t matter if you escape a non-special character. To me, escaping a character is like using a safety net. If you’re unsure if a particular character is a special character, escape it. It can’t hurt your expression.

Time for an example. Let’s say we want to create a goal based on the following URL:


index.php?id=34

I need to turn the above into a regular expression. The question mark and period are special characters so they need to be escaped. But I’m not sure about the equal sign. I better escape just to be safe. So here’s how the resulting reg ex would look: index\.php\?id\=34. By the way, the equal sign is not a special character.

So there you have it. My two cents on regular expressions. These tips just scratch the surface of what you can do with Reg ex. If you really want to learn about reg ex check out my friend Robbin’s series on the subject.

Filed Under: Resources Tagged With: regular-expressions, Tips

Comments

  1. Steve says

    December 11, 2007 at 4:54 am

    At *least* $0.03 surely! ;-)

    Great intro Justin to the weird and wonderful world of RE’s!

    FWIW, there are cases where escaping characters can get you into trouble, eg inside []. But for an intro article? Perfectly fine!

    Cheers!
    – Steve

    Reply
  2. Justin Cutroni says

    December 11, 2007 at 6:41 am

    Thanks Steve! Always great to hear from you!

    You’re absolutely right about escaping characters inside of brackets and I went back and forth trying to decide if I should make a note of that. In the end I wanted to keep this simple.

    Have a great day,

    Justin

    Reply
  3. Tyson says

    December 11, 2007 at 10:32 am

    Thanks for posting this, Justin. If there’s one thing I wish I knew more about, it’s regular expressions. I think it was my new year’s resolution last year, and here we are at 2008 already! Maybe 2008 will be the year. Good intro, it’s helpful for me.

    Tyson

    Reply
  4. Justin Cutroni says

    December 11, 2007 at 10:55 am

    Thanks Tyson, glad you found the post useful. Good luck with your 2008 resolution!

    Justin

    Reply
  5. Jeremy says

    December 11, 2007 at 3:56 pm

    Hey Justin – Great one. My coworkers are always trying to get me up to speed on reg ex and sometimes you just gotta take it one step at a time! Thanks for the post.

    Reply
  6. Robbin Steif says

    December 11, 2007 at 11:12 pm

    You just wanted to show off your son’s toys, didn’t you? And his little hands too.

    Reply
  7. Paul Annesley says

    December 19, 2007 at 8:59 pm

    “It doesn’t matter if you escape a non-special character.”

    It’s worth noting that this isn’t a safe assumption for alphanumeric characters. For example ‘n’ is not equivalent to ‘\n’, as the latter is interpreted as a new-line character.

    (correct me if this isn’t the case in GA)

    Reply
  8. Justin Cutroni says

    January 13, 2008 at 11:53 am

    Paul,

    Thanks for pointing that out. You are absolutely correct.

    Justin

    Reply
  9. Nathanael says

    February 20, 2008 at 2:26 pm

    Would welcome a regex 102 post. Short of taking a full scale course, regexes seem like the kind of thing best picked up incrementally. Feel free to take it another level deeper.

    Reply
  10. Gavin Doolan says

    September 1, 2008 at 4:39 am

    Fantastic Justin, I thought regular expressions were far more complex than this, turns out they are quite simple and thanks to your article I now feel I understand them.

    I’m going to give these a shot :).

    Reply
  11. Sen Hu says

    December 28, 2008 at 11:57 am

    I am a scripting trainer. Part of my sessions include regular expressions. My students find that biterscripting is the easiest way to learn regular expressions. Biterscripting is free, it does not require anything else (compiler, etc.), it works on any windows version – so my students can download it on their home computers and start experimenting with regular expressions right away. Biterscripting provides many REs and stream editors (inserter, appender, enumerator, alterer, extractor by entire file, string, line, word, character) so that students can do more sophisticated things than just parsing. I belive a free download is available at http://www.biterscripting.com .

    Sen

    Reply
  12. cnu says

    March 6, 2009 at 1:10 pm

    why did u stop?? I just started enjoying your class. Great work!

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

My Books

Google Analytics by Justin Cutroni
Learn More on Amazon.com

Performance Marketing co-authored by Justin Cutroni
Learn More on Amazon.com

Recent Posts

  • Understanding the Google Analytics Cohort Report
  • Using Offline and Online data to drive Google Analytics Remarketing
  • Understanding Cross Device Measurement and the User-ID
  • Universal Analytics: Now out of beta!
  • Advanced Content Tracking with Universal Analytics

Categories

  • About Google Analytics (25)
  • Analysis (52)
  • Analytics Strategy (3)
  • Campaign Tracking (14)
  • Ecommerce (8)
  • Event Tracking (10)
  • Remarketing (2)
  • Reporting (10)
  • Resources (7)
  • Tag Management (5)
  • Tips (25)
  • Tracking (52)
  • Uncategorized (64)
  • Universal Analytics (9)
  • Web Analytics (15)

Copyright © 2023 ·News Pro Theme · Genesis Framework by StudioPress · WordPress