You might have heard of Cohort Analysis – it’s a time-tested segmentaiton technique used to understand user behavior.
Cohort analysis has never been a core part of Google Analytics – it’s always been a bit of a hack. I wanted to discuss how to do cohort analysis with Google Analytics today and what it might look like in the future.
What is a Cohort?
Let’s use a definition from Wikipedia:
A cohort is a group of people who share a common characteristic or experience> within a defined period (e.g., are born, are exposed to a drug or vaccine or pollutant, or undergo a certain medical procedure). Thus a group of people who were born on a day or in a particular period, say 1948, form a birth cohort.
Side note: you’ll notice the medical tone of the definition. Medical studies use cohorts a lot.
I like this definition as it clearly points out the key parts of a cohort: users who share a common characteristic that occurs within a defined period of time.
A cohort is really just another type of segmentaiton, the key is that a cohort is based on date.
Here’s an ecommerce example. If I was an ecommerce business owner I would want to create a cohort of customers who make their first purchase on Black Friday. This cohort is important because they made their first purchase during a very important time, the holiday buying season.
The common characteristic/experience that they share is that they all made their first purchase, and the defined period of time is Black Friday.
From an analysis perspective we want to segment this group to observe their behavior over a longer period of time.
- Do these customer behave differently?
- How do they differ from customers that buy at other times of the year?
- Do they buy multiple times? Do they spend the same amount?
Here’s another example, pretend I’m Evernote. How do customers who signed up for my premium service in October, 2012 differ from customers who signed up for my service in February, 2012? Again, do these users behave differently over time?
I think a lot of people use the term cohort to mean a segment of users and ignore the date part of the definition. I know I’ve done this in the past. But segmenting by users is really user segmentation, which is slightly different. I want to stay focused on cohorts in this post.
So now that we know what a cohort is, why can’t we create a cohort in Google Analytics?
Why creating Cohorts with Google Analytics is hard.
Cohort analysis is hard to do with Google Analytics because activity dates are not accessible in the reports or segmentation. To create a cohort we need to access the date that a conversion, or some other action, happens. In our examples above we talked about transactions on Black Friday and the date a user became a premium subscriber.
Another reason reason that cohort analysis is hard in Google Analytics is that the segmentation is based on visits, not users. So even if we had some type of date dimension and used it to create a segment, that segment would only represent visits that matched the criteria. It would not return data from all the visits for the users that matched the criteria.
In order to do cohort analysis in Google Analytics we need to inject a date, usually the date when a conversion happens, into Google Analytics and attach it to a user. Some of the most popular dates to collect include:
- first purchase date
- most recent purchase date
- first time the user signed up for membership
- the date when user upgraded membership
Once we have a date in Google Analytics we can do some cool analysis.
Let’s look at two different ways to collect the data we need for cohort analysis..
Using Custom Variables to Create Cohorts
You can use a custom variable to create a cohort in Google Analytics. The general idea is that you set a visitor level custom variable with the date of the action when the user completes the action.
Because the data is stored in a visitor level custom variable (i.e. a cookie) whenever the user visits the site the data will be sent to Google Analytics. The cookie helps us attach the date to the user.
Here’s an example. We want to create a cohort based on first purchase date. We would need to set a visitor scoped custom variable that contains the date when a visitor made their first purchase.
We would need to use the following code:
_gaq.push(['setCustomVar', 1,'FPD', 'YYYYMMDD', 1]);
The custom variable is named FPD (for First purchase Date). The date is formatted using the four-digit year, two-digit month and two-digit day. This format will let us create a lot of different cohorts using advanced segments. More on this below.
Remember, the code above needs to be placed on the site’s thank you page. But, it can only be added when a person makes their first purchase. We don’t want to set this cookie when people make their second or third purchase.
Obviously you’ll need some code to identify if the purchase is a first purchase or a repeat purchase.
It is possible that this data could be erased if the user deletes their cookies. You could always create some more code to keep resetting this cookie when the user logs into the site.
Using Events to create Cohorts
Another way to add user data to Google Analytics is via an event. You can send the same data, but you just need to do it more often.
Here’s the event code:
_gaq.push(['trackEvent', 'Cohorts','FirstPurchase', 'YYYYMMDD']);
Because an event is not stored as a cookie you need to fire an event every single time the user visits the site. This means that you need to know that the user is a customer and insert the event code on a page the user will see. You might put it on the login page or you might just include it in the standard page tag. Depending on your site, this could be a lot of coding.
I’m not a fan of this method. It takes too much work.
I prefer to use a custom variable because it will persist and provide more reliable data.However, I would hedge my bets and try to add a little code that identifies when a user logs in and checks and resets the cookie.
So now that we have a cohort defined, how do we analyze the performance in Google Analytics? Using advanced segmentation!
Let’s assume I used the custom variable approach to identify cohorts.
I want to create a cohort of all the people that first purchased from my company in October, 2012. The purchase date is stored in a custom variable named FPD. Here’s how that Custom Advanced Segment would look:
The first condition specifies the name of the custom variable, the second condition matches the date stored in the custom var.
Notice, that using the
YYYMMDD format let’s me create many different date based cohorts. I could create a quarterly cohort using the following advanced segment:
Remember, these segments will return all of the data, from all of the visits, that include the appropriate custom variable. This means that all of the data will be from users that made their first purchase in my desired date range.
So what am I looking for once I apply this segment? What kind of insights am I looking to find? A few things:
- How does this group convert at macro and micro conversions? It’s really, really important to look at any revenue metrics. Are they spending money? Are they generating a lot of value for my business? If they are not generating value, when did they stop generating value? Was it after one week, one month of one year?
- If you are an ecommerce website, what products do these people purchase? Are they buying products in the same general category or are they buying a broad range of things?
- What traffic sources generate repeat visits? Are they responding to retention based marketing activities? Are they coming directly to the site because they love you?
- If you are a publisher analyzing members, what content do these people read? Do they read the same type of content? Are they looking in different categories? What do their loyalty metrics look like?
The easiest way to visually identify the behavior of cohorts is by applying multiple advanced segments at the same time and observing the metrics. How do they change? Do they all show the same trend?
Whenever I’m doing a cohort analysis I’m always trying to figure out when this group of people no longer provides value to my company. If I can figure this out they I can be more proactive in retaining them as a customer.
The Future of Cohorts in Google Analysis
Cohort Analysis is going to get a lot better in Google Analytics.
You may have used the new Remarketing feature. This tool does true user segmentation. This includes the option to create a cohort using various date dimensions. This is going to be the future of segmentation in analytics.
User segmentation, and the ability to create cohorts, will be the gateway to many different types of new analysis.
Bjoern Sjut says
One of the best ways to employ cohorts in analysis is to differentiate in freemium models between a registration date and a purchase date. This allows to better understand by what channels users were enticed to sign up for an account – compared to what were the last onsite/offsite events that actually triggered the purchase.
Justin Cutroni says
@Bjoern: Great example, that’s exactly the kind of thing I had in mind when writing this post. Thanks for sharing.
Tim Wilson says
Thanks for this post, Justin. Great stuff, as always.
One question: you describe analyzing the cohorts using an advanced segment. For a quicker view to compare across cohorts, could I simply use the custom variable reports (not to automatically do the quarterly roll-up, but to look at the data at the specific level the cohort is identified in the custom variable) and/or custom reports to see this data?
Obviously, that’s not going to work for flow visualization and reports that I might want to “filter” by cohort (traffic sources, for instance), where segments would be needed. But, for “simple” analysis — visits, conversion rate, AOV, etc. — wouldn’t simply using the Custom Variable reports and custom reports do the trick?
Justin Cutroni says
@Tim: You’re absolutely correct. You could simply use the standard Custom Variable report or use a custom report containing the custom variables. The only challenge is granularity. If you store a YYYMMDD date in the custom var you might have a lot of rows of data. But if that works for you go for it!
Thanks so much for pointing that out.
Eivind Savio says
Great post Justin.
I have been using Next Analytics to report cohorts over time.
A while ago I posted a screen shot over at Google+ of this:
The cohort here is registration date in the format DD.MM.YYYY stored as a Custom Variable. Next Analytics then do the work of creating a monthly cohort in the format YYYY-MM.
Looking forward to cohorts in GA.
Justin Cutroni says
@Elvind: Thanks for sharing, very cool. I really like your heatmap.
This post has been sitting in my drafts queue since March, 2012. It was actually a presentation I did at a GA user conference earlier this year. Better late than never.
Interesting take on cohort analysis. The way that I have seen it done by BIs is retrospectively look at the data and run it through cohort formulas, to come up with the biggest organic cohorts. But for that you need to have access to user based information and raw data which GA does not do.
Is cohort analysis going to be possible in the new Universal Analytics code?
Justin Cutroni says
@Menucha: User segmentation is coming to Google Analytics in the future. It’s not part of Universal Analytics. It will be available to all versions of GA.