I’ve been writing about Google Analytics a lot lately and think it’s time for an Urchin post. I know Urchin isn’t as new and exciting as Google Anlytics, but it is a fantastics web analytics package. It can process huge chunks of data quickly and provides great reporting at an amazing price.
One of my roles at EpikOne is to manage the professional services team. Our team handles support for Google Analytics and Urchin software. We support mid and enterprise level companies that use Urchin or Google Analytics. I previously blogged about some of the tools we use to debug problems with Google Analytics. Now it’s time to talk about Urchin.
While most of the tools we use to debug Urchin are the same tools that we use for GA, there is one big difference. With Urchin you have the log files to play with. This means that you can process, and reporcess, your data while troubleshooting an issue. It also means that you can dig into the logs and see how Urchin is transforming the raw log data into report data.
Without delay, here is my list of tools for debugging Urchin:
Ah, good old command line tools! If you’re working with Urchin then you WILL have to dig through a log file… MANUALLY! These tools make it painless. If you don’t know how to use grep, sed or awk I would reccomend a generic Linux book from O’Reilly or, if you want to be a bit ambitious, a shell scripting book from O’Reilly.
Microsoft Windows Resource Kit Tools
The Microsoft Windows Server 2003 Resource Kit Tools are a set of command line tools designed to streamline management tasks such as troubleshooting operating system issues. Honestly, this kit is a set of all your favorate Unix commands for Windows. Our team likes it because we’re not always working with Linux. There are lots of people out there running Urchin on Windows and we need to work through their issues quickly and efficiently.
Did you know that Urchin can be run in ‘debug’ mode? Yup, it can. This means that Urchin will generate debug data that shows how each log file line is parsed and stored in the database. This is probably THE most valuable tool when working with Urchin. Here’s how to run Urchin in debug mode:
/path/to/urchin/bin/urchin -D -p"profile_name" > process.log
First a warning. Debug mode generates a tremendous amount of data. Never run debug mode if your log source has more than ~ 100 lines. The output will show how Urchin is parsing your log file lines. It will also show how your filters are applied to the profile data.
LiveHTTPHeaders is a FireFox plug-in that displays all of the headers sent between a web page and the various servers that contribute the content for said webpage. This is a great tool to debug issues with the utm.js (if you are using UTM tracking). Using this plug-in you can validate that a request is made to your web server server for both the utm.js file and the utm.gif file.
If you’re working with web pages then you probably already use this. I like the Developer’s Toolbar because it give you quick access to the UTM tracking cookies. Validating that the tracking cookies are set, and are set correctly, is one of the first things you should do when debugging a UTM problem.
This is the tool for testing your regular expressions. The Regex Coach is a graphical application for Windows and Linux/x86 (also usable on FreeBSD) which can be used to experiment with (Perl-compatible) regular expressions interactively. It has the following features:
- It shows whether a regular expression matches a particular target string.
- It can also show which parts of the target string correspond to captured register groups or to arbitrary parts of the regular expression.
If you have any questions about the validity of your regular expressions you should test them with the RegEx Coach.
If you work with Urchin and have a tool or technique that is useful please let me know!
Krishna Baddam says
We are setting up Urchin for a client with 1Gb of compressed logs per day. They would like to keep one year history. We have to get Win 2003 machine. My question is: Is it better to get two quad core processor machine with 4Gb memory or one quad core processor machine with 8gb of memory? Can urchin use the extra processor? Also will Urchin be able to handle all that data? One additional question is with out using UTM can we be able to get unique user counts?
Justin Cutroni says
I assume you’re using Urchin 5… Urchin is not a multi threading application, so get a single quad core machine with the 8 GB of memory. Urchin will not use all 8 GB of memory directly, but the machine will run faster with the 8 gb.
Urchin should have no problem with 1 GB of logs a day.
You will need to use the UTM to identify unique visitors.
Hope that helps.
Brian Squires says
I own this site. It went live in June 08. It was set-up w/urchin analytics. I wanted to see the cities that visitors are coming from so I had my developer add google analytics. My question is : why is the visitor count from the google reports at least 60-70% lower than urchin visitor count?? It’s driving me crazy.
Justin Cutroni says
It’s probably due to a difference in the tracking method. My guess is that you’re tracking people in urchin using an IP based tracking method while GA is using a cookie based tracking method.
Hope that helps,