Wednesday, March 10, 2010

Raw data

Nowadays, in 2010, we can use different ways to get the raw data needed to feed a web analytics solution such as: server log files, page tagging, packet sniffing or integrating web analytics into the web server software itself.

The massive use of Google Analytics has made very popular the page tagging, but the other methods are still valid and it is worthy to know the pros and the cons of each of them. This way we might find an hybrid solution that adapts better to our needs.

Let's talk in detail about the methods to get this information.

SERVER LOG FILES

The web server reliably records every transaction it makes.

If a person revisits a page, the second request will often be retrieved from the browser's or proxy cache, and so no request will be received by the web server. This means that the person's path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor to the website.

The data is on the company's own servers, and is in a standard, rather than a proprietary, format. This makes it easy for a company to switch programs later and analyze historical data with a new program.

Logfiles contain information on visits from search engine spiders. Although these should not be reported as part of the human activity, it is useful information for search engine optimization.

Logfiles require no additional DNS Lookups. Thus there are no external server calls which can slow page load speeds, or result in uncounted page views.

PAGE TAG

Involves including a small invisible image and, by using a component ("tag"), to pass along with the image request certain information about the page and the visitor. It is usually written in JavaScript, though Java can be used, and increasingly Flash is used.This information can be processed remotely by a web analytics solution.

With the increasing popularity of Ajax-based solutions, an alternative to the use of an invisible image, is to implement a call back to the server from the rendered page. In this case, when the page is rendered on the web browser, a piece of Ajax code would call back to the server and pass information about the client that can then be aggregated by a web analytics company.

It requires changes to the web site to be analyzed which can be difficult to maintain, but many applications blogs, CRM tools, E-commerce solutions, site builders already provide complete tagging solutions.

The tag is specific for the web analytics vendor, so it won't be easy to change it.

Page tagging may not be able to record all transactions:

- Page tagging relies on the visitors' browsers co-operating, which a certain proportion may not do (for example, if JavaScript is disabled like in most mobile devices, or a hosts file prohibits requests to certain servers).
- Tags may be omitted from pages either by oversight or between bouts of additional page tagging.
- It may not be possible to include tags in all pages. Examples include static content such as PDFs or application-generated dynamic pages where re-engineering the application to include tags is not an option.

It is easier to add additional information to the tag, like the visitors' screen resolution, or the price of the goods they purchased.

Page tagging can report on events which do not involve a request to the web server, such as interactions within Flash movies, partial form completion, mouse events such as onClick, onMouseOver, onFocus, onBlur etc.

Page tagging is available to companies who do not have access to their own web servers.

Tagging will slightly slow down your pages.

PACKET SNIFFING

Packet sniffing collects data by sniffing the network traffic passing between the web server and the outside world. Once the packet sniffer has recreated the HTTP and HTTPS traffic it can then create a log file, similar to one created by a web server. From this you can use your favourite web analytics solution to process the log files.

Packet sniffing involves no changes to the web pages or web servers.

It can be implemented:

a) Installing a separate server at your data center which you connect either to a SPAN port on your switch (a software change) or to a network tap (a physical device that basically duplicates all the packets that cross your cables. Unlike proxy servers or server plugins (or even page tags), both of these methods are completely passive and are physically incapable of impacting your traffic in anyway.
b) Installing the software on your web servers. This approach is still much less intrusive than server plugins, etc. but it does add additional CPU and memory usage to your web servers. Unless your servers are idle (and you have only one), this isn't recommended. But it is an easy way to get up and running quickly.

There are certain things you cannot capture using packet sniffing alone. In particular, interactions that occur client-side (within a visitor's browser only) and do not generate any corresponding traffic to the server. One example is if you download a movie and pause/rewind/fast-forward, and actually want to know that user performed those interactions.

Atomic Labs' Pion takes a completely approach by capturing all your web traffic passively using packet sniffing, and provides a visual, web-based interface that integrates with some major vendors like Omnitures Genesis, Google Analytics, WebTrends and Unica.

No comments:

Post a Comment