Clickstream Consulting
       The data warehouse experts



Search this site or  the web
powered by FreeFind




    Site search
    Web search
Integrating Clickstream Data Into Your Existing Data Warehouse

This article explains how to integrate clickstream data from your enterprise’s web site into your existing data warehouse. It includes two examples from retail e-commerce and the utility industry.

Most enterprises now maintain one or more web sites, and the user clickstream data that is automatically generated by these sites is a very important source of business intelligence. Surprisingly, most businesses do nothing to integrate clickstream data into their existing data warehouses, throwing away one of their richest sources of business information. In this article, we show how two very different kinds of businesses integrate clickstream data with their existing data warehouses.

Retail e-Commerce Clickstream Data

While many pundits continue to pronounce retail e-commerce dead, the total value of retail e-commerce activity has continued to grow, although at a slower pace than in the past. Online shoppers will spend $11.9 billion during the current holiday season, versus $10.8 billion last year, according to Jupiter Media Metrix. One of the big factors fueling this increase is the catalog retailers, who have added an online catalog channel to their traditional catalogs by mail.

One of my clients is a large catalog retailer, that added a companion catalog website about 2 years ago. Originally created as an experiment, the site quickly grew to about 10% of total sales. The retailer had been using a web log file analysis tool (WebTrends) to track user activity on its site, but soon discovered that this mechanism was inadequate. Log file analysis tools do not track application specific information like what a user puts into or takes out of a shopping cart, the amount of a sale, the shipping address, the method of payment, etc. Log file analysis tools also do a poor job of analyzing the effectiveness of online promotions, and printed catalog promotions that pointed the users to the website.

The retailer already had an existing data warehouse for its mail order catalog business. They wanted to integrate the data in that data warehouse with the clickstream data from their online catalog. The integration posed some interesting challenges.

Conformed Dimensions

Because of the rather large differences the kinds of data that can be collected in the mail-order environment versus the online catalog environment, the easiest way to integrate clickstream data with the existing mail-order data warehouse was to join separate mail-order and online fact tables through common conformed dimension tables. Obvious candidates for conformed dimensions were the Time, Geography and Customer dimensions. But the clickstream environment introduces new attributes that are not present in the mail-order data warehouse.

Most brick-and-mortar businesses are not 24 hour a day affairs. The mail-order catalog business fulfilled transactions 8am-6pm, six days a week. Orders were filled in daily batches, and the Time dimension only went down to the day level of granularity. In contrast, the web site was up and running 24 hours a day. From the perspective of the user, orders were filled instantaneously upon completing the checkout process. These transaction records, like all clickstream records, are timestamped down to the seconds level of granularity. This caused the conformed Time dimension to go from a day level of granularity to a seconds level of granularity. It also caused the mail-order business to start timestamping its transaction down to the second level, like the online environment.

A conformed Customer dimension has even more drastic changes to its attributes. In an online environment, customer activity can be seen, whether they actually buy something or not. This means that there are now three different levels of customers:
  1. Actual customers who have ordered goods online or through the mail. Customer attributes include mailing address, phone number, credit profile, billing address, shipping address, etc.
  2. Registered users online, who have userIDs that identify them, but not necessarily to the level of customers who have ordered goods.
  3. Unregistered surfers who are just browsing the catalog, and are tracked using persistent cookies and session cookies.

A conformed Customer dimension has to be expanded to hold all of these potential customer attributes. By following the activity of users who surf online, the retailer gains valuable information on how potential customers use their catalog, and which product presentations and offers are most enticing.

Conforming the Geography dimension causes it to split into two in this environment. Clickstream data may have a physical geographic location if the user orders something for delivery to a specific address, but if the user is just surfing, his exact physical location can be difficult to determine. More interesting is his host geography, that is, what site is hosting his access to the online retailer. This is the cyberspace equivalent to the neighborhood surrounding the online store, and the catalog retailer needs to know which host sites, like AOL.com, SportingLife.com, etc., drive traffic to the online store. By discovering these pathways, the retailer may choose to advertise on some of the sites that drive a lot of traffic to online catalog. Using newer services like Geobytes.com it may also be possible to at least identify the city and country of online user access. This retailer discovered that there was a great deal of foreign interest in its online catalog, and it is considering expanding the list of countries that it serves as a result. The net effect of all this is a conformed Geography dimension containing physical location information like addresses, and a separate Host dimension containing information about the hosts that drive traffic to the web site, as well as TCP/IP address information that may have geographic identity down to the country and city level.

Utility Company Clickstream Data

A regulated utility company decided to use its web site to reduce customer service costs. The site could be used to pay bills online, get records of past payments, start service, stop service, get appliance rebates, etc. They also had a traditional call center that could be used to do customer service requests as well. Their existing data warehouse covered the traditional call center activity but did not cover what was happening on the web site.

There were two looming business issues. Was the web site really saving customer service costs that would otherwise be born by the call center and billing departments? And was the web site providing an adequate level of customer service—one that could be demonstrated to the State Public Utility Commission, which regulates these issues. Without a clickstream data warehouse it was impossible to answer these questions, and the utility decided to embark on a project to integrate clickstream data into its existing data warehouse using a conformed dimension process similar to what was described in the previous section.

While I am unable to go into the details of how the dimensions were conformed, I can discuss one interesting result. Analyzing the clickstream data with the call center data showed that many potential users were abandoning the site before finishing their billing or service request and instead going to the call center to complete the billing or service request. By redesigning the web interface to be more user friendly, a higher percentage of requests were fulfilled on the site itself, resulting in significant cost savings. This insight occurred because the utility integrated its clickstream data with its customer service data.

Conclusion

For more information on how to integrate clickstream data with your existing data warehouse, see my new book, “Clickstream Data Warehousing”, published by John Wiley and Sons. It is available at Amazon.com and Fatbrain.com.
Mark Sweiger is President and Principal of Clickstream Consulting, LLC
and author of the new book, "Clickstream Data Warehousing,"
published by John Wiley & Sons. He has taught
courses on clickstream data warehousing for TDWI and maintains
an informative Web site, http://www.ClickstreamConsulting.com.
He can be reached at mailto:msweiger@ClickstreamConsulting.com.


Home    Our Services    Company Overview     Consultant Profiles     Articles and Papers     Contact Us!     Book Site     Links

© Copyright 2001, 2002 Clickstream Consulting, All Rights Reserved