Raw vs. Semantic Information

Next: Acquiring the Information Up: Integration with Web Servers Previous: Integration with Web Servers

4.1 Raw vs. Semantic Information

The DFP approach to web personalization is based on providing relevant information to a sophisticated on-line decision engine. This subsection distinguishes between the raw data that can be obtained easily and higher-level semantic information, such as used by the Decision Flow in the example of Section 3.

At a minimum, the on-line decision engine should have access to the following kinds of information. We are not suggesting that all of this data will be used in each decision, but that it should be available if deemed relevant.

(a): History of customer clicks: This includes not only the web requests that the customer is making, but also the the navigation path being followed around the site, the times spent at each page, and the entries made into any forms.
(b): Web server responses: The response of a web storefront to a customer may be very important in understanding the customer experience. For example, to determine frustration stemming from difficult searches, it is important to know about both the number of searches performed and also the sizes of returned answers.
(c): Enterprise data: A broad variety of stored information may be useful to the personalization. At a minimum this will include accessing information resulting from bulk statistical analyses and information on inventory and availability times. If the customer has been identified then customer profiles and recent customer histories can also be incorporated into the personalization process.

There is also a higher, semantically rich kind of information that can be helpful as input for the decision engine. This will include information, for example, about the category of page being accessed by the customer (e.g., search request, search answer, catalog entry, shopping cart), or the intent of a page (e.g., that it includes a promotion or indicates that a certain catalog item is out of stock).

Indeed, an important part of installing a DFP system, or any web personalization system using on-line decision support, will involve creating or determining a model of the web site, that incorporates relevant models of customers, the intent of their activities on the site, the business value of those activities, indicators that a customer may abandon a transaction, etc. The example of Section 3 provides a starting point for such a model, but a variety of other factors may be brought into play. This model will be clearly visible in the program driving the decision engine, and will help to guide the kinds of information that need to be passed from web server to decision engine.

There is a trade-off between attempting to automatically infer semantically rich information from the HTML passed between customer and web server vs. manually incorporating that information into the web page generation so that it can be obtained easily by the decision engine. Attempting to infer this information automatically typically involves parsing the HTML; it will involve the development of special-purpose code and be computationally expensive. Further, its success will depend on how direct and uniform the relationship is between the actual HTML content of the web pages and their intent. On the other hand, incorporating code into the web page generation that captures the semantically rich information puts an additional burden on the web site developer, both at creation time and during maintenance. A site such as Amazon or Yahoo could have thousands of pages, some static, others generated dynamically via server-side scripting languages such as ASP/JSP, or CGI-scripts/servlets. It will be a huge effort to modify all executable scripts to add the MIHU functionality.

Sophisticated web authoring environments such as Microsoft's FrontPage or Allaire's ColdFusion Studio provide hooks so that web site authors can easily incorporate semantically rich information into the HTML generated by their code. Thus it would be straightforward for site developers to extract high-level semantic information to be passed to the Vortex engine. However, if the site has not been built using such tools, we expect that early adopters of our personalization technology will opt for parsing the HTML, and will use only some of the information actually available.

Next: Acquiring the Information Up: Integration with Web Servers Previous: Integration with Web Servers

Rick Hull
2/19/2001