At a minimum, the on-line decision engine should have access to the following kinds of information. We are not suggesting that all of this data will be used in each decision, but that it should be available if deemed relevant.
There is also a higher, semantically rich kind of information that can be helpful as input for the decision engine. This will include information, for example, about the category of page being accessed by the customer (e.g., search request, search answer, catalog entry, shopping cart), or the intent of a page (e.g., that it includes a promotion or indicates that a certain catalog item is out of stock).
Indeed, an important part of installing a DFP system, or any web personalization system using on-line decision support, will involve creating or determining a model of the web site, that incorporates relevant models of customers, the intent of their activities on the site, the business value of those activities, indicators that a customer may abandon a transaction, etc. The example of Section 3 provides a starting point for such a model, but a variety of other factors may be brought into play. This model will be clearly visible in the program driving the decision engine, and will help to guide the kinds of information that need to be passed from web server to decision engine.
There is a trade-off between attempting to automatically infer semantically rich information from the HTML passed between customer and web server vs. manually incorporating that information into the web page generation so that it can be obtained easily by the decision engine. Attempting to infer this information automatically typically involves parsing the HTML; it will involve the development of special-purpose code and be computationally expensive. Further, its success will depend on how direct and uniform the relationship is between the actual HTML content of the web pages and their intent. On the other hand, incorporating code into the web page generation that captures the semantically rich information puts an additional burden on the web site developer, both at creation time and during maintenance. A site such as Amazon or Yahoo could have thousands of pages, some static, others generated dynamically via server-side scripting languages such as ASP/JSP, or CGI-scripts/servlets. It will be a huge effort to modify all executable scripts to add the MIHU functionality.
Sophisticated web authoring environments such as Microsoft's FrontPage or Allaire's ColdFusion Studio provide hooks so that web site authors can easily incorporate semantically rich information into the HTML generated by their code. Thus it would be straightforward for site developers to extract high-level semantic information to be passed to the Vortex engine. However, if the site has not been built using such tools, we expect that early adopters of our personalization technology will opt for parsing the HTML, and will use only some of the information actually available.