Access to on-line data sources is a critical component of our information assistants. In the Travel Assistant there is no data stored locally in the system. Instead all information is accessed directly from web sources. To do this we build wrappers that turn web sources into structured data sources. This allows the system to reason with the data and integrate the information with other data sources. As XML becomes more widely used, the access to data will become easier, but it will be a long time before most of the required data will be available as structured sources.
A wrapper is a program that turns a semi-structured information source into a structured source. This idea is shown in Figure 6 where the Yahoo! weather source is dynamically turned into an XML data source. Since the weather data changes frequently it would not be useful to download this data in advance. Instead the wrapper provides access to the live data, but provides it in a structured form. Once we have built such a wrapper, the Travel Assistant can send HTTP requests to the wrapper and get back XML tuples.
We have developed a set of tools for semi-automatically creating wrappers for web sources [8]. The tools allow a user to specify by example what the wrapper should extract from a source. The examples are then fed to an inductive learning system that generates a set of rules for extracting the required data from a site. The user interface for the wrapper learning system is shown in Figure 7. The window in the upper right shows the original web page, the window in the upper left shows the labeled data for this page, and the bottom window shows the learned extraction rules. Beyond just creating the rules, we have also developed techniques for ensuring that the system is extracting the right data [7], monitoring the source to ensure the it continues to function properly [5], and automatically repairing wrappers in response to format changes in a site [3].
Once a wrapper for a site has been created, one can use that site programmatically. For example, with the wrapper for Yahoo! Weather, we can now send a request to get the weather for a particular city and it will return the corresponding XML data with the weather for that city. As we mentioned earlier, there is no data stored in the application. This minimizes the work involved in maintaining an assistant and ensures that the assistant has access to the latest information.