1IBM Research Lab in Haifa
|
2Dept of Computer Science
|
Copyright is held by the author/owner(s).
WWW10, May 1-5, 2001, Hong Kong.
ACM 1-58113-348-0/01/0005.
Mobile knowledge seekers often need access to information on the Web during a meeting or on the road, while away from their desktop. A common practice today is to use pervasive devices such as Personal Digital Assistants or mobiles phones. However, these devices have inherent constraints (e.g., slow communication, form factor) which often make information discovery tasks impractical.
In this paper, we present a new focused-search approach specifically oriented for the mode of work and the constraints dictated by pervasive devices. It combines focused search within specific topics with encapsulation of topic-specific information in a persistent repository. One key characteritic of these persistent repositories is that their footprint is small enough to fit on local devices, and yet they are rich enough to support many information discovery tasks even in disconnected mode. More specifically, we suggest a representation for topic-specific information based on "knowledge-agent bases" that comprise all the information necessary to access information about a topic (under the form of key concepts and key Web pages) and assist in the full search process from query formulation assistance to result scanning on the device itself. The key contribution of our work is the coupling of focused search with encapsulated knowledge representation making information discovery from pervasive devices practical as well as efficient. We describe our model in detail and demonstrate its aspects through sample scenarios.
Keywords: Focused search; Pervasive devices; Disconnected search; Knowledge agents.
Pervasive devices, such as Personal Digital Assistants (PDAs) and mobile phones, currently provide much more functionality than they were originally designed for. PDAs are no longer simple organizer/calendar tools. Hundreds of applications are available on the PalmOS platform alone (from ebooks to Web browsing), both in connected and disconnected modes (see palm.net webclipping applications [19]). This trend can be observed in the mobile phone business as well, with WAP phones offering a wide variety of services ranging from flight information to remote banking and e-trading.
As these devices developed additional functionality, they have evolved into general purpose information appliances in the form of very thin clients. Information access requires adequate information discovery abilities, that is to say proper mechanisms for searching and browsing information. The nature of pervasive devices, however, imposes new constraints on the classic search and browse paradigms that reign in the connected desktop world. These constraints are mostly due to:
The form-factor-related constraints have a direct impact on the success of any PDA application. Indeed if the mode of interaction is awkward or too time consuming, the application will most likely not be adopted by users. In the context of WWW information discovery, two types of interaction are critical: navigation (for browsing and exploring) and text input (for searching). Many user studies have been conducted in order to evaluate various methods for information navigation and for text input given various form factor constraints. As discussed in [5], many browsing paradigms that we take for granted in the desktop world have to be reconsidered in the context of small devices. For instance, scrollbars that are widely used on desktops are much harder to master on small devices and thus the simple act of scrolling itself often becomes disorienting. Consequently, content is often pre-packaged in device-dependent formats which simplify the navigation task. Examples include WML decks for WAP phones [27], and HDML pages for Palm VII. Ideally, content should be automatically or semi-automatically translated from a common format (e.g., XML) into device-specific formats, and in fact many transcoding products [22] tackle exactly this problem. However, this ideal solution is not realizable at this point since content is currently not provided in one common format, and since transcoding has turned out to be an elusive task.
Since an all-in-one solution does not seem near, it is necessary to devise interim application-specific and device-specific solutions. It is suggested in [11], for example, that navigation can be greatly enhanced by the definition of short-cut keys that facilitate faster browsing through WML decks in the context of a business card application on WAP phones. A promising direction has also been explored in the Power Browser system [5] that reorganizes the content of individual Web sites so as to decouple the navigation and viewing phases. This method was designed for local site search.
In terms of text input, several approaches have been proposed to enhance the user's experience for various input devices. Tegic T9 text input technology [24] provides a disambiguation mechanism which enables text input on phone keypads with much fewer keystrokes than is usually necessary. Masui [16] provides both word-level completion (via menus and dynamic approximate string matching) and phrase-level completion. The Power Browser [5] supports site-specific word completion.
The second class of constraints we consider pertains to the communication mode of pervasive devices, especially in terms of bandwidth limitations and response time requirements. Even if the interaction process is greatly enhanced by the techniques described above, a typical search process still requires numerous transactions between the wireless device and the content-storing servers (which include both search engines and servers that hold the actual target data). In [5] for instance, where users can search a Web site via a wireless PDA, at least six interactions are necessary before PDA formatted search results are displayed. Even with reasonable bandwidth, each such transaction is noticeable by the user who will consequently wait a few seconds for meaningful results.
A common solution geared to improve response time in many computer applications is to cache some information where it is required and thus provide quasi immediate feedback without having to access the original storage location. In the context of pervasive devices, we would like to store/cache some information on the local device, thus eliminating the need to access remote servers during some stages of the information discovery task. Avantgo [1] and IntelliSync [10] enable caching a pre-selected set of Web pages on pervasive devices. However, caching data for general purpose Web information discovery tasks would require an insurmountable amount of local storage space.
The analysis of the constraints inherent to pervasive devices such as PDAs presented above, leads us to believe that a promising direction for making search practical on pervasive devices is using a focused topic-specific search model, that works mostly in disconnected mode, with periodic downloading of data to the device. Focused search addresses the above issues as follows:
Focused search and crawling has been proposed in the past, mostly with the motivation of assisting novice Web surfers and/or improving the quality (precision) of results. Some systems and services predefine the topics of interest (a.k.a. domains) themselves (see for instance various directory services such as Yahoo [29] and Google [8]), while others allow the users to define their topics of interest (e.g., Fetuccino [9,3], Focused Crawler [7], WTMS [18], and knowledge agents [2]).
Since the amount of memory available on pervasive devices is very limited, it is crucial that the information stored locally be highly representative of the topic, yet concise. While the focused search techniques mentioned above can be used to collect a representative collection of Web pages for a particular topic, they are insufficient in terms of encapsulating the knowledge representation required to support all of the information discovery tasks on the pervasive device.
In this paper, we describe a focused-search approach specifically designed for the mode of work and the constraints dictated by pervasive devices. Topics are represented and stored in a persistent repository with a very small footprint which can be downloaded to small devices. More specifically, we suggest representing topic-specific information via "knowledge-agent bases" (KABs) that encapsulate information required to assist users in their discovery process from pervasive devices. The idea is to allow users to predefine a topic of interest, and then capture a very small but representative piece of the Web for this topic, storing characteristic information regarding it on the PDA. The information we store includes a topic-specific lexicon, a small number (~100) of the most authoritative Web pages and hubs of the domain (core pages) whose text is stored in its entirety, and a relatively larger number (on the order of thousands) of second-tier Web pages (outlinks of the core pages) for which we store only the URL and anchor text of the outlink. An index which supports search on the text of the core pages as well as on the anchor text of the referenced pages is also stored on the device.
The KAB's are built automatically using the Knowledge Agent technology that we have described in [2]. Users need only to supply several (4-6) queries which define the topic, along with an optional set of seed URLs which they consider to represent the topic at hand. We elaborate on the knowledge acquisition process in Section 2.1.
Combining focused search with encapsulated knowledge representation holds the promise of making information discovery from pervasive devices practical as well as efficient. The advantages of this approach are numerous:
The rest of this paper is organized as follows. Section 2 presents the topic specific knowledge representation and acquisition methods we use. Section 3 describes the process of information discovery in disconnected mode including topic browsing, query formulation, and disconnected search. Section 4 consists of some sample scenarios and examples using knowledge encapsulation for focused search on a Palm device. Section 5 concludes by reiterating the key contributions of this work and pointing to future directions. Related work is discussed as relevant throughout the paper.
Knowledge agents is a methodology for focused information discovery that the authors of this paper introduced in [2]. The goal of the knowledge agent approach is to gain persistent knowledge on a given topic in order to assist users in their information discovery tasks. Knowledge agents are created by users, and undergo a training phase during which they act on behalf of their creator in acquiring automatically expertise on a given topic. This knowledge is subsequently leveraged to facilitate both the search task (query formulation assistance, disconnected search, result ranking) and the browsing (by providing a set of pertinent pages and key concepts) on matters pertaining to the agent's domain of expertise.
The motivation for using a knowledge agent approach in our context is its compact representation, or encapsulation, of topic-specific knowledge. The compactness of the representation allows its deployment in the limited resource environment of pervasive devices, while the topic specificity of the represented knowledge answers the form-factor and communication model constraints of these devices, as explained in the Introduction.
The key adaptation of the knowledge agent approach to pervasive information discovery is the physical detachment between the knowledge acquisition phase and the knowledge application phase. Knowledge acquisition is done on a desktop server, with continuous (and preferably broadband) Internet access. After the topic-specific knowledge is acquired (and represented in a KAB), it is downloaded to the pervasive (and often disconnected) device, and serves to conduct as many information discovery tasks as possible either in fully disconnected mode or with minimal access to remote servers.
Figure 1 captures the server/device interaction. The knowledge acquisition phase can be initiated either from a connected desktop or from a PDA connected through a cradle or a wireless modem. Once the KAB is ready, it can be downloaded to the device again through a cradle or wireless modem. The user will only need to reconnect to the server to obtain updates or create new agents. Note that small KABs can also be obtained directly from other users, in a peer-to-peer communication exchange through direct beaming (a small KAB of about 400K takes about two minutes to exchange). Larger KABs can still be exchanged between users (an average KAB of about 1MB can take a few minutes). However, since such a long beaming operation might be inconvenient (not to mention battery consumption problems), it is preferable in the case of larger KABs to beam only a reference to the KAB on the server, and let the receiving party download it directly from the server.
This section focuses on the knowledge representation and acquisition phases. Section 3 will detail how this knowledge is applied to information discovery on pervasive devices, and specifically how pervasive search on PDAs is made both effective and efficient.
Having answered the first two questions, we move on to tackle the type of knowledge which should be acquired by the agent. We claim that the agent can limit itself to acquiring the following information:
Section 3 elaborates on how this knowledge base assists topic-specific disconnected information retrieval. Before moving on to detail the knowledge acquisition process, however, the key observation to be made is that the above knowledge is all implied by the core set of pages (and their associated text). Since the core pages are of the best available resources on the topic, their text should be representative of the domain language. It should contain topic-specific terminology, lexical affinities, and phrases. As for the required knowledge of second-tier pages, exactly such knowledge is contained in the hyperlinks which emanate from core pages. These links are associated with anchor text, which is often highly descriptive of the destination URL's contents. This is especially true for high quality pages, where the anchor text, like most other text, is highly informative. Moreover, since we aim to have top-notch hubs in the core set, their out-going links may be quite numerous and point to pages of high quality.
We call the URLs which are referenced by the core set satellite pages, and our experience shows that for core sets which contain about 100 pages, some 3000 satellite pages are referenced, with descriptive anchor text existing for many of these pages. Thus, the agent attains some knowledge on the contents of a much wider set of pages than it is actually able to store.
Knowledge is acquired by collecting the core pages in an iterative process, governed by queries. Recall that the agent's creator defines the topic by a series of queries, which we will now denote by q1,...,qn. Denoting the core set following the i'th query by Ci, we have
Ci+1 = fka(Ci,qi) ,
where fka is the knowledge agent's core-set evolution-by-queries function. This function performs the process described below. Note that the initial core set, C0, can be empty or contain some creator-specified URLs.
The core set evolves by processing the series of training queries, two types of which are supported by the system:Thus, for each training query, the agent performs a Web search. For text-based queries, some general purpose Web search engines (of the agent's creator choice) are presented with the query, returning a root set of candidate pages. For sample URL queries, the user supplied URLs are the root candidates. The collection of root candidates is expanded into a larger set of candidate pages, S, by following the hyperlinks surrounding the root pages. The pages which currently reside in the core set are also added to S. The exact expansion model depends on the type of query which is being processed, and is out of scope of this paper. However, it is based on ideas presented in [12] and is fully explained in [2]. The resulting set of candidate pages, denoted by S, represents a directed subgraph of the WWW, whose nodes are the candidate pages and whose edges are the hyperlinks which connect the candidates.
The large set of candidate pages S are ranked by a combination of link-structure analysis and textual analysis. These analyses complement each other towards finding the pages which are both central to the agent's domain and satisfy the query well. The various components of the score are briefly described below, while the specific technical details appear in [2].
The textual and link analysis scores are then combined to yield the overall query relevance score for each candidate page. At this stage, the core set can be updated. One of the innovations of the KA model is that the query relevance score does not exclusively govern whether a page needs to be stored in the KAB. Rather, with each page which is already stored in the KAB (that is, with each current member of the core set) there is an associated fitness score, which reflects that page's relevance to the domain through the course of the previous iterations of the agent's training phase. The fitness scores of the current core set members are compared against the query relevance scores of the rest of the candidate pages. Pages compete for the right to be included in the KAB using an evolutionary adaptation mechanism, also described in [2]. Briefly, the first few iterations see the core set grow until it reaches its maximal size, in terms of number of pages, as defined by the agent's creator. Once the core set is full, subsequent iterations cause pages with low domain fitness scores to become stale. These are then removed from the KAB, thus vacating room for the new, fresh and fit pages. All pages which are entered into the KAB explicitly by the user receive a high initial fitness score. This conveys our high regard for the user's judgment of the quality of these pagess.
At the end of the training process, the KAB comprises a set of Web pages (and their associated fitness scores) that can be thought of as a set of category pages in some directory service. Unlike the latter though, the KA topic can be of any granularity and reflects the personal interests of its creator. These interests are not necessarily covered, or at least not in sufficient depth or specificity, by directory services. In addition, the KAB pages induce, as mentioned previously, a domain specific vocabulary, and a set of satellite pages.
While the training itself, may take a few hours (depending mainly on the network connection available) since several hundred Web paegs are crawled as part of this stage, it can be set up in a few minutes using a simple interface.
As mentioned above, once the KAB is resident on the device, there is no need to access the server for many further information discovery tasks as they are conducted locally on the device.
The first discovery task that is conducted on the device is topic browsing and exploration. The KAB pagess which are ranked by their fitness to the topic provide valuable information in the form of specialized bookmarks. Simply knowing the key Web pages on a given topic is a great source of information as demonstrated by directory services. Having them locally available on the device is convenient in a variety of applications. They are available for simple browsing as classical bookmarks but can also be conveniently searched as explained below. In addition to reading the top resources, the top terms in the KAB can also be explored. These terms are most likely good starting points for searches in the domain of the KAB.
The second discovery task is topic-specific information retrieval. The retrieval process typically involves the following three stages:
In the remainder of this section we provide more detail regarding topic-specific query formulation and the disconnected search process.
In addition to simplifying text input, word completion can also be very helpful in spelling terms correctly. In the spirit of source code editors that provide word completion based on the limited set of terms in the the programming language vocabulary, we provide word completion based on the domain-specific vocabulary encapsulated in the KAB. As keys are pressed, the agent suggests the most frequent words in the KAB vocabulary consistent with the input prefix. This list is sorted in decreasing order of frequency, increasing the chances that the first term suggested is what the user had in mind. The user can thus complete query terms with one click rather than many keystrokes. See Section 4, for examples of word completions using some sample KABs.
Automatic query completion is usually performed by suggesting terms related to the query terms using some global semantic word network such as WordNet as suggested in [26]. In our case, on the other hand, query completion is based on the topic-specific knowledge as encapsulated in the KAB. The advantage of this approach is that the KAB's local vocabulary characterizes the domain's ontology and thus relations between terms are domain dependent. As a result, added terms disambiguate the query terms in the context of the specific domain.
The query completion process works by first clustering the terms in the query so that each cluster contains terms that most likely constitute a phrase and then identifying the best candidate expansion terms for each such phrase separately. The list of terms presented to the user is the union of each cluster's expansion terms. Since there is an upper bound on the total number of terms that can be suggested to the user (due to the small display size), each cluster is allotted the number of terms that it can contribute to this list based on the relative weight (in terms of frequency in the KAB) of the terms in the cluster. Term clustering as well as term suggestion is based on the lexical affinity relation. Recall that term t2 is considered a lexical affinity of t1 if t2 is found in proximity (e.g., within 5 words) to t1 in the text. The details of the query completion algorithm are formally expressed as follows:
For example, consider the query "Circus dancer in Paris museums" and a KAB specializing in the french painter Toulouse-Lautrec. The terms "circus" and "dancer" are clustered together and the suggested terms in decreasing order of weights are "actress, performer, prostitute, moulin, rough, valentine, goulue". The terms suggested for "Paris" are "dorsay, musee, france, montmartre", and the term suggested for "museums" is "art". Assuming the number of suggested terms is limited to 9, and based on the computed cluster weights, the final list is "actress, performer, prostitute, moulin, rouge, dorsay, musee, art".
See Section 4 for a sample usage scenario using query completion as well as for several additional query completion examples.
As mentioned above, the KAB pages are indexed during the training phase using a text search engine. We use Palm Pirate (Palm Information Retrieval Application for Text Search) [20], a search engine for the Palm developed at the IBM research lab in Haifa. The system allows users to store, search and view text collections on their Palm device. Palm Pirate is composed of an indexing component which runs on the desktop and a search component which runs on the Palm. The KAB pages are indexed using this indexing component during the training phase. The index is then downloaded to the device as part of the KAB.
Palm Pirate uses a static inverted index, an index that does not support update operations such as insertion or deletion of pages. The reason that it is static is that the inverted index vocabulary is converted to a minimal perfect hash function which is a very efficient representation (in terms of storage) for such a vocabulary [28]. In order to build the perfect hash all keys need to be known in advance, making it a static data structure. We find that for small collections (such as KABs), static inverted indices are reasonable, as it takes little time to construct the entire index from scratch. The ratio between index size and the raw data is about 15%. The average time to construct the index is on the order of a few minutes, and the average time to process a query of 3-4 words is about 0.5 seconds regardless of the index size.
The search component of the Palm Pirate uses a standard tf-idf [23] based scoring mechanism. The word statistics stored as part of the KAB's lexicon serve as the source for term frequencies used by this algorithm. The score for a KAB page is a combination of its textual similarity score to the query and its fitness score. Recall that the fitness score of a KAB page reflects its relevance to the training queries and therefore to the KAB's domain. By using the fitness score, authoritative domain pages within the collection of search results are given priority in terms of ranking.
In this section, we present some scenarios that demonstrate several pervasive information discovery tasks being performed using our system. As mentioned in the introduction the premise is that that all the user has during the search and explore stage is a PDA and a modem or some other form of wireless communication. For example, he could be in the airport or on an airplane, in a museum, waiting for his child at soccer practice, or shopping.
We have created KABs on various topics ranging from technical fields such as Java, XML, Error Correcting Codes (ECC), to more recreational fields such as Toulouse-Lautrec, Broadway, Main Course Recipes (MCR), and Pokemon. We present one running scenario using the Toulouse-Lautrec KAB, as well as additional examples and statistics using some of the other KABs.
Agents can be created from the desktop or the PDA. We offer both a "one-click" and an advanced interface. In addition, ready-made agents can be downloaded from an agent repository, or beamed from another PDA.
In the one-click interface, the user defines the agent by providing an optional list of seed URLs and 4-6 queries, which define the domain. In addition, the user must provide a name and description for the agent as well as their email address, which is used to identify them as the owners of the agent. In the advanced interface, users can create the agent in a more iterative fashion. They can inspect the results after each query, add new queries to agents that have already been trained, and add/remove pages from the KAB.
In Table 1 we present for four of our topics, the queries that we used to train an agent on the topic, the size of the corresponding KAB, the number of KAB pages, the precision of these pages (percent of pages relevant to the topic), the number of pages in the most closely related directory in Yahoo! [29] and the Open Directory as provided by Google [8].
Agent | Training Queries | KAB size | # of KAB pages | Open Directory | Yahoo |
Java | 1.86MB | 100 precision: 95% | 1478 | 343 | |
Error Correcting Code (ECC) |
1.25MB | 100 precision: 84% | closest category coding theory (28) |
no close category | |
Toulouse - Lautrec | 430KB | 60 precision: 87% | 1 | 7 | |
Main Course Recipes (MCR) |
500KB | 99 precision: 99% | closest category: recipe collections (486) |
closest category: (among 8) recipes (1057) |
All of the agent training was done based solely on sample queries. We did not provide the agent with any seed pages since we wanted to assume the least involvement in terms of the user in the knowledge acquisition stage. Our experience has shown that the hub and authority based techniques used in training enable the agent to acquire the most prominent pages pertaining to the topic. In terms of precision, note that both the Java and Recipes agents have extremely high precision. Only five of the 100 pages, in the Java KAB are not pure Java pages (they are more general programming language pages), and only one out of 100 pages in the MCR KAB is not related to main course recipes. The precision of the ECC and Toulouse - Lautrec agents albeit being slightly lower, is still relatively high (~85%). Note that the non-relevant pages could be removed using our KAB editing functionality. However, in the spirit of assuming as little user involvement as possible, we left them in the KAB to demonstrate that even if some of the pages in the KAB are not relevant to the topic, this does not hurt the functionality of the system. In terms of the precision of the satellite pages in the KAB, an empirical study based on sampling showed that on average 75% of the satellite pages are relevant to the domain of the KAB.
In comparison to commercial Web directories, only Java and Toulouse - Lautrec have corresponding directory entries in Yahoo! and in the Open Directory project. The Java category comprises many pages in both directories (343 and 1478 pages, respectively) and consequently would be too large to cache on a small device. The Toulouse - Lautrec directory, on the other hand, has very few pages and thus does not contain enough information to be useful as a Toulouse - Lautrec representative repository. There are no direct corresponding directories for either the ECC or MCR KABs and the most closely related categories (if any) are again not concise or highly representative of the topic.
In Figure 2 we see the Toulouse - Lautrec KAB being selected from a pull-down list that contains the names of all the KABs available on the device.
Once an agent is selected several operations are available which allow the user to explore the content of the KAB. Figure 3 depicts this functionality. The left-most screenshot shows the user simply viewing some basic information regarding the agent such as the name and description that the owner gave it as well as the queries used for training it. This information provides some feeling for the scope of the agent, and may be of interest when acquiring a new agent via downloading or beaming. The second screen shot shows the KAB pages sorted by their fitness score. That is, the first pages in the list are the top hubs or authorities in the domain. The third screenshot shows the top terms of the KAB. These terms are most likely good starting points for searching the KAB, and indeed clicking on any of them will initiate a search for pages that contain these terms. See Tables 3, 4, 5, 6 in the appendix for the top terms for some other KABs.
Figure 4 demonstrates the word completion functionality. After typing just two letters "am" and hitting the refine button only two words are suggested: ambassadeur, and american. For "amb" there is only one completion - ambassadeur. In contrast, the Broadway KAB would suggest the following words for the letters "am": american, amadeus, amateur, amazing, while there are no words suggested for "amb".
As another example, consider the two letters "pi": The Pokemon KAB suggests the words: pikachu, picture, pinball, pikablu; The ECC KAB suggests: pietrobon, pipeline, pinch, pixel; The MCR KAB suggests: pie, piece, pineapple, pizza, pink. Clearly, no general purpose word completion could achieve such results.
Figure 5 shows the query completion functionality. After selecting the word ambassadeur from the word completion list and hitting refine again, the system suggests additional terms for the query. In this case the terms are: bruant, aristide, poster, lautrec, toulouse. The user selects bruant and aristide, to complete the query formulation stage.
Some additional examples of query completion are provided in the following table:
Agent | Query term | Query completion options |
Java | bean | java, pure, awesome |
Java | java bean | program, faq, tutorial |
Recipes | bean | white, lamb, free, middot, rice, grain |
Java | block | method, call, function, code, class |
ECC | block | turbo, code |
Pokemon: | card | trade, game, pokemon |
Java | card | game, java, applet |
Java | free | java, software, tool, system, operating, download |
ECC | free | code, distance, hamming |
Toulouse - Lautrec | free | poster, download |
Pokemon | free | pokemon, card, email, greeting, game |
Recipes | red | wine, dark, slightly, vinegar |
Pokemon | red | blue, pokemon, yellow, green, stadium |
Figure 6 presents the results for the query "ambassadeur aristide bruant". The left-most image shows the first result screen. In this screen, only the titles of the top results are displayed (with no urls or snippets) in order to fit as many results as possible on the small screen. Once the user clicks on one of these titles, more information about this particular result is presented. More specifically, if the result is one of the core pages (for which we cache the text of the page locally), the title as well as a query focused snippet of the result with the query terms highlighted is displayed (middle image). The amount of information displayed at this level of granularity was also designed so it will more or less fill one full screen on the Palm device. On the other hand, if the result corresponds to a satellite/referenced page, we show the anchor text of the reference as the title and the url of the page itself (right image).
Moving along to Figure 7, the user has now chosen to access the result page by clicking on the title of the desired result. In case of core pages, the text of the page is retrieved from the cache. The text can be browsed with query terms highlighted (left image). Satellite pages need to be retrieved from the Web server where they reside by wireless communication. Note that this is the first time that network access is required. All of the steps described in the sample scenario so far where performed in disconnected mode.
In this paper, we have presented a focused-search approach specifically designed for the mode of work and the constraints dictated by pervasive devices. We first identified the constraints inherent to pervasive devices as well as their implication on the search experience from such devices. We then argued that topic-specific focused search is a key to overcome these limitations. More specifically, we claimed that by coupling focused search with a knowledge representation that encapsulates domain knowledge in a concise yet representative fashion, it is possible to make information discovery from pervasive devices practical as well as efficient.
We suggested a representation for topic-specific information based on "knowledge-agent bases" (KABs) where topics are represented and stored in a persistent repository with a very small footprint which can be downloaded to small devices. While knowledge acquisition is done on a desktop server by downloading KABs to the device, many information discovery tasks can be performed in disconnected mode. The KAB encapsulates all the information necessary to access information about a topic under the form of key concepts and key Web pages, to assist in query formulation and to perform topical searches on the device itself. Using this model, response time for many tasks is much better than in previous connected models. Several aspects of the method were demonstrated through a sample scenario and examples.
With more mobile knowledge seekers turning to pervasive devices on a daily basis, the importance of improving the usability of information discovery tasks on such devices will become even more crucial in the future. While we have focused mainly on PDAs and in particular on PalmOS based devices in this paper, the future is clearly in mobile phones that have a much larger potential market. We believe that several aspects of our model are applicable and may be even more critical in this arena. Form factor, for example, imposes even heavier constraints and the query formulation assistance stage is even more crucial. Topic-specific knowledge encapsulation could thus result in even greater benefits. While current mobile phones do not have enough memory or processing power to use our KAB model as is, hopefully these devices will become more powerful and more importantly will open their OS so users can customize and exchange information.
We are in debt to Miki Herscovici and Doron Cohen for providing us with the Palm Pirate
application, and their great expertise on PalmOS. We also thank Uri Weiss and Roni Raab
for their contribution to the codebase and Rob Farrell for his feedback on user's
experience.
KAB cached sites | KAB lexicon |
java program, java faq, java language, java tutorial, java resource, java developer, java applet, java platform, java technology, java code, java microsystem, java application, program language, java class, pure java, java api, java sdk, java security, java software, microsoft java, java bean, java interpreter, virtual machine, java compiler, java book, java servlet |
KAB cached sites | KAB lexicon |
turbo code, error code, linear code, error correcting code theory, binary code, decode code, data compression, information theory, code length,convolutional code, parity check, block code, orthogonal array, matrix generation, iterative decoding, haming code, reed code, concatenated code |
KAB cached sites | KAB lexicon |
chicken recipes, beef recipes, middot lamb, mimi chicken, hiller chicken, lamb recipe, leg of lamb, lamb chop, seafood recipe, pork recipe, meat recipe, crockpot recipe, chicken crockpot, fish recipe, jamie chicken, roast lamb, low fat, middot chop chicken salad, pork tenderloin |
Yariv Aridor is a research staff member at the IBM Research Laboratory in Haifa, Israel. He received his MSc. and PhD. in Computer science from the Tel-Aviv university, Israel, in 1989 and 1995, respectively. His research interests include distributed systems, mobile object-oriented systems and agent technology.
David Carmel is a Research Staff Member at the IBM Research Laboratory in Haifa, Israel, and belongs to the 'Information Retrieval and Organization' Group. His research interests include information retrieval, multi-agent systems and artificial intelligence. He received his MsC and PhD in Computer science from the Technion, Israel institute of technology, in 1993 and 1997 respectively. David joined IBM in 1997 and has been involved with projects dealing with text mining, search applications, and information retrieval on the Web. He is currently involved in the Knowledge Agent project.
Ronny Lempel is a Ph.D. Student in the Department of Computer Science, Technion, Haifa, Israel, focusing on WWW link-structure analysis. He received his B.Sc. and M.Sc. from the same department in 1997 and 1999, respectively.
Yoelle S. Maarek is a Research Staff Member at the IBM Research Lab in Haifa, Israel and manages the "Information Retrieval and Organization" group that comprises about 20 members. Her research interests include information retrieval, Internet applications, and software reuse. She graduated from the "Ecole Nationale des Ponts et Chaussees", Paris, France, as well as received her D.E.A (graduate degree) in Computer Science from Paris VI University in 1985. She received a Doctor of Science degree from the Technion, Haifa, Israel, in January 1989. Before joining IBM Research in Haifa, Dr Maarek was a research staff member at the IBM T.J. Watson Research Center for about 5 years. She serves on the program committees of several international conference and is a member of the Review Board of the WebNet Journal. She has published over 25 papers in referred journals and conferences.
Aya Soffer is a Research Staff Member in the 'Information Retrieval and Organization' group at the IBM Research Laboratory in Haifa, Israel. Her research interests include pictorial information systems, information retrieval, and non-traditional database systems. She received her MsC and PhD degrees in Computer science from the University of Maryland at College Park in 1992 and 1995, respectively.