MRJ Text Search Software

INTRODUCTION

MRJ has developed a high performance distributed text search system. The software provides several unique capabilities for visualizing data and extracting difficult to find information. Relevance feedback provides powerful query expansion for finding related documents.

The software can be used on a single system or in a fully distributed manner, with databases scattered across the internet. Distributed databases can be searched simultaneously. Client software and text search servers can be distributed across the internet.

SERVER SOFTWARE

The text search server software has been ported to a variety of host computer architectures, ranging from PC's to supercomputers. The server software runs under SunOS, Solaris, HP-UX, Apollo, Linux, and Connection Machines.

Indexing is highly configurable and is designed to handle free text input. Intelligent tag recognition permits automated extraction of titles, dates, and other important information. Document separator tags -- such as a line of dashes or a "Subject:" tag -- can be defined so that large documents can be automatically broken down into logical components. Input files can be in any popular compressed file format or in uncompressed form.

CLIENT SOFTWARE

Two user interfaces are available for the text search client software.

The text based client software provides support for diverse workstations and terminals. Based on curses, this software is particularly useful for supporting internet (telnet) and dial up access, where users may not have sophisticated terminals or specialized client software. The client software supports screen resizing for full support of X terminals. Pull down menus and scrollable document lists are provided for easy use.

The graphical client software provides an innovative way to visualize and search text data. Based on X Windows, this software provides a user interface with the same basic features as the text based client software, but with a fancier user interface.

The graphical client software, however, is not just a fancier version of the text based client. An innovative Network Display provides a totally different way to search for documents. The Network Display shows each QUERY as a node. The LINKS between the nodes represent the documents. To see a document, the user clicks on a connecting line. The Network Display provides a rich set of graphical editing tools for interactively manipulating the display. The user can group items, change colors, and change shapes to help differentiated successive searches and documents of interest. Tags can be defined in the indexer to automatically display documents with certain tags with specific shapes and colors.

The Network Display is particularly useful for finding related documents in a large volume of documents. The integration of data visualization and relevance feedback in an interactive environment is particularly powerful. In an era of exponentially increasing volumes of on-line data, this innovative system provides a quick and effective mechanism for finding related information.

JAPANESE CAPABILITIES

MRJ has developed a unique system which facilitates crossing language barriers. Our prototype Cross Language Language Text Search (CLTS) system permits English speaking users to directly search Japanese text databases.

CLTS combines several technologies to provide a system which enable users to cost effectively find documents of interest in languages other than their native language. CLTS integrates MRJ's Japanese OCR, Japanese parsing, and distributed Text Search software with multilingual dictionaries and machine translation software.

CLTS operation starts with English language queries. CLTS generates a list of similar search terms in Japanese, and presents the list to the user. The user can fine tune their Japanese query by reviewing English descriptions of the Japanese search terms. This step allows users without knowledge of Japanese to focus their query on the specific meanings they wish to find.

After selecting Japanese search terms, the database is searched in Japanese. Documents found are listed for the user, who can view them in Japanese or machine translated English (if machine translation is used). Relevance feedback can be used to find related Japanese documents. Selected words, sections, or whole documents can be fed back for additional searches. With the addition of machine translated versions of the documents, the user can further filter the list of documents find to only those of particular interest. CLTS thus enables expert (human) translation services to be focussed on the documents (and parts thereof) that are really needed... and not wasted on documents that are not desired.

The prototype CLTS system provides English speaking users with access to Japanese language documents. The system is extendable to support other language conversions, including allowing Japanese speaking users access to English language documents.

For more information about MRJ software products and technical services, please contact ksheers@mrj.com. You can also contact us by telephone, FAX, or postal mail:

MRJ, Inc., 10560 Arrowhead Drive, Fairfax, VA 22030 USA
TEL: 703-385-0700
FAX: 703-385-4637

Copyright 1995 by MRJ, Inc., Fairfax, VA