Given many document sources and a query, a metasearcher:
Finds the good sources for the query
Evaluates the query at these sources
Merges the results from these sources Metasearcher Unindexed
Documents Legacy
Database /
WAIS / etc. Existing
Web
Application
Main Issues
How to query different types of sources?
How to combine results and rankings from multiple data sources? Metasearcher grep
‘biomedical’
*.txt SELECT
title
FROM
articles . . . http://…/getTitle?
title=‘biomedical’&…
Other Issues
How to choose among multiple data sources?
How to get metadata about multiple data sources? Metasearcher cat
*.txt SELECT
SCHEMA
……. Best:
http://….?getMetaData
Worst:
“Hi. What do you have?”
Cost/Functionality
Cost of acceptance Metadata Harvesting SDLIP/STARTS Z39.50 google Function
Z39.50 http://www.loc.gov/z3950/agency/
Goals
Permits one computer, the client, to search and retrieve information on another, the database server
Important both technically and for its wide use in library systems
Most development has concentrated on bibliographic data
Most implementations emphasize searches that use a bibliographic set of attributes to search databases of MARC records
Principles
Abstract view of database searching.
Server stores a set of databases with searchable indexes
Interactions are based on a session
The client opens a connection with the server, carries out a sequence of interactions and then closes the connection.
During the course of the session, both the server and the client remember the state of their interaction.
The results
Z39.50
The server carries out the search and builds a results set
Server saves the results set.
Subsequent message from the client can reference the result set.
Thus the client can modify a large set by increasingly precise requests, or can request a presentation of any record in the set, without searching entire database.
Services
init -- client connects to the server and exchanges initial information, e.g., preferred message size
explain -- client inquires of the server what databases are available for searching, the fields that are available, the syntax and formats supported, and other options
search -- client presents a query to a database choices of syntax for specifying searches
• only Boolean queries widely implemented
• one or more records may be returned to the client
Services
manipulation of results sets -- e.g., sort or delete
present -- requests the server to send specified records from the results set to the client in a specified format
• options:
for controlling content and formats
for managing large records or large results sets
Example
In the database named "Books" find all records for which the access point title that contains the value "evangeline" and the access point author contains the value "longfellow.“
Z39.50 defines a rich variety of search access points that can be extended by implementers
Problems
Very difficult to implement
There are freely available implementations, but they are complex
Outdated assumptions
Searching is expensive computationally
Bandwidth is limited (ASN.1 compression)
Originally designed for bibliographic record retrieval, and not full documents or other objects
“Overspecified”
(Almost) Nobody Implements Explain!
Assumes questionable user model (stateful)
Simple Digital Library Interoperability Protocol http://www-diglib.stanford.edu/~testbed/doc2/SDLIP/
SDLIP
Compromise between a full-scale, all encompassing search middleware design such as Z39.50 and the “anything goes” approach typical for ad-hoc search interface design on web
Support for stateful and stateless operation by the server
Support for thin clients, such as handheld devices
Developed jointly by Stanford, Berkeley, and UC Santa Barbara
Comments