MIA
An Information System for Mobile Users Bernd Thomas
University Koblenz
AI-Research Group
MIA
An Information System for Mobile Users Bernd Thomas
University Koblenz
AI-Research Group
Outline
Motivation
Architecture & Functionality
Agents, Communication & Distribution
Information Extraction
Using Ontological Knowledge
The MobileInformationAgent Project
Clients: WebBrowser, WAP, PDA+GPS
Ambition:
Online Web Search and
Information Extraction
Location Awareness
Anytime Algorithm
Uses Logic (LP): Agents + Ontology
Distributed Multi Agent System
Project Time: 1.1.2000 - 31.10.2002 profile creation search constraint search re-login start and logout What is MIA? A Multi Agent Information System that provides a mobile user with location based information
according to his individual interests.
MIA and its Agents
search extract PDA,GPS,Mobile WAP WEB Browser HTTP Black-
Board host N ... GPS
DB User Interests foreign
Agent query
request start register start Paltfom Agent
host n Paltfom Agent
host 1 classify Spider
Agent Blackboard
Agent Matchmaker Server
Agent KQML Ontology
Agent
Agent Communication
MIA‘s agent system and communication architecture oriented at FIPA
MIA‘s agents use KQML performatives.
Agent platforms: abstraction from machine provide environment for agents
Other Agents can query platform agents for running agents or request starting of agents phase I
host A -> platform A : start
platform A -> matchmaker : start
platform A -> blackboard : start
platform A -> server : start Example Communication Session (System Startup and Search with 3 hosts): phase II
host B -> platform B : start
platform B -> matchmaker : register
host C -> platform C : start
platform C -> matchmaker : register phase III
server -> matchmaker : ask for blackboard
matchmaker -> server : blackboard address
matchmaker -> server : recommend blackboard
server -> blackboard : ask for old results
blackboard -> server : send old results
blackboard -> matchmaker : subscribe to agent
status change start search
server -> matchmaker : ask for spider
recommendation
matchmaker -> platform B : create spider agent
spider -> platform B : created
platform B -> matchmaker : send spider address
platform B -> matchmaker : send spider
recommendation
server -> spider : start spidering topic/city
server -> spider : send all results for topic/city
spider -> server : starting to search
matchmaker -> blackboard : there is a new spider
blackboard -> spider : send all results for topic/city
Agent Distribution Policy
Distributed MAS has two goals:
distribute computation among machines
minimize communication between machines MIA uses simple distribution policy:
platform-agent 1: matchmaker, server and blackboard
platform-agent 2-n: ontology-agent and spider-agents are equally distributed Load-Balancing:
MIA does not use automatic load-balancing,
but while the system is online new platform-agents can be added much communication
less computation less communication
much computation
Information Extraction
apply offline learned wrappers (synthesized extraction procedures)
set of predefined pages are examined by offline learned wrappers
online learning of wrappers
for each page found by the spider and positive address containment classification a wrapper is learned.
major problem: absence of examples!
Online and Offline method both learn only from positive examples
Both methods use LGG techniques on feature-terms to learn. MIA uses two modes to extract information from web pages:
IE: Offline Wrapper Learning
Wrapper Learning System: for offline learning and integration into the MIA system Learning Technique:
Document Representation:
logical representation of a DOM-Tree (set of facts)
each node is represented by a feature term
Idea:
learn relevant features of ancestor and descendant nodes surrounding the relevant nodes for extraction
Method:
learning from positive examples (subtrees) only
LGG on feature terms,
user-based inductive learning
Result: generalized node paths
IE: Online Wrapper Learning
Major Problem:
how to obtain learning examples (example extractions) for unknown pages?
Idea:
use (very strict) address patterns to idenitfy only a few addresses on a page
these few matches serve as learning examples
Document Representation:
list of tokens (feature terms)
Method:
one shot learning (generalize in one step on all examples)
for each page one wrapper is learned
Result:
generalized feature-term lists used as left and right delimiters for extraction
IE: Extraction Evaluation
Evaluation for online learned wrappers:
„self-supervision“: check if extractions match with generalized
patterns derivable from knowledge base
semantic cross check: use associated semantic of slots for evaluation
Evaluation for offline learned wrappers:
semantic cross check How does the agent can verify the quality of its extractions? match with similar concept names or instances derivable from ontology
search topic and condition check with zip DB and city slot zip code match with estimated city name from GPS database or user input city check slot (extracted)
MIA‘s Ontology
Ontological Knowledge useful for:
Web Spidering: keywords from the user profile may not be sufficient
Information Extraction: check correctness of extractions Description Logic used to model ontology for gastronomy & recreation domains
RACER: Renamed ABox and Concept Expression Reasoner
(Volker Haarslev, Ralf Möller)
KrHyper (Peter Baumgartner) [WLP2001]:
bottom up model generation
DL similar language (plus non-monotonic negation, rule based language)
Ontology
partial TBOX of MIA‘s
gastronomy ontology currently covered:
gastronomy
recreationABox (3800 facts)TBox (~ 90 concepts)
Ontology Agent
TBOX:
(implies c_mahlzeit c_essen).
(equivalent c_speisestaette (and c_ort (some offers c_mahlzeit)
(some of_nationality c_nationalitaet))).
(implies c_fastfood (and c_speisestaette (not (some has_service c_service)))).
(equivalent c_restaurant (and c_speisestaette (some has_service c_service))). ABOX:
(instance antipasti c_mahlzeit).
(instance ristorante c_restaurant). RACER system [eclipse 6]: about(antipasti,X).
X = instantiators = ['C_MAHLZEIT'] More? (;)
X = instantiators = ['C_ESSEN'] More? (;)
X = instantiators = ['C_VERDERBLICH'] More? (;)
X = instantiators = ['C_PRODUKT'] More? (;)
X = instantiators = ['C_DING'] More? (;)
X = instantiators = ['C_FESTSTOFF'] More? (;)
[eclipse 7]: related_term(antipasti,X).
X = 'OF_NATIONALITY' = 'ITALIENISCH' More? (;)
X = 'OFFERED_BY' = 'PIZZERIA' More? (;)
[eclipse 11]: related(antipasti,X).
X = pizzeria More? (;)
X = osteria More? (;)
X = pasticceria More? (;)
X = ristorante More? (;)
X = rosticceria More? (;)
X = trattoria More? (;)
X = pizza_zum_mitnehmen More? (;)
X = antipasti More? (;)
X = carpaccio More? (;)
X = cozze More? (;)
X = maccaroni More? (;)
X = nudeln More? (;)
about(X,Explanation) :-
racer('instantiators'(X),Concept),
Explanation = ('instantiators'=Concept).
about(X,Explanation) :-
racer('concept-ancestors'(X),Subsumers),
Explanation = ('concept-ancestors'=Subsumers).
about(X,E...
Outlook
Need for cooperation with telecom provider for automatic user position estimation via cell information of mobile phones
Ongoing research in Information Extraction with good results for HTML/XML documents
Major problem online learning of wrappers, MIA uses very heuristic method ... good ideas needed.
Ontology based web spidering ... let us see what the semantic web project offers?
Left out in this project: sharing search and extraction work among agents
References
Peter Baumgartner, Ulrich Furbach and Bernd Thomas
Model Based Deduction for Knowledge Representation .
17. WLP - Workshop Logische Programmierung ,
Technische Universität Dresden 4-6. September 2002 Nicholas Kushmerick and Bernd Thomas
Adaptive Information Extraction: A Core Technology for Information Agents .
In Intelligent Information Agents R&D in Europe: An AgentLink perspective. (2002) Springer. Gerd Beuster, Bernd Thomas and Christian Wolff
Ubiquitous Web Information Agents
Workshop on Artificial Intelligence In Mobile Systems ,ECAI'2000 ,
European Conference on Aritifical Intelligence
August 22nd 2000, Berlin,Germany Bernd Thomas:
Token-Templates and Logic Programs for Intelligent Web Search
Journal of Intelligent Information Systems ,
Kluwer Academic Publishers Special Issue: Methodologies for Intelligent Information Systems
Volume 14, Number 2/3, March-June 2000, pp. 241-261 Bernd Thomas:
Anti-Unification Based Learning of T-Wrappers for Information Extraction
Workshop on Machine Learning for Information Extraction ,
preceeding Sixteenth National American Conference on Artifical Intelligence (AAAI-99) ,
July 18-19 Orlando, Florida
Comments