Folksonomies structureRepresented as a tripartite hypergraph 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 4 Metallica Iron Maiden metal classic John
Tagging with DHARMAA DHT-based Approach for Resource Mapping through Approximation
Luca Maria Aiello, Marco Milanesio Giancarlo Ruffo, Rossano Schifanella Università degli Studi di Torino Computer Science Department Keywords : navigational search, folksonomy, DHT, approximated graph mapping, last.fm Seventh International Workshop on Hot Topics in Peer-to-Peer Systems Speaker: Luca Maria Aiello, PhD student aiello@di.unito.it
‹#›
Overview
Goal: enrich the p2p layer with a tag-based navigational search engine Task: mapping a folksonomy on a DHT Problem: mapping dense graphs on distributed layer is very inefficient Solution: approximated mapping using a complexity-bounded algorithm Evaluation: the approximated solution does not upset the structure/semantic of the folksonomy 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 2
Motivations
Direct search Navigational search Taxonomies (e.g. Yahoo! directory): not successful Folksonomies: successfully emerging thanks to the social web Tag-based search engine on DHTs is profitable Applications layered on DHTs use exact match search (e.g. eMule) folksonomic search adds flexibility Recent research activity on P2P for privacy-aware online social networks Folksonomic search engine is an important OSN feature 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 3
We know that in web services as well as in distributed systems we can have two different research paradigms. The first is of course direct search: given a query or a keyword I get a set of resources back. The second paradigm is the navigational search, where there is a set of categories I can browse to reach a resource that somehow matches the whole set of keywords I browsed (for example yahoo directory). Categories in the navigational search were traditionally organized in predefined taxonomies. This approach wasn’t really very successful, because often predefined classifications don’t really match the mental categories of the users. However, in last few years, thanks to the social web, a navigational paradigm based on folksonomies is becoming more and more popular. Here, the users build a object classification in a bottom-up fashion by assigning arbitrary tags to objects. Given this setting, we think that building such folksonomy based search engine on a DHT can be a good opportunity for two reasons. First, this can be a successful way to limit the well known exact-match lookup problem which affects the DHTs, adding flexibility to the p2p layer. The second is that recent works are trying to implement online social network services on DHTs, in order to better preserve user information privacy. Since online social networks often use the folksonomic paradigm for object location, the present work well complement this new research trend. ‹#›
Folksonomies structure
Represented as a tripartite hypergraph 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 4 Metallica Iron Maiden metal classic John
Tag-Resource Graph (TRG)
Projection on user dimension 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 5 Metallica Iron Maiden metal classic John Bipartite graph Edges are weighted depending on number of tag assignments 3 5 1
Folksonomy graph (FG)
Models tag-to-tag similarity with co-occurrence 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 6 Metallica Iron Maiden metal classic 3 5 1 2 5+3=8 1+2=3 8 3 Co-occurrence similarity is widely known and used Local calculation Directional
Folksonomic search
User selects a tag Related resources and tags are displayed Ranking based on arc weights User can shift to another tag or select a resource 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 7 metal thrash power grind classic Iron Maiden Metallica Manowar 8 3 12 1 5 7 9
Folksonomy evolution
The folskonomy grows quickly due to massive user activity Resource insertion Tag insertion 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 8 Both FG and TRG should be updated properly!
Folksonomy maintenance
23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 9
Mapping on a DHT
Map FG and TRG on the DHT and implement the update operations on the distributed layer Idea: Splitting the FG and TRG into small structural parts or modules Each module contains a node (tag or resource) and its outgoing edges/arcs Each module is mapped on a different p2p index node 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 10
Mapping FG on a DHT
23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 11
Mapping TRG on a DHT
23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 12
Folksonomy maintenance on the DHT
When inserting a new tag or resource, proper modules have to be modified Complexity of resource insertion and search are linear with the input size OK! Tagging operation is linear with |Tags(r)| Tags(r) is the set of tags labeling the resource r How many tags for a resource? Let’s see a real-world example… 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 13 Insert(r, t1,…, tm) Tag(t,r) Search step #lookups 2 + 2m 4 +|Tags(r)| 2
Last.fm folksonomy sample
99,405 users, 1,413,657 items and 285,182 different tags 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 14 |Tags(r)| Can be huge!
An approximate solution
DHARMA DHT-based Approach for Resource Mapping through Approximation Idea: Put a constant upper bound to the number of lookups for tagging operation The resulting FG will be approximated… 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 15
‹#›
Approximate tagging: k=1 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 16 metal thrash classic cool Lookup count : 0 Metallica 1 3 4 Now, select k tags at random in Tags(Metallica) and link them with the new tag 5
Approximate graph vs real graph
With DHARMA approximation, the complexity is affordable for very small k The smaller k is, the more the structure of the Folksonomy Graph is upset! We need to compare the approximate and the real Folksonomy Graph 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 17 Insert(r, t1,…, tm) Tag(t,r) Search step #lookups 2 + 2m 4 + k 2
Approximation in Last.fm
We simulate the evolution of FG with the approximated protocol Simulated resource insertion and tagging activity Goal: draw a comparison between the real Last.fm FG and the approximated FG, for different values of k Nodal degree Arcs weight 23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 18
Nodal degree comparison
23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 19
Arc weights comparison
23/04/2010 HOTP2P 2010 - Luca Maria Aiello, Università degli Studi di Torino 20
Comments