personalised,(interac&ve(access(to(...

19
Chris&n Seifert, University of Passau Hamburg, 20150326 Personalised, Interac&ve Access to Digital Library Content Lessons Learned in the EEXCESS Project

Upload: others

Post on 15-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

Chris&n  Seifert,  University  of  Passau  Hamburg,  2015-­‐03-­‐26

Personalised,  Interac&ve  Access  to  Digital  Library  Content  Lessons  Learned  in  the  EEXCESS  Project

Page 2: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

Mo&va&on

A.-­‐L.  Barabasi,  R.  Albert,  and  H.  Jeong.  Scale-­‐free  characteris&cs  of  random  networks:  the  topology  of  the  world-­‐wide  web.  Physica  A:  Sta&s&cal  Mechanics  and  its  Applica&ons,  281(1–4):69  –  77,  2000.

Why  and  how  to  reduce  this  distance?

Uniqu

e  Visitors  (%)

0,00%

15,00%

30,00%

45,00%

60,00%

Rank  of  the  Site

0 25 50 75 100

Digital  libraries,    museums,  archives

specialised  resources

User-­‐content  distance

Page 3: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

3

Page 4: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

4

Long  -­‐tail  resources  in  context:  • Discover  new  informa&on  • Verify  facts  • Enrich  exis&ng  informa&on

Why  to  reduce  the  user-­‐content  distance?

Page 5: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

Mo&va&on• Content  provider  strategies:  

A. Dedicated  portals  B. Search  engine  op&misa&on  C. Social  Network  Marke&ng

User  finds  content.  Limited  success.  

• User  strategies:  A. Use  major  search  engines  B. Use  dedicated  portals  C. Don’t  know  of  existence  of  

resources  and/or  portals

Page 6: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

6

Idea

Page 7: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

EEXCESS  Approach

Reduce  user-­‐content  distance  Bring  content  to  users  

(in  a  helpful,  polite,  non-­‐obtrusive  manner)  

Locate  users  

Channel    Iden&fica&on  and  

Injec&on

Find  out  what  users  need  

Context  Detec&on  and  

Personalised  Search

Present  resources  

Interac&ve  Visualisa&ons

Page 8: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

8

More  details,  please..

Page 9: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

Channel  Iden&fica&on  and  Injec&on

• Frequently  used  channels  – Social  media  channels  – CMS  -­‐  mul&plier  effect  – Online  Word  Processors    – No  access  →  Browser  technology    

• Challenge:    – Variety  of  clients

Lesson  1  [Clients]:  Favour  clients  with  mul&plier  effect

Locate  users  

Page 10: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

10

D7.1 Test Bed design, deployment plan and mockups

© EEXCESS consortium: all rights reserved page 27

Figure 7: SITOS EasyWiki with EEXCESS recommendations

By clicking on the “use” icon of a recommendation the recommendation snippet is copied to the text.

Figure 8: SITOS EasyWiki with used recommendation

D5.2First Prototype: User Pro�le and Context Detection, Usage Analysis

8 Prototype: Twitter BotTwitter is used as an distribution channel for cultural content. The bot was implemented to enableTwitter users to access the EEXCESS recommendations. Users can query the bot for speci�c contentsand the bot offers resources to random users to broaden its publicity and form a network withinthe Twitter environment. The contents are distributed via status-updates using the twitter account@RecoRobot.

8.1 Guided TourThere are three ways to get involved with the EEXCESS twitter bot:

• Actively question the bot (mention @RecoRobot in your own tweet) to get a one-time answer

• Follow the bot to get continuous recommendations

• The bot can offers recommendations to random users triggered by keywords

In general, the bot can extract information from tweets and query the EEXCESS service for a recom-mendation. If a good recommendation is found, the TwitterBot responds by updating its status updatementioning the user and supplying the recommendation link together with a short description.

Figure 4: Mention the bot for a recommendation.

Basically there are two approaches (push or pull the recommendation), how this content deliveryprocess can be triggered. First, the user can mention the Twitter bot in a tweet and it will try torecommend a suitable resource. Figure 4 shows query and result of a successful recommendation.This abstract process is presented in �gure 5

TwitterBot queried by user:

Crawl @Mentions from Twitter

Get Recommendations Persist Update Status

and mention user

Figure 5: Abstract process: Mention.

c� EEXCESS consortium: all rights reserved 36

Lesson  2  [Architecture]:  Modularise;  use  APIs  to  separate  clients  from  back-­‐end

Page 11: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

User  Context  Detec&on

• Translate  user  context  to  informa&on  need  • Example  —  browser  extension

Find  out  what  users  need  

1

3

Results  for  a  page

Results  for  a  selec&on

Search  Backend

2

Results  for  a  paragraph

User  context

Personalised    Results

Page 12: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

User  Profile  Mining

Can  we  predict  manual  queries  from  a  text  selec&on?

[1]  hmp://www.britannica.com/EBchecked/topic/219315/French-­‐Revolu&on

Find  out  what  users  need  

“  ..  The  gathering  of  troops  around  Paris  and  the  dismissal  of  Necker  provoked  insurrec&on  in  the  capital.  On  July  14,  1789,  the  Parisian  crowd  seized  the  Bas&lle,  a  symbol  of  royal  tyranny.  Again  the  king  had  to  yield;  visi&ng  Paris,  he  showed  his  recogni&on  of  the  sovereignty  of  the  people  by  wearing  the  tricolour  cockade...”  [1]

storming  Bas&lle  1789

Page 13: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

User  Profile  Mining

Chris&n  Seifert,  Jörg  Schlömerer,  Michael  Granitzer:  “  Towards  a  Feature-­‐Rich  Data  Set  for  Personalized  Access  to  Long-­‐Tail  Content”,  Proc.  IAR  at  ACM  SAC,  to  appear  

(a) Ratio of selection terms in query (b) Ratio of query terms in selection

Figure 4: Term analysis for queries and selected text

of a text selection enriched with the aforementioned featuresand a label for each term of the selection, which indicates ifthe term is also contained in the corresponding query (andhence considered relevant).

The list of stop words is the one provided by the “tm”package for R7, the POS tags were obtained with NLTK [3]and the CRF models were computed with Mallet [16]. Weevaluated the performance of 29 feature combinations using10-fold cross-validation. In order to evaluate the stabilityacross users and tasks we also performed cross-validation onsplits defined by users (all but one user as training and oneuser for test), and tasks respectively.

Table 4: Accuracies [%] for query prediction from selectedtext. Cross-validated using splits over users, tasks, and 10-fold random.

feature set triviali, c, t i, t c, t rejector acceptor

users mean 76 77 75 51 49SD 15 15 18 35 35

tasks mean 82 83 82 71 29SD 6 6 7 8 8

10-fold mean 89 88 84 71 29SD 1 2 1 2 2

i - the identity of a term, i.e. the term itselfc - whether the term begins with upper- or lowercaset - POS tag

The best performing feature combinations are shown intable 4. As the CRF model assigns a label to each termin the selection (identifying it as relevant or not relevant),accuracy refers to the ratio of correctly labeled terms tothe total number of terms. Incorporating a term itself asa feature (i, c, t & i, t) leads to the best results, but thismay not generalize well due to the limited vocabulary inthe dataset. Nevertheless, feature combinations without thewords provide similar results as well (e.g., the combinationof case-identifier and POS-tag, c, t) and thus are the betteroption.

The standard deviations reveal, that the query behavioris stable over tasks, but not over users. In fact half of the

7http://cran.r-project.org/web/packages/tm/

users incorporated the major part of the selection into theirqueries and the queries of the other half contained only a mi-nority of the selection terms. Thus, prediction performancedrops for the evaluation over users.

7. RELATED WORKProviding long-tail recommendations is a highly challeng-

ing task, first of all because of the data sparsity issue: onlya few or even no ratings are available for items in the long-tail. To overcome this problem, the authors in [18] partitionthe whole item set into head and tail parts and cluster theitems in the tail. In [11] recommendations are obtained bycombining the items in a user’s personal long-tail with users,which have those items in their head portion. While theseapproaches still require the existence of at least a few ratingsin the tail or even the existence of dense data in the head,Stickroth et al. [25] aim to provide high quality recommen-dations in a network with a small amount of users and items(and hence without the presence of a dense head). Thereforethe authors propose a multilevel approach, with a decreas-ing degree of personalization and di↵erent recommendationstrategies at each level. Their dataset encompasses 60 rat-ings on 151 items by 175 users and is not published. Closestto our work, Wang et al. [27] conducted a user study in thecultural heritage domain in which they elicited user modelswith ratings of museum objects of the Rijksmuseum Ams-terdam from 39 participants.Most of the approaches for user data collection for long-

tail domains use server-side data logging. A representativeexample is the smartmuseum approach were user interestsare either manually given or by tagging and rating of re-sources [23]. A game-based approach to server side collec-tion was pursued by Wang et al. [27] who used an interac-tive quiz to collect ratings for museum objects. Goecks andShavlik [6] use client-side data collection in a Web browserfor user interest detection based on the text of the webpage,clicked hyperlinks, scrolling and mouse activity.All of those data sets capture the features we identified as

necessary to collect only partly. To the best of our knowl-edge, there is no publicly available dataset, which accountsfor the specific challenges of long-tail recommendations andcontains the required data.

Lesson  3  [Data]:  Collect  ground  truth  data  as  early  as  possible

Lesson  4  [Data]:  Collect  ground  truth  data  as  early  as  possible

Find  out  what  users  need  

Page 14: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

User  Profile  MiningFind  out  what  users  need  

Lesson  5:  Informa&on  need  is  en&ty-­‐based

Page 15: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

Visualisa&onsPresent  resources  

Visualisa*ons  allow  more  engaging  access  to  data  and  help  to  deal  with  the  informa&on  overload  by  using  power  of  the  human  visual  system  [1]

[1]  Ben  Shneiderman.  1996.  The  Eyes  Have  It:  A  Task  by  Data  Type  Taxonomy  for  Informa&on  Visualiza&ons.  In  IEEE  Visual  Languages.  College  Park,  Maryland  20742,  U.S.A.,  336–343

Page 16: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

Visualisa&ons

Lesson  6  [UI]:  Use  mock-­‐ups  (fake  data)

Present  resources  

Page 17: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

17

Summary

Page 18: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

18

EEXCESS  framework  ‣ Inject  content  in  channels  ‣ Detect  informa&on  need  ‣ Visualise  results  

enabling  ‣ Discovery,  Verifica&on  and  Enrichment  of  Informa&on

Reduce  user-­‐content  distance  Bring  content  to  users

Page 19: Personalised,(Interac&ve(Access(to( Digital(Library(Contenteexcess.eu/wp-content/uploads/2015/04/Science2-0... · Figure 7: SITOS EasyWiki with EEXCESS recommendations By clicking

19

Ques&ons?

hmp://eexcess.eu

hmps://github.com/EEXCESS/eexcess

chris&n.seifert@uni-­‐passau.de