touring dataland? automated recommendations for the ig...

1
Results Introducon Heuriscs Big data sciensts, like travelers to a new land, are faced with the daunng task of discovering which (storage) locaons contain in- teresng aracons (i.e., research data) Many services, such as travel websites, pro- vide user-specific recommendaons derived from analysis of huge amounts of usage data We explore how recommendaon approach- es can be adapted and applied to big data sci- ence. In parcular, we create heuriscs for recommending Globus data locaons Globus Acknowledgements Transfer accuracy: the av- erage number of end- points correctly predicted The neural network, which combines heuriscs, outperforms all indi- vidual heuriscs The most unique users, instuon, and owned endpoints heuriscs perform poorly except in cases where users have lile or no transfer history William Agnew Georgia Instute of Technology Kyle Chard (Advisor) University of Chicago Ian Foster (Advisor) University of Chicago Globus [1] network. Each endpoint is a vertex, larger if endpoint is more popular. Edges between endpoints that have transferred, more visible is transfer between pair is more frequent. Long-tailed Usage Distribuons Bytes Transferred per User Transfers per User Unique Endpoints per User History: The most likely source (S) / desnaon (D) endpoint is the most recent S/D endpoint used by a user Markov Chain: A transion matrix of the observed probabilies of using each endpoint as a S/D condioned on a parcular end- point being previously used as a S/D Most Unique Users: The most likely S/D endpoint is the S/D endpoint with the most unique users Instuon: The most likely S/D endpoint is the most popular endpoint at that users instuon Endpoint Ownership: The most likely S/D endpoint is the end- point most recently created by the user Heuriscs perform well for different classes of users We use a deep recurrent neural network [2] to combine heuriscs by ranking the predicons of each heurisc for the series of user endpoint choices Neural Network Block. Takes as input heurisc endpoint recommendaons and memory of past recommendaons to user and outputs reweighted heu- risc endpoint recommendaons and updated recommendaon memory User accuracy: the aver- age accuracy per user, where a user's accuracy is the fracon of that user's endpoints correctly pre- dicted References 1. Foster, Ian. "Globus Online: Accelerang and democrazing science through cloud-based services." IEEE Internet Compung 15.3 (2011): 70. 2. Graves, Alex. "Generang sequences with recurrent neural networks." arXiv pre- print arXiv:1308.0850 (2013). This work is supported in part by the Naonal Science Foundaon grant NSF- 1461260 (BigDataX REU) Deep Recurrent Neural Networks Recommendaon Mockup Touring Dataland? Automated Recommendaons for the Big Data Traveler Michael Fischer University of Wisconsin-Milwaukee Website wagnew3.github.io

Upload: others

Post on 09-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Touring Dataland? Automated Recommendations for the ig ...sc16.supercomputing.org/sc-archive/src_poster/poster_files/spost103s2-file1.pdfFoster, Ian. "Globus Online: Accelerating and

Results

Introduction Heuristics Big data scientists, like travelers to a new land, are faced with the

daunting task of discovering which (storage) locations contain in-

teresting attractions (i.e., research data)

Many services, such as travel websites, pro-vide user-specific recommendations derived from analysis of huge amounts of usage data

We explore how recommendation approach-es can be adapted and applied to big data sci-ence. In particular, we create heuristics for recommending Globus data locations

Globus

Acknowledgements

Transfer accuracy: the av-

erage number of end-

points correctly predicted

The neural network, which combines heuristics, outperforms all indi-vidual heuristics

The most unique users, institution, and owned endpoints heuristics perform poorly except in cases where users have little or no transfer history

William Agnew

Georgia Institute of Technology

Kyle Chard (Advisor)

University of Chicago

Ian Foster (Advisor)

University of Chicago

Globus [1] network. Each endpoint is a vertex, larger if endpoint is

more popular. Edges between endpoints that have transferred,

more visible is transfer between pair is more frequent.

Long-tailed Usage Distributions

Bytes Transferred

per User

Transfers per User Unique Endpoints

per User

History: The most likely source (S) / destination (D) endpoint is the most recent S/D endpoint used by a user Markov Chain: A transition matrix of the observed probabilities of using each endpoint as a S/D conditioned on a particular end-point being previously used as a S/D Most Unique Users: The most likely S/D endpoint is the S/D endpoint with the most unique users Institution: The most likely S/D endpoint is the most popular endpoint at that user’s institution Endpoint Ownership: The most likely S/D endpoint is the end-point most recently created by the user

Heuristics perform well for different classes of users

We use a deep recurrent neural network [2] to combine heuristics by ranking the predictions of each heuristic for the series of user endpoint choices

Neural Network Block. Takes as input heuristic endpoint recommendations

and memory of past recommendations to user and outputs reweighted heu-

ristic endpoint recommendations and updated recommendation memory

User accuracy: the aver-

age accuracy per user,

where a user's accuracy is

the fraction of that user's

endpoints correctly pre-

dicted

References 1. Foster, Ian. "Globus Online: Accelerating and democratizing science through

cloud-based services." IEEE Internet Computing 15.3 (2011): 70.

2. Graves, Alex. "Generating sequences with recurrent neural networks." arXiv pre-

print arXiv:1308.0850 (2013).

This work is supported in part by the National Science Foundation grant NSF-

1461260 (BigDataX REU)

Deep Recurrent Neural Networks

Recommendation

Mockup

Touring Dataland?

Automated Recommendations for the Big Data Traveler Michael Fischer

University of Wisconsin-Milwaukee

Website

wagnew3.github.io