building babel - s3-us-west-2.amazonaws.com · babel architecture recommenders eigenfactor...

27
Building Babel Large Scale Data Collection in the Cloud Ian Wesley-Smith [email protected]

Upload: others

Post on 05-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Building Babel

Large Scale Data Collection in the CloudIan [email protected]

Page 2: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Scholarly Article Recommendation

• Information Overload– 50m – 150m articles in existence

Page 3: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Google Scholar

• Recommendation vs Search– Serendipity

• Homonymity• Synonymity

Page 4: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Netflix/Spotify/Amazon

• User ratings (explicit, implicit)• Density– # user-item interactions >> # items

• Netflix Competition (2006)1

– 100m ratings– 480k users– 17k movies

1: http://www.netflixprize.com/community/viewtopic.php?id=68

Page 5: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Barriers to Research

• Hard to get datasets• Difficult to measure effectiveness– Judges– Citation prediction

Page 6: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Enter Babel

• Provide access to private data sets• Provide scholarly article recommendations,

freely to anyone– Feedback data in return

• Evaluate recommenders using usage data– With enough traffic could be very fast

Page 7: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Audience

• Publishers– Offload expensive research into recommender systems

to academia– Better recommendations drive more traffic/purchases

• Tool Developers• Researchers

Page 8: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Requirements

• Fast• Reliable• Scalable (lots of data!)• Easy to use• Cheap

Page 9: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

REST APIcurl http://babel-us-east-1.eigenfactor.org/recommendation/aminer/12345{"transaction_id": "46bb84190e9ddfd17700bfafb500ab3c","results": [

{"paper_id": "672","publisher": "aminer"

},{"paper_id": "11274","publisher": "aminer"

} ]

}

Page 10: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

http://babel.eigenfactor.org

Page 11: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Browser Plugins

Page 12: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

http://labs.jstor.org/sustainability/

Page 13: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Babel Architecture

Recommenders

EigenFactor Recommends

Co-Citation

Bibliographic Coupling

Metadata Database

update.eigenfactor.org

Object Store

Archive

Metadata Extraction

Recommender Frontend

Publisher

DemoWebsite

Chrome Plugin Analytics

Normalization

Researcher

Recommendation Cache

DesktopApp

Page 14: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Frontend

Recommenders

EigenFactor Recommends

Co-Citation

Bibliographic Coupling

Metadata Database

update.eigenfactor.org

Object Store

Archive

Metadata Extraction

Recommender Frontend

Publisher

DemoWebsite

Chrome Plugin Analytics

Normalization

Researcher

Recommendation Cache

DesktopApp

Page 15: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Frontend

AWS Elastic Bean Stalk

Application

Package

Deploy

Page 16: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Frontend

AWS Elastic Bean Stalk

Application

Package

Deploy

Page 17: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object
Page 18: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Swagger UI

Page 19: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Swagger UI

Page 20: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Frontend

AWS Elastic Bean Stalk

Application

Package

Deploy

Page 21: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object
Page 22: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Frontend

AWS Elastic Bean Stalk

Application

Package

Deploy

Page 23: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

AWS Elastic Bean Stalk

Image:Part1:Develop,Deploy,andManageforScalewithElasticBeanstalkandCloudFormation Series byEvanBrown, AWS

Page 24: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

DynamoDB

• AWS NoSQL– Key-value store

• Very fast (<10ms)• Very scalable – Specify throughput

• Not too expensive

Recommendation Cache

Page 25: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Issues

• Not all AWS services are created equal– Data Pipeline– Cloud Search

• Documentation• SDK/Tooling• Python & GIL• Access Keys

Page 26: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

Future Directions

• Finish backend• Expand clients (publishers, tool developers)• Actually get more recommenders• Babel 3.0 – simple middleware– Automatically logs & add transaction info to outgoing

requests

Page 27: building babel - s3-us-west-2.amazonaws.com · Babel Architecture Recommenders EigenFactor Recommends Co-Citation Bibliographic Coupling Metadata Database update.eigenfactor.org Object

http://[email protected]