mendeley, putting data into the hands of researchers

Mendeley, putting data into the hands of

researchers

Kris Jack, PhDData Mining Team Coordinator

“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].

But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

➔ idea behind mendeley

➔ our features

➔ our technical challenges and solutions

➔ what does this mean for you?

Summary

works like this:

1) Install “Audioscrobbler”

2) Listen to music

3) Last.fm builds your music profile and recommends you music you also could like... and it’s the world‘s biggest open music database

Last.fmMendeley

research libraries

researchers

papers

disciplines

music libraries

artists

songs

genres

Last.fmMendeley


➔ our features



Summary

Mendeley helps researchers work smarter

Mendeley extracts research data..

Install Mendeley Desktop


..and aggregates research data in the cloud

Mendeley extracts research data..


By doing this, Mendeley makes science more collaborative and transparent


➔ our features



Summary

500,000+ users; the 20 largest userbases:

University of CambridgeStanford University

MITUniversity of Michigan

Harvard UniversityUniversity of OxfordSao Paulo University

Imperial College LondonUniversity of Edinburgh

Cornell UniversityUniversity of California at Berkeley

RWTH AachenColumbia University

Georgia TechUniversity of Wisconsin

UC San DiegoUniversity of California at LA

University of FloridaUniversity of North Carolina

39,000,000+ articles

we can only use algorithms that scale up

related research

searchreadership statistics

+ dozens of other servicesmost frequent tags

most frequent tags on our scale

related research

readership statistics search

most frequent tags

for each documentfor each tag in document

increment count for tag

sort tags by frequency







called 39,000,000 times


increment count for tagcalled ~3 times

called ~39,000,000 x 3 = ~117,000,000 times






most frequent tags

most frequent tags on our scale



sort tags by frequencyfor each tag counted

emit the tag and frequency

solution: distributed computing

map reduce




MapReduce: Simplified Data Processing on Large ClustersIn Proceedings of OSDI 2004, San Francisco, CA, 2004.Jeffrey Dean and Sanjay Ghemawat

hadoop

MapReduce: Simplified Data Processing on Large ClustersIn Proceedings of OSDI 2004, San Francisco, CA, 2004.Jeffrey Dean and Sanjay Ghemawat

solution: distributed computing

support vector machines

hidden markov models

conditional random fields

Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit: An open-source CRF reference string parsing package. In Proceedings of the LREC 08, Marrakesh, Morrocco.

deduplication

file hash check

crowd sourcing new articles from users

39,000,000 canonical documentsdocument fingerprinting

collapse metadata and update canonical docs

metadata comparison

pig

statistics

readerrank

currently tf-idf similarity between documentsdeveloping collaborative filtering

currently tf-idf similarity between documents

contact recommendations

currently recommendations based on contact networkdeveloping version based on interests

currently recommendations based on contact network


➔ our features



Summary

access to data

datatel data setonline catalog

online article view logs article tags

library readership library stars

Mendeley's API

*new* you can get all of the articles in a group - data for you to test related research algos?

Mashups with data on:

Chemical compounds

Locations

Alzheimer’s researchGrant funding

Twitter streams

Mendeley's API

want more?

let us know...

“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].

But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

www.mendeley.com

we're hiring!

mendeley, putting data into the hands of researchers

Technology

aggregates research

group data

simplified data processing

frequent tagsrelated

tag sort tags

tag countedemit

timessort tags

related research algos