mendeley, putting data into the hands of researchers
DESCRIPTION
I was invited to give a keynote presentation at the RecSysTEL Workshop (http://bit.ly/b2Bg2J) on 2010/09/30.It presents Mendeley's tools for researchers and data sets that we made available for the dataTEL challenge, designed to provide new large scale data for researcers in recommendation systems.The event was really enjoyable and the participants were excited about Mendeley.TRANSCRIPT
Mendeley, putting data into the hands of
researchers
Kris Jack, PhDData Mining Team Coordinator
“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].
But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“
➔ idea behind mendeley
➔ our features
➔ our technical challenges and solutions
➔ what does this mean for you?
Summary
works like this:
1) Install “Audioscrobbler”
2) Listen to music
3) Last.fm builds your music profile and recommends you music you also could like... and it’s the world‘s biggest open music database
Last.fmMendeley
research libraries
researchers
papers
disciplines
music libraries
artists
songs
genres
Last.fmMendeley
➔ idea behind mendeley
➔ our features
➔ our technical challenges and solutions
➔ what does this mean for you?
Summary
Mendeley helps researchers work smarter
Mendeley extracts research data..
Install Mendeley Desktop
Mendeley helps researchers work smarter
..and aggregates research data in the cloud
Mendeley extracts research data..
Mendeley helps researchers work smarter
By doing this, Mendeley makes science more collaborative and transparent
➔ idea behind mendeley
➔ our features
➔ our technical challenges and solutions
➔ what does this mean for you?
Summary
500,000+ users; the 20 largest userbases:
University of CambridgeStanford University
MITUniversity of Michigan
Harvard UniversityUniversity of OxfordSao Paulo University
Imperial College LondonUniversity of Edinburgh
Cornell UniversityUniversity of California at Berkeley
RWTH AachenColumbia University
Georgia TechUniversity of Wisconsin
UC San DiegoUniversity of California at LA
University of FloridaUniversity of North Carolina
39,000,000+ articles
we can only use algorithms that scale up
related research
searchreadership statistics
+ dozens of other servicesmost frequent tags
most frequent tags on our scale
related research
readership statistics search
most frequent tags
for each documentfor each tag in document
increment count for tag
sort tags by frequency
for each documentfor each tag in document
increment count for tag
for each documentfor each tag in document
increment count for tag
for each documentfor each tag in document
increment count for tag
called 39,000,000 times
for each documentfor each tag in document
increment count for tagcalled ~3 times
called ~39,000,000 x 3 = ~117,000,000 times
for each documentfor each tag in document
increment count for tag
for each documentfor each tag in document
increment count for tag
sort tags by frequency
most frequent tags
most frequent tags on our scale
for each documentfor each tag in document
increment count for tag
sort tags by frequencyfor each tag counted
emit the tag and frequency
solution: distributed computing
map reduce
for each documentfor each tag in document
increment count for tag
sort tags by frequency
MapReduce: Simplified Data Processing on Large ClustersIn Proceedings of OSDI 2004, San Francisco, CA, 2004.Jeffrey Dean and Sanjay Ghemawat
hadoop
MapReduce: Simplified Data Processing on Large ClustersIn Proceedings of OSDI 2004, San Francisco, CA, 2004.Jeffrey Dean and Sanjay Ghemawat
solution: distributed computing
support vector machines
hidden markov models
conditional random fields
Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit: An open-source CRF reference string parsing package. In Proceedings of the LREC 08, Marrakesh, Morrocco.
deduplication
file hash check
crowd sourcing new articles from users
39,000,000 canonical documentsdocument fingerprinting
collapse metadata and update canonical docs
metadata comparison
pig
statistics
readerrank
currently tf-idf similarity between documentsdeveloping collaborative filtering
currently tf-idf similarity between documents
contact recommendations
currently recommendations based on contact networkdeveloping version based on interests
currently recommendations based on contact network
➔ idea behind mendeley
➔ our features
➔ our technical challenges and solutions
➔ what does this mean for you?
Summary
access to data
datatel data setonline catalog
online article view logs article tags
library readership library stars
Mendeley's API
*new* you can get all of the articles in a group - data for you to test related research algos?
Mashups with data on:
Chemical compounds
Locations
Alzheimer’s researchGrant funding
Twitter streams
Mendeley's API
want more?
let us know...
“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].
But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“
www.mendeley.com
we're hiring!