how to build your own delve: combining machine learning, big data and sharepoint

47
How to build your own Delve: combining machine learning, big data and SharePoint #SPSBE11 Joris Poelmans April 18 th , 2015

Upload: joris-poelmans

Post on 15-Jul-2015

632 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: How to build your own Delve: combining machine learning, big data and SharePoint

How to build your own Delve: combining machine learning, big data and SharePoint

#SPSBE11

Joris Poelmans

April 18th, 2015

Page 2: How to build your own Delve: combining machine learning, big data and SharePoint

Platinum

Go

ldSilver

Thanks to our sponsors!

Page 3: How to build your own Delve: combining machine learning, big data and SharePoint

http://jopx.blogspot.com

Page 4: How to build your own Delve: combining machine learning, big data and SharePoint

Agenda

Introduction to Delve

Office Graph

Big Data and Machine Learning

Building your own Delve - architectural concept

Page 5: How to build your own Delve: combining machine learning, big data and SharePoint

Agenda

Introduction to Delve

Office Graph

Big Data and Machine Learning

Building your own Delve - architectural concept

Page 6: How to build your own Delve: combining machine learning, big data and SharePoint

Stay In the Know Find What you Need Discover New Connections

Connect with the right experts and

learn more about their content.

Find just the right results from any

source and take actionDiscover new information tailored

to you from your network

Delve – Search and Discovery Across O365

Powered by Office Graph

Page 7: How to build your own Delve: combining machine learning, big data and SharePoint
Page 8: How to build your own Delve: combining machine learning, big data and SharePoint

Agenda

Introduction to Delve

Office Graph

Big Data and Machine Learning

Building your own Delve - architectural concept

Page 9: How to build your own Delve: combining machine learning, big data and SharePoint

What is The Office Graph?

User Documents People Conversations

Page 10: How to build your own Delve: combining machine learning, big data and SharePoint

What is The Office Graph?

Manager

Direct report

Works with

Shared with me

Viewed by me

Trending around me

Presented to me

Liked by me

Page 11: How to build your own Delve: combining machine learning, big data and SharePoint

Connected Enterprise

Page 12: How to build your own Delve: combining machine learning, big data and SharePoint
Page 13: How to build your own Delve: combining machine learning, big data and SharePoint

Signals sent from Delve, Exchange, O365, …

Click person

Modify/Save

Elevate

Share

Follow

Like

Comments

Email

Ignore

Presented to

Shown document

Open document

Shown board

++

Page 14: How to build your own Delve: combining machine learning, big data and SharePoint

Content and signals across O365 auto-

populating the Office Graph insights

Insights derived with machine learning for proactive and intelligent experiences

Page 15: How to build your own Delve: combining machine learning, big data and SharePoint

Agenda

Introduction to Delve

Office Graph

Big Data and Machine Learning

Building your own Delve - architectural concept

Page 16: How to build your own Delve: combining machine learning, big data and SharePoint

Big data is what

happened

when the cost

of storing user data

became cheaper

than making the

decision

to throw it away

Page 17: How to build your own Delve: combining machine learning, big data and SharePoint

Transactions + Interactions + Observations = Big Data

Megabytes

Gigabytes

Terabytes

Petabytes

Purchase detail

Purchase record

Payment record

ERP

CRM

WEB

Offer details

Support Contacts

Customer Touches

Segmentation

Web logs

Offer history

A/B testing

Dynamic Pricing

Affiliate Networks

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMSSentiment

External Demographics

HD Video, Audio, Images

Speech to Text

Product/Service Logs

Social Interactions & Feeds

Business Data Feeds

User Click Stream

Sensors / RFID / Devices

Spatial & GPS Coordinates

Increasing Data Variety and Complexity

Page 18: How to build your own Delve: combining machine learning, big data and SharePoint

Big Data Core Technology landscape

• New paradigm for

storing data

• 100+ Non-SQL DB’s

and growing

• Support SQL querying

• Internal architecture

different from classic DBs

• Appliances

• Teradata

• Microsoft

PDW/APS

• Oracle BDA X4-2

• Hadoop/HDFS+

MapReduce

• Key Big Data

technology

Hadoop MPP

NoSQLNewSQL

Page 19: How to build your own Delve: combining machine learning, big data and SharePoint

Modern Data Architecture• Apache Hadoop is an open source

framework that supports data-intensive distributed applications Uses HDFS storage to enable

applications to work with 1000s of nodes and petabytes of data using a scale-out model

Uses MapReduce to process data

Inspired by Google

MapReduce

Google File System

Related projects:

HBase, Hive, Mahout, Pig,Sqoop, Ambari, Storm, Zookeeper, ... Andmany more

Page 20: How to build your own Delve: combining machine learning, big data and SharePoint

HDFS and MapReduce in a nutshell

Page 21: How to build your own Delve: combining machine learning, big data and SharePoint

Hadoop components

Distributed Storage

(HDFS)

Hive

Distributed Processing

(MapReduce)

PigHB

ase

HCatalog

Data

Inte

gra

tion

( OD

BC

/ SQ

OO

P/ R

EST/F

lum

e)

MahoutPegasus Rhadoop

Oo

zie

Data integration

Data access

Hadoop core

Operations

Am

bari

Zo

oke

ep

er

StormKafka

http://jopx.blogspot.be/2015/03/overview-of-apache-hadoop-components-in.html

Page 22: How to build your own Delve: combining machine learning, big data and SharePoint

Microsoft Azure HDInsightSupport HBase as NoSQL columnar database on Azure Blobs

Support Storm as stream processing

Hadoop in Azure

Data Node Data Node Data Node Data Node

Task Tracker Task Tracker Task Tracker Task Tracker

Name Node

Job Tracker

HMasterCoordination

Region Server Region Server Region Server Region Server

Able to leverage Azure Blob Storage

Pay per use model

Based on Hortonworks Data Platform

Page 23: How to build your own Delve: combining machine learning, big data and SharePoint
Page 24: How to build your own Delve: combining machine learning, big data and SharePoint

Hive• Hadoop feature to perform data warehouse

operations

• HiveQL High-level, SQL-like language, abstraction over MapReduce

Supports equi-joins

Schema on read NOT schema on write

Automatically invokes MapReduce jobs

Much simpler than using MapReduce directly

• Metadata store Contains descriptions of tables

• Acts as a bridge to many BI products which expecttabular data

Page 26: How to build your own Delve: combining machine learning, big data and SharePoint
Page 27: How to build your own Delve: combining machine learning, big data and SharePoint
Page 28: How to build your own Delve: combining machine learning, big data and SharePoint

Machine learningfinding the needle in the haystack

• Formal definition: “A computer program is said to learn from

experience E with respect to some class of tasks T and performance

measure P, if its performance at tasks in T, as measured by P,

improves with experience E” - Tom M. Mitchell

• Another definition: “The goal of machine learning is to program

computers to use example data or past experience to solve a given

problem.” – Introduction to Machine Learning, 2nd Edition, MIT Press

• ML often involves two primary techniques: – Supervised Learning: Finding the mapping between inputs and outputs using

correct values to “train” a model

– Unsupervised Learning: Finding patterns in the input data (similar to Density

Estimates in Statistics)

Page 29: How to build your own Delve: combining machine learning, big data and SharePoint

Vision Analytics

Recommendation

engines

Advertising analysis

Weather forecasting for

business planning

Social network analysis

Legal

discovery and document

archiving

Pricing analysis

Fraud

detection

Churn

analysis

Equipment monitoring

Location-based tracking

and services

Personalized Insurance

Page 30: How to build your own Delve: combining machine learning, big data and SharePoint

Some retailers profit… by predicting major changes in your life.

Page 31: How to build your own Delve: combining machine learning, big data and SharePoint

Steps to build a machine learning solution

Page 32: How to build your own Delve: combining machine learning, big data and SharePoint

Typical machine learning algorithms• Clustering (k-means, orthogonal partitioning,…)

• Association rule learning ( A priori)

• Regression (linear/logistic)

• Recommendation engines

• Classification (C4.5, decision trees, SVM, Naïve Bayes, AdaBoost, Random Forest, …)

• Similarity matching

• Neural networks

• Bayesian networks

• Genetic algorithms

• EnsemblesSee http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/

And http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf and

http://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms

Page 33: How to build your own Delve: combining machine learning, big data and SharePoint

Doing recommendations – some approaches• Collaborative filtering

• Feature based recommendations

• K-nearest neighbours

Page 34: How to build your own Delve: combining machine learning, big data and SharePoint

Collaborative filtering• A set of items

(books, beers, blogposts,…)

• Ratings from users

• Recommendeditems based on your ratings andother people’sratings

Page 35: How to build your own Delve: combining machine learning, big data and SharePoint

Feature based recommendations• Use user’s ratings of items

Create an algorithm to definewhich features (metadata ) of items the user likes

• Requires detailedinformation about items -content based An item can be a person as well –

see “People you may know”

• Most approaches combine “feature based” and“collaborative filtering”

Page 36: How to build your own Delve: combining machine learning, big data and SharePoint

K-Nearest Neighbours (Classification approach)• Find ratings from people similar

to you and see what they liked Use similarity functions (Minkowski

distance, RMSE, Pearson CorrelationCoefficient,…)

• Take the average ratings of the k people most similar to you Display the items with the highest

averages

• Conclusion – requires solidbackground in Math andStatistics

Page 37: How to build your own Delve: combining machine learning, big data and SharePoint

Machine Learning and Data Scientists

Developing predictive analytics and

machine learning must be simpler, today it requires specialized skills:• Data management• Data exploration• Math & statistics• Domain expertise• Machine learning• Software development• Data visualization

65% of enterprise feel they have a strategic shortage of data scientists, a role many did not know existed 12 months ago …

Page 38: How to build your own Delve: combining machine learning, big data and SharePoint

Microsoft Azure Machine Learning

Page 39: How to build your own Delve: combining machine learning, big data and SharePoint

Microsoft Azure Machine Learning (Ctd.)

Personalized WorkspaceCombine R modules with Microsoft’s best in class algorithms running Xbox and Bing

Work with anyone, anywhere by simply sharing the workspace

Easy Access to All DataDrop in desktop data sets into the built-in storage space.

Bring in cloud data with the ease of a drop down

Deploy Models as Web ServicesOperationalize in minutes and refine models at the speed of the market

Partner ToolsML partners enjoy SDK access for robust solutions

Microsoft Azure Machine Learning Studio

Microsoft Azure Machine Learning API service

Microsoft AzureMachine Learning SDK

Page 40: How to build your own Delve: combining machine learning, big data and SharePoint
Page 41: How to build your own Delve: combining machine learning, big data and SharePoint

Agenda

Introduction to Delve

Office Graph

Big Data and Machine Learning

Building your own Delve - architectural concept

Page 42: How to build your own Delve: combining machine learning, big data and SharePoint

E vent producers

Web logs

Documents &

metadata

Transform Long-term storage

Azure SQL

Database & Azure

Storage

Predictive Analytics

Azure

Machine

Learning

Presentation and action

On premise

Building your own Delve - high level architecture

Page 43: How to build your own Delve: combining machine learning, big data and SharePoint

Building your own Delve – remarks

• Graph technology left out for simplicity Take a look at Neo4J or Pegasus on Hadoop if you are interested

• Not very realistic to rebuild Delve but possible todefine point solutions

• If you still go ahead Think about the end-to-end data pipeline

Fast track with Recommendation API in datamarket http://datamarket.azure.com/dataset/amla/recommendations

Cache recommendations for performance and cost optimization

Learn R or Python to extend AzureML capabilities

Page 44: How to build your own Delve: combining machine learning, big data and SharePoint

Online Resources

• www.coursera.org (MOOC)

• Microsoft Virtual Academy http://www.microsoftvirtualacademy.com/training-courses/getting-started-with-microsoft-

azure-machine-learning

http://www.microsoftvirtualacademy.com/training-courses/implementing-big-data-analysis

• Cloud Data Science process - http://azure.microsoft.com/en-

us/documentation/articles/machine-learning-data-science-how-to-create-machine-learning-service/

• Blogs http://blogs.msdn.com/b/benjguin/

http://hortonworks.com/blog/

http://blogs.msdn.com/b/bigdatasupport/

http://blogs.msdn.com/b/big_data_france/

http://blogs.msdn.com/b/brian_swan/

http://blogs.msdn.com/b/mwinkle/

http://blogs.msdn.com/b/avkashchauhan/

http://blogs.msdn.com/b/carlnol/

http://blogs.technet.com/b/machinelearning/

Page 45: How to build your own Delve: combining machine learning, big data and SharePoint

Recommended books

Page 46: How to build your own Delve: combining machine learning, big data and SharePoint

Thank you!

Page 47: How to build your own Delve: combining machine learning, big data and SharePoint