develop your machine learning model - … your machine learning model ... google news, and youtube....
TRANSCRIPT
D e v e l o p Yo u r M a c h i n e L e a r n i n g M o d e lF U T U R E P R E D I C T I O N S
SAMSON LEE | DELON YAU
Microsoft Hong Kong Limited
T E C H _ F o r u m
1. Choose the Right Model for Machine Learning
2. The Data and Infrastructure Preparation
3. Developing Your Machine Learning Solutions
1. Choose the RightMachine Learning Model
SAMSON LEE
Technology Solutions Professional
Developer Experience Group
Machine Learning
"Field of study that gives computers theability to learn without being explicitlyprogrammed“ - 1959, Arthur Samuel
A set of algorithms that learn from data
for the discovery of patterns that can be used
to understand or solve the relevant problems
Requires Huge Computing Power Cloud Computing
Machine Learning Categories
• Algorithms trained with data
comprised of examples of the
answers wanted (Have expected
results)
• Example: a model that identifies
fraudulent credit card use would be
trained from a data set with labeled
data points of known fraudulent and
valid charge
• Algorithms tries to autonomously
identify patterns and rules in given
dataset
• Example: find groups of customer
demographics with similar buying
habits
Supervised Unsupervised
Machine Learning Algorithms
The 5questions
data science
answers
Is this A or B? Classification Algorithms
Is this Weird?Anomaly Detection
Algorithms
How much? How many? Regression Algorithms
How is this organized? Clustering Algorithms
What should I do now?Reinforcement Learning
Algorithms
Machine Learning Algorithms
Is this A or B? Classification Algorithms
Is this Weird?Anomaly Detection
Algorithms
How much? How many? Regression Algorithms
How is this organized? Clustering Algorithms
What should I do now?Reinforcement Learning
Algorithms
The Microsoft
Cognitive Toolkit
Open Source
Customize models through
Python, C++ or BrainScript
Run on both
Windows & Linux OS
(Docker enabled)
Team Data Science Process
Business
Understanding
Data
Acquisition &
Understanding
Deployment
Modeling
Data Source
Pipeline
Data Wrangling
Analytics Environment
Feature Engineering
Model Fitting
Model Evaluation
2. The Data andInfrastructure Preparation
SAMSON LEE
Technology Solutions Professional
Developer Experience Group
Platform Services
Security & Management
Infrastructure Services
Datacenter Infrastructure (34 Regions including Hong Kong)
Web Apps
MobileApps
APIManagement
APIApps
LogicApps
NotificationHubs
Content DeliveryNetwork (CDN)
MediaServices
HDInsight MachineLearning
StreamAnalytics
DataFactory
EventHubs
MobileEngagement
ActiveDirectory
Multi-FactorAuthentication
Automation
Portal
Key Vault
BiztalkServices
HybridConnections
ServiceBus
StorageQueues
Store /Marketplace
HybridOperations
Backup
StorSimple
SiteRecovery
Import/Export
SQLDatabase
DocumentDB
RedisCache Search
Tables
SQL DataWarehouse
Azure AD Connect Health
AD PrivilegedIdentity Management
OperationalInsights
CloudServices
Batch Remote App
ServiceFabric Visual Studio
ApplicationInsights
Azure SDK
Team Project
VM Image Gallery& VM Depot
AZURE at a Glance
Platform Services
Security & Management
Infrastructure Services
Datacenter Infrastructure (34 Regions including Hong Kong)
Web Apps
MobileApps
APIManagement
APIApps
LogicApps
NotificationHubs
Content DeliveryNetwork (CDN)
MediaServices
HDInsight MachineLearning
StreamAnalytics
DataFactory
EventHubs
MobileEngagement
ActiveDirectory
Multi-FactorAuthentication
Automation
Portal
Key Vault
BiztalkServices
HybridConnections
ServiceBus
StorageQueues
Store /Marketplace
HybridOperations
Backup
StorSimple
SiteRecovery
Import/Export
SQLDatabase
DocumentDB
RedisCache Search
Tables
SQL DataWarehouse
Azure AD Connect Health
AD PrivilegedIdentity Management
OperationalInsights
CloudServices
Batch Remote App
ServiceFabric Visual Studio
ApplicationInsights
Azure SDK
Team Project
VM Image Gallery& VM Depot
AZURE at a Glance
Ingestion
Modern Data Lifecycle
Processing Staging Serving
• Event Hubs
• IoT Hubs
• Service Bus
• Kafka
• HDInsight
• Azure Data Lake
Analytics
• Storm
• Spark
• Stream Analytics
• Azure Data Lake
Storage
• Azure Storage
• Azure SQL DB
• Azure Data Lake
Storage
• Azure Data
Warehouse
• Azure SQL DB
• Hbase
• Cassandra
• Azure Storage
• Power BI
Enrichment and Curation
Azure Data Factory Azure Machine Learning
3. DevelopingYour Machine Learning Solutions
DELON YAU
Technical Evangelist
Developer Experience Group
Introduction
Microsoft’s Spot Market is specifically designed to help small businesses gain exposure in the online market, and for consumers to find personalised recommendations. Machine learning plays a vital role in the solution.
Nowadays, product recommendation systems often use techniques such as collaborative filtering or content based filtering.
Why do we need a Recommendation Engine?
E-commerce has reshaped consumer-business interactions. Consumers are exposed to a wide range of choices, and a number of businesses have developed customised recommendation systems.
Spot Markets: High Street (SMHS) is a cloud-based platform designed to connect consumers with local retailers.
What is novel about it? How can we stand out?
Most recommendation systems nowadays have been developed to find the most closely personalised recommendations for their consumers based on consumer feedback such as ratings and comments. For example, systems used by Netflix, Amazon, Google News, and YouTube.
However, the recommendation engine in Spot Market relies not only on various direct and indirect consumer feedback but also on building a consumer profile through retrieving and analysing third party application data. This consumer profile serves as one of the inputs for the recommendation engine which then generates a list of recommended items.
Goals
Understand customers’ interest change and decay information as time goes by. In other words, products and services that the customers interacted with more recently are weighted higher than that the customers interacted with long time ago.
Allow customers to be notified about the most relevant products based on their social network activities such as Facebook, Twitter, Pinterest.
Learning the activities of the user within the app. For example, if they bookmark a certain product or retailer, they might want to see more related products in the future.
Location tracking: Being able to recommend relevant shops or restaurants based on the user’s location.
The three families
Three families of recommender systems are considered here:
Content-based.
Collaborative filtering.
Hybrid.
Collaborative Filtering
Memory-Based Collaborative Filtering
Item-Item Collaborative Filtering: “Users who liked this item also liked ...”
User-Item Collaborative Filtering: “Users who are similar to you also liked ...”
Model-based Collaborative Filtering
Matrix Factorisation (MF), an unsupervised learning approach for latent variable decomposition and dimensionality reduction.
K-Means K-Means clustering is a popular unsupervised classification algorithm in data mining where the given dataset is not labelled i.e. not categorised.
Benefits of K-means
K-means clustering has several advantages compared to its competitors:
Suitable for product recommendation system: Particularly useful if there is only a limited understanding of how the data is structured. For example, if the given consumer-product matrix contain many different trends.
Easy operation: K-means clustering only takes in the data parameters and the K values, i.e. the number of clusters to be created.
Fast operation
Other Cool Features
Location Prediction: A typical situation that utilises the Location Analyser is of a jogger that runs on the same path every Sunday and has liked a specific brand of athletic shoes on their Facebook account. If a shop that resides along the path of the jogger puts out an offer on trainers, then the Recommendation. Engine will suggest that offer to the jogger before they go out for their weekly exercise.
Who the user is with: Another interesting suggestion from the client is that the future system should be able to recognize not only where the user is and what retailers nearby, but also who the user is with. For example, if the user is with his wife, a restaurant nearby with the cuisine that they both like would be recommended.
Calendar Events
Core Software and Tools
Azure Machine Learning Studio
Python
NumPy
The SciPy Library
Matplotlib
Pandas
SymPy
Ipython - A kernel for Jupyter
Scikit-learn Library
One of the most important libraries in this project. It consists of open-source machine learning and data mining algorithms. Some popular algorithms include regression, classification, kmeans/ spectral data clustering, support vector machines etc. They are well-designed with full compatibility with the aforementioned scientific libraries such as SciPy and NumPy.
A rapidly used module throughout this project, the sklearn.decomposition module, includes popular matrix decomposition algorithms, such as Sparse Principal Component Analysis (SPCA), Non-negative Matrix Factorisation (NMF) and Independent Component Analysis (ICA).