can ai tell the difference between fried chicken a ... · learning ibm watson document conversion...
TRANSCRIPT
Can AI tell the difference between fried chicken a labradoodle? Can Machine Learning help with GDPR? Yes, fast, accurately and easily!
David SpurwayIBM Power Systems Product Manager, UKI
3 © IBM Corporation, 2016
Agenda
• How PowerAI can help with GDPR
• AI anyone? We have the best tools for that job!
• Introduction to Machine Learning
• How PowerAI makes Deep Learning easier, with demo
• How PowerAI Vision makes working with datasets much easier, with demo
4 © IBM Corporation, 2016
Example GDPR Case Study
5 © IBM Corporation, 2016
Elinar Oy Ltd
“We specialize in ECM [Enterprise Content
Management] solutions, and recognized that AI
could be key to tackling the huge amounts of
unstructured data that companies must contend
with every day. For business processes such as
sales order processing, invoice automation and
GDPR [European Union General Data
Protection Regulation] discovery, we could
teach AI solutions to replace the human element,
saving time, eliminating effort and uncovering
insights that are far beyond current biological
and technological capabilities. By becoming one
of the first in the AI space, we could seize first-
mover advantage ahead of our competitors.”
Ari Juntunen, CTO at Elinar Oy Ltd
http://ecc.ibm.com/case-study/us-
en/ECCF-POC03326USEN
6 © IBM Corporation, 2016
Built on IBM Power Systems
“We are also working on a new solution
that we call the Elinar GDPR AI Miner.
Built on the IBM Power Systems and
PowerAI platform combined with IBM
BigInsights Text Analytics, it will use our
unique AI capabilities to enable customers
to mine huge amounts of GDPR data.
Specifically, we will offer AI models for
GDPR consent identification and data
identification and extraction, which will
help users to achieve compliance at lower
cost and higher quality. These are just two
of countless potential applications.”
YES
PHASE 1
Training mode
TRAINING MODULE(Teached by customer)
PRE MADE AI
FI, SE, NO (NY), NO (BN), DK, UK, ES Incremental learning
IBM Watson
Document conversion
Natural Language
understanding
IS GDPR
AI(Caffe2)
?
NO
Feedback loop
PHASE 2
Configuration
AnnotationNLU (Watson)
/ AQL(BigInsights)
EXTRACTOR AI
AI GROUPER
Configuration
RULES
EXPORTER
Name, date of birth, address
Name, address
Social security number, customer number
PHASE 3
Rules, definedby client,
when document has GDPR data, AI needed?
Validation API: Customer teaching,
time delay
Level 1
Level 3
Level 2
Natural Language Classifier
Custom defines is data GDPR or not.
Rev.2
My Career History
Date range Role
July 1995 – December 1996 RS/6000 Support Specialist
January 1997 – October 1998 UK HACMP Technical Advisor
October 1998 – December 1999 RS/6000 Presales Technical Specialist
January 2000 – September 2004 RS/6000 Technical Specialist
September 2004 – March 2006 STG Product Services Team Leader – pSeries
March 2006 – December 2006 System p Technical Services Manager
January 2007 – December 2007 System z Technical Sales Specialist
January 2008 – October 2008 System p & System z Presales Technical Specialist
October 2008 – December 2009 Logicalis Senior Enterprise Technical Specialist
January 2010 – May 2011 Logicalis Technical Architect
May 2011 – November 2012 IBM UK STG Systems Architect, FSS
November 2012 – Present IBM Power Systems Product Manager
Now and going forward
November 2012 – Present: IBM Power Systems Product Manager
https://www.linkedin.com/pulse/how-ibm-power-systems-like-batman-david-spurway?articleId=8112406449121429124
10 © IBM Corporation, 2016
Who is the boss?
Robert Picciano
SVP IBM Cognitive Solutions
IBM Systems
Stefanie Chiras
Vice President, Power Systems Hardware
Offerings
11 © IBM Corporation, 2016
Augmented intelligence, Artificial Intelligence, Cognitive
driving innovation Faster
12 © IBM Corporation, 2016
My friends at Uni…
13 © IBM Corporation, 2016
Hello, Machine Learning - MNIST
14 © IBM Corporation, 2016
15 © IBM Corporation, 2016
16 © IBM Corporation, 2016
17 © IBM Corporation, 2016
18 © IBM Corporation, 2016
GPUs are like minions
The individual cores in a GPU
are not very powerfulhttps://devblogs.nvidia.com/parallelforall/inside-pascal/
But gather loads together, and
remarkable things can happen!
19 © IBM Corporation, 2016
OBSERVATION DECISIONINTERPRETATION EVALUATION
010101010101010111100010011001010111
0000000000010101010100000000000 111101011
11000 000000000000 111111 010101 101010
10101010100
PrescriptiveBest Outcomes?
DescriptiveWhat Has Happened?
CognitiveLearn Dynamically
PredictiveWhat Could Happen?
20 © IBM Corporation, 2016
010101010101010111100010011001010111
0000000000010101010100000000000 111101011
11000 000000000000 111111 010101 101010
10101010100
OBSERVATION DECISIONINTERPRETATION EVALUATION
PrescriptiveBest Outcomes?
DescriptiveWhat Has Happened?
CognitiveLearn Dynamically
PredictiveWhat Could Happen?
ACTIONDATA
How many fraudsduring last month? Per Country ?
Which Transactions will be fraudulent ?
What is the best action in light of potential fraud ? In Natural Language
: « Explain me whythis transaction is
fraudulent ?
21 © IBM Corporation, 2016
Prepare the data
ALL DATA
Input VAR
Training Data
Test Data
Machine Learning Algorithms
Predictive Model PREDICTION ? Test Data
ACCURACY ?
Machine Learning Algorithms use training data to create a
predictive model: its accuracy is tested on holdback data
Machine Learning
TensorFlow Caffee Torch Theano Chainer Spark ML ……
Prepare Data Build/Train Model Deploy/Score Monitor/Refine
Iterative Development
Recognising Patterns
Use Cases / Industries
Face Detection Spoken Words
Transportation Security
Extracting Insight
From Text/Video Avoiding spam
Healthcare improved diag
Discovering Anomalies
Financial Frauds Sensor Readings
Manufacturing optimisation
Making Predictions
Stock Trade Client Behavior
Retail shopping / promotion
22 © IBM Corporation, 2016
enterprise-ready
software distribution
built on open source
tools for ease
of development
performance
faster training times
for data scientists
23 © IBM Corporation, 2016
Had a drive, had an idea…
@PositivelyPOWER
24 © IBM Corporation, 2016
Other People’s work involving dogs
https://openpowerfoundation.org/blog
s/deep-learning-goes-to-the-dogs/
25 © IBM Corporation, 2016
A server and a friend
26 © IBM Corporation, 2016
The real work begins
• Started here:
https://openpowerfoundation.org/blog
s/deep-learning-goes-to-the-dogs/
• “Doggy Docker for Deep Learning”
•“We put our Caffe model and our
classification code…”
• “Requirements”
• “Install the caffe framework
following the instructions here,…”
• Prerequisites
• Caffe has several dependencies:
• CUDA is required for GPU mode.
• library version 7+ and the
latest driver version are
recommended, but 6.* is fine
too
• 5.5, and 5.0 are compatible but
considered legacy
• BLAS via ATLAS, MKL, or
OpenBLAS.
• Boost >= 1.55
• protobuf, glog, gflags, hdf5
27 © IBM Corporation, 2016
Installing CUDA
https://developer.nvidia.com/cuda-
downloads?target_os=Linux&target_arch=
ppc64le&target_distro=Ubuntu&target_vers
ion=1604
Installation Instructions:
• `sudo dpkg -i cuda-repo-ubuntu1604-
9-0-local_9.0.176-1_ppc64el.deb`
• `sudo apt-key add /var/cuda-repo-
<version>/7fa2af80.pub`
• `sudo apt-get update`
• `sudo apt-get install cuda`
28 © IBM Corporation, 2016
Installing BLAS
BLAS: install ATLAS by sudo apt-get install libatlas-base-dev or
install OpenBLAS by sudo apt-get install libopenblas-dev
29 © IBM Corporation, 2016
Installing Boost
5.1 Easy Build and Install
Issue the following commands in the shell (don't type $; that represents the shell's
prompt):
$ cd path/to/boost_1_61_0
$ ./bootstrap.sh --help
Select your configuration options and invoke ./bootstrap.sh again without the --help
option. Unless you have write permission in your system's /usr/local/ directory, you'll
probably want to at least use
$ ./bootstrap.sh --prefix=path/to/installation/prefix
to install somewhere else. Also, consider using the --show-libraries and --with-
libraries=library-name-list options to limit the long wait you'll experience if you build
everything. Finally,
$ ./b2 install
will leave Boost binaries in the lib/ subdirectory of your installation prefix. You will
also find a copy of the Boost headers in the include/ subdirectory of the installation
prefix, so you can henceforth use that directory as an #include path in place of the
Boost root directory.
30 © IBM Corporation, 2016
Installing Caffe
• Compilation with Make
–Configure the build by copying and modifying the example Makefile.config for your
setup. The defaults should work, but uncomment the relevant lines if using Anaconda
Python.
cp Makefile.config.example Makefile.config
# Adjust Makefile.config (for example, if using Anaconda Python, or if
cuDNN is desired)
make all
make test
make runtest–For CPU & GPU accelerated Caffe, no changes are needed.
Wait a minute – is
there not an easier
way?!
31 © IBM Corporation, 2016
Use PowerAI instead!
https://www.ibm.com/uk-en/marketplace/deep-learning-platform
32 © IBM Corporation, 2016
Step by step install guide
33 © IBM Corporation, 2016
Or, follow this demo from Chris Parsons (takes 4 mins!)
34 © IBM Corporation, 2016
What is PowerAI?PowerAI R3 PowerAI R4
OSUbuntu16.04
Ubuntu 16.04
CUDA 8.0 8
cuDNN 5.1 6
Built w/ MASS Yes Yes
OpenBLAS 0.2.19 0.2.19
Caffe 1.0 rc5 1.0 rc5
NVIDIA Caffe0.14.5 + 0.15.14 0.15.14
IBM Caffe 1.0 rc3 1.0 rc3
Chainer 1.20.1 1.23
NVIDIA DIGITS 5 5
Torch 7 7
Theano 0.9 0.9
TensorFlow1.0.0+0.12 1.1.0
GPU 4 x P100 4 x P100
Base System S822LC/HPC S822LC/HPC
35 © IBM Corporation, 2016
Steps involved
1. Download & install NVIDIA CUDA
2. Download & install NVIDIA cuDNN
6.0 for CUDA 8 POWER8 Deb
packages
3. Install Deep Learning frameworks
–All, if you like
–Or separately
4. Go!
PowerAI R3 PowerAI R4
OSUbuntu16.04
Ubuntu 16.04
CUDA 8.0 8
cuDNN 5.1 6
Built w/ MASS Yes Yes
OpenBLAS 0.2.19 0.2.19
Caffe 1.0 rc5 1.0 rc5
NVIDIA Caffe0.14.5 + 0.15.14 0.15.14
IBM Caffe 1.0 rc3 1.0 rc3
Chainer 1.20.1 1.23
NVIDIA DIGITS 5 5
Torch 7 7
Theano 0.9 0.9
TensorFlow1.0.0+0.12 1.1.0
37 © IBM Corporation, 2016
On to the training…
Example of how to train the dog breed model based on stanforddog datasetPlese make sure your caffe framewok was installed, and the path of caffe/.build_release/tools has been added to the system PATH.1. Download the stanford dog datasetDownload the images, annotations and lists from stanford dogs dataset and extract them to data/standord_dogs2. Parse the dataset to generate the train and test datasetcd data/standord-dogs && python dog_parse.py
3. Creat the lmdb data for traincd data/standord-dogs && ./create_image.sh
4. Train the model based on GoogleNet.Download GoogleNet Model to models/bvlc_googlenet/Run the following command to start the train.cd models/bvlc_googlenet/ && train.sh
38 © IBM Corporation, 2016
A problem with my idea…
Stanford Dogs Dataset
Aditya Khosla Nityananda Jayadevaprakash Bangpeng Yao Li
Fei-Fei
Stanford University
The Stanford Dogs dataset contains images of 120 breeds of dogs from
around the world. This dataset has been built using images and annotation
from ImageNet for the task of fine-grained image categorization. Contents
of this dataset:
•Number of categories: 120
•Number of images: 20,580
•Annotations: Class labels, Bounding boxes
39 © IBM Corporation, 2016
Examples of the images (Bernese Mountain Dog)
40 © IBM Corporation, 2016
The GPUs we began with…
41 © IBM Corporation, 2016
The results with the S824L
42 © IBM Corporation, 2016
Deep Learning Goes to the Dogs
• https://openpowerfoundation.org/blogs/deep-learning-goes-to-the-dogs/
• For IBMers with permission:
–https://9.196.133.13/netaccess/loginuser.html
–Dogs Classification Demos
• http://vision.stanford.edu/aditya86/ImageNetDogs/
• The Stanford Dogs dataset contains images of 120 breeds of dogs from
around the world. This dataset has been built using images and annotation
from ImageNet for the task of fine-grained image categorization.
44 © IBM Corporation, 2016
But then, along came a Minksy!
45 © IBM Corporation, 2016
Using some real GPUs, quite hard!
46 © IBM Corporation, 2016
From 24 hours to 2 hours 38 minutes…
47 © IBM Corporation, 2016
Faster and more accurate
IBM Power S824L
24 hours
Accuracy =
IBM Power S822LC for HPC (Minsky)
2 hour 38 minutes
Accuracy =
0.7% more accurate
48 © IBM Corporation, 2016
Acceleration training …. days become hours
9Days
4 H
ou
rs
Recognition
Shape
Attenuation
Boundary
Recognition
Shape
Attenuation
Boundary
54x
Learning
runs with
POWER8
4 H
ou
rs
4 H
ou
rs
4 H
ou
rs
4 H
ou
rs
. . . . . . . . .
. . . . . .
4 H
ou
rs
What will you do?
Iterate more and create more accurate models?
Create more models?
Both?
100x
Learning
runs with
POWER9
50 © IBM Corporation, 2016
Power Systems and NVIDIA GPU Roadmap
NVIDIA GPU NVIDIA GPU with NVLink
2015
Power Chip Power Chip
with NVLink
2016
80 GB/s
Peak*PCIe x1632 GB/s
51 © IBM Corporation, 2016
Just to mention, in Azure…
NC6 NC12 NC24 NC24r
Cores 6 12 24 24
GPU 1 x K80 GPU 2 x K80 GPUs 4 x K80 GPUs 4 x K80 GPUs
Memory 56 GB 112 GB 224 GB 224 GB
Disk 380 GB SSD 680 GB SSD 1.44 TB SSD 1.44TB SSD
Network Azure
Network
Azure Network Azure Network InfiniBand &
Azure Network
https://azure.microsoft.com/en-us/blog/azure-n-series-general-availability-on-december-1/
52 © IBM Corporation, 2016
A bit of a fail…
53 © IBM Corporation, 2016
Answering questions like this…
https://twitter.com/drjuliashaw/status/87
4293864814845952
54 © IBM Corporation, 2016
How do I get data into PowerAI?
55 © IBM Corporation, 2016
On to the training…
Example of how to train the dog breed model based on stanforddog datasetPlese make sure your caffe framewok was installed, and the path of caffe/.build_release/tools has been added to the system PATH.1. Download the stanford dog datasetDownload the images, annotations and lists from stanford dogs dataset and extract them to data/standord_dogs2. Parse the dataset to generate the train and test datasetcd data/standord-dogs && python dog_parse.py
3. Creat the lmdb data for traincd data/standord-dogs && ./create_image.sh
4. Train the model based on GoogleNet.Download GoogleNet Model to models/bvlc_googlenet/Run the following command to start the train.cd models/bvlc_googlenet/ && train.sh
56 © IBM Corporation, 2016
Preparing the data…
redbooks@pts153:~/pet-breed/data/stanford-dogs$ pg dog_parse.py
import scipy.io
import os
import cv2
import random
import xml.etree.ElementTree as ET
dog_images = './Images'
dog_annotation = './Annotation'
dog_train_annos_mat = './train_list.mat'
dog_test_annos_mat = './test_list.mat'
output_dir = './dog_detected'
output_txt_dir = '.'
def dog_annotation_processing(dog_annos_mat):
mat = scipy.io.loadmat(dog_annos_mat)
file_list = mat['file_list']
annotation_list = mat['annotation_list']
labels = mat['labels']
file_str = ''
labels_str = ''
for i, filename_item in enumerate(file_list):
filename = filename_item[0][0]
annotation_filename = annotation_list[i][0][0]
label = labels[i][0]
root = ET.parse(dog_annotation + '/' + annotation_filename)
bb = [int(root.find('object').find('bndbox').find('xmin').text), \
int(root.find('object').find('bndbox').find('ymin').text), \
int(root.find('object').find('bndbox').find('xmax').text), \
int(root.find('object').find('bndbox').find('ymax').text)]
print filename, bb, label
img = cv2.imread(dog_images + '/' + filename)
outBgr = img[bb[1]:bb[3], bb[0]:bb[2]]
category = filename.split('/')[0]
if not os.path.exists(output_dir + '/' + category):
os.system('mkdir -p ' + output_dir + '/' + category)
category_split = category.split('-')
labels_str += category_split[0] + ' ' + category_split[1] + '\n'
output_path = os.path.join(output_dir, filename)
cv2.imwrite(output_path, outBgr)
file_str += filename + ' ' + str(int(label)-1) + '\n'
return file_str, labels_str
if not os.path.exists(output_dir):
os.system('mkdir -p ' + output_dir)
train_str, labels_words = dog_annotation_processing(dog_train_annos_mat)
fp = open(output_txt_dir + '/train.txt', 'w')
fp.write(train_str)
fp.close
fp = open(output_txt_dir + '/synset_words.txt', 'w')
fp.write(labels_words)
fp.close
val_str, labels_words = dog_annotation_processing(dog_test_annos_mat)
fp = open(output_txt_dir + '/val.txt', 'w')
fp.write(val_str)
fp.close
57 © IBM Corporation, 2016
Annotation files…
redbooks@pts153:~/pet-breed/data/stanford-dogs/Annotation/n02107683-
Bernese_mountain_dog$ cat n02107683_1003
<annotation>
<folder>02107683</folder>
<filename>n02107683_1003</filename>
<source>
<database>ImageNet database</database>
</source>
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
<segment>0</segment>
<object>
<name>Bernese_mountain_dog</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>115</xmin>
<ymin>119</ymin>
<xmax>248</xmax>
<ymax>246</ymax>
</bndbox>
</object>
58 © IBM Corporation, 2016
What format does my data need to be in for Machine
Learning?
• Labeled data
–Metadata (this is a dog, cat, etc.)
–Sub folders named appropriately (/dog /cat)
–CSV. (prevalent)
• TensorFlow
–TFRecords
• Caffe
–Blobs
59 © IBM Corporation, 2016
Transforming Data
60 © IBM Corporation, 2016
Transforming Data (null)
61 © IBM Corporation, 2016
Transforming Data
62 © IBM Corporation, 2016
AI Vision toolset
Experts only becomes
beginner knowledge
requirement to build
image-based neural nets
Tooling lets non-techies
label data – brings expertise
to algorithm from LOB and
mitigates errors
Choose best model and
framework to apply based
on data set
Tools for Ease
of Development
64 © IBM Corporation, 2016
First try, using SuperVessel
• http://ny1.ptopenlab.com:443/AIVision/index.html
65 © IBM Corporation, 2016
Begins well…
66 © IBM Corporation, 2016
But this does not look quite so good…
67 © IBM Corporation, 2016
Back to using a Minsky in Southbank…
69 © IBM Corporation, 2016
“Artificial intelligence struggles to tell difference between fried
chicken and Labradoodles.”
70 © IBM Corporation, 2016
AI Vision demo: Enable deep learning capability from data center
to car
© IBM Corporation, 2016
Questions?David Spurway – IBM Power Systems Product Manager
Email: [email protected]
Phone: 07717 892 896
Twitter, LinkedIn, YouTube
72 © IBM Corporation, 2016
What we have gone through
• How PowerAI can help with GDPR
• AI anyone? We have the best tools for that job!
• Introduction to Machine Learning
• How PowerAI makes Deep Learning easier, with demo
• How PowerAI Vision makes working with datasets much easier, with demo
© IBM Corporation, 2016
Thank you!David Spurway – IBM Power Systems Product Manager
Email: [email protected]
Phone: 07717 892 896
Twitter, LinkedIn, YouTube
74 © IBM Corporation, 2016
Trademarks and notes
IBM Corporation 2015
• IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be trademarks or service marks of International Business Machines Corporation in the United States, other countries, or both. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
• Other company, product, and service names may be trademarks or service marks of others.
• References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.
• IBM and IBM Credit LLC do not, nor intend to, offer or provide accounting, tax or legal advice to clients. Clients should consult with their own financial, tax and legal advisors. Any tax or accounting treatment decisions made by or on behalf of the client are the sole responsibility of the customer.
• IBM Global Financing offerings are provided through IBM Credit LLC in the United States, IBM Canada Ltd. in Canada, and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates and availability are based on a client’s credit rating, financing terms, offering type, equipment type and options, and may vary by country. Some offerings are not available in certain countries. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.
75 © IBM Corporation, 2016
Special notices
This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in
other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings
available in your area.
Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you
any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-
1785 USA.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees
either expressed or implied.
All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the
results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and
conditions.
IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide
to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and
options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.
IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.
All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are
dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this
document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available
systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the
applicable data for their specific environment.