rm world 2014: rapidminerresources.com

21
RapidMinerResources & Book presentation Andrew Chisholm Rapidminerresources 1

Upload: rapidminer

Post on 29-Nov-2014

117 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: RM World 2014: Rapidminerresources.com

RapidMinerResources & Book presentation

Andrew Chisholm

Rapidminerresources 1

Page 2: RM World 2014: Rapidminerresources.com

TOPICS

• About us - Andrew Chisholm

• Book - Exploring Data with RapidMiner

• About us – Dr Markus Hofmann

• RapidMiner Resources background

• RapidMiner Resources videos now

• Future plans

• A mini survey to help me focus

Rapidminerresources 2

Page 3: RM World 2014: Rapidminerresources.com

About us – Andrew Chisholm

• By day• Product manager for an active test and measurement product used

extensively in the telecoms world

• By night • Crime fighting super hero• Data mining hobbyist

• Recent Masters degree in Data Mining and Business Intelligence

• Certified RapidMiner Master (#007)

• RapidMiner blog at http://rapidminernotes.blogspot.com

• Author of “Exploring Data with RapidMiner”

Rapidminerresources 3

Page 4: RM World 2014: Rapidminerresources.com

Exploring data with RapidMiner

Rapidminerresources 4

• 90% [1] of data mining is • Cleaning• Reformatting• Summarizing• Understanding

• …Exploratory Data Analysis…• RapidMiner is good at helping with this• …so I decided to write a book• Practical examples within a process context

[1] Ingo Mierswa – verbal communication RapidMiner World Conference 20/8/14 09:43

Page 5: RM World 2014: Rapidminerresources.com

About us - Dr Markus Hofmann

• PhD from Trinity College Dublin

• Lecturer in Informatics at the Institute of Technology, Blanchardstown

• Editor with Ralf Klinkenberg “RapidMiner: Data Mining Use Cases and Business Analytics Applications”

• Editor for an upcoming text mining book

• Extensive knowledge in the data mining domain

Rapidminerresources 5

Page 6: RM World 2014: Rapidminerresources.com

Background

• RapidMiner is a truly powerful product

• The visual method of creating processes means it is more accessible to visual learners

• There is a learning curve and videos are the right way to help with it because this matches the method of creating processes

• RapidMiner videos initially with other collaborators in the future

• We do charge to cover costs and hopefully make some beer money

Rapidminerresources 6

Page 7: RM World 2014: Rapidminerresources.com

RapidMiner Resources - now

• http://rapidminerresources.com

• Approximately 60 videos ~15 minutes duration each

• ~15 hours total length

• Organised as “basic”, “advanced” and “RapidMiner Server”

Rapidminerresources 7

Page 8: RM World 2014: Rapidminerresources.com

Basic idea

• Most videos focus on one operator and show it being used with a mini case study

• Additional context and operators are required to help explanations• Processes and data accompany the videos so users can “sing along”• Tips and tricks as well as gotchas pop up from time to time• More advanced videos tend to focus on broader concepts

• “taming messy data”• “regular expressions”• “macros”• “dates”

• The idea is that they can be used to help learn initially and act as a refresher later

Rapidminerresources 8

Page 9: RM World 2014: Rapidminerresources.com

Operators

Vid

eos

Rapidminerresources 9

Page 10: RM World 2014: Rapidminerresources.com

Juicy images

Rapidminerresources 10

Page 11: RM World 2014: Rapidminerresources.com

Rapidminerresources 11

Page 12: RM World 2014: Rapidminerresources.com

The future

• There is so much we could do

• 5 candidate areas• Groovy Dark Arts

• Text Mining

• Web Mining

• RapidMiner Server

• Time Series in more detail

• The challenge is what is the priority?

Rapidminerresources 12

Page 13: RM World 2014: Rapidminerresources.com

A mini survey

• Pretend you have $100k to spend

• I’m going to give you a link to a survey which will ask you to spend your money across different choices

• You can put all the money on one choice or spread it out across all of them

• Hopefully we will get an interesting result

• It will help us to decide what to focus on

http://goo.gl/vLgy96

Rapidminerresources 13

Page 14: RM World 2014: Rapidminerresources.com

What the survey will look like

• Simply enter money in the 6 boxes (it has to add up to 100)

• Optionally give more detail and your name (don’t worry, no salesman will call)

Rapidminerresources 14

Page 15: RM World 2014: Rapidminerresources.com

Groovy Dark Arts

• Reading from databases

• Getting details from models – for example SVD eigenvectors as example sets

• Multiple inputs and multiple outputs

• Regular expressions for parsing data

• Reading data files efficiently

• Checking example sets to assert correctness

Rapidminerresources 15

Page 16: RM World 2014: Rapidminerresources.com

Text mining

• Word vectors

• Pruning

• Filtering

• Meta data

• Word lists

• Windowing documents

Rapidminerresources 16

Page 17: RM World 2014: Rapidminerresources.com

Web mining

• Browsing and crawling

• Xpath

• Enrichment from external sources

• JSON

• XML

Rapidminerresources 17

Page 18: RM World 2014: Rapidminerresources.com

RapidMiner Server

• Installing

• Passing parameters

• Creating services

• Reports

• Schedules

Rapidminerresources 18

Page 19: RM World 2014: Rapidminerresources.com

Time Series in more detail

• Extracting features from series

• Windowing

• Fourier analysis

• Wavelet transforms

Rapidminerresources 19

Page 20: RM World 2014: Rapidminerresources.com

The survey…

Rapidminerresources 20

Groovy Dark Arts Text Mining Web Mining RapidMiner Server Time Series in more detail

• Reading from databases

• Getting details from models – for example SVD eigenvectors as example sets

• Multiple inputs and multiple outputs

• Regular expressions for parsing data

• Reading data files efficiently

• Checking example sets to assert correctness

• Word vectors• Pruning• Filtering• Meta data• Word lists• Windowing

documents

• Browsing and crawling

• Xpath• Enrichment from

external sources• JSON• XML

• Installing• Passing parameters• Creating services• Reports• Schedules

• Extracting features from series

• Windowing• Fourier analysis• Wavelet transforms

http://goo.gl/vLgy96

Page 21: RM World 2014: Rapidminerresources.com

Questions…

Thank you…

Rapidminerresources 21