albert gavino on foss in data science and analysis

15

Upload: cp-union

Post on 11-Apr-2017

64 views

Category:

Technology


1 download

TRANSCRIPT

Trends in Data Science Albert Gavino, Talas Data Scientist, Climate Reality Leader August 2016

Working with MVP group of companies

The 3 BIG V’s

VOLUME

VARIETY

VELOCITY

From regular data to BIG data

Reg

ular

dat

aB

IG d

ata

Statistical modeling

Machine Learning

Deep Learning / A.I.

Traditional Modern

Trends in Data Science Domains

Data Science Domain Current Status in the Local Setting

Statistics Traditional

Natural Language Processing (NLP) Entered the market

Predictive Analytics / Machine Learning Entered the market

Visualization / Dashboards Entered the market

Image Processing (openCV) Exploration

Internet of Things (IoT) Exploration

Artificial Intelligence/ Deep Learning Exploration

DS/Big Data Applications to the field of Study

Agriculture Climate forecast modeling to help farmers manage plantations (e.g. corn yields)

Medical field Image processing for chest x rays, retina images for diabetic patients

Linguistics Natural Language Processing (NLP) for dialects and Sentiment Analysis applications

Economics/Finance Predicting good stock options based on good economic indicators. (e.g. effect of Elections on PSE)

Sample Field of Study Specific Applications

Engineering Internet of Things (IoT) application to Big Data

Building a Data Science Team

Data Scientist Data Engineer/Dev Ops

Statistician Viz Expert

R, Python, Spark ML

Hadoop, Spark Core, Spark stream

SAS, SPSS, R, Matlab

Tableau, Cognos D3, Javascript

Neural NetsRandom Forest

RDD, dataframes,SQLContext

Linear RegressionK-means clustering

visualization GIS maps

DS

ro

le

Pro

g La

ngua

ge

Sam

ple

ou

tput

Data Science Team Composition

Python, R and SQL : Which one to Choose?

2014 DICE Tech Salary Survey Analytics Vidhya

Trends on Programming Languages

scalaR

pythonspark Rapid miner EMC

java

The rise of the DS notebooks

Notebook Options:● Running locally on a machine (e.g. jupyter)● Running on the cloud (e.g. databricks)

Data Science Crowdsourcing Platforms

Ensembling and Kaggling

DS Trends Slides Summary

Choose domains which are good to focus on (e.g. data science, dev-ops, viz, statistics, ML)

It is good to learn a programming language such as R or Python and good to combine with SQL skills

Machine Learning Ensemble methods are still quite popular with DS crowd-sourcing platforms.

DS Notebooks are convenient to use

Q & A portion

www.talas.phwww.facebook.com/groups/talasph