platform for data scientists
TRANSCRIPT
www.subex.com
Platform for Data ScientistsBinu K, Architect Analytics Platform
1
Why Platform?
www.subex.com2
Data and Analytics
Capture
• Acquire, extract, parse, aggregate
Analyze
• Feature Engineering, Exploratory analysis
Modelling
• Machine learning, Statistics, Optimisation
Analytics Output
• Application to live data - Trends, Prediction
Communication of Results
• Dashboards and Reports
The process & pain areas
Time taken for data into insights – Few Months
3
60 – 75%
Credits : Forbes
www.subex.com 4
Advantages
Automate repeated routine jobs• Data load• Preprocessing
Maximum resource Utilization• Scheduling job overnight
Focus more on business• Look different use cases• Solution areas
Integrated tool box• Combine tools into one
environment
www.subex.com 5
Expectations
Workbench• Exploratory Data Analysis• Advanced Modelling• Distributed
Architecture
Bespoke Algorithms• Customized ML algorithms• Custom Approaches
Industrialization• Packaged Analytics
Platform
Workbench
www.subex.com6
Work BenchEDA
7
Querying capabilities• Pointed queries• Aggregations• Partitioning• Windowing• Analytical functions
Descriptive Stats• Univariate analysis• Bivariate analysisPredictive Modeling• Building and testing• Ensemble
Bespoke Algorithms
www.subex.com8
www.subex.com 9
Customization
• Decision Trees/Random Forests• Handling categorical values• Identify top reason• Custom node labelling
• K-Means• Weighted Distance • Geospatial distance - Harvesine distance
• Social Network Analysis• Build call network• Community detection• Influencer identification
Domain & scale
Packaged Analytics
www.subex.com10
Objective
www.subex.com 11
Pareto AnalysisExample
Selection of a limited subset which produces significant overall effect. Two comparable metrics with unbalanced magnitudes of cause & effect are identified
Samples
• Smart phones constitute 27% of all handsets but contribute to 95% of all mobile traffic
• 75% of the of the revenue is generated from 15% of distinct rate plans• 10% of distinct problem areas are responsible for 83% of total complaints
Use cases
Can be used to identify impact of a causal metric on a outcome metric.
Private & Confidentialwww.subex.com
ROC® Analytics & InsightsData Flow
12
Streaming & Batch Sources
StructuredROC FMS ROC RA, ROC PS etc.
UnstructuredLogs, Tweets, DPI, Mobile App, ERP etc.
ProfilerDomain Guided Analytics
Analytical EngineDistributed ML and Statistical Techniques
Self LearningContinuous Feedback for Periodic Improvement
Signal Hub
Domain and Analytical Inputs
Daily ProfilesProfile for a day
Profile Manager
Master ProfileProfile from many days
Pareto Analysis
Machine Learning & Statistics Libraries (Mllib, Scikit learn etc.)
AP4
AP2
AP5
AP3
Many more….
www.subex.com 13
Recipe for Success
Regardless of what some software vendor advertisements may claim, you can’t just purchase some Analytics software, install it, sit back, and watch it solve all your problems.
Right combination of domain (business acumen) and analytics is required to solve any business problem
“There is a tendency of solving one’s problems by means of much equipment rather than thought."
Alan Turing.
www.subex.com 14
ROC® InsightsTechnologies
Data Ingestion Data Storage Modelling/Profiler Reporting
TechomicsArchitecture
16