webinar on big data challenges : presented by raj kasturi
TRANSCRIPT
1
Is Scrum a good fit for
solving big data challenges? Speaker – Raj Kasturi
September 19th, 2017
10:00 to 11:00 AM EST,
7:30 PM to 8:30 PM IST
Special Thanks to:
2
• 25+ years of IT experience with eight plus years of enterprise level Agile Experience
• Agile experience as an Agile Coach, Scrum Trainer, Scrum Master
• Leading and helping large-‐scale Agile project transitions
• Adjunct faculty at Pennsylvania State University, Pennsylvania, USA. • 18+ years of teaching experience in Technology, Project Management
and 8 years of teaching Scrum, Agile courses
• Started my career as a programmer; worked as App. Dev. Manager
• Speaker, volunteer at agile conferences, user groups
• Servant Leader – Agile World, User Group, Scrum Alliance
My Website/Blog: http://agilekingdom.com/
@AgileRaj
https://www.linkedin.com/in/rajkasturi/
Raj Kasturi, MBA
3
Agenda
What is big data?
The three V’s of big data
Big Data Trends of 2017
Agile Spectrum
Big data complexity and empirical process control theory
Scrum and Big Data
Summary
4
What is big data?
▪ The term big data was coined in late 1990s
▪ Big data is different than regular data
▪ Billions of data sets and their interaction
▪ Traditional RDBMS is for regular data
▪ RDBMS cannot handle big data
▪ Requires a new technological approach for handling and
processing
▪ New data platforms to meet storage and performance
requirements
5
The 3 V’s of big data
Volume
VarietyVelocity
Are these three factors required to drive the need?
6
Add Value
▪ Do we have a fourth V?
▪ Aggregate to provide value
Value
7
Google’s flu tracker
▪ Knowing the what, rather than the why was good enough
▪ 2009 H1N1 flu epidemic
▪ Real-time flu tracker “Google Flu Trends”
▪ Flu sufferers google before visiting a clinic
▪ Search queries optimized, accurate and real-time data
▪ Data was far more effective than CDC – Size
▪ 3 billion searches a day
▪ Large servers and clever algorithms to sort data
8
Who uses it?
▪ Financial Services
▪ Telecommunications
▪ Energy
▪ Government
▪ Retail
▪ And many more…..
9
Complexity
Big Data
10
Agile Spectrum
11
Input Output
May have internal processes
Process
12
Input Output
May have internal processes
Defined Process
Composition known
Characteristics well
defined
• Sequential/Series of steps
• Underlying process well understood
• Results repeatable/predictable
• Command & Control approach
• Pre-defined variations are acceptable
13
Empiricism
14 14
Transparency
Ad
ap
tatio
n
Insp
ect
ion Black
Box
Frequently Inspect
and remove any
unacceptable
variations
Adjust and
control the
process, Improve
Significant aspects of the
process must be visible to
those responsible for the
outcome
Inputs Outputs
Needs frequent measurement
Problem cannot be fully understood or defined
Solution evolves as information becomes known
Protect the
black box by
not adding
anything new!
15 15
16 16
17
Hadoop’s distributed file system (HDFS)
Source: Managing big data workflows for dummies
MapReduce - think of it as a framework that processes and reduces raw big data into
regular‐size, tagged datasets that are much easier to work with.
18
Popular platforms and tools
➢ Pig
➢ Apache Hive
➢ Apache Sqoop
➢ In-memory databases
➢ NoSQL databases
➢ Massively Parallel Processing (MPP)
➢ Cassandra
➢ Hadoop
➢ Plotly
➢ Bokeh
➢ Neo4j
➢ Cloudera
➢ OpenRefine
➢ Storm
19
Scrum and Big Data
➢Scrum’s ability to measure work output –
Velocity
➢Knowledge is based on the ability to measure
a given phenomenon
➢Once we measure it, we can start to
manipulate the input and determine if we’ve
improved something by the resulting output.
Inspect & Adapt concept
➢-we have discussed empiricism and Scrum is
based on empirical process control
➢Continuous improvement
20
Top 10 Big Data Trends 2017
1. Big data becomes fast and approachable:
Options expand to speed up Hadoop
2. Big data no longer just Hadoop:
Purpose-built tools for Hadoop become obsolete
3. Organizations leverage data lakes from the get-go
to drive value
4. Architectures mature to reject one-size-fits
all frameworks
5. Variety, not volume or velocity, drives big-data
investments
21
Top 10 Big Data Trends 2017
6. Spark and machine learning light up big data
7. The convergence of IoT, cloud, and big data create new
opportunities for self-service analytics
8. Self-service data prep becomes mainstream as end users
begin to shape big data
9. Big data grows up: Hadoop adds to enterprise standards
10. Rise of metadata catalogs helps people find analysis-
worthy big data
22
Summary
Scrum is good for work:
With a fair degree of complexity,
Requires innovation
Requires invention
Product differentiation
Productivity
Faster launch to market
I say that Big Data needs all of the above.
23
Attributions
1. http://www.scrumguides.org/scrum-guide.html Scrum Guide 2016
2. https://www.scruminc.com/scrum-big-data-2/ JJ Sutherland
3. Managing Big Data Workflows for dummies – BMC Software special edition- Joe Goldberg
and Lillian Pierson, PE
4. Top 10 Big Data Trends for 2017 Tableau