big data and security challenges
DESCRIPTION
Big Data is getting bigger and bigger but at the same time before adopting it seriously and exploiting it we should also take care of the security shortcomings it comes up with....from a forensics and security point of view....we need to understand the vulnerabilities they come up with before blindly adopting them!!!!TRANSCRIPT
Geo Intelligence India
13-14 Jun 2013
New Delhi
Do lafzon ki hai DATA ki kahani...............
Ek hai ZERO....duja hai ONE.....
2
Big Spatial Data
Security
WELCOME
3
BIG SPATIAL DATA has been with us for ages in various forms…but pretty invisible!!
4
5Ancient Egypt
River nile
Engineers used to try data analysis to predict crop yields
6695 Km long
Basic Intro
Concepts
Perceptions
Challenges
…the 15 min route to THANK YOU slide
6
An English professor wrote the words :
“A Woman without her man is nothing”
On the chalk board and asked his students to punctuate it correctly….
“A Woman,without her man,is nothing”.
“A Woman: Without her, man is nothing”
7
A greater scope of Geo Int info
New kinds of Geo data and analysis
Real time Geo information
Data influx from new technologies
Non traditional forms of Geo data
Large volumes of Geo data
The latest buzzword
Social media data
0 2 4 6 8 10 12 14 16 18 20
Series 1
DEFINING BIG SPATIAL DATA
8
How we understand it ?
Spatial data sets exceeding capacity of current computing systems……
….to manage, process or analyze the data with reasonable effort
due to Volume, Velocity, Variety and Veracity
DEFINING BIG SPATIAL DATA
BIG SPATIAL DATA9
10
DATA is Exploding in
Volume Velocity VARIETY
While decreasing in
Veracity
DEFINING BIG SPATIAL DATA
BIG SPATIAL DATAFinding actionable info in Massive volumes of both structured and unstructured geo data that is so large and complex that it’s difficult to process with traditional database and software techniques……
Volume
Velocity
VARIETY
VERACITY
Data at rest
Data in Motion
Data in Manyforms
Data in Doubt
11
90% of data in the world was created in the last 2 years
2.5 EB of data is created
every day
U.S. drone aircraft sent back 24 years
worth of video footage in 2009
Gigabyte (GB) - 1,024MBTerabyte (TB) - 1,024GBPetabyte (PB) - 1,024TBExabyte (EB) - 1,024PB
* Estimated revenue FY 2013
growth of geospatial data is outpacing both software and services and is set to become a major contributor to the overall growth of the industry
13
100% security is a mythNo one has said this!!!
But it remains a fact
14
Increasing attack surface
The technology is ready….
But are we ready ?
15
16
16
DISASTER RELIEF
FINANCIAL
FRAUD DETECTION
CALL CENTER REQUESTS
DISEASE SURVEILLANCE
INSURANCE
RETAIL
TELECOMMUNICATIONS
UTILITIES
ECO-ROUTING
The otherof the
side story
17
Security challenges before we adopt Big spatial data
18
Distributed programming frameworks
Ek
19
Distributed programming frameworks
Input fileMap Intermediate
Combining Shuffle Output File
Local Reduce Reduce
Mapper performs computation& outputs a key/value pairs
20
Reducer combines the values belonging to each distict key and outputs the result
Utilise parallilism in computation & storage to process massive amounts of data
MAP REDUCE
FRAMEWORK
Splits the input data-set into independent chunks which are processed
in a completely parallel manner
Aggregate results from map phase
performs a summary operation
Schedules and re-runs tasks
Splits the input
Moves map outputs to reduce inputs
Receive the results
Distributed programming frameworks
21
So challenge is not storage but it is I/O speed
One Machine
4 i/o ChannelsEach channel : 100 MB/s
10 Machine’s
4 i/o ChannelsEach channel : 100 MB/s
Read 1 TB
45 Min 4.5 Min
Untrusted Mappers
Securing the data in the presence of an untrusted mapper
Distributed programming frameworks
23
NO SQL ISSUES
TWO24
25
First off : the name
NoSQL is not “NEVER SQL”
NoSQL is not “No To SQL “
26
NoSQL
Is simply
Not Only SQL!!!!!
MongoDB
Redis
27
NoSQL DB are still evolving with
respect to security infrastructure
Data storage & transaction logs
28
STORAGE TIERS
- Multi-tiered storage media
- Necessitated by scalable size
- Different categories of data- Different types of storage
Data storage & transaction logs
29
Lower tier means reduced security, loose access controls
Keeping track of data location
Data storage & transaction logs
30
INPUT VALIDATION/FILTERING
31
How can we trust data ?
Validating data when source of input data is not reliable?
Filtering malicious data @ BYOD
Input validation/filtering
32
REAL TIME MONITORING
33
Humongous number of alerts!!!!
False positives
Filtering malicious data @ BYOD
REAL TIME MONITORING
34
Secure communication
35
End to end security ?
Data encryption : attribute based encryption!!!to be made richer
Secure communication
36
Granular audits
37
New attacks will keep happening…and to find out we need detailed audit logs
Missed true positives
Granular audits
38
PRIVACY ISSUES
39
EG : How a retailer was able to identify that a teenager was pregnant before her father knew
40
PRIVACY ISSUES
In the world of big data,privacy invasion is a business model
And...
We Also Have cloud with us?
41
At 1.4% in 2011-12 Cloud was a very small percentage of the total IT spend
42
Pace of Big Spatial Data adoption has been
Sluggish
43
44
There is unlikely to be a day soon in near
future when we have a
“FIND TERRORIST”
BUTTON
45
We have mostly been reactive till
date…..
46USE KERBEROS FOR NODE AUTHENTICATION – (BUT WE KNOW IT’S A PAIN TO SET UP)
STRINGENT POLICIES
STANDARD TO INTRA COUNTRY LAWS
EXHAUSTIVE LOGS
SECURE COMMUNICATION
STRINGENT POLICIES
47