fluturas presentation @ big data conclave

Post on 01-Nov-2014

360 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Flutura had presented at the big data conclave . Please find the presentation

TRANSCRIPT

Agenda

• 3 Industries , 5 real life Flutura user stories

• 7 Key “Gotchas” & Big Data Best Practices

Case Study-1 : Reducing Network threats by Detecting Patterns in

perimeter device logs

What is the Biz problem being solved ?

What is the problem being solved?

Network threats are growing ...

What is the problem being solved?

• 2 types of threats – Internal ( Social Unrest & Watch List ) & External ( Hackers )

External hackers Internal Activists

Who is experiencing the pain ? Telecom Security Operations centre

Lots of Telecom Machine data left untapped !

This is typically flushed but has gold in it

Why is it important to solve this problem?

• Reduces network disruption from hackers

• Minimize social disruption and unrest

Traditional RDBMS architectures cant handle high velocity machine data !

SOC's cant see threat patterns … running BLIND

• Being Blind = Risk • BeingCannot be blind to patterns anymore • The capability to “see” patterns previously not seen • Network activity and behaviour – Firewalls , routers • Saves lives, provides social stability – WL Chatter !

Capability to remove “data blind folds” to “SEE” behavioural patterns key to

security

MACHINE DATA

KEY TO UNCOVERING

SECURITY PATTERNS !

What are some “behavioural signatures” ?

1. Sudden increase in you tube uploads @ night

1. Viral Rate of propagation of MMS videos

So what does the data look like ? National content filtering log – 1 billion events/day !

16

1329031890 http://photogallery.indiatimes.com/photo/4686985.cms 94.200.107.14 94.200.0.0 Du_Public_IP_Address 0 37

1 2 3 4 5 6

Decoding 7 components of the Netsweeper log entry

7

EPOCH Time stamp

URL requested Source IP Client

subnet Client group

name 0 allowed 1 denied

URL Category Descp tbd

50 categories in the system

Education, Pornoraphy, Phishing, Criminal Skills etc

23" - Its related to "Pornography “45" - Its related to "GENERAL"

Timestamp URL requested Source IP Client Subnet Client Group Name Denied flag URL Categort

Decoding National content filtering logs

Expand to ingest variety of watched events

File Delete Events

User Login Failure Events

Root access Failures

2 Sigma events

Table Drop Events

Table Delete Events

Column Drop Events

Critical Proc recompilation

OS logs Database logs

Critical tsn value changes

Master data changes

App login failures

Login at unusual time windows

Application logs

Search for specific keywords

2 Sigma event for URL’s

Decomp tree- failed reqsts

Login Failure

Web server logs

Dropped call frequency

Watch List inbound/outbound

Cut calls - poor connection

Call Failure event frequency

Timeout event frequency

Swarm event detected

Dropped IP calls frequency

Failed IP call frequency

CDR logs IPR logs

SMS Capacity events

Unusual sms traffic events

User defined router events

Compliance related router event

Router logs

Odd hour Unsuccessful logins

X happens Y times in Z time

User defined firewall events

Compliance oriented firewall e

Firewall logs

Frequency of login failures high in a certain pockets Recency of late night events noticed in certain pockets Certain corridors experiencing high dropped calls

Converting raw data Actionable Intelligence

INTEGRATED

EVENT 360

REPOSITORY

SENSE &

RESPOND

LAYER

LOG FILE

INGESTION

MACHINE LEARNING

ALGORITHMS ON

GRANULAR LOG

EVENT DATA

INFER INTENT FROM

PATTERNS

AND CREATE EVENT

PROFILES

LOAD RISK /

BEHAVIOR PROFILE

TO RULES ENGINE

DB

INTERCEPT OR

OFFLINE REVIEW OF

EVENTS

CONSOLIDATE & REVIEW

EVENT INTERCEPTS TO

ASSESS EVENT RULE

EFFECTIVENESS

MEASURE PATTERN RULE

EFFECTIVENESS

- TRUE POSITIVE / FALSE

POSITIVES

CASE MANAGEMENT

WORKFLOW

TELECOM SWITCHES OTHER DEVICES •CDR LOG FILES •IP LOG FILES •MISC LOG FILES

Holistic Value Chain

BIG DATA

REPOSITORY

Case Study-2 : Decoding travellers intent

What's the problem we are trying to solve ?

• Travellers are “signalling” to us thru the behaviour they exhibit

• OTA is unable to sense n respond to these varied behaviour

Why is it important to solve this problem ?

• Impacts look to book

• Increase revenue from cross sell

Srikanth intends to travel from San Fran to NYC

Srikanth searches !

Srikanths First Moment of Truth !

Srikanth sees the options rendered !

Is Srikanth Price Sensitive or Time conscious traveller?

87 % 13%

Does Srikanth have a bias towards any

airline ?

Those small clicks reveal a lot !

So who is Srikanth? Do we 'know' him ?

What's his behavorial DNA ? Key vectors ?

Early bird ( days = 21 ) Price insensitive ( click % = 89 %) Prefers American Airlines Most valuable customer ( Decile-1 ) Intra visit interval = 17 days Visit dispersion = 12 % International Churn propensity = 0 Bargain hunter = No ( 3 % coupon) Roadie = Yes ( 28000 miles per qtr ) Sentiment index = 73 %

How do we respond in real time to Srikanths experience and behavioural patterns we’ve seen ?

• If Srikanth is a high value customer

• If he does not book within 8 min window

• In real time route to high performing agent

• Short circuit the queue

• Extra 10 % discount since he is vulnerable

• If search response time velocity is trending downward

• Signal to beef up infrastructure

• Optimise code base

• Property recommendations

Case Study-3 : Watched List

What is the problem being solved?

• Internal watch lists

• Can we get e signals in their behavior ? Call patterns ?

SMS patterns ?

Youtube upload patterns ?

Watched countries ?

Intrawatch list chatter ?

Late night communication behavior ?

• Watch list activity intelligence takes 6 weeks

• Bring it down to < day

• Enhance it to make it real time

Why is it important to solve this problem ?

• Threat signals are there in telecom and communication logs

• Saves lives !

• Ensures national

security !

Under the hood

• Remote Authentication Dial-In User Service (RADIUS) provide authentication, authorization and accounting for network access.

• When a user wants to get access to the Internet he will first have to give his users

credentials (in most cases username and password) to a local RADIUS client.

Deconstructing Radius Logs

The IP address of the NAS ( Network Access server ) that is sending the request

The framed address to be configured for the user

3 time stamps

User Identity

Radius logs Netsweeper logs

Subscriber database

Rich Security intelligence !

Triangulate from 3 event data pools

Access/Device

Framed IP address

Customer ethnicity

URL accessed

Date/time

Day

Week

Client IP address

Customer type

Customer browse location

Post paid Subscriber Database

1329031890 http://photogallery.indiatimes.com/photo/4686985.cms 94.200.107.14 94.200.0.0 Du_Public_IP_Address 0 37

Status

Enterprise

Residential

Asian

European

Dubai

Smart Phone

Desktop

Ipad

Others

URL Type

Gaming sites

News sites

Others

?

? Yes

No

Business rule to derive access device to be elicited from

SME

Location mapping business logic to be elicited from SME

Social Networking

Blogs

P2P sites

VPN/VOIP

NAS Port Id

Username Nas port id RADIUS Logs

Co-relating fragmented telecom log files-Info model

Calls to watched countries

Intra Watch list Chatter velocity is high

Call patterns reveal malicious intent

38

Entity on watch list

NOT on watched list but high level of

interactions

Are people ‘n’ degrees away from watched list performing 2 sigma activity across multiple Call dimensions – sms, voice, conference and other behavioral activity ?

CDR From BTN To TN Date/Time Duration Call type, Approximate tower location which carried

call

Watch List Recommender Data Product Modeling Unique behavioural signature

Discarded Telecom data--> Actionable Security patterns

Case Study-4 : Mobile forensics

Mobile funnel data Analyzing Mobile Sub Channel Behavioural

shift to Drive revenues for a leading online

travel company

What's the problem being solved ?

• More applications becoming mobile

• There is a dip in transaction completion rate

• Friction points and hot spots exist

• No way to “see” these hot spots and patterns

• Spot friction points

• Mobile funnel drops

• Payment gateway drops

• Airline connector drops

Funnel Analysis

Churn Scoring Model

Case Study-5 : Money transmission

Minimizing fund leakages to watched entities

Money transmission event stream Threat matrix Graph Analysis

Money transmission behavioral modeling

Modeling money transmission behavior

Graph analysis to monitor money transmission patterns

• Each account can be modelled as a node in a graph

• Behaviour across nodes can be analyzed

• Proxy behaviours can be easily discerned

7 Key “gotchas” ( best practices)

Lesson-1 : Think “Polyglot persistence”

Asset

Sensor

Parameters

Asset tags Sensor tags

Events

Column family ( Hbase/Cassandra)

Document db ( Mongo)

Graph db ( Neo4js)

RDBMS ( Oracle )

Heavy duty write workloads

Photos, Videos, text Inter relationships

Low velocity self service

Logical Business Model

“Different strokes for different folks”

Lesson-2 : Think “pattern extraction”

1. Collaborative filtering

2. Text Mining

3. Scoring Models (

Logistic etc )

Embedding one ML process can help SPOT patterns not previously seen

Lesson-3 : Think “Baby steps”

• 60-90 day Hadoop Sandbox

• Build quick wins to

build momentum

• Pick a few low

hanging use cases to demonstrate impact

No Big Bang !

Lesson-4 : Think “Data Products”

• Data Product = “Action an end user takes”

• EXAMPLE

• Watch List recommender vs tons of “feel good” graphs

• Next best action vs lots of dials, graphs

Focus on Outcomes more than Analysis

Lesson-5 : Think “MVP-Minimum Viable Product”

• Minimalist ... Key is to start simple

• Only core features ... No bells and whistles

• Get feedback from early adopters and enrich features

How can Big Data co-exist with existing DW solutions ?

Big Data Existing DW

Existing DW

OSS BSS CRM

ETL

Existing BI tools

Radius logs IP traffic

logs Comments

File copy / Bulk load / Agent based

Operational App Integration

Existing DW

OSS BSS CRM

ETL

Existing BI tools

Radius logs IP traffic

logs Comments

File copy / Bulk load / Agent based

Operational App Integration

Lesson-6 : Gracefully Co-exist

Lesson-7 : Think “Biz backward … NOT Tech forward”

1. What is the business problem you are solving ? Tightly framed ?

2. Why is important to solve this problem ?

3. What happens if we dont solve this problem ?

4. Is status quo an option ?

5. Is the business pain acknowledged ?

6. How would the end user “feel” when the product is deployed ?

7. Are budgets allocated ?

8. What is the actual use case to solve the pain ?

Connect with business @ a deeper level !

1. Think “Polyglot Persistence”

2. Think “Pattern Extraction”

3. Think “Crawl-Walk-Run”

4. Think “Data Products”

5. Think “MVP”

6. Think “Co-existence”

7. Think “Business Impact/Outcomes”

To summarize !

Taming and channelising data beast is going to be a crucial capability for survival !

Pl feel free to reach out …

Derick.jose@fluturasolutions.com

top related