big data approaches to cloud security

Download Big Data Approaches to Cloud Security

Post on 15-Jan-2015




6 download

Embed Size (px)


Slides of a talk given to the Seattle Chapter of the Cloud Security Alliance. Looks briefly at Architectures, Sources of Log Data, and behavioral signatures in the data and issues and observations around using Big Data products for security.


  • 1. BIG DATA APPROACHESTO CLOUD SECURITYPaul Morse President, WebMall VenturesCloud Security Alliance, Seattle Chapter 3/28/2013

2. BIG DATA IS NOT JUST ABOUT LOTS OF DATA, IT IS ABOUTHAVING THE ABILITY TO EXTRACT MEANING; TO SORT THROUGH THE MASSES OF DATA ELEMENTS TO DISCOVER THE HIDDEN PATTERN, THE UNEXPECTED CORRELATION,Art Coviello, executive chairman of RSA ON THE SURFACE, BIG DATA SEEMS TO BE ALL ABOUT BUSINESSINTELLIGENCE AND ANALYTICS, BUT IT ALSO AFFECTS THE NITTY- GRITTY OF POWER AND COOLING, NETWORKING, STORAGE AND DATA CENTER EXPANSION. 3. AGENDA Observations Cloud Architectures/Components Machine-Generated Data Sources of Data Time Sequencing of Events Searching for Behavior Recent Hack Examples 4. OBSERVATIONS Big Data solutions are changing the game for security practitioners and execs Provide the ability to look at discovery, detection and remediation across large portionsof the organization in entirely new ways Correlation between seemingly unrelated events in near real time is now relatively easy Growing range of solution types simple to highly complex Roll your own to pre-packaged solutions On-prem, Public Cloud-based and Hybrid Simple Log search to Predictive Analysis with complex dashboards and reporting Some solutions have extremely short time to value propositions Big Data Washing like Cloud Washing is showing up Prices vary Free to mondo It is NOT the holy grail for security but has many advantages over traditional SIEMproducts real time, large amounts of data, broad event correlation, etc. 5. SET THE STAGE Many perspectives to Cloud Computing Main focus for this talk is as a Public Cloud Provider You are the owner of the facility all of it. Infrastructure-centric discussion How do Big Data solutions improve Security? 6. YOUR CLOUD DATACENTER 7. SCADA DATA SOURCES Backup Generators Door Wireless DevicesBackup Batteries SensorsRFIDPCs Tablets PowerCard Key Storage Distribution Systems PrintersPhones?This is your attack surface TempWater SystemServers Sensors Lighting controlsRouters/SwitchesI want all the data in one searchable repository and available in near real time 8. SECURE? THINK AGAIN. Internet Mapping Project harmless Port ping and bot install 660 million IPs with 71 billion ports tested 460 Million Devices Responded Resulted in 420 thousand bots Stupid uid/pwd combos Admin/admin, Admin/no pwd,root/root, root/no pwd Whats on your network? 9. CAUSE FOR PAUSE We hope other researchers will find the data wehave collected useful and that this publication willhelp raise some awareness that, while everybody istalking about high class exploits and cyberwar, foursimple stupid default telnet passwords can give youaccess to hundreds of thousands of consumer as wellas tens of thousands of industrial devices all over theworld. 10. MACHINE DATA Isnt it really all machine data? Machine-generated data (MGD) is the generic term for information which wasautomatically created from a computer process, application, or other machinewithout the intervention of a human. Network Device Log files Event logs Application logs RFID logs Storage logs HVAC Logs Sensor data Etc. 11. MACHINE DATA EXAMPLESApache[Fri Sep 09 10:42:29.902022 2011] [core:error] [pid 35708:tid 4328636416] [client] File does not exist:/usr/local/apache2/htdocs/favicon.icoJuniperSep 10 07:06:45 host rpd[6451]: bgp_listen_accept: Connection attempt from unconfigured neighbor: 10 07:07:53 host login: 2 LOGIN FAILURES FROM 10 07:08:25 host inetd[2785]: /usr/libexec/telnetd[7251]: exit status 0x100Oracle/SiebelSQLParseAndExecute Statement 4 0 2003-05-13 14:07:38 select ROW_ID, NEXT_SESSION, MODIFICATION_NUM from dbo.S_SSA_IDIIS192.168.114.201, -, 03/20/01, 7:55:20, W3SVC2, SALES1,, 4502, 163, 3223, 200, 0, GET, /DeptLogo.gif, -,, anonymous, 03/20/01, 23:58:11, MSFTPSVC, SALES1,, 60, 275, 0, 0, 0, PASS, /Intro.htm, -,Card Reader10/23/04 06:16:32,Administrator,00000101,Anderman,Penny,00026,01000,10/22/200510/23/04 06:16:32,West Gate,00000100,Peterson,Bob,00954,01000,10/21/2005 12. TIME SEQUENCE OF EVENTS Outbound TrafficTerminate SessDelete logs Installer runs Upload Small File CommandFailPassLogin Attempt ServerTOR LB Front endIP Address/PacketT0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 13. TIME SEQUENCE OF EVENTSTerminate SessDelete logsUpdate Upload Small FileCommandFailPassLogin AttemptDeviceTOR LB Front endIP Address/PacketT0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 14. TIME SEQUENCE OF EVENTSTerminate SessDelete logs Update Upload Small FileCommand Fail PassLogin AttemptDeviceIP Address/PacketT0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 Door 5 Door 4 Door 3 Door 2 Door 1 T-30T-15T0 T15 T30 T45 15. SOME AREAS TO CONSIDER Ingesting various data formats Many vendors claim it is easy, when it may not be Transforms and connectors may be required (affect performance) Device companies create add-ons, connectors, dashboards, transforms, queries, etc Speed of indexing determines real time abilities Do you need to index ALL machine data? Vendor-specific Query languages No standard, some commonality Learning curve for seriously complex queries and operationalizing environment Dashboards and Visualizations Vary Large number of simultaneous queries is required Workflow is critical what happens when you find something? Implementation architecture lots of hardware? Bandwidth? Security? Users? Data Governance You found what? 16. HACK EXAMPLES DOJ in January Defacement What specific behavior happened and what did they do? Log in Remotely Completely replace Index.* Solution monitor index.* and set up a parsing stream and search for a code in the html. Call a workflow if the file changes or the code doesnt match. DDoS Overwhelm Website Solution compare request rate of increase to a previous norm. If the disparity is great enough, call a workflow to check IP addresses of source(s). Depending on results, do nothing or script a filter or block. 17. VENDORS AND GETTING STARTED Hadoop with Flume Getting Started HP ArcSight Easiest Cloud Based Loggly Sumo Logic Splunk Storm Logrythm Download and Install SumoLogic Loggly LogScape Logrythm LogStash LogScape Sawmill LogStash Sawmill Splunk Splunk Splunk Storm Hadoop/Flume/Pig