log analysis paralysis - meetings.internet2.edu · history manual searches (grep)text searches...
Post on 07-Aug-2019
214 Views
Preview:
TRANSCRIPT
Log Analysis Paralysis Richard Biever, CISO Duke University
Michael Grinnell, Deputy CISO University of VirginiaJimmy Lummis, CISO Georgia Institute of Technology
Mark McCahill, Architect Duke University
History
Manual searches (grep)Text searches• Great for incidents, speed of search. Bad
if logs spread across multiple machines.
SIEMS (Qradar, Nitro, Logrhythm, ArcSight)• Good for compliance, but generally
security-focused, closed, and required heavy investments in professional services or FTE to tune/run.
Home grown/Open source• Met needs but required ongoing
investments to support.
SIEMs vs. Log Analysis
What do we need?
Log aggregation with search and alerting BEFORE SIEM
(e.g. Splunk!)
The good
Built a relationship with higher ed; met needs of broad IT, good integration, visualizations, alerting
The not so good
ESM was high maintenance, performance tightly tied to hardware
The ugly
Cost model for large amounts of data (>1 TB/day)
Maturing the space
We have a wealth of network, security, and
server operational data available to us
Need real-time updates, big data scale,
flexible analysis and visualization tools
spanning logs
Vendor licenses based on data volume limit
our options particularly around high volume
data
What tools and architectures makes
sense?
Business drivers/requirements
Financial
• TCO must include licensing/software, infrastructure and FTE
• Have to account for analysis in addition to infrastructure
• Costs should not penalize based on increased consumption
• Research and academic consumption must be taken into account
Infrastructure
• Support on-prem or cloud (virtualized!)• Stream data in/out of the system• Support data compression• Support structured and unstructured data• Interoperability with a variety of sources and
clients• Hot/warm/cold storage and retention• Security controls (access control, transport
security, integrity)• Interactive searching, alerting and API support• Data subscription and anonymization• Automation integration
Functional requirements
Data Analytics Support for variety of data analytics tools/processes
Support for integration with visualization tools
Support for integration with machine learning
modules
Customer base
Security IT –Operational
IT –Compliance
Data analytics (academic and administrative)
Research
Case Study – Georgia Tech’s approach
Build requirements and use-cases for an enterprise log management service
1Issue RFI and analyze responses from vendors
2Perform POC testing of most viable solutions to ensure scalability and fit for use
3Identify best procurement approach
4
Most Critical Requirements
MUST ALLOW IT OPERATIONS EFFICIENCY GAINS IN A SECURE
LEAST PRIVILEGE MANNER
MUST BE USABLE AND ALLOW FOR RESEARCH USE AS WELL AS OTHER UNIVERSITY OPERATIONS
COST MODEL MODEL MUST FIT HIGHER EDUCATION BUDGETS
AND NOT RESTRICT USAGE (PREFER ALL YOU CAN EAT)
Case Study - Duke’s approach
Migrate to open architecture with open source tools:
1Phased migration: move network; security; system logs first
2Replace Splunk dashboards with Jupyter notebooks, Python visualization tools
3…meanwhile, legacy Splunk dashboard can submit queries to Spark backend
4
Architecture (spring)
ApacheSpark cluster
Jupyternotebooks
HDFSfilesystem
Flume
archival data registryQuiltData
StreamSets
network& ITSOlogging
Linux/Windowslogging
LivyRESTAPI
RStudioShiny
others…
syslog
realtimelow latency
Kafkamessage queue
Dashboard tooling
• Strategy: use mainstream data science tools• Python, SQL, Jupyter notebooks, RStudio Shiny
• Python visualization / interactive dashboard tools:• tabular data: Qgrid (scroll / sort / filter)• graphics: PyViz (HoloViews, hvPlot, Panels)
Archival dataset
packaging
• Package data + metadata for long-term storage• package registry, versions, checksums, etc.• registry front-ends an object store (AWS S3)• columnar data format via Apache Arrow
• QuiltData: “Docker for data”• https://quiltdata.com/
Some not-so hidden benefits
ALIGNMENT WITH MAINSTREAM DATA SCIENCE TOOLS MEANS WE
CAN LEVERAGE ADVANCED VISUALIZATION LIBRARIES AND
ENGAGE RESEARCHERS
CENTRAL IT STAFF BECOMES CONVERSANT IN THE BIG DATA
FRAMEWORKS CAMPUS RESEARCHERS WANT TO USE
(EXAMPLES: GENOMICS, PRECISION MEDICINE, …)
BIG DATA SETS THE STAGE FOR USING ML (MACHINE LEARNING) TECHNIQUES TO AUTOMATE IT
OPERATIONS
Case Study – UVA’s approach
Build requirements and use-cases for an expanded enterprise log management service
1Perform extended POC of alternative correlation and visualization solutions
2Conduct TCO and trade-off analysis; Select final architectural solution
3Implement new design and retire legacy systems
4
Additional requirements
• Publish/Subscribe to data from Splunk• Multi-tenancy or federation• Centralized log retention
Some not-so hidden benefits
STANDARD LOG FORMATS ALLOW CROSS PLATFORM INTEGRATION
MULTI-TENANCY AND FEDERATION SUPPORTS
RESEARCH COLLABORATION
INTEGRATION WITH INCIDENT TRACKING AND AUTOMATION SYSTEMS FREES STAFF FROM
DRUDGEWORK
top related