splunk live university of alberta 2015

17
Greg Dostatni Team Lead, Application Hosting Splunk at the University of Alberta Copyright © 2015 Splunk Inc.

Upload: dostatni

Post on 15-Aug-2015

113 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Splunk live university of alberta 2015

Greg DostatniTeam Lead, Application Hosting

Splunk at the University of Alberta

Copyright © 2015 Splunk Inc.

Page 2: Splunk live university of alberta 2015

2

• At U of A since 2007• Responsible for 10-person

team managing applications and databases university-wide

• Splunk user since 2013• I’ve eaten BBQ chicken

intestines on a stick. Yummy.• splunk> take the sh out of IT

Page 3: Splunk live university of alberta 2015

3

The University of Alberta

• Public research university based in Edmonton and founded in 1908

• 39,000+ students and 18,000 employees

• 5 campuses and 18 faculties• One of the top 100 universities

worldwide

Page 4: Splunk live university of alberta 2015

4

IT at the University of Alberta

Central IT group for authentication, wireless and core services

Independent IT groups for most faculties and departments

University-wide initiative to consolidate more of IT

Need to standardize IT operations and tame diverse technology stacks

4

Page 5: Splunk live university of alberta 2015

5

Application Hosting Objectives

• Centralize more of IT• Build and manage shared

environments• Develop custom services as

needed• Roll out/upgrade applications• Investigate performance

problems

IT

Libraries

LMS

Public website + CMS

Ticketing

Billing systems

Research group serversOther applications

and databases

Page 6: Splunk live university of alberta 2015

6

Challenges after Restructuring IT

• More interdependencies among teams

• Massive volume of data, housed in silos

• “Running blind” – no understanding of the data

• Time-consuming to gather data for incidents

Page 7: Splunk live university of alberta 2015

7

Splunk Timeline

• Funding to rebuild Splunk environment

• New hardware, clustering with dedicated storage

• 400 data sources• 133 sourcetypes

April 2015

• Management notification of syslog data loss

• Incidents escalated

• Splunk in production?

Sept. 2014

• Data loss concerns from restarting Splunk

• Management relying on Splunk reports

• Splunk not in production

March 2014

• Pilot deployed• Splunk as syslog

target• Log aggregation

test; no need for backup

Sept. 2013

Page 8: Splunk live university of alberta 2015

8

Splunk at the University of Alberta

Infrastructure Applications

(mail, authentication)

Networking and Security

(switches, IPS)

Application Hosting

(apps, databases)

Page 9: Splunk live university of alberta 2015

9

Example: Troubleshooting Authentication Systems

Before

• 12GB/day, 20 machines• No aggregation• Reactive issue response

based on user feedback• Manual investigations• Delay in getting data

After

• Centralized data• ½ hour to troubleshoot• Proactive alerts for issues• Easy access to

infrastructure data• Real-time reporting

Page 10: Splunk live university of alberta 2015

10

Example: Performance MonitoringTrack and correlate request response times to gauge user satisfaction

Page 11: Splunk live university of alberta 2015

11

Example: First Responders AppDashboards for initial incident review

Page 12: Splunk live university of alberta 2015

12

Example: Proactive AlertsTrigger alerts on both the count and percentage of messages

Page 13: Splunk live university of alberta 2015

13

Example: Executive Dashboards

Page 14: Splunk live university of alberta 2015

14

Splunk Deployment Takeaways

Successes

• Visibility cutting through team boundaries

• More advanced initial incident investigation

• Openness - signed standard IT agreement for access to Splunk data

• Management loves reports• Defusing situations with rapid

access to facts

Challenges

• Accepting syslog data directly• Log standardization• Figuring out what to look at in the

logs to understand “good” system behavior

Page 15: Splunk live university of alberta 2015

15

Aha! MomentsTransactions

• End-to-end monitoring of 4M+ email messages per day (greylisting spam filtering Google)

• Used transactions to combine logs across systems into single, message-centric log

• Ability to easily search for anomalies

Generic Alerts

• Created alert to catch errors across systems in real time

• Used existing alert and removed host specification to create the generic alert

• Catches errors that were not in Splunk at the moment the alert was created

10-second Query

• 10-second window = ~35,000 events

• Statistics to rank likely events triggering issues

• New Splunk window to analyze unusual messages

• Ability to examine small slice of time in detail while running statistics over longer period of time

Page 16: Splunk live university of alberta 2015

16

“Splunk allows us to erase these lines and any analyst can see all the data from

anywhere and investigate a problem from end to end.”

Page 17: Splunk live university of alberta 2015

Thank you