splunk bsides

25
Why it matters to IT MACY CRONKRITE @MACYCRON www.facebook.com/safehex

Upload: macy-cronkrite

Post on 21-May-2015

920 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Splunk bsides

Why it matters to ITMACY CRONKRITE

@MACYCRONwww.facebook.com/safehex

Page 2: Splunk bsides

Data Mining for Organization ValueData you are already processing has value• Audit trail & application status• Automatic monitoring for errors and warnings• Helping track down configuration problems• Helping track down bugs• Micro analysis of user behavior “click stream” and

complex events

• No more email to “monitor a process”• Get alerts only when something critically fails.

Page 3: Splunk bsides

What is it?

• Search and analysis engine• Google like search of your log data

Page 4: Splunk bsides

RDBMS???ARE YOU KIDDING?

Organization Data is BIG DATA (velocity-variety-volume)

So Map Reduce – Key Value Pairs FTW!!!!

Old way RDBMS>>> New Way (Map Reduce)

Page 5: Splunk bsides

• Could be better than user supplied info? AKA tickets, complaints, unreported errors.

• Behavior Analysis (Good and Bad)

Page 6: Splunk bsides

Versions

• Free– 500MB/day– Reporting– Ad-hoc search

• Enterprise (all above and)

– 500MB/day and more! – Access controls– Distributed Search, Load Balancing– Monitoring & Alerting

Page 7: Splunk bsides

Server 1

• Install Splunkd and SplunkWeb• Via WebGUI under Manager tab• Add Receiver Port to enable forwarders

Server1

Page 8: Splunk bsides

Setup 2Forwarder Setup (most common)

• Server1– Install Splunkd and SplunkWeb

• ServerX– Install Splunkd

Server1

ServerX

ServerX ServerX ServerX ServerX

ServerX

Page 9: Splunk bsides

MACHINE DATA

• Most sensors create log files• Anything with a time-stamp• Unstructured data (many source types)• Anything that the system does on behalf of a

user can be tracked, aggregated, and correlated across servers and applications

• At minimum two keys are needed; – timestamp, and unique user session id.

Page 10: Splunk bsides
Page 11: Splunk bsides

Why --- Event Correlation• It leverages a natural query language to

perform searches and analysis of log files.• A single search can cross multiple disparate

logs looking for key words and other structures

• Splunk is licensed per volume of data indexed, not on a per server basis

• Build Apps (custom views) for specific ROLES

Page 12: Splunk bsides

Mix Human Event Reports AND Machine EventsCorrelate your 1X / Base case instantlyLOGS are on all layers of your application stackAlert when the combination of events meet criteria. Less for human to parse Whew!! Less data overload/ignore you won’t go back

Page 13: Splunk bsides
Page 14: Splunk bsides

What is Splunk?

• Sounds like its expensive or it takes weeks to set up. • There’s a free license. It installs in 15 minutes. On your laptop, while you’re testing it out,

search billions of events in seconds. When you’re ready, scale up to your datacenter and search trillions. Basic searching and quite a lot of the reporting will work right out of the box.

• Bullsxxx.

Well I’m not saying that 15 minutes in, it’s going to be emailing your boss a pdf pie chart of “lost revenue – top causes”. But that’s seriously possible in a couple of hours. Out of the box, Splunk will parse your data and extract out a lot of meaning, and if it doesn’t get everything, teaching it how to extract the juicy numbers and names from your events is really pretty straightforward. Then, once all the numbers and names are extracted and ready to be reported on, you’ll be able to do real searches and reports that help your people solve real problems. And when you get to that point, from then on it’s pretty much crack. My goal in this document is to get you addicted. Sorry.

• Download Splunk for free and try it for yourself from splunk.com, right now.

Page 15: Splunk bsides

Uses

• Right Now we are using Splunk to calculate our VPN metrics for the Remote Access service

• Total Sessions– index="vpn" user authentication Successful | stats count AS Logins

• Unique users– index="vpn" %ASA-6-113004 | rex field=_raw "user = "(?

<Username>.*) | dedup Username | stats count AS UniqueUsers• For information usage, “non ‘mm’ machines”

– index="vpn" Received request for DHCP hostname for DDNS| rex field=_raw "hostname for DDNS is: (?<Machine>.*)!"| eval machine=lower(Machine)| search Machine!= "mm*" | rex field=_raw "Username = (?<User>.*), IP"| table User, Machine

Page 16: Splunk bsides

Transactions ACROSS devices

• Can we calculate IN SPLUNK, the transaction duration, e.g. started transaction at timestamp, and end transaction. IF we standardize on the Keys for the start and end.

• This is a different approach to solving "duration"

Page 17: Splunk bsides
Page 18: Splunk bsides
Page 19: Splunk bsides

Index Volume

Page 20: Splunk bsides

Splunk Navigation and Basic Searching REVIEW

• Splunk comes with several Apps, but the only relevant one now is the 'Search' app, which is the interface for generic searching. To begin your Splunk search, type in terms you might expect to find in your data. For example, if you want to find events that might be HTTP 404 errors (i.e., webpage not found), type in the keywords:

• http 404 --You'll get back all the events that have both HTTP and 404 in their text. Notice that search terms are implicitly AND'd together. The search was the same as "http AND 404". Let's make the search narrower:

• http 404 "like gecko“ Using quotes tells Splunk to search for a literal phrase “like gecko”, which returns more specific results than just searching for “like” and “gecko” because they must be adjacent as a phrase.

• Splunk supports the Boolean operators AND, OR, and NOT (must be capitalized), as well as parentheses to enforce grouping. To get all HTTP error events (i.e., not 200 error code), not including 403 or 404, use this:

• http NOT (200 OR 403 OR 404) Again, the AND operator is implied; the previous search is the same as http AND NOT (200 OR 403 OR 404)

• Splunk supports the asterisk (*) wildcard for searching. For example, to retrieve events that has 40x and 50xx classes of HTTP status codes, you could try: http (40* OR 50*)

Page 21: Splunk bsides

Intermediate Searching

• Splunk's search language is much more powerful than you think it is. So far we've only been talking about 'search', which retrieves your indexed data, but there are dozens of other operations you can perform on your data. You can "pipe" (i.e., transfer) the results of a search to other commands to filter, modify, reorder, and group your results.

• If Google were Splunk, you'd be able to search the web for every single page mentioning your ex-girlfriends, extract out geographical information, remove results without location info, sort the results by when they were written, keeping only the most recent page per ex-girlfriend, and finally generate a state by-state count of where Mr. Don Juan's ladies currently live. But Google isn't Splunk, so good luck with that.

• Let's do something similar, though, with our web data: let's find some interesting things about URIs that have 404s. Here's our basic search:

• status=404 • Now let's take the result of that search and sort the results by URI:• status=404 | sort - uri • That special "pipe" character ("|") says "take the results of the thing on the left and process

it, in this case, with the 'sort' operator". •

Page 22: Splunk bsides

Splunk Navigation and Basic Searching REVIEW

• Wildcards can appear anywhere in a term, so "f*ck" will return all events with fack, feck, fick, fock, or flapjack, among others. A search for “*” will return all events. Note that in these searches we’ve been playing fast and loose with precision. Any event that has 50 in it (e.g. “12:18:50”) would also unfortunately match. Let’s fix that.

•When you index data, Splunk automatically adds fields (i.e., attributes) to each of your events. You can always add your own extraction rules for pulling out additional fields. To narrow results with a search, just add attribute=value to your search:

• sourcetype=access_combined status=404

• This search shows a much more precise version of our first search (i.e., "http 404") because it will only return events that come from access_combined sources (i.e., webserver events) and that have a status code of 404, which is different than just having a 404 somewhere in the text. In addition to <attribute>=<value>, you can also do != (not equals), and <, >, >=, and <= for numeric fields.

Page 23: Splunk bsides

Continued

• status=404 | top 5 referer_domain | search count>2 •

OK math geeks, supposing you want to calculate a new field based on other fields, you can use the 'eval' command. Let's make a new field kbytes, on the fly, based on the bytes fields:

• * | eval kbytes = bytes/1024

And now for something completely different: assuming you had indexed data from a dating site, search for the smartest girl of each hair and eye color variation, calculating her bmi:

• • gender=female |sort -iq |dedup hair, eyes |eval bmi=weight/height

• No hate mail.

• We've just shown you a tiny, tiny window of what is possible in a Splunk search. See the Appendix for a quick cheatsheet of search commands and examples.

Page 24: Splunk bsides

SPLUNK >

Page 25: Splunk bsides

Real-time Big Data

• Search and analysis engine• Google like search of your ORGANIZATIONS data