analyzing your logs: what are they telling you?

Post on 15-Sep-2014

3 Views

Category:

Business

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Use systems thinking and statistical analysis to learn more about your proprietary applications. Analyze their behavior based on the logs they generate. Determine patterns and trends to obviate system downtimes.

TRANSCRIPT

Analyzing Your LogsWhat are they telling you?

Gerard Ibarra, PhDNovember 2008

Goals Systems Thinking Definition of System: This Presentation Log Analysis Analysis Summary

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 2

Think systems first Use statistics to understand what is going

on Get a better picture with charts Include control charts to monitor the system

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 3

“A system is an assemblage or combination of elements or parts forming a complex or unitary whole;…” (Blanchard, B. S., and Fabrycky, W. J., Systems and Engineering and Analysis (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall, 1990)

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 4

Systems could be any of the following:◦ A transportation network moving items from one

place to another – dynamic◦ A bridge used to connect places together – static◦ A set of unmanned aerial vehicles (UAV) located

in a strategic region providing intelligence – dynamic

◦ A group of applications and servers acting together to perform a service – dynamic

◦ A motor for a car – static/dynamic

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 5

Systems today are more complex than before (Using Systems Engineering to Improve RMS&L Requirements, A Government-Industry Training Workshop, various discussions, Springfield VA: November 12-13, 2008)

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 6

Changes in one part of the system affects the system as a whole◦ More items to move – extra resources to process◦ Increase traffic – longer times to cross bridge◦ Reduction in UAV – changes strategies if mission

remains the same◦ Server down – increases load; possible sales loss◦ New and improved parts – increase inventory to

maintain both motors

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 7

Why think systems for your network?◦ Because changes done to its parts affect its

overall mission and ultimately the business as a whole. For example, the items below have an effect on how the system operates that in turn affects how the company can conduct its business. Adding or removing applications Modify software/hardware configuration Add or remove hardware from operations Improving, adding, or deleting features

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 8

System is the aggregation of applications, servers, and services working in unison to produce a common function for the use, goals, sustainment, and operations of the company

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 9

Various ways to analyze logs: Examples◦ Statistical

Central Tendency Variation Skewness Kurtosis

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 10

Examples Continued:◦ Graphical

Bar Chart Line Chart Pie Chart Control Charts

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 11

Statistical – Central Tendency◦ Determine how much central tendency there is in

the log data Know and understand what is the average number of

events occurring in a system – used for a quick check of how the system is currently operating

Compare the average events occurring over time – see if there are any patterns

Look at the startup of a process – determine if the number of errors differ as times progresses

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 12

Statistical – Central Tendency Example◦ Use the following analytics to generate report

Mean Medium Mode Quartiles

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

First Hour

(Based on 1-min aggregations over 1-hour periods)

13

◦ The mean is 2.3333 – this is the average times over one hour based on one minute increments that the error occurred; anything more than this should raise a flag when comparing the same events to the same hour to other days

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleDay1 Day2 Day3 Day4 Day5

Mean 2.33 2.35 2.21 7.45 2.41

14

◦ The median is 2 – this is the mid point number of events based on the hour; it should be somewhat close to the mean unless the data is skewed

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleData (1, 1, 1, 1, 1, 1, 2, 20, 21, 22, 23, 24, 25 )Mean = 11Median = 2The mean is over five times the median – should raise a flag; notice that the data is skewed to the ones and twenties

15

◦ The mode is 1 – this is the most reoccurring number of events based on one minute aggregations over the one hour; shows where most of the data comes from; should make some sense with respect to the mean or median or both

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleData (1, 1, 1, 1, 1, 1, 5, 17, 19, 21, 23, 25, 27)Mode = 1Mean =11Median = 5

There is a wide variation between the three indices – should raise flag

16

◦ The lower and upper quartiles are 1 and 3.5 – this shows the lower half and upper half of the medians based on the Moore and McCabe or “M-and-M method” (there are various ways to calculate the quartiles)

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleData (0, 0, 0, 0, 0, 0, 5, 17, 19, 21, 23, 25, 27)LQ = 0; UQ = 22Mean = 11The mean is far from the LQ in terms of percentages – should raise flag; could show that at the startup of the period the #no. of errors were nil, and as time increased, so did the errors

17

Statistical – Variation◦ Determine how much the log data is varying from

the mean The closer to the mean, the less the systems vary The less variations typically the smoother the system

operates

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 18

Statistical – Variation Example◦ Use the following analytics to generate report

Mean Variation

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

First Hour

(Based on 1-min aggregations over 1-hour periods)

19

◦ The mean is 2.3333 and the standard deviation is 1.91195 – the standard deviation is the amount that the data varies from the mean; it is the amount of spread from the mean expressed in the original units

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleMean = 45StdDev = 41The standard deviation is almost the same amount as the mean – this should raise a flag (Note that the company could define this type of behavior as normal)

20

Statistical – Skewness and Kurtosis◦ Try to find out the type of distribution the system

generates Learn if the data is normal – good for predictions See how the system operates – determine if there

are modes during certain periods

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 21

Statistical – Skewness and Kurtosis Example◦ Use the following analytics to generate report

Statistics

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

(Based on 1-hour aggregations over the range of the data)

22

◦ The Skewness is -2.34592 – this is a measure of the symmetry of the distribution (negative means that it skews to the left and positive to the right)

◦ The Kurtosis is 8.49086 – this is the measure of how peaked the distribution is (the larger the number, the more “peaked”)

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Normal

Skewed LeftPeaked High

m

Re

gion

with

S

igni

fican

t #

of

Eve

nts

Example of possible distribution: Mostof the events take place at the start ofthe process and peaks in a short interval

23

◦ A Skewness of 0.0 and Kurtosis of 3.0 means that this is an ideal normal distribution – great for predicting possible outcomes

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Normal

m

s sm

X

Z

24

Graphical – Bar Charts◦ View the errors based on different periods◦ Understand the behavior of the systems better

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Most Errors on Day 2Least number of errorsat 6:00 am and 5:00 pm

Two instances of almostzero errors on day 5

25

Graphical – Line Charts◦ Get a clearer perspective on the error rates◦ View same data, but from a different perspective

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Most Errors on Day 2Least number of errorsat 6:00 am and 5:00 pm

Two instances of almostzero errors on day 5

26

Graphical – Line Charts◦ Use it to forecast

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Follows Same Trend Basedon Periods (Aug01 – Sep01 and Aug02 – Sep02)

Shows an Upward Trend

27

Graphical – Pie Charts◦ Compare to other events◦ Compare to system as a whole

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Errors account for less than2% of the Events in the System

Significant number ofErrors occurring basedon the number of Warnings

28

Graphical – Control Charts◦ Monitor the system or individual subsystems◦ Anticipate possible problems

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Out of Compliance

Trending Upwards: Tryto keep it from going abovethe UCL again

29

Use analytics and charting to help view and understand what the system and its subsystems may be doing◦ Look for

Abnormalities Deviations Compliances

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 30

◦ Learn how to Predict Anticipate Forecast

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Most of the chart and result screen shots shown in this presentation were created in Violog. http://www.buildwave.com/violog

31

top related