crime data 1960-2012 bijen patel. purpose in my project, i analyze various types of crime data from...

43
Crime Data 1960-2012 Bijen Patel

Upload: doris-reed

Post on 18-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

Crime Data 1960-2012Bijen Patel

Page 2: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

2 Purpose

In my project, I analyze various types of crime data from the year 1960 to 2012.

I have analyzed the following: Number of Total Crime

Number of Violent Crime

Number of Property Crime

Number of Murder Crime

Number of Rape Crime

In my analysis of the data, I show how crime in USA seems to have a parabolic regression. I will also be explaining why this may be happening.

Page 3: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

3 Why is this important?

It is important because it gives the government, FBI, CIA, police departments, etc. a goal

The goal is to continue to follow the current trend and reduce all types of crime exponentially

If the trend continues throughout the upcoming years, this means law enforcement is doing a very good job

Law enforcement can use my code to project data for types of crime that I did not project for (Robbery, Assault, Burglarly, Larceny Theft, Vehicle Theft)

Page 4: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

4 Read .CSV File

R Code crimedata=read.table("c:/data/

crimedata.csv",sep=",",header=T) read.table function reads csv file as a table

c:/data/crimedata.csv tells R Program where csv file is located

.CSV file is a comma separated data file

sep=“,” tells R that data is separated by comma

header=T tells R to read the first line of .CSV file as a header

Page 5: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

5 Result of Reading .CSV File

Table with Headers Table with # of Crimes for

Every Year from 1960-2012

Page 6: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

6 Plotting Different Type of Crime Data by Year attach(crimedata)

The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes

plot(Year,Total,main="Total Crimes in USA 1960-2012",xlab="Year",ylab="Number of Crimes",pch=19) plot() function used to make scatterplot of data

first is the x-variable (Year)

second is the y-variable (Total)

main=“Total Crimes in USA 1960-2012” is Title of Scatterplot

xlab=“Year” is x-axis label

ylab=“Number of Crimes” is y-axis label

pch=19 tells R which plot point to use out of a list of plot points available

Page 7: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

7 pch Points Available in R

Page 8: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

8 Total Crime Scatterplot

Page 9: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

9 Parabolic Regression

As you can see from all of the plots, they all seem to show a parabolic regression or something close to a parabolic regression

Page 10: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

10 All Scatterplots Shown Together

Page 11: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

11 Determining Total Crime Regression Equation

FitTotal=lm(Total~poly(Year,2,raw=TRUE)) lm() function is used to fit linear models

On left side of ~ is response variable (Total)

On right side of ~ is x variable We want poly(x,degrees,raw=TRUE)

x=Year

Degrees = 2 because parabolic regression

raw=TRUE because default uses orthogonal polynomials and we do not want that

summary(FitTotal)

Page 12: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

12 Regular vs Orthogonal Polynomials

Page 13: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

13 Total Crime Regression Result

P-value=2.2*10^-16

Y=(-1.113*10^4)*(x^2) + (4.436*10^7)*(x) – (4.417*10^10)

Page 14: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

14 Imposing the Regression Equation Line onto the Scatterplot plot(Year,Total,main="Total Crimes in USA 1960-2012 with Fitted

Second Degree Polynomial Regression",xlab="Year",ylab="Number of Crimes",pch=19) Same plot() function used to plot previous scatterplots

Changed title to indicate “With Fitted Second Degree Polynomial Regression”

p1=points(Year,predict(FitTotal),type=“l",col="red",lwd=2)

points() function draws sequence of points onto the scatterplot

First variable is x variable (Year)

Second variable is y variable (predict(FitTotal)) because we want to plot the regression equation points

type=“l” tells R to draw a line instead of plotting points

col=“red” tells R to draw a red line

lwd=2 tells R the width we want the line to be

Page 15: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

15 type=() Table

Page 16: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

16 Resulting Regression Line and Scatterplot for Total Crimes

Page 17: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

17 Total Crime Projection to Year 2020

Total2020 <- function(x) {FitTotal$coefficient[3]*x^2 + FitTotal$coefficient[2]*x + FitTotal$coefficient[1]}

Defining Total2020 as the regression equation

year_projection=seq(1960,2020,1)

In year_projection I store values 1960-2020 by increase of 1 (1960,1961 … 2020) with seq() function

result_projection=Total2020(year_projection)

result_projection is the values of the function(x), where x is the year

result_projection

shows the projected values from year 1960-2020

plot(year_projection,result_projection,main="Total Crimes in USA 1960-2020 (Projected)",xlab="Year",ylab="Number of Crimes",pch=19)

Page 18: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

18 Result

Page 19: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

19 Violent Crime Regression Result

P-value=2.2*10^-16

Y=(-1.305*10^3)*(x^2) + (5.207*10^6)*(x) – (5.192*10^9)

Page 20: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

20 Resulting Regression Line and Scatterplot for Violent Crimes

Page 21: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

21 Projection

Page 22: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

22 Property Crime 1960-2020 Scatterplot Projection

Page 23: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

23 Murder Crime 1960-2020 Scatterplot Projection

Page 24: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

24 Rape Crime 1960-2020 Scatterplot Projection

Page 25: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

25 Adjusted R^2 and P-Value

All Regression Equations have P-Value of 0 This means that the model is statistically significant and

fits well to the data

All Regression Equations have Adjusted R^2 above .90 except Murder Crime Equation This means that the regression equation models explain

90% of the variability of response data around the means

Murder crime regression model only explains around 80% of variability of response data around model mean

Page 26: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

26 Theil’s Adjusted R-squared Formula

There are multiple adjusted R-squared formulas.

The above first formula is the one that R program uses, which is Theil’s formula.

The second formula is the non-adjusted R-squared formula. Plug this value into Theil’s formula to find the adjusted R-squared value.

Page 27: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

27 All Regression Equation Lines and Plots

Page 28: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

28 Using nlm Function (non linear minization) to do the Regression

CrimeData <- read.table("C:/data/crimedata.csv",sep=",",header=T)

Reading the .csv file (explained in slide 4)

attach(CrimeData)

Explained in slide 6

NumberOfYears <- Year-1960

Simple subtract function gives each year from 1960-2012 a simple value from 0 to 52, where 0 is 1960 and 52 is 2012.

Page 29: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

29 nlm Function Continued

CrimeFunction <- function(coeff,y=Total,x=NumberOfYears){

b0<-coeff[1]

b1<-coeff[2]

b2<-coeff[3]

y_hat <- b0 + b1*x + b2*x^2

return(sum((y-y_hat)^2))

}

Simple function gives Error Sum of Squares (sum of squared difference between observed y value and expected y value (y_hat) for Total Crime

Similar function used for other types of crime. “y” value is changed from “Total” to “Violent” or “Property” etc.

Page 30: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

30 nlm Function Continued

result <- nlm(TotalCrimeFunction,p=rep(3384200,3))

nlm(f,p)

f is the function to be minimized (TotalCrimeFunction)

p is starting parameter values for minimization (3384200 because this is the 1960 Total Crime value) (repeated 3 times because three parameters b0, b1, and b2)

beta <- result$estimate

gives estimates of b0, b1, and b2

y_hat <- beta[1] + beta[2]*NumberOfYears + beta[3]*(NumberOfYears^2)

Regression equation to predict y (y_hat)

plot(Year,Total,main="Total Crimes in USA 1960-2012",xlab="Year",ylab="Number of Crimes",pch=19)

lines(Year,y_hat,col="red")

Page 31: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

31 Total Crime Result from nlm Function

Page 32: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

32 Repeat Same nlm Procedure for Violent and Property Crimes

Page 33: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

33 Same nlm Procedure for Murder and Rape Crimes

Page 34: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

34 Why is there a parabolic regression for the crime data?

You would think that there would be a linear regression for crime data since 1960 because population has been growing steadily in the USA since 1960.

There is linear population growth, so why not linear crime data as well? As there are more people, you would think there would

be more crime.

All of the crime data peaks at 1990 and then begins to drop, which creates the parabolic shape. Why does it continue linearly to 1990 and then stop?

Page 35: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

35 Linear Population, Parabolic Crime

Page 36: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

36 What happened in the 1990s? Something had to happen in the 1990s for crime to stop its linear

pattern from 1960-1990.

Increase in the number of police

Number of police increased by 50000-60000 in 1990s, which was greater than any previous decade

Rising prison population

More than half the prison growth from 1970-2000 took place in 1990s

In 1990s, there was increased parole revocation and longer sentences for crimes, which causes people to be more fearful of criminal activities

Waning crack epidemic

Crack market began to decline in 1990s, which meant less drug-related crimes

Legalization of abortion

Unwanted children are at greater risk of engaging in criminal activities

Legalized abortion leads to reduction in unwanted children

Page 37: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

37 What else happened in 1990s?

1990s – Second Generation of Cell Phones Affordable cell phones become more and more popular every year

1990 – World Wide Web invented

1991 – First Digital Answering Machine

1993 – Text Messaging for Cell Phones

1995 – VoIP (Voice over IP) (Phone service over the internet)

1998 – Google

1999 – First Blackberry Mobile Device

Page 38: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

38

Page 39: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

39 What happened after year 2000?

Exponential technological innovation Third Generation of Cell Phones

More mobile, faster flip phones with screens and added functions

Fourth Generation of Cell Phones Smartphones (iPhone and Android)

Mobile Apps

As technology continues to expand exponentially, it becomes easier and easier for information to be shared and people to communicate

Page 40: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

40

Page 41: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

41 Reasonable to Project to Year 2020?

I believe it is reasonable to project the parabolic regression models to year 2020 because: Technological innovation should continue to grow

exponentially

This and other factors will cause more and more people to fear engaging in criminal activities

Page 42: Crime Data 1960-2012 Bijen Patel. Purpose  In my project, I analyze various types of crime data from the year 1960 to 2012.  I have analyzed the following:

42 Reasonable to Project Longer than 2020?

It is probably not reasonable to project longer than the year 2020 because this trend can’t continue as a parabolic trend forever.

If this trend were to continue as a parabolic trend, we would eventually reach a negative value for crime data, and that is impossible.