crime data 1960-2012 bijen patel. purpose in my project, i analyze various types of crime data from...

Post on 18-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Crime Data 1960-2012Bijen Patel

2 Purpose

In my project, I analyze various types of crime data from the year 1960 to 2012.

I have analyzed the following: Number of Total Crime

Number of Violent Crime

Number of Property Crime

Number of Murder Crime

Number of Rape Crime

In my analysis of the data, I show how crime in USA seems to have a parabolic regression. I will also be explaining why this may be happening.

3 Why is this important?

It is important because it gives the government, FBI, CIA, police departments, etc. a goal

The goal is to continue to follow the current trend and reduce all types of crime exponentially

If the trend continues throughout the upcoming years, this means law enforcement is doing a very good job

Law enforcement can use my code to project data for types of crime that I did not project for (Robbery, Assault, Burglarly, Larceny Theft, Vehicle Theft)

4 Read .CSV File

R Code crimedata=read.table("c:/data/

crimedata.csv",sep=",",header=T) read.table function reads csv file as a table

c:/data/crimedata.csv tells R Program where csv file is located

.CSV file is a comma separated data file

sep=“,” tells R that data is separated by comma

header=T tells R to read the first line of .CSV file as a header

5 Result of Reading .CSV File

Table with Headers Table with # of Crimes for

Every Year from 1960-2012

6 Plotting Different Type of Crime Data by Year attach(crimedata)

The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes

plot(Year,Total,main="Total Crimes in USA 1960-2012",xlab="Year",ylab="Number of Crimes",pch=19) plot() function used to make scatterplot of data

first is the x-variable (Year)

second is the y-variable (Total)

main=“Total Crimes in USA 1960-2012” is Title of Scatterplot

xlab=“Year” is x-axis label

ylab=“Number of Crimes” is y-axis label

pch=19 tells R which plot point to use out of a list of plot points available

7 pch Points Available in R

8 Total Crime Scatterplot

9 Parabolic Regression

As you can see from all of the plots, they all seem to show a parabolic regression or something close to a parabolic regression

10 All Scatterplots Shown Together

11 Determining Total Crime Regression Equation

FitTotal=lm(Total~poly(Year,2,raw=TRUE)) lm() function is used to fit linear models

On left side of ~ is response variable (Total)

On right side of ~ is x variable We want poly(x,degrees,raw=TRUE)

x=Year

Degrees = 2 because parabolic regression

raw=TRUE because default uses orthogonal polynomials and we do not want that

summary(FitTotal)

12 Regular vs Orthogonal Polynomials

13 Total Crime Regression Result

P-value=2.2*10^-16

Y=(-1.113*10^4)*(x^2) + (4.436*10^7)*(x) – (4.417*10^10)

14 Imposing the Regression Equation Line onto the Scatterplot plot(Year,Total,main="Total Crimes in USA 1960-2012 with Fitted

Second Degree Polynomial Regression",xlab="Year",ylab="Number of Crimes",pch=19) Same plot() function used to plot previous scatterplots

Changed title to indicate “With Fitted Second Degree Polynomial Regression”

p1=points(Year,predict(FitTotal),type=“l",col="red",lwd=2)

points() function draws sequence of points onto the scatterplot

First variable is x variable (Year)

Second variable is y variable (predict(FitTotal)) because we want to plot the regression equation points

type=“l” tells R to draw a line instead of plotting points

col=“red” tells R to draw a red line

lwd=2 tells R the width we want the line to be

15 type=() Table

16 Resulting Regression Line and Scatterplot for Total Crimes

17 Total Crime Projection to Year 2020

Total2020 <- function(x) {FitTotal$coefficient[3]*x^2 + FitTotal$coefficient[2]*x + FitTotal$coefficient[1]}

Defining Total2020 as the regression equation

year_projection=seq(1960,2020,1)

In year_projection I store values 1960-2020 by increase of 1 (1960,1961 … 2020) with seq() function

result_projection=Total2020(year_projection)

result_projection is the values of the function(x), where x is the year

result_projection

shows the projected values from year 1960-2020

plot(year_projection,result_projection,main="Total Crimes in USA 1960-2020 (Projected)",xlab="Year",ylab="Number of Crimes",pch=19)

18 Result

19 Violent Crime Regression Result

P-value=2.2*10^-16

Y=(-1.305*10^3)*(x^2) + (5.207*10^6)*(x) – (5.192*10^9)

20 Resulting Regression Line and Scatterplot for Violent Crimes

21 Projection

22 Property Crime 1960-2020 Scatterplot Projection

23 Murder Crime 1960-2020 Scatterplot Projection

24 Rape Crime 1960-2020 Scatterplot Projection

25 Adjusted R^2 and P-Value

All Regression Equations have P-Value of 0 This means that the model is statistically significant and

fits well to the data

All Regression Equations have Adjusted R^2 above .90 except Murder Crime Equation This means that the regression equation models explain

90% of the variability of response data around the means

Murder crime regression model only explains around 80% of variability of response data around model mean

26 Theil’s Adjusted R-squared Formula

There are multiple adjusted R-squared formulas.

The above first formula is the one that R program uses, which is Theil’s formula.

The second formula is the non-adjusted R-squared formula. Plug this value into Theil’s formula to find the adjusted R-squared value.

27 All Regression Equation Lines and Plots

28 Using nlm Function (non linear minization) to do the Regression

CrimeData <- read.table("C:/data/crimedata.csv",sep=",",header=T)

Reading the .csv file (explained in slide 4)

attach(CrimeData)

Explained in slide 6

NumberOfYears <- Year-1960

Simple subtract function gives each year from 1960-2012 a simple value from 0 to 52, where 0 is 1960 and 52 is 2012.

29 nlm Function Continued

CrimeFunction <- function(coeff,y=Total,x=NumberOfYears){

b0<-coeff[1]

b1<-coeff[2]

b2<-coeff[3]

y_hat <- b0 + b1*x + b2*x^2

return(sum((y-y_hat)^2))

}

Simple function gives Error Sum of Squares (sum of squared difference between observed y value and expected y value (y_hat) for Total Crime

Similar function used for other types of crime. “y” value is changed from “Total” to “Violent” or “Property” etc.

30 nlm Function Continued

result <- nlm(TotalCrimeFunction,p=rep(3384200,3))

nlm(f,p)

f is the function to be minimized (TotalCrimeFunction)

p is starting parameter values for minimization (3384200 because this is the 1960 Total Crime value) (repeated 3 times because three parameters b0, b1, and b2)

beta <- result$estimate

gives estimates of b0, b1, and b2

y_hat <- beta[1] + beta[2]*NumberOfYears + beta[3]*(NumberOfYears^2)

Regression equation to predict y (y_hat)

plot(Year,Total,main="Total Crimes in USA 1960-2012",xlab="Year",ylab="Number of Crimes",pch=19)

lines(Year,y_hat,col="red")

31 Total Crime Result from nlm Function

32 Repeat Same nlm Procedure for Violent and Property Crimes

33 Same nlm Procedure for Murder and Rape Crimes

34 Why is there a parabolic regression for the crime data?

You would think that there would be a linear regression for crime data since 1960 because population has been growing steadily in the USA since 1960.

There is linear population growth, so why not linear crime data as well? As there are more people, you would think there would

be more crime.

All of the crime data peaks at 1990 and then begins to drop, which creates the parabolic shape. Why does it continue linearly to 1990 and then stop?

35 Linear Population, Parabolic Crime

36 What happened in the 1990s? Something had to happen in the 1990s for crime to stop its linear

pattern from 1960-1990.

Increase in the number of police

Number of police increased by 50000-60000 in 1990s, which was greater than any previous decade

Rising prison population

More than half the prison growth from 1970-2000 took place in 1990s

In 1990s, there was increased parole revocation and longer sentences for crimes, which causes people to be more fearful of criminal activities

Waning crack epidemic

Crack market began to decline in 1990s, which meant less drug-related crimes

Legalization of abortion

Unwanted children are at greater risk of engaging in criminal activities

Legalized abortion leads to reduction in unwanted children

37 What else happened in 1990s?

1990s – Second Generation of Cell Phones Affordable cell phones become more and more popular every year

1990 – World Wide Web invented

1991 – First Digital Answering Machine

1993 – Text Messaging for Cell Phones

1995 – VoIP (Voice over IP) (Phone service over the internet)

1998 – Google

1999 – First Blackberry Mobile Device

38

39 What happened after year 2000?

Exponential technological innovation Third Generation of Cell Phones

More mobile, faster flip phones with screens and added functions

Fourth Generation of Cell Phones Smartphones (iPhone and Android)

Mobile Apps

As technology continues to expand exponentially, it becomes easier and easier for information to be shared and people to communicate

40

41 Reasonable to Project to Year 2020?

I believe it is reasonable to project the parabolic regression models to year 2020 because: Technological innovation should continue to grow

exponentially

This and other factors will cause more and more people to fear engaging in criminal activities

42 Reasonable to Project Longer than 2020?

It is probably not reasonable to project longer than the year 2020 because this trend can’t continue as a parabolic trend forever.

If this trend were to continue as a parabolic trend, we would eventually reach a negative value for crime data, and that is impossible.

top related