crime data 1960-2012 bijen patel. purpose in my project, i analyze various types of crime data from...
TRANSCRIPT
Crime Data 1960-2012Bijen Patel
2 Purpose
In my project, I analyze various types of crime data from the year 1960 to 2012.
I have analyzed the following: Number of Total Crime
Number of Violent Crime
Number of Property Crime
Number of Murder Crime
Number of Rape Crime
In my analysis of the data, I show how crime in USA seems to have a parabolic regression. I will also be explaining why this may be happening.
3 Why is this important?
It is important because it gives the government, FBI, CIA, police departments, etc. a goal
The goal is to continue to follow the current trend and reduce all types of crime exponentially
If the trend continues throughout the upcoming years, this means law enforcement is doing a very good job
Law enforcement can use my code to project data for types of crime that I did not project for (Robbery, Assault, Burglarly, Larceny Theft, Vehicle Theft)
4 Read .CSV File
R Code crimedata=read.table("c:/data/
crimedata.csv",sep=",",header=T) read.table function reads csv file as a table
c:/data/crimedata.csv tells R Program where csv file is located
.CSV file is a comma separated data file
sep=“,” tells R that data is separated by comma
header=T tells R to read the first line of .CSV file as a header
5 Result of Reading .CSV File
Table with Headers Table with # of Crimes for
Every Year from 1960-2012
6 Plotting Different Type of Crime Data by Year attach(crimedata)
The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes
plot(Year,Total,main="Total Crimes in USA 1960-2012",xlab="Year",ylab="Number of Crimes",pch=19) plot() function used to make scatterplot of data
first is the x-variable (Year)
second is the y-variable (Total)
main=“Total Crimes in USA 1960-2012” is Title of Scatterplot
xlab=“Year” is x-axis label
ylab=“Number of Crimes” is y-axis label
pch=19 tells R which plot point to use out of a list of plot points available
7 pch Points Available in R
8 Total Crime Scatterplot
9 Parabolic Regression
As you can see from all of the plots, they all seem to show a parabolic regression or something close to a parabolic regression
10 All Scatterplots Shown Together
11 Determining Total Crime Regression Equation
FitTotal=lm(Total~poly(Year,2,raw=TRUE)) lm() function is used to fit linear models
On left side of ~ is response variable (Total)
On right side of ~ is x variable We want poly(x,degrees,raw=TRUE)
x=Year
Degrees = 2 because parabolic regression
raw=TRUE because default uses orthogonal polynomials and we do not want that
summary(FitTotal)
12 Regular vs Orthogonal Polynomials
13 Total Crime Regression Result
P-value=2.2*10^-16
Y=(-1.113*10^4)*(x^2) + (4.436*10^7)*(x) – (4.417*10^10)
14 Imposing the Regression Equation Line onto the Scatterplot plot(Year,Total,main="Total Crimes in USA 1960-2012 with Fitted
Second Degree Polynomial Regression",xlab="Year",ylab="Number of Crimes",pch=19) Same plot() function used to plot previous scatterplots
Changed title to indicate “With Fitted Second Degree Polynomial Regression”
p1=points(Year,predict(FitTotal),type=“l",col="red",lwd=2)
points() function draws sequence of points onto the scatterplot
First variable is x variable (Year)
Second variable is y variable (predict(FitTotal)) because we want to plot the regression equation points
type=“l” tells R to draw a line instead of plotting points
col=“red” tells R to draw a red line
lwd=2 tells R the width we want the line to be
15 type=() Table
16 Resulting Regression Line and Scatterplot for Total Crimes
17 Total Crime Projection to Year 2020
Total2020 <- function(x) {FitTotal$coefficient[3]*x^2 + FitTotal$coefficient[2]*x + FitTotal$coefficient[1]}
Defining Total2020 as the regression equation
year_projection=seq(1960,2020,1)
In year_projection I store values 1960-2020 by increase of 1 (1960,1961 … 2020) with seq() function
result_projection=Total2020(year_projection)
result_projection is the values of the function(x), where x is the year
result_projection
shows the projected values from year 1960-2020
plot(year_projection,result_projection,main="Total Crimes in USA 1960-2020 (Projected)",xlab="Year",ylab="Number of Crimes",pch=19)
18 Result
19 Violent Crime Regression Result
P-value=2.2*10^-16
Y=(-1.305*10^3)*(x^2) + (5.207*10^6)*(x) – (5.192*10^9)
20 Resulting Regression Line and Scatterplot for Violent Crimes
21 Projection
22 Property Crime 1960-2020 Scatterplot Projection
23 Murder Crime 1960-2020 Scatterplot Projection
24 Rape Crime 1960-2020 Scatterplot Projection
25 Adjusted R^2 and P-Value
All Regression Equations have P-Value of 0 This means that the model is statistically significant and
fits well to the data
All Regression Equations have Adjusted R^2 above .90 except Murder Crime Equation This means that the regression equation models explain
90% of the variability of response data around the means
Murder crime regression model only explains around 80% of variability of response data around model mean
26 Theil’s Adjusted R-squared Formula
There are multiple adjusted R-squared formulas.
The above first formula is the one that R program uses, which is Theil’s formula.
The second formula is the non-adjusted R-squared formula. Plug this value into Theil’s formula to find the adjusted R-squared value.
27 All Regression Equation Lines and Plots
28 Using nlm Function (non linear minization) to do the Regression
CrimeData <- read.table("C:/data/crimedata.csv",sep=",",header=T)
Reading the .csv file (explained in slide 4)
attach(CrimeData)
Explained in slide 6
NumberOfYears <- Year-1960
Simple subtract function gives each year from 1960-2012 a simple value from 0 to 52, where 0 is 1960 and 52 is 2012.
29 nlm Function Continued
CrimeFunction <- function(coeff,y=Total,x=NumberOfYears){
b0<-coeff[1]
b1<-coeff[2]
b2<-coeff[3]
y_hat <- b0 + b1*x + b2*x^2
return(sum((y-y_hat)^2))
}
Simple function gives Error Sum of Squares (sum of squared difference between observed y value and expected y value (y_hat) for Total Crime
Similar function used for other types of crime. “y” value is changed from “Total” to “Violent” or “Property” etc.
30 nlm Function Continued
result <- nlm(TotalCrimeFunction,p=rep(3384200,3))
nlm(f,p)
f is the function to be minimized (TotalCrimeFunction)
p is starting parameter values for minimization (3384200 because this is the 1960 Total Crime value) (repeated 3 times because three parameters b0, b1, and b2)
beta <- result$estimate
gives estimates of b0, b1, and b2
y_hat <- beta[1] + beta[2]*NumberOfYears + beta[3]*(NumberOfYears^2)
Regression equation to predict y (y_hat)
plot(Year,Total,main="Total Crimes in USA 1960-2012",xlab="Year",ylab="Number of Crimes",pch=19)
lines(Year,y_hat,col="red")
31 Total Crime Result from nlm Function
32 Repeat Same nlm Procedure for Violent and Property Crimes
33 Same nlm Procedure for Murder and Rape Crimes
34 Why is there a parabolic regression for the crime data?
You would think that there would be a linear regression for crime data since 1960 because population has been growing steadily in the USA since 1960.
There is linear population growth, so why not linear crime data as well? As there are more people, you would think there would
be more crime.
All of the crime data peaks at 1990 and then begins to drop, which creates the parabolic shape. Why does it continue linearly to 1990 and then stop?
35 Linear Population, Parabolic Crime
36 What happened in the 1990s? Something had to happen in the 1990s for crime to stop its linear
pattern from 1960-1990.
Increase in the number of police
Number of police increased by 50000-60000 in 1990s, which was greater than any previous decade
Rising prison population
More than half the prison growth from 1970-2000 took place in 1990s
In 1990s, there was increased parole revocation and longer sentences for crimes, which causes people to be more fearful of criminal activities
Waning crack epidemic
Crack market began to decline in 1990s, which meant less drug-related crimes
Legalization of abortion
Unwanted children are at greater risk of engaging in criminal activities
Legalized abortion leads to reduction in unwanted children
37 What else happened in 1990s?
1990s – Second Generation of Cell Phones Affordable cell phones become more and more popular every year
1990 – World Wide Web invented
1991 – First Digital Answering Machine
1993 – Text Messaging for Cell Phones
1995 – VoIP (Voice over IP) (Phone service over the internet)
1998 – Google
1999 – First Blackberry Mobile Device
38
39 What happened after year 2000?
Exponential technological innovation Third Generation of Cell Phones
More mobile, faster flip phones with screens and added functions
Fourth Generation of Cell Phones Smartphones (iPhone and Android)
Mobile Apps
As technology continues to expand exponentially, it becomes easier and easier for information to be shared and people to communicate
40
41 Reasonable to Project to Year 2020?
I believe it is reasonable to project the parabolic regression models to year 2020 because: Technological innovation should continue to grow
exponentially
This and other factors will cause more and more people to fear engaging in criminal activities
42 Reasonable to Project Longer than 2020?
It is probably not reasonable to project longer than the year 2020 because this trend can’t continue as a parabolic trend forever.
If this trend were to continue as a parabolic trend, we would eventually reach a negative value for crime data, and that is impossible.
43 Sources
http://www.disastercenter.com/crime/uscrime.htm
http://pricetheory.uchicago.edu/levitt/Papers/LevittUnderstandingWhyCrime2004.pdf
http://www.brighthub.com/education/homework-tips/articles/123405.aspx
http://www.knowyourmobile.com/nokia/history-mobile-phones/19848/history-mobile-phones-1973-2007