using r for building a simple and effective dashboard
TRANSCRIPT
![Page 1: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/1.jpg)
Data Science for Business Applications in R (basic)
How to use open source tools and data science to get insights on business and customers
Andrea Gigli
https://about.me/andrea.gigli
![Page 2: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/2.jpg)
Goal
The goal of this talk is Give you a flavour of what can you do
with open source data analysis tools like R or Python
Give you some useful «code snippets» to make practice
Provide a way of reasoning while commenting code and slides
![Page 3: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/3.jpg)
Final Artifact
![Page 4: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/4.jpg)
Case 1. Building a Dashboard to Communicate Business Insights
The setting You are a rampant Data Scientist Someone want to start a new business in
NY and create a taxi company (or the new Uber!) and ask you an advice
You want to prepare a beautiful and simple dashboard with the most relevant insights and KPI
![Page 5: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/5.jpg)
Dashboards for Business Analysis First think first… Get some Data
http://www.nyc.gov/html/tlc/html/home/home.shtml
![Page 6: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/6.jpg)
Dashboard Mockup
Customer behaviour
Economics
Insights & Graphics
Other Insights
Sketch an idea of your Dashboard/Report
![Page 7: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/7.jpg)
Dashboard Mockup Start Exploring Data
Trip Details Data▪ medallion, hack_license, vendor_id, rate_code,
store_and_fwd_flag, Pickup_datetime, Drop-off_datetime, passenger_count, trip_time_in_secs, trip_distance, Pickup_longitude, Pickup_latitude, Drop-off_longitude, Drop-off_latitude
Trip Fare Data: ▪ medallion, hack_license, vendor_id,
Pickup_datetime, payment_type, fare_amount, surcharge, mta_tax, tip_amount, tolls_amount, total_amount
![Page 8: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/8.jpg)
R & RStudio
In the following I’ll make extensive use of R (https://www.r-project.org), Rstudio (https://www.rstudio.com) and the following R libraries
library(psych)library(dplyr)library(ggmap)library(lattice)
![Page 9: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/9.jpg)
Import Data Download data in <your folder> from here: Unzip Import in a R DataFrame:
setwd(“<your folder>")
Import them in a Dataframe:
#read trip_data.csvdata_trip<-read.csv("trip_data.csv",sep=',',header=1,nrows=500000)#read trip_fare.csvdata_fares<-read.csv("trip_fare.csv",sep=',‘,header=1,nrows=500000)
![Page 10: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/10.jpg)
Cleaning and Wrangling Let’s do some Cleansing, for example
#exclude trip with time less than 60 secondsdata_trip<-data_trip[(
data_trip$trip_time_in_secs)>60,]
#exclude trip with distance less than 0.1 milesdata_trip<-data_trip[(
data_trip$trip_distance)>0.1,]
data_trip<-data_trip[!(data_trip$pickup_latitude==0 |data_trip$pickup_longitude==0),]
![Page 11: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/11.jpg)
Cleaning and Wrangling
#work on a selection of the NYC area
data_trip<-data_trip[(data_trip$pickup_latitude>(40.62)& data_trip$pickup_latitude<40.9 &data_trip$pickup_longitude>(-74.1)&data_trip$pickup_longitude<(-
73.75)&data_trip$dropoff_latitude>(40.62)&data_trip$dropoff_latitude<40.9&data_trip$dropoff_longitude>(-
74.1)&
data_trip$dropoff_longitude<(73.75)),]
![Page 12: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/12.jpg)
Cleaning and Wrangling Build new variables,
#create a column for pickup_hourdata_trip$pickup_hour<-as.POSIXlt(
data_trip$pickup_datetime)$hour
#create a column for dropoff_hourdata_trip$dropoff_hour<-as.POSIXlt(
data_trip$dropoff_datetime)$hour
#create a column for countingdata_trip$ones<-1
![Page 13: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/13.jpg)
Cleaning and Wrangling Remove some variables,
data_fares$medallion<-NULLdata_fares$vendor_id<-NULL
data_trip$dropoff_datetime<-NULLdata_trip$medallion<-NULLdata_trip$vendor_id<-NULLdata_trip$store_and_fwd_flag<-NULLdata_trip$rate_code<-NULL
![Page 14: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/14.jpg)
Exploratory Data Analysis Plot some Histograms
#Distribution of number of passengers per trip
hist(data_trip$passenger_count,6,main="Distribution of Number of Passengersper Trip",xlab="Number of Passengersp/Trip")
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey")
hist(data_trip$passenger_count,6, add = TRUE,col=" lightgoldenrod2 ")
![Page 15: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/15.jpg)
Exploratory Data Analysis
![Page 16: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/16.jpg)
Exploratory Data Analysis
#Distribution of payment_type
barplot(sort(table(data_fares$payment_type),decreasing = TRUE), xaxt = 'n')
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey")
barplot(sort(table(data_fares$payment_type),decreasing = TRUE), ylab="Frequency“,col="lightgoldenrod2", add =TRUE,main="Distribution of Payement Type“)
![Page 17: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/17.jpg)
Exploratory Data Analysis
![Page 18: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/18.jpg)
Exploratory Data Analysis
#Distribution of number of trip time length
hist(data_trip$trip_time_in_secs/60,10, xlim=c(0,100),main="Distribution ofTrip Time",xlab="Trip Time in
minutes")
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey")
hist(data_trip$trip_time_in_secs/60,10, add = TRUE,col="lightgoldenrod2")
![Page 19: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/19.jpg)
Exploratory Data Analysis
![Page 20: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/20.jpg)
Exploratory Data Analysis
#Distribution of number of trip distance
hist(data_trip$trip_distance,100,xlim=c(0,40),main="Distribution of Trip Distance",
xlab="Trip Distance")
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey")
hist(data_trip$trip_distance,100, add =TRUE,col="lightgoldenrod2")
![Page 21: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/21.jpg)
Exploratory Data Analysis
![Page 22: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/22.jpg)
Exploratory Data Analysis
#Distribution of fare amount (full domain)
hist(data_fares$fare_amount, main="Distribution of Fare Amount", xlab="Fare Amount")
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey")
hist(data_fares$fare_amount,add =TRUE,col="lightgoldenrod2")
![Page 23: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/23.jpg)
Exploratory Data Analysis
![Page 24: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/24.jpg)
Exploratory Data Analysis
#Distribution of fare amount (restricted domain)
hist(data_fares$fare_amount,xlim=c(0,80),200, main="Distribution of Fare Amount", xlab="Fare Amount")
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey“)
hist(data_fares$fare_amount,200, xlim=c(0,80),add = TRUE,col="lightgoldenrod2")
![Page 25: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/25.jpg)
Exploratory Data Analysis
![Page 26: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/26.jpg)
Exploratory Data Analysis
#Distribution of tip amount
hist(data_fares$tip_amount,500,xlim=c(0,20), main="Distribution of Tip Amount", xlab="Tip Amount")
rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4], col = "grey")
hist(data_fares$tip_amount,500,xlim=c(0,20),add = TRUE,col="lightgoldenrod2")
![Page 27: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/27.jpg)
Exploratory Data Analysis
![Page 28: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/28.jpg)
Exploratory Data Analysis
#Distribution of Total Amount
hist(data_fares$total_amount,1000,xlim=c(0,100), main="Distribution of Total Amount", xlab="Total Amount")
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey")
hist(data_fares$total_amount,add = TRUE,col="lightgoldenrod2",1000,xlim=c(0,100))
![Page 29: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/29.jpg)
Exploratory Data Analysis
![Page 30: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/30.jpg)
Exploratory Data Analysis#Distribution of pickups during the day
barplot(table(data_trip$pickup_hour))
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey")
barplot(table(data_trip$pickup_hour), add = TRUE,col="lightgoldenrod2",main="Distribution of Pickups in 24H",
ylab="Frequency")
![Page 31: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/31.jpg)
Exploratory Data Analysis
![Page 32: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/32.jpg)
Exploratory Data Analysis#Distribution of pickups during the day (ordered)
barplot(sort(table(data_trip$pickup_hour),decreasing = TRUE))
rect(par("usr")[1], par("usr")[3], par("usr")[2],par("usr")[4], col = "grey")
barplot(sort(table(data_trip$pickup_hour),decreasing = TRUE)),add = TRUE,col="lightgoldenrod2",main="Distribution of Pickups in 24H",
ylab="Frequency")
![Page 33: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/33.jpg)
Exploratory Data Analysis
![Page 34: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/34.jpg)
Wrangling and Cleaning, again#Top 5 busiest hours of the day
busy_hours<-aggregate(data_trip$ones ~ data_trip$pickup_hour, data_trip, sum)
#select top 5 pickup_hours
busy_hours.top5<- busy_hours %>% arrange(desc(busy_hours[,2])) %>% top_n(5)
names(busy_hours.top5)[names(busy_hours.top5)==
"data_trip$pickup_hour"]<-"pickup_hour"names(busy_hours.top5)[names(busy_hours.top5)==
"data_trip$ones"] <- "nr_runs"
![Page 35: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/35.jpg)
Exploratory Data Analysis
busy_hours.top5
pickup_hour nr_runs 1 23 32829 2 0 31392 3 22 27887 4 1 26800 5 12 25711
![Page 36: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/36.jpg)
Exploratory Data Analysis#Distribution of pickups during the day in %
names(busy_hours)[names(busy_hours)=="data_trip$pickup_hour"]<-"pickup_hour“
names(busy_hours)[names(busy_hours)=="data_trip$ones"] <- "counter“
hoursum<-sum(busy_hours$counter)
busy_hours$perc<-busy_hours$counter/hoursum
![Page 37: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/37.jpg)
Exploratory Data Analysisggplot(busy_hours,aes(x = pickup_hour,
y = perc*100))+ geom_ribbon(aes(ymin=0,ymax=perc*100), fill="lightgoldenrod2",color="lightgoldenrod2")+
scale_x_continuous(breaks = seq(from = 0, to = 23, by = 1))+ geom_point(size=3, color="burlywood3")+
geom_line(color="burlywood3", lwd=0.5)+ggtitle("Number of Pickups per Hour every 100 Daily Pickups")+ xlab("Hour of the Day")+theme(axis.title.y=element_blank(),
panel.grid.major = element_blank(),panel.grid.minor = element_blank(),text=element_text(size=22))
![Page 38: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/38.jpg)
Exploratory Data Analysis
![Page 39: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/39.jpg)
Wrangling and Cleaning, again#Top 10 busiest locations of the city
#Build variables to define «locations»data_trip$latpickup<-round(data_trip$pickup_latitude/0.005)*0.005data_trip$slatpickup<-lapply(data_trip$latpickup,toString)data_trip$lonpickup<-round(data_trip$pickup_longitude/0.005)*0.005data_trip$slonpickup<-lapply(data_trip$lonpickup,toString)data_trip$trip_start<-paste(data_trip$slatpickup,
data_trip$slonpickup,sep="|")
![Page 40: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/40.jpg)
Wrangling and Cleaning, again#build a trip identifier concatenating rounded #latitude and longitude in string format
data_trip$trip_start<-paste(data_trip$slatpickup,data_trip$slonpickup,sep="|")
#get rid of unuseful variables
data_trip$latpickup<-NULLdata_trip$lonpickup<-NULLdata_trip$slatpickup<-NULLdata_trip$slonpickup<-NULL
![Page 41: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/41.jpg)
Exploratory Data Analysis
#groupby trip identifier and countbusy_locations <- aggregate(data_trip$ones ~
data_trip$trip_start, data_trip, sum)
names(busy_locations)[names(busy_locations)=="data_trip$trip_start"] <- "location“
names(busy_locations)[names(busy_locations)=="data_trip$ones"] <- "counter"
![Page 42: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/42.jpg)
Exploratory Data Analysis
#total number of triptripsum <- sum(busy_locations$counter)
#total number of tripbusy_locations$perc <- busy_locations$counter
/tripsum
top10_loc <- busy_locations %>% arrange( desc(busy_locations[,2])) %>% top_n(10)
![Page 43: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/43.jpg)
Exploratory Data Analysis#print top 10 busiest location
top10_loclocation counter perc
1 40.75|-73.99 8937 0.01846335
2 40.74|-74.005 7705 0.01591811
3 40.76|-73.985 7108 0.01468474
4 40.745|-73.98 6990 0.01444096
5 40.735|-73.99 6585 0.01360425
6 40.725|-73.99 6295 0.01300512
7 40.745|-73.985 6289 0.01299273
8 40.75|-73.975 6287 0.01298860
9 40.765|-73.98 6187 0.01278200
10 40.72|-73.99 6183 0.01277374
![Page 44: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/44.jpg)
How to convert Latitude and Longitude into Location Names in R
#get address of busy locations
C <- unlist(strsplit(top10_loc$location, "[|]"))
coordinates = matrix(as.double(c), nrow=10,ncol=2,byrow=TRUE)
top10_loc$lat<-coordinates[,1]top10_loc$lon<-coordinates[,2]
top10_loc$address<-mapply(FUN = function(lon, lat) revgeocode(c(lon, lat)), top10_loc$lon, top10_loc$lat)
![Page 45: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/45.jpg)
How to convert Latitude and Longitude into Location Names in R
top10_loc$address
[1] "137 W 33rd St, New York, NY 10120, USA" [2] "345 W 13th St, New York, NY 10014, USA" [3] "1585-1589 Broadway, New York, NY 10036, USA" [4] "145 E 32nd St, New York, NY 10016, USA" [5] "10 Union Square E, New York, NY 10003, USA" [6] "42 2nd Ave, New York, NY 10003, USA" [7] "110-112 Madison Ave, New York, NY 10016, USA" [8] "633-637 3rd Ave, New York, NY 10017, USA" [9] "Carnegie Hall, 152 W 57th St, New York, NY 10019, USA" [10] "129-131 Allen St, New York, NY 10002, USA"
![Page 46: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/46.jpg)
Playing with Maps in R
#represent busiest addresses in a barchart
ggplot(top10_loc, aes(x=reorder(address,counter), y=perc*1000)) +
geom_bar(stat='identity',fill="lightgoldenrod2")
+ coord_flip()+ ggtitle("Top 10 Locations withHighest Number\nof Pickups p/1000Trips")
![Page 47: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/47.jpg)
Playing with Maps in R
![Page 48: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/48.jpg)
Playing with Maps in R#build map for busy locationsny_map<-get_map(location = c(-73.9308,
40.7336),maptype = "satellite",zoom=11)
ny_map2<-get_map(location=c(-73.9874,40.7539),maptype = "satellite",zoom=13)
ny_map3<-get_map(location=c(-73.99,40.75),maptype = "roadmap", zoom=13)
#represent busiest location in a mapggmap(ny_map3)+geom_point(aes(x=top10_loc$lon,y=top10_loc$lat,size=top10_loc$counter),data=top10_loc)
![Page 49: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/49.jpg)
Playing with Maps in R
![Page 50: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/50.jpg)
Playing with Maps in R#build map for a sample of pickups
data_sample<-data_trip[sample(nrow(data_trip), 400000), ]
ggmap(ny_map, extent = "device") + geom_point(aes(x = data_sample$pickup_longitude, y = data_sample$pickup_latitude), colour = "yellow", alpha = 0.1, size = 1, data = data_sample)
![Page 51: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/51.jpg)
Playing with Maps in R
![Page 52: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/52.jpg)
Playing with Maps in R#build a heat map of pickups
ggmap(ny_map, extent = "device") + geom_point(aes(x = data_sample$pickup_longitude, y = data_sample$pickup_latitude), colour = "yellow", alpha = 0.1, size = 1, data = data_sample)
![Page 53: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/53.jpg)
Playing with Maps in R
![Page 54: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/54.jpg)
Playing with Maps in R#build a heat map of pickups
ggmap(ny_map3, extent = "device") +geom_density2d(data = data_sample,
aes(x = data_sample$pickup_longitude, y = data_sample$pickup_latitude), size = 0.3) +
stat_density2d(data = data_sample, aes(x = data_sample$pickup_longitude, y = data_sample$pickup_latitude, fill = ..level.., alpha = ..level..),
size = 0.01, geom = "polygon") + scale_fill_gradient(low = "yellow", high = "red") + scale_alpha(range = c(0.4, 0.9), guide = FALSE)
![Page 55: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/55.jpg)
Playing with Maps in R
![Page 56: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/56.jpg)
Playing with Maps in R+ geom_point(aes(x=top10_loc$lon,y=top10_loc$lat,
size=top10_loc$counter),data=top10_loc)
![Page 57: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/57.jpg)
Play Again with Latitude and Longitude in R#Trip with highest standard deviation of travel #time#I assume "trip" means "a taxi run with a given #trip_start and trip_end".
data_trip$latdropoff<-round(data_trip$dropoff_latitude/0.005)*0.005data_trip$slatdropoff<-lapply(data_trip$latdropoff,toString)data_trip$londropoff<-round(data_trip$dropoff_longitude/0.005)*0.005data_trip$slondropoff<-lapply(data_trip$londropoff,toString)
data_trip$trip_end<-paste(data_trip$slatdropoff,data_trip$slondropoff,sep="|")
![Page 58: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/58.jpg)
Play Again with Latitude and Longitude in R#get rit of not useful variablesdata_trip$latdropoff<-NULLdata_trip$londropoff<-NULLdata_trip$slatdropoff<-NULLdata_trip$slondropoff<-NULL
#trip_id variabledata_trip$trip_id<-paste(data_trip$trip_start,
data_trip$trip_end,sep="|")
![Page 59: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/59.jpg)
Play Again with Latitude and Longitude in R#compute standard deviation for every triptrips<-aggregate(data_trip$trip_time_in_secs ~ data_trip$trip_id, data_trip, sd)
#get the trip with highest standard deviation #and find pickup and dropoff locationstrips.topsd<-trips %>% arrange(desc(trips[,2]))
%>% top_n(10)names(trips.topsd)[names(trips.topsd)==
"data_trip$trip_id"] <- "trip_id"names(trips.topsd)[names(trips.topsd)==
"data_trip$trip_time_in_secs"] <- "trip_sd"
![Page 60: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/60.jpg)
Play Again with Latitude and Longitude in R#recover from google maps and print top 10 trip by sdtrip_text=list()for(i in 1:10) {
coords=matrix(as.double(unlist(strsplit(trips.topsd$trip_id[i], "[|]"))), nrow=2,ncol=2,byrow=TRUE)
from=coords[1,]to=coords[2,]origin<-mapply(FUN = function(lon, lat)
revgeocode(c(lon, lat)), from[2], from[1]) destination<-mapply(FUN = function(lon, lat)
revgeocode(c(lon, lat)), to[2], to[1])
trip_text[i]=paste("Trip",i,"from",origin,"to",
destination,"has",round(trips.topsd$trip_sd[i],2)," SD.")}
![Page 61: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/61.jpg)
Play Again with Latitude and Longitude in Rprint(trip_text)
[[1]] [1] "Trip 1 from JFK Expressway, Jamaica, NY 11430, USA to JFK Expressway, Jamaica, NY 11430, USA has 3660.94 SD." [[2]] [1] "Trip 2 from Perimeter Rd, Jamaica, NY 11430, USA to 826 Greene Ave, Brooklyn, NY 11221, USA has 3436.54 SD." [[3]] [1] "Trip 3 from 46-36 54th Rd, Flushing, NY 11378, USA to 107-11 Van Wyck Expy, Jamaica, NY 11435, USA has 3181.98 SD.”……[[10]] [1] "Trip 10 from Central Terminal Area, Jamaica, NY 11430, USA to 34-40 E Houston St, New York, NY 10012, USA has 2206.17 SD."
![Page 62: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/62.jpg)
Play Again with Latitude and Longitude in R#Trip with the lowest fare’s Standard Deviation
#I assume each taxy run is uniquely identified #by "hack licence" and "pickup time". #I can build unique run_id's for data_fares and #data_trip tables and join them
data_fares$run_id<-paste(data_fares$hack_license,data_fares$pickup_datetime,sep="|")
data_trip$run_id<-paste(data_trip$hack_license,data_trip$pickup_datetime,sep="|")
![Page 63: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/63.jpg)
Play Again with Latitude and Longitude in R#I create a new dataframe merging data_fares and #data_trip on run_id
df_merge=merge(x=data_trip,y=data_fares, by.x="run_id", by.y="run_id", all.x=TRUE)
#groupby and standard deviation computation for #fare ampount
fares<-aggregate(df_merge$fare_amount ~ df_merge$trip_id, df_merge, sd)
![Page 64: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/64.jpg)
Play Again with Latitude and Longitude in R
#Keep track of tot number of runs for each trip
fares_c<-aggregate(df_merge$ones ~ df_merge$trip_id,df_merge, sum)
fares_merge=merge(x=fares,y=fares_c,by.x="df_merge$trip_id",by.y="df_merge$trip_id", all.x=TRUE)
names(fares_merge)[names(fares_merge)=="df_merge$trip_id"] <- "trip_id"names(fares_merge)[names(fares_merge)=="df_merge$fare_amount"] <- "fare_sd"names(fares_merge)[names(fares_merge)=="df_merge$ones"] <- "trip_count"
#exclude trip with less then 30 runs and orderfares_merge<-fares_merge[(fares_merge$trip_count>30),]fares_merge<- fares_merge %>% arrange((fares_merge$fare_sd))
![Page 65: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/65.jpg)
Play Again with Latitude and Longitude in R#get some extra information beyond numberstrip_text=list()for(i in 1:10) {
coords=matrix(as.double(unlist(strsplit(fares_merge$trip_id[i], "[|]"))), nrow=2,ncol=2,byrow=TRUE)from=coords[1,]to=coords[2,]
origin<-mapply(FUN = function(lon, lat) revgeocode(c(lon, lat)), from[2], from[1])
destination<-mapply(FUN = function(lon, lat)revgeocode(c(lon, lat)), to[2], to[1])
trip_text[i]=paste("Trip",i,"starts from",origin,"and end to to",destination)}
![Page 66: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/66.jpg)
Play Again with Latitude and Longitude in Rprint(trip_text)
[[1]] [1] "Trip 1 starts from 1585-1589 Broadway, New York, NY 10036, USA and end to 107-11 Van Wyck Expy, Jamaica, NY 11435, USA" [[2]] [1] "Trip 2 starts from 1700 3rd Ave, New York, NY 10128, USA and end to 53 E 124th St, New York, NY 10035, USA" [[3]] [1] "Trip 3 starts from 330 W 95th St, New York, NY 10025, USA and end to 534 W 112th St, New York, NY 10025, USA" ……[[10]][1] "Trip 10 starts from 762 Amsterdam Ave, New York, NY 10025, USA and end to 192 Claremont Ave, New York, NY 10027, USA"
![Page 67: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/67.jpg)
Play Again with Latitude and Longitude in R#prepare points to visualizenr_points=100
ffrom=matrix(nr_points*2,nrow=nr_points,ncol=2)tto=matrix(nr_points*2,nrow=nr_points,ncol=2)
for(i in 1:nr_points) {coords=matrix(as.double(unlist(strsplit(
fares_merge$trip_id[i], "[|]"))),nrow=2, ncol=2,byrow=TRUE)
from=coords[1,]to=coords[2,]
ffrom[i,1]=coords[1,1] ffrom[i,2]=coords[1,2] tto[i,1]=coords[2,1] tto[i,2]=coords[2,2]}
![Page 68: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/68.jpg)
Play Again with Latitude and Longitude in R#transform points in a matrix to points in a dataframe
start_end<-as_data_frame(list(from.lat=
ffrom[,1],from.lon=ffrom[,2],to.lat=tto[,1],to.lon=tto[,2]))
#plot the trip with the lowest fare’s SDggmap(ny_map, extent = "device") +
geom_point(aes(x = start_end$to.lon[1], y = start_end$to.lat[1]), colour = "red", alpha = 0.6, size = 10, data=start_end) +
geom_point(aes(x = start_end$from.lon[1], y = start_end$from.lat[1]),colour = "yellow", alpha = 0.6, size = 10, data=start_end)
![Page 69: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/69.jpg)
Play Again with Latitude and Longitude in R
![Page 70: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/70.jpg)
Play Again with Latitude and Longitude in R#plot the other trips aroung Manhattan area
ggmap(ny_map3, extent = "device") +geom_point(aes(x =
start_end$to.lon+0.00085,y = start_end$to.lat), colour = "red", alpha = 0.2, size = 10, data=start_end) +
geom_point(aes(x = start_end$from.lon, y = start_end$from.lat),colour = "green",alpha = 0.2, size = 10, data=start_end)
![Page 71: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/71.jpg)
Play Again with Latitude and Longitude in R
![Page 72: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/72.jpg)
From Mockup To Dashboard
Customer behaviour
Economics
Insights & Graphics
Other Insights
We can fill our mockup now
![Page 73: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/73.jpg)
From Mockup To Dashboard
Customer behaviour
Economics
Insights & Graphics
Other Insights
We can fill our mockup
![Page 74: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/74.jpg)
From Mockup To Dashboard
Let’s use some descriptive stats instead of graph in the Customer’s Behavior Section
> summary(data_trip$passenger_count) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 1.000 1.000 2.182 3.000 6.000> summary(data_trip$trip_time_in_secs/60) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.083 6.000 10.000 11.97 15.000 128.0 > summary(data_trip$trip_distance) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.110 1.160 1.930 2.943 3.420 45.46 > summary(data_fares$payment_type) CRD CSH DIS NOC UNK 257247 242503 2 16 232
![Page 75: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/75.jpg)
From Mockup To Dashboard
Customer Behaviour entriesAverage Number of Passengers p/Trip Average Time Spent on Taxi p/Trip
2.18 12'25th
Percentile Median 75th Percentile 25th Percentile Median 75th Percentile
1 1.0 3 6' 10' 15'
Average Number of Miles p/Trip Payements Type
2.94 miles Credit Card (51%)25th
Percentile Median 75th Percentile Cash NOC Other
1.2 1.9 3.4 48% 0.00% 1%
![Page 76: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/76.jpg)
From Mockup To Dashboard
Customer behaviour
Economics
Insights & Graphics
Other Insights
We can fill our mockup
![Page 77: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/77.jpg)
From Mockup To Dashboard
Let’s use some descriptive statistics instead of graph in the Economics Section
> summary(data_fares$fare_amount) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.50 6.50 9.50 12.18 14.00 385.00 > summary(data_fares$tip_amount) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 0.00 0.00 1.22 1.90 200.00 > summary(data_fares$total_amount) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.50 8.00 11.00 14.31 16.10 490.80 > summary(data_fares$total_amount-data_fares$tip_amount-data_fares$fare_amount) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0000 0.5000 0.5000 0.9158 1.0000 20.0000
![Page 78: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/78.jpg)
From Mockup To Dashboard
Average Tip p/Trip Average Other Earnings p/Trip
1.22 $ 0.92 $25th
Percentile Median 75th Percentile
25th Percentile Median 75th
Percentile0 $ 0 $ 1.9 $ 0.50 $ 0.50 $ 1.00 $
Average Amount Earned p/Trip Average Fare p/Trip
14.31 $ 12.18 $25th
Percentile Median 75th Percentile 25th Percentile Median 75th Percentile
8.00 $ 11.00 $ 16.10 $ 6.5 $ 9.50 $ 14 $
![Page 79: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/79.jpg)
From Mockup To Dashboard
Customer behaviour
Economics
Insights & Graphics
Other Insights
We can fill our mockup
![Page 80: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/80.jpg)
From Mockup To Dashboard
![Page 81: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/81.jpg)
From Mockup To Dashboard
Customer behaviour
Economics
Insights & Graphics
Other Insights
We can fill our mockup
![Page 82: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/82.jpg)
From Mockup To Dashboard Include some facts from which you can infer
something interesting Top 5 Busiest Hours The Busiest Hours are from 22:00 to 02:00
Trip with Most Volatile Travel Time Trip from JFK Expressway, Jamaica, NY 11430, USA to JFK Expressway, Jamaica, NY 11430, USA has 3660.94 SD.
Trip With Most Consisten Fares From 1585-1589 Broadway, NY 10036 to 107-11 Van Wyck Expy, Jamaica, NY 11435
![Page 83: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/83.jpg)
My version…Customer Habits on a Taxi Trip
25th Percentile Median 75th Percentile 25th Percentile Median 75th Percentile 25th Percentile Median 75th Percentile Cash NOC Other
1 1.0 3 6' 10' 15' 1.2 1.9 3.4 48% 0.00% 1%Economics
25th Percentile Median 75th Percentile 25th Percentile Median 75th Percentile 25th Percentile Median 75th Percentile 25th Percentile Median 75th Percentile
8.00 $ 11.00 $ 16.10 $ 6.5 $ 9.50 $ 14 $ 0 $ 0 $ 1.9 $ 0.50 $ 0.50 $ 1.00 $Taxi Life Insights
Top 10 Busiest Locations
Trip from JFK Expressway, Jamaica, NY 11430, USA to JFK Expressway, Jamaica, NY 11430, USA has 3660.94 SD.
Trip With Most Consisten Fares
From 1585-1589 Broadway, NY 10036 to 107-11 Van Wyck Expy, Jamaica, NY 11435
Pickup Points Busy Areas Top 10 Busiest Locations
Top 5 Busiest Hours
The Busiest Hours are from 22:00 to 02:00
Trip with Most Volatile Travel Time
Average Amount Earned p/Trip Average Fare p/Trip Average Tip p/Trip Average Other Earnings p/Trip
14.31 $ 12.18 $ 1.22 $ 0.92 $
Average Number of Passengers p/Trip Average Time Spent on Taxi p/Trip Average Number of Miles p/Trip Payements Type
2.18 12' 2.94 miles Credit Card (51%)
NYC Taxy Data Insigths
![Page 84: Using R for Building a Simple and Effective Dashboard](https://reader036.vdocuments.net/reader036/viewer/2022070513/588843b91a28ab7a298b6cc7/html5/thumbnails/84.jpg)
Thanks!
Andrea Gigli
https://about.me/andrea.gigli