data visualization - wyamaka.files.wordpress.com · software-and-data-visualization a line graph...

Post on 06-Oct-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DATA VISUALIZATIONDr.Woraphon Yamaka

OVERVIEW

• When you manage multiple content assets, such as social media or a blog, with multiple sources of data, it can get overwhelming. What should you be tracking? What actually matters? How do you visualize and analyze the data so you can extract insights and actionable information?

• More importantly, how can you make reporting more efficient when you're busy working on multiple projects at once?

BASIC PLOT •Column Chart•Bar Graph•Line Graph•Dual Axis Chart•Area Chart•Stacked Bar Graph•Mekko Chart•Pie Chart•Scatter Plot Chart•Bubble Chart•Waterfall Chart•Funnel Chart•Bullet Chart•Heat Map

COLUMN CHART

• A column chart is used to show a comparison among different items, or it can show a comparison of items over time. You could use this format to see the revenue per landing page or customers by close date.

0e+00

1e+05

2e+05

3e+05

1960 1970 1980 1990 2000

year

GD

P

BAR GRAPH

1960

1970

1980

1990

2000

0e+00 1e+05 2e+05 3e+05

GDP

year

1960

1970

1980

1990

2000

year

A bar graph, basically a horizontal column chart, should be used to avoid clutter when one data label is long or if you have more than 10 items to compare. This type of visualization can also be used to display negative numbers.

LINE GRAPH

2000

4000

6000

1960 1970 1980 1990 2000

year

GD

P

http://www.sthda.com/english/wiki/ggplot2-line-plot-quick-start-guide-r-

software-and-data-visualization

A line graph reveals trends or progress over time and can be used to show many different categories of data. You should use it when you chart a continuous data set.

DUAL AXIS CHART

time

X

0 20 40 60 80 100

-2

-1

0

1

2

0

2

4

6

Y

A dual axis chart allows you to plot data using two y-axes and a shared x-axis. It's used with three data sets, one of which is based on a continuous set of data and another which is better suited to being grouped by category. This should be used to visualize a correlation or the lack thereof between these three data sets.

https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2.html

AREA CHART

An area chart is basically a line chart, but the space between the x-axis and the line is filled with a color or pattern. It is useful for showing part-to-whole relations, such as showing individual sales reps' contribution to total sales for a year. It helps you analyze both overall and individual trend information.

STACKED BAR CHART

• A stacked area chart is the extension of a basic area chart. It displays the evolution of the value of several groups on the same graphic. The values of each group are displayed on top of each other, what allows to check on the same figure the evolution of both the total of a numeric variable, and the importance of each group.

http://t-redactyl.io/blog/2016/01/creating-plots-in-r-using-ggplot2-part-4-

stacked-bar-plots.html

0

10000

20000

30000

1960 1970 1980 1990 2000

year

gdpP

erc

ap

country

Japan

Thailand

BASIC PLOT • Mekko Chart

• Also known as a marimekko chart, this type of graph can compare values, measure each one's composition, and show how your data is distributed across each one.

• It's similar to a stacked bar, except the mekko's x-axis is used to capture another dimension of your values -- rather than time progression, like column charts often do. In the graphic below, the x-axis compares each city to one another.

PIE CHART

• A pie chart shows a static number and how categories represent part of a whole -- the composition of something. A pie chart represents numbers in percentages, and the total sum of all segments needs to equal 100%.

25

50

75

0/100

value

x

group

Child

Female

Male

SCATTER PLOT CHART

• A scatter plot or scattergramchart will show the relationship between two different variables or it can reveal the distribution trends. It should be used when there are many different data points, and you want to highlight similarities in the data set. This is useful when looking for outliers or for understanding the distribution of your data.

http://www.sthda.com/english/wiki/ggplot2-scatter-plots-quick-

start-guide-r-software-and-data-visualization

thai

thai

thai

thai

thai

thai

thai

thai

thai thai

2000

4000

6000

1960 1970 1980 1990 2000

year

gdpP

erc

ap

BUBBLE CHART

• A bubble chart is similar to a scatter plot in that it can show distribution or relationship. There is a third data set, which is indicated by the size of the bubble or circle.

https://www.r-graph-gallery.com/320-the-

basis-of-bubble-plot.html

BASIC PLOT

https://www.data-to-viz.com/graph/stackedarea.html

A waterfall chart should be used to show how an initial value is affected by intermediate values -- either positive or negative -- and resulted in a final value. This should be used to reveal the composition of a number. An example of this would be to showcase how overall company revenue is influenced by different departments and leads to a specific profit number.

FUNNEL CHART

• A funnel chart shows a series of steps and the completion rate for each step. This can be used to track the sales process or the conversion rate across a series of pages or steps.

HEAT MAP

GDP

pop

Life

GDP pop Life

Var1

Var2

0.00

0.25

0.50

0.75

1.00value

A heat map shows the relationship between two items and provides rating information, such as high to low or poor to excellent. The rating information is displayed using varying colors or saturation.

LET PRACTICE

REQUIRED PACKAGES:

# ggplot2install.packages("")

library(ggplot2)

library(dplyr)

library(gapminder)

library(latticeExtra)

COLUMN CHART

# Use the example data from gapminder

data = gapminder

# Prepare data

newdata=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))

# Run plot

ggplot(data=newdata, aes(x=year, y=gdpPercap)) +

geom_bar(stat="identity", fill="steelblue")

0e+00

1e+05

2e+05

3e+05

1960 1970 1980 1990 2000

year

GD

P

BAR GRAPH

1960

1970

1980

1990

2000

0e+00 1e+05 2e+05 3e+05

GDP

year

1960

1970

1980

1990

2000

year

ggplot(data=newdata, aes(x=year, y=gdpPercap,fill=year)) +geom_bar(stat="identity",color="blue")+theme_minimal()+ coord_flip()

LINE GRAPH

ggplot(data=newdata, aes(x=year, y=gdpPercap)) +geom_line(linetype = "dashed",color="red", size=3)+geom_point(color="red", size=3)+labs(title="Thailand GDP",x ="year", y = "Value")

2000

4000

6000

1960 1970 1980 1990 2000

year

Valu

e

Thailand GDP

DUAL AXIS CHART

library(latticeExtra)

obj1 =xyplot(gdpPercap~year,newdata1, type = "l", col=c("steelblue") , lwd=2)obj2 =xyplot(gdpPercap~year,newdata2, type = "l", col=c( "#69b3a2") , lwd=2)# Make the plot with second y axis:doubleYScale(obj1, obj2, add.ylab2 = TRUE, use.style=FALSE )

year

gd

pP

erc

ap

1960 1970 1980 1990 2000

1000

2000

3000

4000

5000

6000

5000

10000

15000

20000

25000

gd

pP

erc

ap

AREA CHART

ggplot(newdata, aes(x=year, y=gdpPercap)) +geom_area( fill="blue", alpha=0.4) +geom_line(color="red", size=2) +geom_point(size=3, color="green") +ggtitle("Thailand GDP")

STACKED BAR CHART

newdata1=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))

newdata2=data.frame(subset(data, country=="Japan"&year>1952&year<2007))

alldata=data.frame(rbind(newdata1,newdata2))

ggplot(aes(y = gdpPercap, x = year), data = alldata)+

geom_col(aes(fill = country ), width = 0.7) 0

10000

20000

30000

1960 1970 1980 1990 2000

year

gdpP

erc

ap

country

Japan

Thailand

PIE CHART

df <- data.frame(

group = c("Male", "Female", "Child"),

value = c(25, 25, 50)

)

ggplot(df, aes(x="", y=value, fill=group))+

geom_bar(width = 1, stat = "identity")+

coord_polar("y", start=0)+

scale_fill_manual(values=c("red", "blue", "green"))

25

50

75

0/100

value

x

group

Child

Female

Male

SCATTER PLOT CHART

ggplot(newdata, aes(x= year, y=gdpPercap)) +

geom_point(size=2, shape=23, color="red")+

geom_text(label="thai")

thai

thai

thai

thai

thai

thai

thai

thai

thai thai

2000

4000

6000

1960 1970 1980 1990 2000

year

gdpP

erc

ap

BUBBLE CHART

ggplot() +

geom_point(data=data.frame(newdata), aes(x=gdpPercap, y=lifeExp, size=pop, color=continent), alpha=0.5) +

scale_size(range = c(.1, 24), name="Population (M)")

BUBBLE CHARTnewdata1=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))

newdata2=data.frame(subset(data, country=="Japan"&year>1952&year<2007))

alldata=data.frame(rbind(newdata1,newdata2))

ggplot() +

geom_point(data=data.frame(alldata), aes(x=gdpPercap, y=lifeExp, size=pop, color= country), alpha=0.5) +

scale_size(range = c(.1, 24), name="Population (M)")

HEAT MAP

GDP

pop

Life

GDP pop Life

Var1

Var2

0.00

0.25

0.50

0.75

1.00value

library(reshape2)Life=newdata$lifeExpGDP =newdata$gdpPercappop =newdata$pop

mydata=cbind(GDP,pop,Life)cormat <- round(cor(mydata),2)head(cormat)melted_cormat <- melt(cormat)

ggplot() + geom_tile(data = melted_cormat, aes(x=Var1, y=Var2,

fill=value))+scale_fill_gradient(low="white", high="red")

SPECIAL PLOT: MAP PLOT

REQUIRED PACKAGES:

# ggplot2install.packages("")

library(ggplot2)

library(ggmap)

library(maps)

library("mapdata")

library("maptools")

library("rworldmap")

library("raster")

library(sp)

PLOT MAP

map('world', 'thai', fill = TRUE, col = "blue")

PLOT MAP

Th <- map_data('world', 'thai')

ggplot() + geom_polygon(data = Th , aes(x=long, y = lat, group = group)) + coord_fixed(1.3)

5

10

15

20

98 100 102 104 106

long

lat

DESIGN MAP# Adding point #https://geocode.xyz/

labs <- data.frame(

long = c(98.979263, 100.523186,99.83594,99.51453 ),

lat = c(18.796143, 13.736717, 19.90728,18.28471),

names = c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang"),

stringsAsFactors = FALSE

)

### Adding bubble

gg1 +

geom_point(data = labs, aes(x = long, y = lat), color = "black", size = 5) +

geom_point(data = labs, aes(x = long, y = lat), color = "yellow", size = 4)

5

10

15

20

98 100 102 104 106

long

lat

COMBINE MAP AND ECONOMIC DATA## Prepare DATA

mapdf <- data.frame(region= c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang"),

value=c(1,2,0.5,1),

stringsAsFactors=FALSE)

# Building Thailand geocode

long = c(98.979263, 100.523186,99.83594,99.51453 )

lat = c(18.796143, 13.736717, 19.90728,18.28471)

Country=c("Thailand")

region=c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang")

GDP=c(2,1,0,1.5)

Thaidata=data.frame(long ,lat,Country,region)

# # Building Thailand geocode + GDP

Thaidata1=data.frame(long ,lat,Country,region,GDP)

COMBINE MAP AND ECONOMIC DATA

## Plot GDP Bankok=2 million Baht , Chiang Mai 1 million Baht

gg1 +

geom_polygon(data = Thaidata, aes(x=long, y = lat, group = Country), fill="grey", alpha=0.3) +

geom_point( data=Thaidata1, aes(x=long, y=lat, size=GDP, color=GDP)) +

scale_size_continuous(range=c(1,12))+

scale_color_viridis(trans="log")

5

10

15

20

98 100 102 104 106

long

lat

GDP

0.0

0.5

1.0

1.5

2.0

1.00

1.25

1.50

1.75

GDP

WORLD MAP

worldMap <- getMap()

world.points <- fortify(worldMap)

world.points$region <- world.points$id

world.df <- world.points[,c("long","lat","group", "region")]

worldmap <- ggplot() +

geom_polygon(data = world.df, aes(x = long, y = lat, group = group)) +

scale_y_continuous(breaks = (-2:2) * 30) +

scale_x_continuous(breaks = (-4:4) * 45)

worldmap

-60

-30

0

30

60

-180 -135 -90 -45 0 45 90 135 180

long

lat

QUIZ

REFERENCES

• https://blog.hubspot.com/marketing/types-of-graphs-for-data-visualization

• Teutonico, D. (2015). ggplot2 Essentials. Packt Publishing Ltd.

top related