data visualization with r packages
TRANSCRIPT
R ile Veri Madenciliği Yaz Okulu, 07 – 13 Eylül 2015, Muğla,TOVAK ULUSLARARSI MARMARİS AKADEMİSİ
DATA VISUALIZATION WITH R PACKAGES
FATMA ÇINAR, MBA, CAPITAL MARKETS BOARD OF TURKEYE-mail: [email protected] @fatma_cinar_ftm @TRUserGroup
Kutlu MERİH, PhD, e-mail: [email protected] @cortexien https://www.riskonomi.com
Visualization of multidimensional multi factorial big data is not large data, big data is
complex data.
What is big data?
How Big Data Humour is big!
We are trainnig decipher this complexcity data Visualization.
Data Visualization packages of R software lattice and ggplot 2.
What is data analysis? Why use a programming language? Why use R ? Why lattice packages? What is lattice packages grammer of
graphics? Why ggplot2 ? What is ggplot2 grammer of graphics?
Agenda • Case study: BRSA NUTS and Sectoral Loans Default Chart of Turkey
Sectoral Loans Dataset Graphics Data-Mining Analysis
Action
Real Time Interactive Data Management for
Effect and Response Analysis
Technique: #Lattice and #ggplot2 Graphical Packages
using #R Software
#library(lattice) #library(ggplot2)
# This example uses the ENGTOVAKLOANS dataset, which comes with ggplot2
names(dataset)
Wednesday, September 02, 2015
names(dataset) names(dataset) [1] "NYEAR" "SYEAR" "QUARTERS" [4] "CITY" "CITYCODE" "NREGION" [7] "REGION" "NUTS3CODE" "NUTS2CODE" [10] "NUTS1CODE" "TRNUTS1REGION" "NUTS1REGION" [13] "TRGROUP" "SECTORAL" "CASHLOANS" [16] "NONCASHLOANS" "TOTALCASHLOANS" "AUTO" [19] "MORTGAGE" "OVERDRAFTACCOUNT" "CREDITCARDS" [22] "FOOD" "BUILDING" "MINERALS " [25] "FINANCIAL" "TEXTILE" "WHOSESALE " [28] "TOURISM" "AGRICULTURE" "ENERGY" [31] "MARITIME" "OTHERCONSUMER" "DEFRECEIVABLE" [34] "DEFCREDITCARDS" "DEFAUTO" "DEFMORTGAGE" [37] "DEFOTHERCONSUMER" "DEFFOOD" "DEFBUILDING" [40] "DEFMINERALS" "DEFFINANCIAL" "DEFTEXTILE" [43] "DEFWHOLESALE " "DEFTOURISM" "DEFAGRICULTURE" [46] "DEFENERGY" "DEFMARITIME" "NONCASHFOOD" [49] "NONCAHBUILDING" "NONCASHMINERALS" "NONFINANCIAL" [52] "NONCASHTEXTILE" "NONCASHWHOLESALE " "NONCASHTOURISM" [55] "NONCASHAGRICULTURE" "NONCASHENERGY" "NONCASHMARITIME"
Wednesday, September 02, 2015
• [1] "NYEAR" "SYEAR" "QUARTERS"
• [4] "CITY" "CITYCODE" "NREGION"
• [7] "REGION" "NUTS3CODE" "NUTS2CODE"
• [10] "NUTS1CODE" "TRNUTS1REGION" "NUTS1REGION"
• [13] "TRGROUP" "SECTORAL" "CASHLOANS"
• [16] "NONCASHLOANS" "TOTALCASHLOANS" "AUTO"
• [19] "MORTGAGE" "OVERDRAFTACCOUNT" "CREDITCARDS"
• [22] "FOOD" "BUILDING" "MINERALS "
• [25] "FINANCIAL" "TEXTILE" "WHOSESALE "
• [28] "TOURISM" "AGRICULTURE" "ENERGY"
• [31] "MARITIME" "OTHERCONSUMER" "DEFRECEIVABLE"
• [34] "DEFCREDITCARDS" "DEFAUTO" "DEFMORTGAGE"
• [37] "DEFOTHERCONSUMER" "DEFFOOD" "DEFBUILDING"
• [40] "DEFMINERALS" "DEFFINANCIAL" "DEFTEXTILE"
• [43] "DEFWHOLESALE " "DEFTOURISM" "DEFAGRICULTURE"
• [46] "DEFENERGY" "DEFMARITIME" "NONCASHFOOD"
• [49] "NONCAHBUILDING" "NONCASHMINERALS" "NONFINANCIAL"
• [52] "NONCASHTEXTILE" "NONCASHWHOLESALE " "NONCASHTOURISM"
• [55] "NONCASHAGRICULTURE" "NONCASHENERGY" "NONCASHMARITIME"
Wednesday, September 02, 2015
NUTS-1:12 Region of Turkey
MEDITERRANEAN SOUTHEAST ANATOLIA EAGEAN REGION NORTHEAST ANATOLIA MIDDLE ANATOLIA WEST BLACK SEA WEST ANATOLIA EAST BLACK SEA WEST MARMARA MIDDLE EAST ANATOLIA ISTANBUL EAST MARMARA
•NUTS-1: 12 Regions•NUTS-2: 26 Subregions•NUTS-3: 81 Provinces
(Nomenclature of Territorial Units for Statistics, NUTS)
İstanbul Region
West Marmara
Region
Aegean Region
East Marmara
West Anatolia Region
Mediterranean Region
Anatolia Region
West Black Sea Region
East Black Sea Region
Northeast Anatolia Region
East Anatolia Region
Southeast
Anatolia
İstanbul (Subregion)
Tekirdağ (Subregion)
İzmir (Subregion)
Bursa (Subregion)
Ankara (Subregion)
Antalya (Subregion)
Kırıkkale (Subregion)
Zonguldak (Subregion)
Trabzon (Subregion)
Erzurum (Subregion)
Malatya (Subregion)
Gaziantep
(Subregion)
Edirne Aydın (Subregion) Eskişehir Konya
(Subregion) Isparta Aksaray Karabük Ordu Erzincan Elazığ Adıyaman
Kırlareli Denizli Bilecik Karaman Burdur Niğde Bartın Giresun Bayburt Bingöl Kilis
Balıkesir (Subregion) Muğla Kocaeli
(Subregion) Adana (Subregion) Nevşehir Kastamonu
(Subregion) Rize Ağrı (Subregion) Dersim
Şanlıurfa
(Subregion)
Çanakkale Manisa (Subregion) Sakarya Mersin Kırşehir Çankırı Artvin Kars Van
(Subregion)Diyarba
kır
A.Karahisar Düzce Hatay (Subregion)
Kayseri (Subregion) Sinop Gümüşhane Iğdır Muş
Mardin (Subreg
ion)
Kütahya Bolu Kahramanmaraş Sivas Samsun (Subregion) Ardahan Bitlis Batman
Uşak Yalova Osmaniye Yozgat Tokat Hakkari Şırnak
Çorum Siirt
Amasya
1 Province 5 Province 8 Province 8 Province 3 Province 8 Province 8 Province 10 Province 6 Province 7 Province 8 Province9
Province
1. Lattice Graphics Packages
How to create basic plots (xyplot, scatterplots, histograms, boxwhisper, dotplot and bar using qplot()
Setting vs. mapping How to add group and factor=numerical
variable
Tuesday, May 2, 2023
1.1. XYPlot Graphic Module library(lattice) p<-xyplot(NUMERIC ~ NUMERIC) | FACTOR1,
group= FACTOR2, data=dataset) p p<-xyplot( NUM ~ NUM ) | FAC1+FAC2, group=
FAC3, data=dataset) p p<-xyplot(DEFENERGY ~ ENERGY |
CITY+factor(NYEAR), group=SECTORAL, data=dataset)
p
Description of XYPlot
Graphs
1. Lattice Graphics Packages
Tuesday, May 2, 2023
p<-xyplot(DEFENERGY ~ ENERGY | CITY+factor(NYEAR), group=SECTORAL, data=dataset)
XYPlot graph of the lattice packakge for 2 numerical 3
factors values
Tuesday, May 2, 2023
p<-xyplot(DEFENERGY ~ ENERGY | CITY+factor(NYEAR), group=SECTORAL, data=dataset)
p<-xyplot(DEFENERGY ~ ENERGY | SECTORAL+factor(NYEAR), group=NUTS1REGION,
data=dataset)
1.1.1. XYPlot Graphic Module and Legand
library(lattice) p<-xyplot(NUMERIC ~ NUMERIC) | FACTOR1, group= FACTOR2,
data=dataset)
p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL+factor(NYEAR), group=NUTS1REGION, data=dataset)
p p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) |
SECTORAL+factor(NYEAR), group=NUTS1REGION, auto.key=list(border=TRUE),data=dataset)
p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL, group=factor(NYEAR), auto.key=list(border=TRUE),data=dataset)
Tuesday, May 2, 2023
p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL+factor(NYEAR), group=NUTS1REGION, data=dataset)
p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL+factor(NYEAR), group=NUTS1REGION, auto.key=list(border=TRUE),data=dataset)
p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL, group=factor(NYEAR), auto.key=list(border=TRUE),data=dataset)
• We do factor-based analysis for the begining with the simplest graphical form of the histogram.
• Histograms of a single numeric value by one factor are the starting point of the factor based graphical analysis
Tuesday, May 2, 2023
1.2. Histogram Graphic Module
Description of
Histogram Graphs
Tuesday, May 2, 2023
p<-histogram( ~ log10(DEFENERGY) | SECTORAL, data=dataset)
p<-bwplot(SECTORAL ~ log10(DEFENERGY))
p<-bwplot(SECTORAL ~ log10(DEFENERGY) | NUTS1REGION)
1.3.DotPlot Graphic Module#p<-dotplot (FACTOR1 ~ NUMERIC | FACTOR2, group=FACTOR3, data=dataset)
p enter**********CITY!!!!!!***************
p<-dotplot (CITY ~ DEFENERGY | SECTORAL, group=factor(NYEAR), data=dataset)
p<-dotplot (CITY ~ log10(DEFENERGY) | SECTORAL, group=factor(NYEAR), data=dataset)
p<-dotplot (NUTS1REGION ~ DEFENERGY | SECTORAL, group=factor(NYEAR), data=dataset)
p<-dotplot(SECTORAL ~ log10(DEFENERGY) | CITY, group=factor(NYEAR), data=dataset)
Description of DotPlot
Graphs
Tuesday, May 2, 2023
p<-dotplot(SECTORAL ~ log10(DEFENERGY) | NUTS1REGION,group=factor(NYEAR),data=dataset)
p<-dotplot(SECTORAL ~ log10(DEFENERGY) | NUTS1REGION,group=factor(NYEAR),auto.key=list(border=TRUE),data=dataset)
p<-dotplot(SECTORAL ~ DEFENERGY | CITY, group=factor(NYEAR), data=dataset)
p<-dotplot(SECTORAL ~ log10(DEFENERGY) | CITY, group=factor(NYEAR), data=dataset)
p<-dotplot(CITY ~ DEFENERGY | SECTORAL, group=factor(NYEAR), data=dataset)
p<-dotplot (CITY ~ log10(DEFENERGY) | SECTORAL, group=factor(NYEAR), data=dataset)
2. Ggplot2 Graphics Packages
How to create basic plots (xyplot, scatterplots, histograms, baloon, facet, density and violin) using qplot()
Setting vs. mapping How to add extra variables with aesthetics
(like color, shape, and size) or faceting
https://plot.ly/ggplot2/geom_bar/
Tuesday, May 2, 2023
What is ggplot2 ?
Grammer of graphics represents and abstraction of graphics ideas/objects
Think ‘verb’, ‘noun’, ‘adjective’ for graphics Allows for a ‘theory’ of graphics on which to build
new graphics and graphics ogjects ‘Shorten the distence from mind to page’
Tuesday, May 2, 2023
Grammer of Graphics ?
‘In brief, the grammer tells us that a statistical graphic is a mapping from data to aesthetic attributes (color, shape, size) of geometric object (point, lines, bars).The plot may also contain stastistical transformations of data and drawn on a specific coordinate system’
Hadley Wickham
Tuesday, May 2, 2023
2.1.Logarithm Module
library(ggplot2) ds<-ggplot(dataset) #as<-aes(log10(NUMERIC), log10(NUMERIC), color=FACTOR) as<-aes(log10(ENERGY), log10(DEFENERGY),
color=SECTORAL) lx<-scale_x_log10() ly<-scale_y_log10() p<-ds+as+gp+lx+ly p
Tuesday, May 2, 2023
How to add extra variables with aesthetics (like color, shape, and size)
#as<-(NUMERIC, NUMERIC, color=FACTOR, shape=factor(NUMERIC), size=NUMERIC
as<-aes(ENERGY,DEFENERGY, color=NUTS1REGION, shape=factor(NYEAR), size=DEFRECEIVABLE
gp<-geom_point()ds<-ggplot(dataset)ds<-ggplot(dataset)p<-ds+as+gpp enter
2.1.1. Baloon Graphic Module
Tuesday, May 2, 2023
Description of Baloon
Graphs
Baloon graphs of ggplot2 package can show us
3-dimensional relations distributed according 1-3
factors in scatterplot form.
With this type 2-dimensional numerical relations
can be represented under effect of 3rd numerical
value.
Tuesday, May 2, 2023
as<-aes(Log10(ENERGY), (log10(DEFENERGY), color=factor(NYEAR), shape=SECTORAL), size=DEFRECEIVABLE
ae<-aes(log10(ENERGY), log10(DEFENERGY), color=SECTORAL)gp<-geom_point()ds<-ggplot(dataset)
dataset=subset(dataset, ENERGY!=0)dataset=subset(dataset, DEFENERGY!=0)ss<-stat_smooth(method = "lm", formula = y ~ x, size = 2)p<-ds+ae+gp+ssp
2.1.2. PowerLaw Graphic Module
Description of Baloon
Graphs
ss<-stat_smooth(method = "lm", formula = y ~ x, size = 2)
Tuesday, May 2, 2023
ae<-aes(log10(ENERGY), log10(DEFENERGY), color=SECTORAL)ss<-stat_smooth(method = "lm", formula = y ~ x, size = 2)
p<-ds+ae+gp+ss
ae<-aes(log10(ENERGY), log10(DEFENERGY), color=NUTS1REGION)p<-ds+ae+gp+ss
3.Density Graphic Module#ad<-aes(NUMERIC, color=FACTOR)ad<-aes(ENERGY, color=SECTORAL)
#as<-aes(log10(NUMERIC), fill=FACTORad<-aes(log10(ENERGY), fill=SECTORAL)
gd<-geom_density()gd<-geom_density(alpha=0.5)ds<-ggplot(dataset)p<-ds+ad+gdp enter
P.S It will be one Numeric Variable
Description of Density
Graphs
Tuesday, May 2, 2023
• Density Graphs are the continuous version of Histograms
• They plot a single numerical variable against their frequancy.
• We can detect single or multiple peaks of density graphs and pinpoint the effective factors.
• On the other hand soperposing density graphs acording the factors with different colors provide us with information of the effect of the factors
• Logarithmic scale leads a more stable density formations for financial data.Description of
Density Graphs
Tuesday, May 2, 2023
ad<-aes(log10(ENERGY),fill=SECTORAL)p<-ds+ad+gd
NUTS Eagean
Regions Log10
Energy Vs Log10 Default Energy, Baloon
Defreceivable Explained by Sectoral and Year Factors Density/
ViolinGraphics
3.1.Density Bar Graphic Module
#ad<-aes(NUMERIC, color=FACTOR)ad<-aes(ENERGY, color=SECTORAL)
#as<-aes(log10(NUMERIC), fill=FACTOR
ab<-aes(log10(ENERGY), fill=SECTORAL)
gbd<-geom_bar(position="dodge")gbs<-geom_bar(position="stack")
ds<-ggplot(dataset)ab<-aes(log10(ENERGY), fill=SECTORAL)p<-ds+ab+gbs
p enterTuesday, May 2, 2023
ab<-aes(log10(ENERGY), fill=SECTORAL)gbs<-geom_bar(position="stack")
p<-ds+ab+gbs
ab<-aes(log10(ENERGY), fill=SECTORAL)gbd<-geom_bar(position="dodge")
p<-ds+ab+gbd
4.Facet Graphic Module f<-facet_grid(FACTOR ~ NUMERIC) f<-facet_grid(NUTS1REGION ~ NYEAR) f<-facet_grid(SECTORAL ~ NYEAR) f<-facet_grid(NYEAR ~ SECTORAL)***
ds<-ggplot(dataset) gv<-geom_violin(),gp<-geom_point(),gd<-geom_density() p<-ds+as+gp+f p<-ds+as+gv+f p<-ds+as+gd+f
av<-aes(ENERGY,DEFENERGY,fill=SECTORAL,color=NUTS1REGION)f<-facet_grid(NYEAR ~ SECTORAL)p<-ds+av+gp+(lx+ly)+f
Tuesday, May 2, 2023
av<-aes(ENERGY,DEFENERGY,fill=SECTORAL,color=NUTS1REGION) f<-facet_grid(NYEAR ~ SECTORAL)
p<-ds+av+gp+f
Facet graphs of ggplot2 package can show us 3-dimensional graphs distributed according 3 factors in matrix form.
In which we can see the anomalies occurs on which year and which region and which period.
Here we investigate default energy versus default loans bloonad by total loans according to region, year and period factors.
Colors period, balloons Total Cash loans.
Description of Facet GraphsTuesday, May 2, 2023
4.1.Facet Violin Graphic Module f<-facet_grid(FACTOR ~ NUMERIC) f<-facet_grid(NUTS1REGION ~ NYEAR) f<-facet_grid(SECTORAL ~ NYEAR) f<-facet_grid(NYEAR ~ SECTORAL)***
ds<-ggplot(dataset) gv<-geom_violin(),gp<-geom_point(),gd<-geom_density() p<-ds+av+gv+lx+ly+f p<-ds+as+gv+f
Tuesday, May 2,
2023
Tuesday, May 2, 2023
av<-aes(ENERGY, DEFENERGY, fill=SECTORAL) f<-facet_grid(NYEAR ~ SECTORAL)
p<-ds+av+gv+f
5.Violin Graphic Module subset ds<-ggplot(dataset) dataset=subset(dataset,ENERGRY!=0) dataset=subset(dataset,DEFENERGRY!=0)
Subset Justify m<-length(dataset[,1]) m enter [m] 3046 ….
Tuesday, May 2,
2023
• ds<-ggplot(dataset)• av<aes(ENERGY,DEFENERGY,fill=SECTORAL)• gv<-geom_violin()• gj<-geom_jitter()• p<-ds+av+gv+gj+lx+ly• p enter
Tuesday, May 2, 2023
Description of Violin Graphs
•Violin Graphs can be seen as two-dimensional density graphs
•Usually Violin Graphs comes with Mushroom, Potter and Bottle formations
•Violin Graphs are very important for Risk Analysis of financial Data
•Through the mean of X-axis Y-density graph ocuurs with nirror copy
•Mushroom formation represents a risk concentration on hig order values of financial data
•Potter means risk on the medium order and the bottle menas risk on the lower orders
av<-aes(ENERGY,DEFENERGY, fill=NUTS1REGION)p<-ds+av+gv+gj+ly
av<-aes(log10(ENERGY),log10(DEFENERGY), fill=NUTS1REGION)p<-ds+av+gv+lx+ly
I would like to express my deep gratitude to;
Dr. Kutlu MERİH,Dr. C. Coşkun KÜÇÜKÖZMENfor their valuable contibutions,
Fatma ÇINAR
Contact
http://www.ieu.edu.tr/tr
http://www.coskunkucukozmen.com
http://www.spk.gov.tr/
http://www.riskonomi.com
@TRUserGroup
@CORTEXIEN
@Riskonometri
@Riskonomi
@datanalitik
@Riskanalitigi
@RiskLabTurkey
@fatma_cinar_ftm
tr.linkedin.com/in/fatmacinar
tr.linkedin.com/pub/kutlu-merih
tr.linkedin.com/in/coskunkucukozmen
KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2014). “MODELLİNG OF CORPORATE PERFORMANCE IN MULTİ-DİMENSİONAL COMPLEX STRUCTURED ORGANİZATİONS “CBBC” MANAGEMENT”, SUBMİTTED TO THE “2ND INTERNATİONAL SYMPOSİUM ON CHAOS, COMPLEXİTY AND LEADERSHİP (ICCLS), DECEMBER 17-19 AT MİDDLE EAST TECHNİCAL UNİVERSİTY (METU), ANKARA, TURKEY.KÜÇÜKÖZMEN, C. C. VE ÇINAR F., (2014). “FİNANSAL KARAR SÜREÇLERİNDE GRAFİK-DATAMİNİNG ANALİZİ”, TROUGBI/DW SIG, NİSAN 2014 İSTANBUL, HTTP://WWW.TROUG.ORG/?P=684 KÜÇÜKÖZMEN, C. C. VE ÇINAR F., (2014). “GÖRSEL VERİ ANALİZİNDE DEVRİM” SÖYLEŞİ, EKONOMİK ÇÖZÜM, TEMMUZ 2014, HTTP://EKONOMİK-COZUM.COM.TR/GORSEL-VERİ-ANALİZİNDE-DEVRİM-Mİ.HTML.KÜÇÜKÖZMEN, C. C. VE MERİH K., (2014). “GÖRSEL TEKNİKLER ÇAĞI" SÖYLEŞİ, EKONOMİK ÇÖZÜM, TEMMUZ 2014, HTTP://EKONOMİK-COZUM.COM.TR/GORSEL-TEKNİKLER-CAGİ.HTMLKÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2014). “BANKİNG SECTOR ANALYSİS OF IZMİR PROVİNCE: A GRAPHİCAL DATA MİNİNG APPROACH”, SUBMİTTED TO THE 34TH NATİONAL CONFERENCE FOR OPERATİONS RESEARCH AND INDUSTRİAL ENGİNEERİNG (YAEM 2014), GÖRÜKLE CAMPUS OF ULUDAĞ UNİVERSİTY İN BURSA, TURKEY ON 25-27 JUNE 2014. MERİH, K. VE ÇINAR, F., (2013). “MODELLİNG OF CORPORATE PERFORMANCE IN MULTİ-DİMENSİONAL COMPLEX STRUCTURED ORGANİZATİONS: “CBBC” APPROACH”, SUBMİTTED TO THE ECONANADOLU 2013: ANADOLU INTERNATİONAL CONFERENCE İN ECONOMİCS III JUNE 19-21, 2013, ESKİŞEHİR. HTTP://WWW.ECONANADOLU.ORG/EN/İNDEX.PHP/ARTİCLES2013/3683KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2014). “NEW SECTORAL INCENTİVE SYSTEM AND CREDİT DEFAULTS: GRAPHİC-DATA MİNİNG ANALYSİS”, SUBMİTTED TO THE ICEF 2014 CONFERENCE, YILDIZ TECHNİCAL UNİVERSİTY İN İSTANBUL, TURKEY ON 08-09 SEP. 2014.PEDRONİ M., AND BERTRAND MEYER (2009). “OBJECT-ORİENTED MODELİNG OF OBJECT-ORİENTED CONCEPTS”, ‘A CASE STUDY İN STRUCTURİNG AN EDUCATİONAL DOMAİN’, CHAİR OF SOFTWARE ENGİNEERİNG, ETH ZURİCH, SWİTZERLAND. FMİCHELA.PEDRONİ|BERTRAND.MEYERG@İNF.ETHZ.CHKÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2015). “VİSUAL ANAYSİS OF ELECTRİCİTY DEMAND ENERGY DASHBOARD GRAPHİCS” SUBMİTTED TO THE 5TH MULTİNATİONAL ENERGY AND VALUE CONFERENCE MAY 7-9, 2015 KADİR HAS UNİVERSİTY İN İSTANBUL, TURKEY