02b data structures datasets
Post on 08-Jul-2018
230 Views
Preview:
TRANSCRIPT
-
8/19/2019 02b Data Structures Datasets
1/96
Data Analysis & DataScience with R
Data structures in R.Build-in Datasets
By Marin Fotache
Al.I. Cuza University of Iași
Faculty of Economics and Business Administration
Department of Accounting, Information ystems andtatistics
-
8/19/2019 02b Data Structures Datasets
2/96
Data structures in R
-
8/19/2019 02b Data Structures Datasets
3/96
Tutorials (and code) on DataStructures
Data structures !Advanced " #y $adley %ic&'am(
'ttp)**adv+r.'ad.co.nz*Data+structures.'tml
.- aria#les !aria#les and Data tructures(
'ttps)**///.youtu#e.com*/atc'0v1D2345f6/
- + Introduction to " ) Atomic Classes
'ttps)**///.youtu#e.com*/atc'0v1-3F8A4av4E
'ttp)**repidemiology./ordpress.com*introduction+to+r+code*
.7 ectors !aria#les and Data tructures(
'ttps)**///.youtu#e.com*/atc'0v19yg:/33$s6
7+ Introduction to " ) ectors
'ttps)**///.youtu#e.com*/atc'0v1;2p'/m
-
8/19/2019 02b Data Structures Datasets
4/96
Tutorials on Data Structures (cont.)
.? @ists and Data Frames !aria#les and Data tructures(
'ttps)**///.youtu#e.com*/atc'0v1Uv#">el7&9. @ogical ectors and perators !aria#les and Data
tructures(
'ttps)**///.youtu#e.com*/atc'0v129#37?-c
>+ Introduction to " ) ;atri, @ist and Data Frame'ttps)**///.youtu#e.com*/atc'0v1cEi
-
8/19/2019 02b Data Structures Datasets
5/96
R script associated with thispresentation
-#KdataKstructuresKKdatasets."
'ttp)**drv.ms*s4ll@B
http://1drv.ms/1sYllLBhttp://1drv.ms/1sYllLB
-
8/19/2019 02b Data Structures Datasets
6/96
-
8/19/2019 02b Data Structures Datasets
7/96
Vectors o nu!"ers withse#uences
ectors can also #e created /it' a seuence> ten&integer'1 - 5*14
> ten&integer'1
[1] 5 + 7 1. 11 12 13 14
or
> ten&integer'2 - 'e/(0ro=5, to=14, y=1)> ten&integer'2
[1] 5 + 7 1. 11 12 13 14
Declare a vector of descending num#ers
> 'e/(0ro=5, to=-5, y=-1)
[1] 5 4 3 2 1 . -1 -2 -3 -4 -5
Com#ine seuences and c function> &ector - c( 2*4, *14)
> &ector
[1] 2 3 4 1. 11 12 13 14
-
8/19/2019 02b Data Structures Datasets
8/96
Vectors containin$ a ran$e odates
2enerating a vector /it' dates #et/eeneptem#er -Lt' and cto#er -nd -> asMpureM dates◦ First solution:
> 'e/('te("2.146.62"), y = "y", 8engtho9t = 4)
◦ Second solution:
> 'e/('te("2.146.62"), 'te("2.1461.6.2"),
"y'")
In #ot' cases t'e result is)[1] "2.14-.-2" "2.14-.-3." "2.14-1.-.1" "2.14-1.-.2"
-
8/19/2019 02b Data Structures Datasets
9/96
Vectors containin$ a ran$e oti!esta!ps
2enerating a vector /it' dates #et/eeneptem#er -Lt' and cto#er -nd -> astimestamps
◦ First solution
> 'e/(c(:%;te(2.14,,2)), y = "%Ty", 8engtho9t = 4)
◦ Second solution
> x - ' 0ort('e/(x, y="y", 8engtho9t=),
"?@-?-? ?A")
◦ Third solution
> 1-:%;te(yer=2.14,onth=,y=25,tz="BCT")
> 'e/(0ro=1,y="y",8engtho9t=)
-
8/19/2019 02b Data Structures Datasets
10/96
Vectors $enerated ro! thenor!al distri"ution
ector o#ect named % contains Nve rando! values dra/n from t'e standard normaldistri#utionO values are not ordered
> x - rnor(5)
> x
[1] -.27++5++ .72+2... .55.5-.34.3+ -.5124+
5um#ers are etracted randomly, so t'at t'esame function /ill dra/ ot'er Nve num#ers)> x - rnor(5)
> x
[1] 1.3.714 -1713177 -.227+++.3+275 .42.3.14
-
8/19/2019 02b Data Structures Datasets
11/96
Vectors created with unction rep (repeat)
ector %.rep contains a seuence ofnum#ers !?, 3, ( repeated t'ree times
> xreD - reD(c(5, 7, 11), 3)
> xreD[1] 5 7 11 5 7 11 5 7 11
ee t'e diPerence /it' version /'ic' useseach clause)
> xreD2 - reD(c(5, 7, 11), ech=2,tie'=3)
> xreD2
[1] 5 5 7 7 11 11 5 5 7 7 11 115 5 7 7 11 11
l " il i (
-
8/19/2019 02b Data Structures Datasets
12/96
%a!ple o "uilt-in (syste!de'ned) ectors
> $etter'
[1] "" "" "c" "" "e" "0" "g" "h" "i" "E" "" "8" "" "n""o" "D" "/" "r" "'" "t" "9" "" "w" "x" "y" "z"
> $!TT!R%[1] "#" "F" "G" "" "!" "" "B" "H" ":" "I" "J" "$" "C" "K"";" " 'ttere[1] 51+. 5757 113. 531.4
-
8/19/2019 02b Data Structures Datasets
13/96
Vectors o actors
Factors are nominal varia#les /'ose values 'ave a num#er oflevelsery important in data analysis and visualizationE) t/o vectors)
◦ student names
◦
student genresBot' vectors initially contain c'aracters
> ne' - c( " c8''(genre)
[1] "chrcter"
-
8/19/2019 02b Data Structures Datasets
14/96
Vectors o actors (cont.)
> 9nc8''(genre)
[1] "e8e" "C8e" "e8e" "C8e" "e8e" "e8e"Genre can have only two values, so it is converted into a factor
> genre - '0ctor(genre)
> c8''(genre)
[1] "0ctor"
> 9nc8''(genre)
[1] 1 2 1 2 1 1
ttr(,"8ee8'")
[1] "e8e" "C8e"
If a non existing value is added in vector "genre", it isautomatically converted back into character> genre - c(genre, "Foy")
> c8''(genre)
[1] "chrcter"
> 9nc8''(genre)" " " " " " " " " " " " " "
F ti tti t
-
8/19/2019 02b Data Structures Datasets
15/96
Functions or $ettin$ ectortype and len$th
Class returns elements data typeO unclass returns t'evalues
> c8''(ten&integer'1)[1] "integer"
> 9nc8''(ten&integer'1) [1] 5 + 7 1. 11 12 13 14
Internally, factor levels are stored as integers> c8''(genre)[1] "0ctor"
> 9nc8''(genre)[1] 1 2 1 2 1 1
ttr(,"8ee8'")[1] "e8e" "C8e"
> tyDeo0(genre)[1] "integer"
Function lengt' returns t'e num#er of elements in a vector
> 8ength(ten&integer'1)[1] 1.
-
8/19/2019 02b Data Structures Datasets
16/96
Reerencin$ ector ele!entsFirst element in vector teninte$ers.*
> ten&integer'1 [1][1] 5@ast element in vector teninte$ers.*
> ten&integer'1 [8ength(ten&integer'1)]
[1] 14First t'ree elements in vector tenKintegers.
> ten&integer'1 [1*3]
[1] 5 + 7@ast t'ree elements in vector
> ten&integer'1 [(8ength(ten&integer'1)-2) *8ength(ten&integer'1)]
[1] 12 13 14First, t'ird, Nft' and sit' elements
> ten&integer'1 [c(1, 3, 5, +)]
[1] 5 7 1.
-
8/19/2019 02b Data Structures Datasets
17/96
-
8/19/2019 02b Data Structures Datasets
18/96
-
8/19/2019 02b Data Structures Datasets
19/96
%cludin$ ele!ents ro! a ector(cont.)
Ecluding Nrst t'ree elements and t'e t'
element and t'e 6t' element
> ten&integer'1 [-(c(1*3,+,))]
[1] 11 13 14
Ecluding t'e Nrst t/o elements and
t'e last t/o elements of t'e vector)
> ten&integer'1 [-c((1*2),(8ength(ten&integer'1)-1) *8ength(ten&integer'1))]
[1] 7 1. 11 12
-
8/19/2019 02b Data Structures Datasets
20/96
Vector 'lterin$
Filter vector elements + select only elements
greater t'an > ten&integer'1 [ten&integer'1 > 1.]
[1] 11 12 13 14
$o/ many elementes are greater t'an 0
> 8ength(ten&integer'1 [ten&integer'1 > 1.])[1] 4
Display I5DICE of elements greater t'an > which (ten&integer'1 > 1.)
[1] 7 1.Filter vector elements + select only elementsgreater t'an Q ver. -
> in - which (ten&integer'1 > 1.)
> ten&integer'1 [in]
[1] 11 12 13 14
-
8/19/2019 02b Data Structures Datasets
21/96
Sortin$+orderin$ a ector
Initial vector
> ne' - c( " ne' - 'ort(ne')
> ne'
[1] "ointe : Kico8et" "Benete : #9re8i"":one'c9 M Miore8" "$zr T :on9t"
[5] "
-
8/19/2019 02b Data Structures Datasets
22/96
R as a ectori,ed lan$ua$e
-
8/19/2019 02b Data Structures Datasets
23/96
R as a ectori,ed lan$ua$e(cont.)
perations can #e applied on t/o or more vectors
> n9ec3 - n9ec1 O n9ec2> n9ec3
[1] 1.2 1.+ 11. 114 15. 74 14
Compare a vector /it' a value
> x
[1] -.5+757455 -...734 .243715+ -.5132523 ..32.27> x >= .
[1] #$%! #$%! TR! #$%! TR!
> x1 - x >= .
> x1
[1] #$%! #$%! TR! #$%! TR! Hesting if at least one of t'e vector elements fulNls t'e predicate> x
[1] -.5+757455 -...734 .243715+ -.5132523 ..32.27
> ny(x > .)
[1] TR!
-
8/19/2019 02b Data Structures Datasets
24/96
R as a ectori,ed lan$ua$e(cont.)
Hesting if all t'e vector elements fulNll t'epredicate !function 88(> 88(x > .)
[1] #$%!
> 88(x > -25)[1] TR!
For a c'aracter vector, display t'e num#er ofc'aracters for eac' element
> y
[] "one" "two" "three" "eight"
> nchr(y)
[1] 3 3 5 5
>
-
8/19/2019 02b Data Structures Datasets
25/96
a!in$ ector ele!ents
rovide a name for eac' vector element> n9&ro = c (one = "9n9", two="oi", three="trei",0o9r="Dtr9")
> n9&ro
one two three 0o9r
"9n9" "oi" "trei" "Dtr9"
H'e same result can #e accomplis'ed /it')> n9&ro = c ("9n9", "oi", "trei", "Dtr9")
> n9&ro
[1] "9n9" "oi" "trei" "Dtr9"
> ne'(n9&ro) = c ("one", "two", "three", "0o9r")
> n9&ro
one two three 0o9r
"9n9" "oi" "trei" "Dtr9"
-
8/19/2019 02b Data Structures Datasets
26/96
Descriptie statistics on ectors
A vector !age( containing t'e age of persons
!8a#acoP, -(> ge = c(1,3,5,2,11,,3,,12,3)
Anot'er vector containing t'e /eig't of a#ove people> weight = c(44,53,72,52,5,73,+.,1.4,1.2,+1)
uppose a#ove /eig'ts /ere in U metric system, /e 'adconvert t'em from l#s into &g
> weightg - weight P .454
Compute t'e mean of peopleRs /eig't> en(weight)
[1] 7.+Compute t'e standard deviation of peopleRs /eig't> '(weight)
[1] 2.774
Compute correlation #et/een age and /eig't> cor e wei ht
-
8/19/2019 02b Data Structures Datasets
27/96
Matrices
H/o+dimensional arrays /'ere eac' element 'ast'e same type !numeric,c'aracter, or logical(
Created /it' t'e trix function. Format)> Cyytrix - trix(ector, nrow=n9er&o0&row',
nco8=n9er&o0&co89n', yrow=8ogic8&89e, ine'=8i't( chr&ector&rowne',chr&ector&co8ne'))
◦ vector contains the elements for the matrix
◦ nrow and ncol secify the row and column dimensions
◦ dimnames contains otional row and column labels stored in
character vectors!
◦ byrow indicates whether the matrix should be filled in by row
byrow=TRUE # or by column byrow=FALSE #$ the default is by
column!
-
8/19/2019 02b Data Structures Datasets
28/96
Matrices (cont.)m. is a ? > matri
> 1 - trix(1*2., nrow=5, nco8=4)> 1
[,1] [,2] [,3] [,4]
[1,] 1 + 11 1+
[2,] 2 7 12 17
[3,] 3 13 1
[4,] 4 14 1
[5,] 5 1. 15 2.
m.- is a - - matri, Nlled #y ro/s> ce88' - c(1,2+,24,+)
> rowne' - c("Row1", "Row2")
> co8ne' - c("Go81", "Go82")
> 2 - trix(ce88', nrow=2, nco8=2, yrow=TR!,
O ine'=8i't(rowne', co8ne'))
-
8/19/2019 02b Data Structures Datasets
29/96
Matrices (cont.)
Display m.-
> 2
Go81 Go82
Row 1 1 2+
Row 2 24 +
m.7 is a - - matri, Nlled #y columns
list is a data structure presented after data frame
> 3 - trix(ce88', nrow=2, nco8=2,
yrow=#$%!,
O ine'=8i't(rowne', co8ne'))
> 3
Go81 Go82
Row 1 1 24
Row 2 2+ +
-
8/19/2019 02b Data Structures Datasets
30/96
-
8/19/2019 02b Data Structures Datasets
31/96
Accesin$ !atri% ele!ents> 1
[,1] [,2] [,3] [,4]
[1,] 1 + 11 1+
[2,] 2 7 12 17
[3,] 3 13 1
[4,] 4 14 1
[5,] 5 1. 15 2.
Display t'e 7rd ro/> 1[3,]
[1] 3 13 1
Display t'e 7rd column
> 1[,3][1] 11 12 13 14 15
Display t'e element at t'e intersection of t'e -ndro/ and t'e 7rd column> 1 [2,3]
[1] 12
A i t i l t
-
8/19/2019 02b Data Structures Datasets
32/96
Accesin$ !atri% ele!ents(cont.)
Display t/o elements from t'e same ro/) m. S-,7Tand m.S-,>T
> 1 [2, c(3,4)]
[1] 12 17
Display t'ree elements from t'e same column)m.S,-T, mS-,-T and m.S7,-T
> 1 [c(1,2, 3), 2]
[1] + 7
Display a Msu#matriM, from m S-,-T to m-S>.>T> 1 [ c(2,3,4), c(2,3,4)]
[,1] [,2] [,3]
[1,] 7 12 17
[2,] 13 1
-
8/19/2019 02b Data Structures Datasets
33/96
Basic statistics on !atri%> 4
co81 co82 co83row1 1 2 3
row2 4 5 +
row3 7
row4 1. 11 12
Compute mean of all t'e cells in matri m.>
> en(4)
[1] +5
Compute mean of all t'e cells on t'e t'ird column
> en(4[,3])
[1] 75
Compute mean of all t'e cells on t'e t'ird ro/
> en(4[3,])[1]
-
8/19/2019 02b Data Structures Datasets
34/96
Basic statistics on !atri% (cont.)
Compute sum of all t'e cells in matri m.>> '9(4)
[1] 7
Compute sum of all t'e cells on t'e t'ird column
> '9(4[,3])
[1] 3.
Compute sum of all t'e cells on t'e t'ird ro/
> '9(4[3,])[1] 24
Compute sum of all t'e cells in matri m.>
> '9(4)
[1] 7
-
8/19/2019 02b Data Structures Datasets
35/96
rowSu!s+colSu!sro/ums calculates t'e sum of t'e cells for eac' ro/ of a
matri> row%9'(4)
row1 row2 row3 row4
+ 15 24 33
colums calculated t'e sums of t'e cells for eac' column of
a matri> co8%9'(4)
co81 co82 co83
22 2+ 3.
ro/;eans*col;eans calculate mean of t'e every ro/*column
> rowCen'(4)
row1 row2 row3 row4
2 5 11
> co8Cen'(4)
co81 co82 co8355 +5 75
Addin$ total rows and colu!ns to
-
8/19/2019 02b Data Structures Datasets
36/96
Addin$ total rows and colu!ns toa !atri%
> 4 co81 co82 co83
row1 1 2 3
row2 4 5 +
row3 7
row4 1. 11 12
Add total column> 4 - cin(4, row%9'(4))
etting t'e name for t'e total column
> co89nne' - co8ne'(4)
> co89nne'[1] "co81" "co82" "co83" ""
> co89nne'[8ength(co89nne')] -
"co8tot8"
> co8ne'(4) - co89nne'
Addi t t l d l t
-
8/19/2019 02b Data Structures Datasets
37/96
Addin$ total rows and colu!ns toa !atri% (cont.)
C'ec& t'e operation> 4
co81 co82 co83 co8tot8
row1 1 2 3 +
row2 4 5 + 15
row3 7 24row4 1. 11 12 33
Add total ro/
> 4 - rin(4, co8%9'(4))
etting t'e name for t'e total column> rowne' - rowne'(4)
> rowne'
[1] "row1" "row2" "row3" "row4" ""
> rowne'[8ength(rowne')] - "rowtot8"
> rowne'(4) - rowne'
-
8/19/2019 02b Data Structures Datasets
38/96
Addin$ total rows and colu!ns toa !atri% (cont.)
C'ec& t'e operationO notice t'e
names of ro/s and columns and t'e
content of last ro/ and column
> 4
co81 co82 co83 co8tot8
row1 1 2 3 +
row2 4 5 + 15
row3 7 24
row4 1. 11 12 33
rowtot8 22 2+ 3. 7
-
8/19/2019 02b Data Structures Datasets
39/96
Arrays
imilar to matrices #ut can 'ave more t'ant/o dimensions
Elements must #e of t'e same type
Created /it' array function)
> yrry - rry(ector,O ien'ion', ine')
◦ vector contains the data for the array
◦ dimensions is a numeric vector giving the maximal
index for each dimension
◦ dimnames % otional list of dimension labels!
Elements in arrays are accessed similar to
t'ose in matrices
-
8/19/2019 02b Data Structures Datasets
40/96
-
8/19/2019 02b Data Structures Datasets
41/96
reate and access arrays (cont.)
display a matri from
elements of A and B for Nrstro/*column of C
> 1 [,,1]
F1 F2 F3
#1 1 3 5
#2 2 4 +
display elements of A for t'e7rd Mro/M of B and -ndro/*columns of C
> 1 [,3,2]
#1 #2
11 12
& display a su#array containg all
elements from Nrst t/oro/s*columns of A, B and C
> 1 [c(1,2),c(1,2),c(1,2)]
, , G1
F1 F2
#1 1 3
#2 2 4
, , G2
F1 F2
#1 7
#2 1.
-
8/19/2019 02b Data Structures Datasets
42/96
Data Fra!es
;ost important data structure in " !at leastfor us(
A data frame is a structure in " t'at 'oldsdata and is similar to t'e datasets found in
standard statistical pac&ages !for eample,A, , and tata( and data#ases
H'e columns are varia#les and t'e ro/s
are o#servationsaria#les can 'ave diPerent types !for
eample, numeric, c'aracter( in t'e samedata frame
-
8/19/2019 02b Data Structures Datasets
43/96
reate an e!pty data ra!e
> 't9ent&gi - t0re('t9ent: = n9eric(),
ne = chrcter(), ge = n9eric(),
'cho8r'hiD = chrcter(),
8&''e''ent = chrcter(),
0in8&gre = n9eric())
> c8''('t9ent&gi)[1] "t0r e"
> 'tr('t9ent&gi)
Qt0r eQ* . o' o0 + ri8e'*
't9ent: * n9 n e * ctor w 6 . 8ee8'*
ge * n9
'cho8r'hiD * ctor w 6 . 8ee8'*
8&''e'' ent* ctor w 6 . 8ee8'*
0in8&gre * n9
-
8/19/2019 02b Data Structures Datasets
44/96
reate a data ra!e ro! ectors
Create t'e vectors
> 't9ent: - c(1, 2, 3, 4, 5)
> ne - c("
-
8/19/2019 02b Data Structures Datasets
45/96
-
8/19/2019 02b Data Structures Datasets
46/96
/seul unctions or displayin$
-
8/19/2019 02b Data Structures Datasets
47/96
/seul unctions or displayin$so!e data ra!e properties
5um#er of o#servations !ro/s(
> nrow('t9ent&gi)[1] 5
5um#er of varia#les !columns(> nco8('t9ent&gi)
[1] +
Bot' t'e num#er of o#servations !ro/s( and varia#les!columns(> i('t9ent&gi)
[1] 5 +
Display t'e names of all t'e varia#les !columns(> ne'('t9ent&gi)
[1] "'t9ent:" "n e" "ge" "'cho8r'hiD""8&''e'' ent" "0in8&gre"
Display t'e names of t'e second, t'ird and fourt'varia#le> ne'('t9ent&gi[2*4])
" " " " " "
-
8/19/2019 02b Data Structures Datasets
48/96
-
8/19/2019 02b Data Structures Datasets
49/96
Selectin$ colu!ns (cont.)elect*display Nrst t/o columns !studentID and
name ( Q ot'er solutions> 't9ent&gi [, c("'t9ent:", "ne")]Using a vector for storing indices of t'e Nrst t/ocolumns
> co8' - c("'t9ent:", "ne")> 't9ent&gi[co8']or
> 't9ent&gi[, ne'('t9ent&gi) ?in? co8']
"eturn MNnalKgradeM varia#le !column( as a vector> 't9ent&gi0in8&gre[1] .. 45 75 .. +..or ... ee on t'e net slide
-
8/19/2019 02b Data Structures Datasets
50/96
Selectin$ colu!ns (cont.)
"eturn MNnalKgradeM varia#le !column( as a vector!cont.(> 't9ent&gi[ , +]or> 't9ent&gi[ , "0in8&gre"]
"eturn MNnalKgradeM varia#le !column( as a one+columndata frame> 't9ent&gi[ , "0in8&gre", roD=#$%!]
0in8&gre
1 ..
2 45
3 75
4 ..
5 +..
S l i
-
8/19/2019 02b Data Structures Datasets
51/96
Selectin$ rows
Display Nrst t/o o#servations !ro/s(
> 't9ent&gi [1*2,] 't9ent: n e ge 'cho8r'hiD
1 1
-
8/19/2019 02b Data Structures Datasets
52/96
attach unctionattach adds t'e data frame to t'e " searc' pat'
> 'erch() [1] "B8o8!n" "too8'*r't9io"[3] "Dcge*'tt'" "Dcge*grDhic'"[5] "Dcge*greice'" "Dcge*9ti8'"[7] "Dcge*t'et'" "Dcge* etho'"[] "#9to8o'" "Dcge*'e"%'en a varia#le name is encountered, data
frames in t'e searc' pat' are c'ec&ed in order tolocate t'e varia#le.
Commands /it'out attac'> 't9ent&gi0in8&gre> t8e ('t9ent&gi8&''e''ent,'t9ent&gi0in8&gre)
> '9ry('t9ent&gi0in8&gre)
-
8/19/2019 02b Data Structures Datasets
53/96
attach s. with H'e same commands using attach
> ttch('t9ent&gi)> 0in8&gre> t8e (8&''e''ent, 0in8&gre)> '9ry(0in8&gre)> D8ot(ge, 0in8&gre)
detach removes an o#ects from t'e searc' pat'> etch('t9ent&gi)
It is advisa#le to use with instead of attach)> with ('t9ent&gi, 0in8&gre)> with ('t9ent&gi, t8e (8&''e''ent,0in8&gre))> with ('t9ent&gi, D8ot(8&''e''ent,
0in8&gre) )
-
8/19/2019 02b Data Structures Datasets
54/96
ase (row) identi'ers
Act li&e primary*uniue &eys in relational ta#les
Can #e speciNed #y rowna!e option /it'in t'edata.ra!e function
%e allocate ne/ values for studentID !to avoidconfusion /it' ro/ num#ers(O t'e remainingvectors are identical
> 't9ent: - c(1..1, 1..2, 1..3, 1..4,1..5)> ne - c("
-
8/19/2019 02b Data Structures Datasets
55/96
ase (row) identi'ers (cont.)A !slig'tly( ne/ version of t'e data frame)
> 't9ent&gi - t0re('t9ent:, ne,ge,O 'cho8r'hiD, 8&''e''ent,O 0in8&gre, rowne' = 't9ent:)student0D is t'e varia#le to use in la#eling cases
on various printouts and grap'ics produced /it'".
display t'e name of t'e ro/s !o#servations(> rowne'('t9ent&gi)
[1] "1..1" "1..2" "1..3" "1..4" "1..5"> 't9ent&gi 't9ent: n e ge 'cho8r'hiD 8&''e'' ent
1..1 1..1
-
8/19/2019 02b Data Structures Datasets
56/96
ase (row) identi'ers (cont.)display t'e name of t'e ro/s !o#servations(
> rowne'('t9ent&gi)[1] "1..1" "1..2" "1..3" "1..4" "1..5"
5otice t'e leftmost column of t'e data framedisplay
> 't9ent&gi 't9ent: n e ge 'cho8r'hiD 8&''e'' ent1..1 1..1
-
8/19/2019 02b Data Structures Datasets
57/96
ase (row) identi'ers (cont.)Display t'e o#servation !ro/( corresponding to
student 0anos 1. Adriana using 'er caseidentiNer !M-M(> 't9ent&gi["1..2",] 't9ent: n e ge 'cho8r'hiD 8&''e'' ent
1..2 1..2 :no' N #rin 1 %t9i91 orte ine
0in8&gre
1..2 45
Display t'e o#servations corresponding tostudents Ianos %. Adriana and op . Ion usingt'eir case identiNer !M-M and M?M(
> 't9ent&gi[c("1..2", "1..5"),] 't9ent: n e ge 'cho8r'hiD 8&''e'' ent
1..2 1..2 :no' N #rin 1 %t9i91 orte ine
1..5 1..5
-
8/19/2019 02b Data Structures Datasets
58/96
Factors (reprise) In presentation 23a, varia#les /ere descri#ed as
nominal, ordinal, interval, and ratioo!inal varia#les are categorical, /it'out an
implied order. Eamples) ;aritaltatus, e, Go#,;asterrogramme
4rdinal varia#les imply order #ut not amount.Eamples) tatus !poor, improved, ecellent (,@a#Assessment !sla#, #ine, foarteBine, ecelent(
0nteral and Ratio varia#les can ta&e on any
value /it'in some range, and #ot' order andamount are implied. Eamples) @iterser8m,$eig't, %eig't, Final2rade !/it' decimals(
Categorical !nominal( and ordered categorical
!ordinal( varia#les are called actors.
Function factor
-
8/19/2019 02b Data Structures Datasets
59/96
Function factor Factors determine 'o/ data /ill #e analyzed and
presented visually H'e function actor() stores t'e categorical
values as a vector of integers in t'e range S... 5 T!/'ere 5 is t'e num#er of uniue values in t'enominal varia#le(, and an internal vector ofc'aracter strings !t'e original values( mapped tot'ese integers
Initially vector sc'olars'ip is a nominal varia#le> 'cho8r'hiD - c("%oci8", "%t9i91",
"%t9i92", O "Cerit", "%t9i91") 5o/ it /ill #e converted into a factor)> 'cho8r'hiD&0 - 0ctor('cho8r'hiD)> 'cho8r'hiD&0
[1] %oci8 %t9i91 %t9i92 C erit %t9i91$ee8'* C erit %oci8 %t9i91 %t9i92
-
8/19/2019 02b Data Structures Datasets
60/96
Factors in data ra!es
-
8/19/2019 02b Data Structures Datasets
61/96
Factors in data ra!es
"e+create t'e data frame using factors
> 't9ent: - c(1..1, 1..2, 1..3, 1..4, 1..5)> ne - c("
-
8/19/2019 02b Data Structures Datasets
62/96
Factors in data ra!es (cont.)Anot'er version of t'e data frame
> 't9ent&gi - t0re(ne, ge,'cho8r'hiD,O 8&''e''ent, 0in8&gre,O rowne' = 't9ent:)
Display t'e structure of t'e data frame> 'tr('t9ent&gi)Qt0r eQ*5 o' o0 5 ri8e'* n e * ctor w 6 5 8ee8' "Fg : C ri",* 52 3 1 4 ge * n9 23 1 21 22 31 'cho8r'hiD * ctor w 6 4 8ee8' "C erit","%oci8",* 2 34 1 3 8&''e'' ent* ; r0ctor w 6 4 8ee8'"%8" "Fine" * 2 3 4 2 1
0in8&gre * n9 45 75 +
Factors in data ra!es (cont )
-
8/19/2019 02b Data Structures Datasets
63/96
Factors in data ra!es (cont.)Basic statistics a#out varia#les in data frame
> '9ry('t9ent&gi) n e ge 'cho8r'hiD Fg : C ri *1 C in *1. C erit *1:no' N #rin *1 1't L 9*21. %oci8 *1Jocz M :o'e0 *1 C ein *22. %t9i91*2
-
8/19/2019 02b Data Structures Datasets
64/96
Factors and alue la"els
> Dtient: - c(1, 2, 3, 4)
> ge - c(25, 34, 2, 52)> iete' - c("TyDe1", "TyDe2", "TyDe1","TyDe1")> 'tt9' - c("
-
8/19/2019 02b Data Structures Datasets
65/96
Factors and alue la"els (cont.)
For $ender, la#els !instead of of values( are displayed
> Dtientt Dtient: ge iete' 'tt9' gener1 1 25 TyDe1
-
8/19/2019 02b Data Structures Datasets
66/96
-
8/19/2019 02b Data Structures Datasets
67/96
-
8/19/2019 02b Data Structures Datasets
68/96
Matrices and lists
-
8/19/2019 02b Data Structures Datasets
69/96
Matrices and lists
;atri dimension names !dimnames( o#ect is a list
> 3 - trix(ce88', nrow=2, nco8=2,O yrow=#$%!,
O ine'=8i't(rowne', co8ne'))
> 3
Go81 Go82
Row1 1 24
Row2 2+ +
> ine'(3)
[[1]]
[1] "Row1" "Row2"[[2]]
[1] "Go81" "Go82"
> 9n8i't(ine'(3))
[1] "Row1" "Row2" "Go81" "Go82"
reatin$ and displayin$ si!ple lists
-
8/19/2019 02b Data Structures Datasets
70/96
reatin$ and displayin$ si!ple lists
Create t/o simple lists
> 8i't1 = 8i't ("9n9", "oi", "trei")
> 8i't2 = 8i't( c("oi", "trei", "Dtr9"))
izualizing lists
> 8i't1
[[1]]
[1] "9n9"
[[2]]
[1] "oi"
[[3]]
[1] "trei"
> 8i't2
[[1]]
reate a !ore co!ple% list
-
8/19/2019 02b Data Structures Datasets
71/96
reate a !ore co!ple% listlist.: contains t/o previous lists, a vector !seuence( and a dataframe)
> 8i't3 = 8i't (8i't1, 8i't2, 3*7, Dtientt)> 8i't3
[[1]]
[[1]][[1]]
[1] "9n9"
[[1]][[2]]
[1] "oi"[[1]][[3]]
[1] "trei"
[[2]]
[[2]][[1]]
[1] "oi" "trei" "Dtr9"
[[3]][1] 3 4 5 + 7
[[4]]
Dtient: ge iete' 'tt9' gener
1 1 25 TyDe1
-
8/19/2019 02b Data Structures Datasets
72/96
reate a !ore co!ple% list (cont.)
Display t'e structure of list.:)> 'tr(8i't3)
$i't o0 4
*$i't o0 3
* chr "9n9"
* chr "oi"
* chr "trei" *$i't o0 1
* chr [1*3] "oi" "trei" "Dtr9"
* int [1*5] 3 4 5 + 7
*Qt0reQ* 4 o' o0 5 ri8e'*
Dtient:* n9 [1*4] 1 2 3 4 ge * n9 [1*4] 25 34 2 52
iete' * ctor w6 2 8ee8' "TyDe1","TyDe2"* 1 2 1 1
'tt9' * ;r0ctor w6 3 8ee8'"!xce88ent"":Droe"* 3 2 1 3
gener * ctor w6 2 8ee8' "8e","0e8e"* 1 2 2 1
Accessin$ list co!ponents
-
8/19/2019 02b Data Structures Datasets
73/96
Accessin$ list co!ponents
Display t'e num#er of o#ects in a list> 8ength(8i't3)
[1] 4
Access t'e Nrst o#ect of t'e list> 8i't3[[1]]
[[1]][1] "9n9"
[[2]]
[1] "oi"
[[3]]
[1] "trei"
> c8''(8i't3[[1]])
[1] "8i't"
Accessin$ list co!ponents (cont)
-
8/19/2019 02b Data Structures Datasets
74/96
Accessin$ list co!ponents (cont)
Access t'e second component of t'e list
> 8i't3[[2]][[1]]
[1] "oi" "trei" "Dtr9"
> c8''(8i't3[[2]])
[1] "8i't"
... and t'e fourt' component> 8i't3[[4]]
Dtient: ge iete' 'tt9' gener
1 1 25 TyDe1
-
8/19/2019 02b Data Structures Datasets
75/96
6ist co!ponent attri"utes+na!es
Function na!es display t'e names of
designated components of a list
H'e Nrst o#ect of list.: is a list /'osecomponents 'ave no name)
> ne'(8i't3[[1]])
K$$
H'e fourt' o#ect of list.: is a data framecalled patientdataO t'is data frame 'ave fourvaria#les !columns( /'ose names can #edisplayed /it' function na!es)
> ne'(8i't3[[4]])
[1] "Dtient:" "ge" "iete'" "'tt9'"
"gener"
Accessin$ co!ponents within co!ponents
-
8/19/2019 02b Data Structures Datasets
76/96
Accessin$ co!ponents within co!ponents
Display t'e t'ird o#ect /it'in t'e Nrst component in list.:> 8i't3[[1]][[3]]
[1] "trei"
Display, in t'e data frame patientdata !t'e data frame ist'e >t' component of t'e list( t'e values of column a$e !t'iscolumn is t'e -nd of t'e data frame(
8i't3[[4]][, 2] or > 8i't3[[4]][, "ge"]
[1] 25 34 2 52Display a$e as a column !not a vector(> 8i't3[[4]][, "ge", roD=#$%!]
ge
1 25
2 34
3 2
4 52
Display age of t'e t'ird patient> 8i't3[[4]][, 2][3]
> 8i't3[[4]][, "ge", roD=#$%!]ge[3]
[1] 2
-
8/19/2019 02b Data Structures Datasets
77/96
/ni-di!ensional ta"les
-
8/19/2019 02b Data Structures Datasets
78/96
/ni di!ensional ta"lesCreate a ta#le /it' freuencies of scholarship in data frame
student$i
> t8e1 - with('t9ent&gi, t8e('cho8r'hiD))> t8e1
'cho8r'hiD
Cerit %oci8 %t9i91 %t9i92
1 1 2 1Display structure of ta#le.
> 'tr(t8e1)
Qt8eQ int [1*4(1)] 1 1 2 1
- ttr(P, "ine'")=$i't o0 1
'cho8r'hiD* chr [1*4] "Cerit" "%oci8" "%t9i91""%t9i92"
> c8''(t8e1)[1] "t8e"
Unidimensional ta#les are vectors /it' la#eled elements !eac'
elementRs la#el is a value of t'e attri#ute used in function ta"le(
> ne'(t8e1)
[1] "Cerit" "%oci8" "%t9i91" "%t9i92"
Access+display uni-di!ensional ta"les
-
8/19/2019 02b Data Structures Datasets
79/96
+ p y
ta"les.* is not a data frame, so /e cannot ualify t'e varia#le using
...> t8e1Cerit!rror in t8e1Cerit * oDertor i' in8i 0or toic ector'
... #ut /e can access /it' vector indices> t8e1[1]
Cerit
1... or list indices> t8e1[[1]]
[1] 1
Display #ot' la#el and t'e of t'e 7rd element in ta#le ta"le.*)
> t8e1[3]%t9i91
2
... or
> 9n8i't(t8e1)[3]
%t9i91
2
-
8/19/2019 02b Data Structures Datasets
80/96
Bi-di!ensional ta"les
-
8/19/2019 02b Data Structures Datasets
81/96
Bi di!ensional ta"les
imilar to pivot ta#les in Ecel
Create a contingency !pivot( ta#le /it' freuencies ofscholarship #y la"assess!ent> t8e2 - with('t9ent&gi, t8e('cho8r'hiD, 8&''e''ent))
> t8e2
8&''e'' ent
'cho8r'hiD %8 Fine orte ine !xce8ent
C erit . 1 . . %oci8 . 1 . .
%t9i91 1 . 1 .
%t9i92 . . . 1
tructure of ta"le.3
> 'tr(t8e2) Qt8eQ int [1*4, 1*4] . . 1 . 1 1 . . . .
- ttr(P, "i n e'")= $i't o0 2
'cho8r'hiD * chr [1*4] "C erit" "%oci8" "%t9i91" "%t9i92"
8&''e'' ent* chr [1*4] "%8" "Fine" "orte ine" "!xce8ent"
> c8''(t8e2)
[1] "t8e"
Accessin$ "i-di!ensional ta"les
-
8/19/2019 02b Data Structures Datasets
82/96
ccess $ " d e s o a a" es
Any cell can #e accessed using indices of ro/ and column...
> t8e2[1,2]
[1] 1
... or t'e names*la#els
> t8e2["Cerit", "Fine"]
[1] 1
Display t'e second column !associated /it' value Bine of
la"assess!ent( as a vector using t'e inde !-(...
> t8e2[, 2]
C erit %oci8 %t9i91 %t9i92
1 1 . .
... or t'e name of t'e column !Bine(
> t8e2[, "Fine"]
C erit %oci8 %t9i91 %t9i92
1 1 . .
Accessin$ "i-di!ensional ta"les (cont )
-
8/19/2019 02b Data Structures Datasets
83/96
Accessin$ "i-di!ensional ta"les (cont.)
imilarly, one can access individual !or group of( ro/s
Access particular ro/s and columns in a ta#le
> t8e2[c("Cerit", "%t9i91"), c("%8", "!xce8ent")]
8&''e'' ent
'cho8r'hiD %8 !xce8ent
C erit . .
%t9i91 1 .
Tri-di!ensional ta"les
-
8/19/2019 02b Data Structures Datasets
84/96
Create a t'ree+dimensional ta#le /it' freuencies of scholarship #y
la"assess!ent #y 'nal$rade
> t8e3 - with('t9ent&gi, t8e('cho8r'hiD, 8&''e''ent,0in8&gre))
Display ta"le.:> t8e3
, , 0in8&gre = +
8&''e'' ent
'cho8r'hiD %8 Fine orte ine !xce8ent
C erit . . . .
%oci8 . . . .
%t9i91 1 . . .
%t9i92 . . . .
, , 0in8&gre =
8&''e'' ent
'cho8r'hiD %8 Fine orte ine !xce8ent
C erit . 1 . .
%oci8 . 1 . .
%t9i91 . . . .
%t9i92 . . . .
Tri-di!ensional ta"les (cont.)
-
8/19/2019 02b Data Structures Datasets
85/96
( )
Display ta"le.: (cont.)
, , 0in8&gre = 45
8&''e'' ent
'cho8r'hiD %8 Fine orte ine !xce8ent
C erit . . . .
%oci8 . . . . %t9i91 . . 1 .
%t9i92 . . . .
, , 0in8&gre = 75
8&''e'' ent
'cho8r'hiD %8 Fine orte ine !xce8ent
C erit . . . .
%oci8 . . . .
%t9i91 . . . .
%t9i92 . . . 1
ta"le
-
8/19/2019 02b Data Structures Datasets
86/96
ta"le improves t'e display of t'ree+dimensional ta#les> 0t8e(t8e3)
0in8&gre + 45 75'cho8r'hiD 8&''e'' ent
C erit %8 . . . .
Fine . 1 . .
orte ine . . . .
!xce8ent . . . .
%oci8 %8 . . . .
Fine . 1 . .
orte ine . . . .
!xce8ent . . . .
%t9i91 %8 1 . . .
Fine . . . . orte ine . . 1 .
!xce8ent . . . .
%t9i92 %8 . . . .
Fine . . . .
orte ine . . . .
!xce8ent . . . 1
Accessin$ three-di!ensional ta"les
-
8/19/2019 02b Data Structures Datasets
87/96
Accessin$ three di!ensional ta"les
Any cell can #e accessed using indices of t'e t'ree aes...
> t8e3[3, 3, 3][1] 1
... or t'e names*la#els> t8e3["%t9i92", "!xce8ent", "75"][1] 1
Display, as an one+dimensional ta#le, t'e values of t'e
la"assess!ent /'ic' corespond to value Studiu2 !>t'( of
scholarship and t'e value 9.75 !>t'( of 'nal$rade
◦ one can use the indexes !!!
> t8e3[4, , 4] %8 Fine orte ine !xce8ent. . . 1
◦ !!! or the label(names
> t8e3[ "%t9i92", , "75" ] %8 Fine orte ine !xce8ent
. . . 1
Accessin$ three-di!ensional ta"les (cont.)
-
8/19/2019 02b Data Structures Datasets
88/96
Accessin$ three di!ensional ta"les (cont.)
Display, as a #i+dimensional ta#le, t'e values of t'e Nrst
!scholarship( and t'e t'ird !'nal$rade( aes associated /it't'e >t' value !Excelent ( of t'e second ais !la"assess!ent(
◦ one can use the index!!!
> t8e3[, 4, ] 0in8&gre
'cho8r'hiD + 45 75 C erit . . . .
%oci8 . . . .
%t9i91 . . . .
%t9i92 . . . 1
◦ !!! or the label(name
> t8e3[, "!xce8ent", ] 0in8&gre
'cho8r'hiD + 45 75
C erit . . . .
%oci8 . . . .
%t9i91 . . . .
%t9i92 . . . 1
-
8/19/2019 02b Data Structures Datasets
89/96
Built-in datasets
-
8/19/2019 02b Data Structures Datasets
90/96
Built in datasets
ome datasets are availa#le in #ase !core( " !e.g. fait'ful(
> he(0ith098, 3) er9Dtion' w iting
1 3+.. 7
2 1.. 54
3 3333 74
;ost data sets are availa#le in pac&ages !e.g. ggplot-, vcd,...(
In most cases, data sets are stored as data frames, e.g.
t'e dataset !oies from pac&age $$plot3
Every pac&age must #e installed !once per computer(> in't88Dcge'("ggD8ot2")
After installation, a pac&age must #e loaded !once for
every "tudio session(
> 8irry(ggD8ot2)
Built-in datasets (cont.)
-
8/19/2019 02b Data Structures Datasets
91/96
Built in datasets (cont.)
Display t'e structure of dataset !oies
> 'tr(oie')Qt0r eQ*57 o' o0 24 ri8e'*
tit8e * chr "" "1... To9chow n" "21 y ; nce C onth" "4.,..."
yer * int 171 13 141 1+ 175 2... 2..22..2 17 117
8ength * int 121 71 7 7. 71 1 3 25 7 +1
9get * int K# K# K# K# K# K# K# K# K# K#
rting * n9 +4 + 2 2 34 43 53 +7 ++ +
ote' * int 34 2. 5 + 17 45 2.. 24 1 51 r1 * n9 45 . . 145 245 45 45 45 45 45
r2 * n9 45 145 . . 45 45 . 45 45 .
r3 * n9 45 45 . . . 45 45 45 45 45
Built-in dataset stored as ta"le
-
8/19/2019 02b Data Structures Datasets
92/96
Built in dataset stored as ta"leData set ;airyeolor in pac&age cd is stored as
t'ree+dimensional ta#le !httD*66crn9'r-DroEectorg6w e6Dcge'6c!xtr6ignette'6c-
t9tori8D0)> in't88Dcge'("c")
> 8irry(c)
> he(Hir!yeGo8or)
[1] 32 53 1. 3 11 5.
> 'tr(Hir!yeGo8or)
t8e [1*4, 1*4, 1*2] 32 53 1. 3 11 5. 1. 3. 1. 25
- ttr(P, "i n e'")= $i't o0 3 Hir* chr [1*4] "F8c" "Frow n" "Re" "F8on"
!ye * chr [1*4] "Frow n" "F89e" "Hze8" "Breen"
%ex * chr [1*2] "C 8e" "e 8e"
> c8''(Hir!yeGo8or)
[1] "t8e"
8ac5a$e datasets
-
8/19/2019 02b Data Structures Datasets
93/96
8ac5a$e datasets" 'as a special pac&age called datasets
> 8irry(t'et')function data displays all t'e datasets in t'is pac&age> t()
isualize all t'e data sets availa#le in all pac&ages)
> t(Dcge = Dcge'(88i88e =TR!))
Display t'e datasets availa#le in pac&age $$plot3> try(t(Dcge = "ggD8ot2") )
...or> t(Dcge = "ggD8ot2")re'98t'
A list !made in --( of all datasets in " is availa#le at
'ttp)**///.pu#lic.iastate.edu*W'ofmann*data in r sor
http://www.public.iastate.edu/~hofmann/data_in_r_sortable.htmlhttp://www.public.iastate.edu/~hofmann/data_in_r_sortable.htmlhttp://www.public.iastate.edu/~hofmann/data_in_r_sortable.html
-
8/19/2019 02b Data Structures Datasets
94/96
Data structures conersion (cont.)
-
8/19/2019 02b Data Structures Datasets
95/96
Data structures conersion (cont.)Convert matri !. &to&01 - 't0re(4)> &to&01
co81 co82 co83 co8tot8
row1 1 2 3 +
row2 4 5 + 15
row3 7 24
row4 1. 11 12 33
rowtot8 22 2+ 3. 7
> 'tr(&to&01)
Qt0reQ* 5 o' o0 4 ri8e'* co81 * n9 1 4 7 1. 22
co82 * n9 2 5 11 2+
co83 * n9 3 + 12 3.
co8tot8* n9 + 15 24 33 7
Data structures conersion (cont.)
-
8/19/2019 02b Data Structures Datasets
96/96
Data structures conersion (cont.)Convert a ta#le into a data frame> t8e&to&t0re =t0re(9n8i't(Hir!yeGo8or))
> he(t8e&to&t0re, 3)
Hir !ye %ex re/
1 F8c Frown C8e 32
2 Frown Frown C8e 53
3 Re Frown C8e 1.
Convert a list into a data frame> 0 - t0re(trix(9n8i't(8i't1), nrow=132,
O yrow=T))
> he(0,3) trix9n8i't8i't1nrow132yrowT
1 9n9
2 oi
top related