package ‘sciencespo’
TRANSCRIPT
Package ‘SciencesPo’August 5, 2016
Type Package
Title A Tool Set for Analyzing Political Behavior Data
Date/Publication 2016-08-05 00:24:29
Version 1.4.1
Date 2016-08-02
Maintainer Daniel Marcelino <[email protected]>
Description Provide functions for analyzing elections and political behaviordata, including measures of political fragmentation, seat apportionment, andsmall data visualization graphs.
URL http://CRAN.R-project.org/package=SciencesPo
License GPL (>= 2)
Encoding UTF-8
Depends R (>= 3.2.0), stats, utils, graphics, grDevices, methods,ggplot2 (>= 2.1.0)
Imports Rcpp (>= 0.12.3), pander (>= 0.6.0), data.table (>= 1.9.4),grid (>= 3.0.0), rstudioapi, htmltools, lubridate, stringr,xtable, shiny
Suggests testthat, devtools, scales, extrafont, gtable, knitr
LinkingTo Rcpp
VignetteBuilder knitr
NeedsCompilation yes
LazyData TRUE
Repository CRAN
BugReports http://github.com/danielmarcelino/SciencesPo/issues
ByteCompile TRUE
RoxygenNote 5.0.1
1
2 R topics documented:
Collate 'Anonymize.R' 'Atkinson.R' 'Bootstrap.R' 'CI.R''CMDcheckVars.R' 'CalibratePlot.R' 'Clear.R' 'Condorcet.R''CountMissing.R' 'CrossValidation.R' 'Crosstable.R' 'DATA.R''DEPRECATED.R' 'Describe.R' 'Dirichlet.R' 'Dummy.R''Frequency.R' 'Geoms.R' 'Gini.R' 'Herfindahl.R''HighestAverages.R' 'Jackknife.R' 'Label.R''LargestRemainders.R' 'Lorenz.R' 'MOMENTS.R' 'MeanFromRange.R''Normalize.R' 'Operators.R' 'Outlier.R' 'Palettes.R''Permutate.R' 'PoliticalDiversity.R' 'Proportionality.R''RcppExports.R' 'Recode.R' 'Rosenbluth.R' 'RunShinyApp.R''ScatterPlot.R' 'SciencesPo.R' 'Stratify.R' 'TESTS.R''Tableplot.R' 'Ternaryplot.R' 'Themes.R' 'Transform.R''UTILS.R' 'Untable.R' 'Voronoi.R' 'Waffleplot.R' 'Winsorize.R''flag.R' 'print.SciencesPo.R' 'zTree.R' 'zzz.R'
Author Daniel Marcelino [aut, cre]
R topics documented:alpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Anonymize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Atkinson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6auto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Bartels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8baseball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10bhodrick93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10big5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11bollen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13bush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Calibrateplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14cathedrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15cgreene76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16chismoke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Clear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Condorcet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19CountMissing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Crosstable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20CrossValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21ddirichlet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Describe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23dHondt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Dummify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25election2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Footnote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Forge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
R topics documented: 3
Formatted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29freq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31galton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32GeomSpotlight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32geom_dumbbell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33geom_lollipop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Gini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37GiniSimpson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38griliches76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Hamilton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Herfindahl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41HighestAverages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Jackknife . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44JamesStein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Label<- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47LargestRemainders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Lorenz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50ltaylor96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51MakeNumeric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51MakeShare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52marriage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52MeanFromRange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53mishkin92 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54nerlove63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Normalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Outlier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57paired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Palettes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59party_pal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Permutate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61PoliticalDiversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63pollster2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Proportionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66pub_pal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68pub_shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69rdirichlet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70read.zTree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Recode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72religiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Render . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Rosenbluth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74RunShinyApp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75scale_color_party . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75scale_color_pub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 alpha
scale_fill_party . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76scale_fill_pub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77scale_shape_pub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Scatterplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78SciencesPo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79SciencesPo-deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79SciencesPoFont . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80sheston91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82stature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Stratify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84summary.Crosstable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85swatson93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Tableplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Ternaryplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86theme_blank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87theme_darkside . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88theme_fte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88theme_pub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89theme_scipo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90titanic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93turnout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94TwoWay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Untable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Voronoi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97vote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Waffleplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98WeightedCorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99WeightedMean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100WeightedStdz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101WeightedVariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Winsorize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103%nin% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Index 105
alpha A dataset that contains four test items
Description
A dataset with four test items used in SPSS to to compute Cronbach’s alpha.
• q1:q4 Numeric items of a scale.
Anonymize 5
Usage
data(alpha)
Format
A data.frame object with 4 variables and 60 observations.
Anonymize To Make Anonymous a Data Frame
Description
Replaces factor and character variables by a combination of random sampled letters and numbers.Numeric columns are also transformed, see details.
Usage
Anonymize(.data, col.names = "V", row.names = "")
## Default S3 method:Anonymize(.data, col.names = "V", row.names = "")
Arguments
.data A vector or a data frame.
col.names A string for column names.
row.names A string for rown names, default is empty.
Details
Strings will is replaced by an algorithm with 52/102 chance of choosing a letter and 50/102 chanceof choosing a number; then it joins everything in a 5-digits long character string.
Value
An object of the same type as .data.
Author(s)
Daniel Marcelino, <[email protected]>.
6 Atkinson
Examples
dt <- data.frame(Z = sample(LETTERS,10),X = sample(1:10),Y = sample(c("yes", "no"), 10, replace = TRUE))dt;
Anonymize(dt)
Atkinson Atkinson Index of Inequality
Description
Calculates the Atkinson index A. This inequality measure is especially good at determining whichend of the distribution is contributing most to the observed inequality.
Usage
Atkinson(x, n = rep(1, length(x)), epsilon = NULL, na.rm = FALSE, ...)
## Default S3 method:Atkinson(x, n = rep(1, length(x)), epsilon = NULL,
na.rm = FALSE, ...)
Arguments
x a vector of data values of non-negative elements.
n a vector of frequencies of the same length as x.
epsilon a parameter of the inequality measure (if NULL, the default parameter (0.5) of therespective measure is used).
na.rm logical. Should missing values be removed? Defaults is set to FALSE.
... additional arguements (currently ignored)
Details
epsilon = 0,5: little inequality aversion epsilon = 1,0: medium inequality aversion epsilon = 2,0:great inequality aversion
References
Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Hand-book of Income Distribution. Amsterdam.
Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.
auto 7
See Also
Herfindahl, Rosenbluth, Gini. For more details see the “Indices” vignette.
Examples
if (interactive()) {# generate a vector (of incomes)# y <- c(80, 60, 10, 20, 30)# Entropy 1.392321# Maximum Entropy 1.609438# Normalized Entropy 0.865098# Exponential Index 0.248498# Herfindahl 0.285000# Normalized Herfindahl 0.106250# Gini Coefficient 0.360000# Concentration Coefficient 0.450000
x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)
# compute Atkinson coefficient with epsilon=0.5Atkinson(x, epsilon=0.5)
w <- c(10, 15, 20, 25, 40, 20, 30, 35, 45, 90)}
auto Statistics of 1979 automobile models
Description
The data give the following statistics for 74 automobiles in the 1979 model year as sold in the US.
Usage
data(auto)
Format
A data.frame object with 14 variables and 74 observations.
Model Make and model of car.
Origin a factor with levels A,E,J
Price Price in dollars.
MPG Miles per gallon.
Rep78 Repair record for 1978 on 1 (worst) to 5 (best) scale.
Rep77 Repair record for 1978 on 1 to 5 scale.
8 Bartels
Hroom Headroom in inches.
Rseat Rear seat clearance in inches.
Trunk Trunk volume in cubic feet.
Weight Weight in pounds.
Length Length in inches.
Turn Turning diameter in feet.
Displa Engine displacement in cubic inches.
Gratio Gear ratio for high gear.
Source
This data frame was created from http://euclid.psych.yorku.ca/ftp/sas/sssg/data/auto.sas.
References
Originally published in Chambers, Cleveland, Kleiner, and Tukey, Graphical Methods for DataAnalysis, 1983, pages 352-355.
Bartels Bartels Rank Test of Randomness
Description
Performs Bartels rank test of randomness. The default method for testing the null hypothesis of ran-domness is two.sided. By using the alternative left.sided, the null hypothesis is tested againsta trend. By using the alternative right.sided, the null hypothesis of randomness is tested againsta systematic oscillation in the observed data.
Usage
Bartels(x, alternative = "two.sided", pvalue = "normal")
## Default S3 method:Bartels(x, alternative = "two.sided", pvalue = "normal")
Arguments
x a numeric vector of data values.
alternative a character string for hypothesis testing method; must be one of two.sided(default), left.sided or right.sided.
pvalue A method for asymptotic aproximation used to compute the p-value.
Bartels 9
Details
Missing values are by default removed.
The RVN test statistic is
RV N =
∑n−1i=1 (Ri −Ri+1)
2∑ni=1 (Ri − (n+ 1)/2)
2
where Ri = rank(Xi), i = 1, . . . , n. It is known that (RV N − 2)/σ is asymptotically standardnormal, where σ2 = 4(n−2)(5n2−2n−9)
5n(n+1)(n−1)2 .
Value
statistic The value of the RVN statistic test and the theoretical mean value and varianceof the RVN statistic test.
n the sample size, after the remotion of consecutive duplicate values.
p.value the asymptotic p-value.
method a character string indicating the test performed.
data.name a character string giving the name of the data.
alternative a character string describing the alternative.
References
Bartels, R. (1982). The Rank Version of von Neumann’s Ratio Test for Randomness, Journal of theAmerican Statistical Association, 77(377), 40-46.
Gibbons, J.D. and Chakraborti, S. (2003). Nonparametric Statistical Inference, 4th ed. (pp. 97-98).URL: http://books.google.pt/books?id=dPhtioXwI9cC&lpg=PA97&ots=ZGaQCmuEUq
Examples
# Example 5.1 in Gibbons and Chakraborti (2003), p.98.# Annual data on total number of tourists to the United States for 1970-1982.years <- 1970:1982tourists <- c(12362, 12739, 13057, 13955, 14123, 15698, 17523,18610, 19842, 20310, 22500, 23080, 21916)
# See it graphicallyqplot(factor(years), tourists)+ geom_point()
# Test the null against a trendBartels(tourists, alternative="left.sided", pvalue="beta")
10 bhodrick93
baseball Bradley Efron and Carl Morris ("Stein’s Paradox in Statistics)
Description
A dataset used in Efron and Morris’s 1977 paper, "Stein’s Paradox in Statistics".
• nameFirst and last name of selected player.• atBatsnumber of times at bats.• hitsnumber of hits after 45 at bats.• avg45is the average after 45 at bats.• remainingAtBatsremaining number at bats.• remainingAvgis the remaining average to the end of the season.• seasonAtBatsis the number of times at bats in the end of the season.• seasonHitsis the number of hits in the end of the season.• avgSeasonis the end of the season average.
Usage
data(baseball)
Format
A data.frame object with 9 variables and 18 observations.
bhodrick93 Bekaert’s and Hodrick’s (1993) Data
Description
Data set used by Bekaert and Hodrick (1993) on biases in the measurement of foreign exchangerisk premiums. This dataset contains the following columns:
• date A character vector for date.• jyspot Price of USD in JY, spot.• jyfwd Price of USD in JY, 30-day forward.• jys30 Price of USD in JY, spot market at 30-day forward deliver/date.• dmspot Price of USD in DM, spot.• dmfwd Price of USD in DM, 30-day forward• dms30 Price of USD in DM, spot market at 30-day forward deliver/date.• bpspot Price of USD in BP, spot.• bpfwd Price of USD in BP, 30-day forward.• bps30 Price of USD in BP, spot market at 30-day forward deliver/date.• quote A numeric vector.
big5 11
Usage
data(bhodrick93)
Format
A data.frame object with 11 variables and 778 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/
References
Bekaert, G., and Hodrick, R. J. (1993) On biases in the measurement of foreign exchange riskpremiums. Journal of International Money and Finance, 12(2), 115-138.
big5 Big 5 dataset
Description
This is a dataset of the Big 5 personality traits. It is a measurement of the Dutch translation of theNEO-PI-R on 500 first year psychology students (Dolan, Oort, Stoel, Wicherts, 2009). The testconsists of fifty items that you must rate on how true they are about you on a five point scale where1=Disagree, 3=Neutral and 5=Agree. It takes most people 3-8 minutes to complete.
• NeuroticismThe level of neuroticism
• ExtraversionThe level of extraversion
• OpennessThe level of openness
• AgreeablenessThe level of agreeableness
• ConscientiousnessThe level of conscientiousness
Usage
data(big5)
Format
A data.frame object with 5 variables and 500 observations.
12 bollen
References
Hoekstra, H. A., Ormel, J., & De Fruyt, F. (2003). NEO-PI-R/NEO-FFI: Big 5 persoonlijkhei-dsvragenlijst. Handleiding Manual of the Dutch version of the NEO-PI-R/NEO-FFI. Lisse, TheNetherlands: Swets and Zeitlinger.
Dolan, C. V., Oort, F. J., Stoel, R. D., & Wicherts, J. M. (2009). Testing measurement invari-ance in the target rotates multigroup exploratory factor model. Structural Equation Modeling, 16,295<96>314.
bollen Bollen Data on Industrialization and Political Democracy
Description
The data were reported in Bollen (1989, p. 428, Table 9.4) This set includes data from 75 developingcountries each assessed on four measures of democracy measured twice (1960 and 1965), and threemeasures of industrialization measured once (1960).
• y1Freedom of the press, 1960
• y2Freedom of political opposition, 1960
• y3Fairness of elections, 1960
• y4Effectiveness of elected legislature, 1960
• y5Freedom of the press, 1965
• y6Freedom of political opposition, 1965
• y7Fairness of elections, 1965
• y8Effectiveness of elected legislature, 1965
• x1GNP per capita, 1960
• x2Energy consumption per capita, 1960
• x3Percentage of labor force in industry, 1960
@references Bollen, K. A. (1979). Political democracy and the timing of development. AmericanSociological Review, 44, 572-587.
Bollen, K. A. (1980). Issues in the comparative measurement of political democracy. AmericanSociological Review, 45, 370-390.
Usage
data(bollen)
Format
A data.frame object with 11 variables and 75 observations.
Bootstrap 13
Bootstrap Method for Bootstrapping
Description
this method provides bootstrapping statistics.
Usage
Bootstrap(x, ...)
## Default S3 method:Bootstrap(x, nboots = 30, FUN, ...)
## S3 method for class 'model'Bootstrap(x, ...)
Arguments
x is a vector or a fitted model object whose parameters will be used to producebootstrapped statistics. Model objects are assumed to be of class “glm” or “lm”.
nboots an integer for the number of reiteration.
FUN a statistic function name to bootstrap, i.e., ’mean’, ’var’, ’cov’, etc.
... further arguments passed to or used by other methods.
Value
A list with “alpha” and “beta” slots. The “alpha” corresponds to ancillary parameters and “beta” tosystematic components of the model.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
x = runif(10, 0, 1)Bootstrap(x, FUN=mean)
14 Calibrateplot
bush Approval Ratings for President George W. Bush
Description
Approval ratings for George W. Bush.
Usage
data(bush)
Format
A data.frame object with 5 variables and 270 observations.
• start.date. Start date of the survey.• end.date. End date of the survey.• approve. Percent which approve of the president.• disapprove. Percent which disapprove of the president.• undecided. Percent undecided about the president.
Details
A data set of approval ratings of George Bush over the time of his presidency, as reported by severalagencies. Most polls were of size approximately 1,000 so the margin of error is about 3 percentagepoints. @source http://www.pollingreport.com/BushJob.htm
Calibrateplot Calibrate Plot
Description
Plots a calibration plot for overlaping observed versus actual values.
Usage
Calibrateplot(.data = NULL, x, y, ci = 0.95,xlab = "Forecast Probability", ylab = "Observed Frequency")
Arguments
.data the data frame.x, y are the forecast and observed data vectors.ci is the size of the confidence intervalxlab a character name for horizontal axis.ylab a character name for vertical axis.
cathedrals 15
Author(s)
Daniel Marcelino
Examples
data <- data.frame(x=c(10,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100),y=c(0,05,10,20,25,30,40,50,60,70,75,80,95,100,100,100,100,100))
Calibrateplot(data, x, y)
cathedrals Cathedrals
Description
Heights and lengths of Gothic and Romanesque cathedrals. This dataset contains the followingcolumns:
• Type Romanesque or Gothic.
• Height Total height, feet.
• Length Total length, feet.
@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
Usage
data(cathedrals)
Format
A data.frame object with 3 variables and 25 observations.
16 cgreene76
cgreene76 Christensen’s and Greene’s (1976) Data
Description
Data set used by Christensen and Greene (1976) on economies of scale in US electric power gener-ation. This dataset contains the following columns:
• firmid Observation id.
• costs Costs in 1970, MM USD.
• output Output, million KwH.
• plabor Price of labor.
• pkap Price of capital.
• pfuel Price of fuel.
• labshr Labor’s cost share.
• kapshr Capital’s cost share.
Usage
data(cgreene76)
Format
A data.frame object with 8 variables and 99 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/
References
Christensen, L. R., and Greene, W. H. (1976) Economies of scale in US electric power generation.The Journal of Political Economy, 655–676.
chismoke 17
chismoke Smoking and Lung Cancer in China
Description
The Chinese smoking and lung cancer data (Agresti, Table 5.12, p. 60).
• Citycity name.
• Smokerif smoker or not.
• Cancerif detected cancer or not.
• Countthe frequencies.
@references Agresti, A. An introduction to categorical Data Analysis, (2nd ed.). Wiley.
Usage
data(chismoke)
Format
A data.frame object with 4 variables and 32 observations.
CI An S4 Class to Confidence Intervals
Description
Calculates the confidence intervals for a vector of data values.
Usage
CI(x, level = 0.95, alpha = 1 - level, na.rm = FALSE, ...)
CI(x, level = 0.95, alpha = 1 - level, na.rm = FALSE, ...)
Arguments
x A vector of data values.
level The confidence level. Default is 0.95.
alpha The significance level. Default is 1-level. If alpha equals 0.05, then yourconfidence level is 0.95.
na.rm A logical value, default is FALSE
... Additional arguements (currently ignored)
18 Clear
Value
CI lower
Est. Mean Mean of data.
CI upper Upper bound of interval.
Std. Error Standard Error of the mean.
Slots
lower Lower bound of interval.
mean Estimated mean.
upper Upper bound of interval.
stderr Standard Error of the mean.
Author(s)
Daniel Marcelino, <[email protected]>.
Examples
x <- c(1, 2.3, 2, 3, 4, 8, 12, 43, -1,-4)
CI(x, level=.90)
Clear Clear Memory of All Objects
Description
This function is a wrapper for the command rm(list=ls()).
Usage
Clear(what = NULL, keep = TRUE)
Arguments
what The object (as a string) that needs to be removed (or kept)
keep Should obj be kept (i.e., everything but obj removed)? Or dropped?
Author(s)
Daniel Marcelino, <[email protected]>.
Condorcet 19
Examples
# create objectsa=1; b=2; c=3; d=4; e=5# remove dClear("d", keep=FALSE)ls()# remove all but a and bClear(c("a", "b"), keep=TRUE)ls()
Condorcet Condorcet Voting
Description
Condorcet Voting.
Usage
Condorcet()
CountMissing Count missing data.
Description
Function to count missing data.
Usage
CountMissing(.data, vars = NULL, prop = TRUE, exclude.complete = TRUE)
Arguments
.data the data frame.
vars vector with name of variables.
prop logical. If this is TRUE will show share of missing data by variable.exclude.complete
Logical. Exclude complete variables from the output.
Examples
CountMissing(marriage, prop = FALSE)
20 Crosstable
Crosstable Cross-tabulation
Description
Crosstable produces all possible two-way and three-way tabulations of variables.
Usage
Crosstable(..., digits = 2, chisq = FALSE, fisher = FALSE,mcnemar = FALSE, latex = FALSE, alt = "two.sided", deparse.level = 2)
## Default S3 method:Crosstable(..., digits = 2, chisq = FALSE,fisher = FALSE, mcnemar = FALSE, alt = "two.sided", latex = FALSE,deparse.level = 2)
Arguments
digits number of digits after the decimal point.
chisq if TRUE, the results of a chi-square test will be shown below the table.
fisher if TRUE, the results of a Fisher Exact test will be shown below the table.
mcnemar if TRUE, the results of a McNemar test will be shown below the table.
latex if the output is to be printed in latex format.
alt the alternative hypothesis and must be one of "two.sided", "greater" or "less".Only used in the 2 by 2 case.
deparse.level an integer controlling the construction of labels in the case of non-matrix-likearguments. If 0, middle 2 rownames, if 1, 3 rownames, if 2, 4 rownames (de-fault).
... the variables for the cross tabulation.
Value
A cross-tabulated object.
See Also
xtabs, Frequency, table, prop.table
Other Crosstables: summary.Crosstable
CrossValidation 21
Examples
with(titanic, Crosstable( SEX, AGE, SURVIVED))
#' # Agresti (2002), table 3.10, p. 106# 1992 General Social Survey--Race and Party affiliationgss <- data.frame(
expand.grid(Race=c("black", "white"),party=c("dem", "indep", "rep")),count=c(103,341,15,105,11,405))
df <- gss[rep(1:nrow(gss), gss[["count"]]), ]
with(df, Crosstable(Race, party))
# Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinkertea <- data.frame(
expand.grid(poured=c("Yes", "No"),guess=c("Yes", "No")),count=c(3,1,1,3))
# nicer way of recreating long tablesdata = Untable(tea, freq="count")
with(data, Crosstable(poured, guess, fisher=TRUE, alt="greater")) # fisher=TRUE
CrossValidation Cross Validation of SciencesPo Trees and Forests
Description
Cross Validation of SciencesPo Trees and Forests.
Usage
CrossValidation(object, ...)
## S3 method for class 'Tree'CrossValidation(object, FUN, ...)
## S3 method for class 'Forest'CrossValidation(object, FUN, ...)
Arguments
object a fitted Tree or Forest object in the formula.
... additional arguments.
FUN either tree or forest
22 ddirichlet
Value
An object of class CV with elements:
call the function call
Author(s)
Daniel Marcelino, <[email protected]>.
ddirichlet Dirichlet Distribution
Description
Density function and random number generation for the Dirichlet distribution
Usage
ddirichlet(x, alpha, log = FALSE, sum = FALSE)
Arguments
x a matrix containing observations.
alpha the Dirichlet distribution’s parameters. Can be a vector (one set of parametersfor all observations) or a matrix (a different set of parameters for each observa-tion), see “Details”.
log if TRUE, logarithmic densities are returned.
sum if TRUE, the (log-)likelihood is returned.
Value
the ddirichlet returns a vector of densities (if sum = FALSE) or the (log-)likelihood (if sum = TRUE)for the given data and alphas.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
mat <- cbind(1:10, 5, 10:1);mat;x <- rdirichlet(10, mat);
ddirichlet(x, mat);
Describe 23
Describe Summary Description Table
Description
Produce summary description table of a dataframe with the following features: variable names,labels, factor levels, frequencies or summary statistics.
Usage
Describe(.data, style = "multiline", justify = "left", max.number = 10,trim.strings = FALSE, max.width = 15, digits = 2, split.cells = 35,display.labels = FALSE, display.attributes = FALSE, file = NA,append = FALSE, escape.pipe = FALSE, ...)
Arguments
.data a data frame.
style the style to be used in pander table, one of “multiline”, “grid”, “simple”, and“rmarkdown”.
justify pander argument; defaults is justified to “left”.
max.number an integer for the maximum number of items to be displayed in the frequencycell. If variable has more distinct values, no frequency will be shown (only amessage stating the number of distinct values).
trim.strings remove white spaces at the beginning or end of the string. This may impact thefrequencies, so interpret the frequencies cell accordingly. Defaults to FALSE.
max.width Limits the number of characters to display in the frequency tables. Defaults to15.
digits the number of digits for rounding.
split.cells pander argument. Number of characters allowed on a line before splitting thecell. Defaults to 35.
display.labels If TRUE, variable labels will be displayed. Defaults to FALSE.display.attributes
If TRUE, variable attibutes will be displayed. Defaults to FALSE.
file the text file to be written to disk. Defaults to NA.
append When “file” argument is supplied, this indicates whether to append output toexisting file (TRUE) or to overwrite any existing file (FALSE, default). If TRUEand no file exists, a new file will be created.
escape.pipe Only useful when style='grid' and file argument is not NA, in which case itwill escape the pipe character (|) to allow Pandoc to correctly convert multilinecells.
... additional arguments passed to pander.
24 dHondt
Details
The IQR formula used is the R standard IQR(x) = quantile(x, 3/4) - quantile(x, 1/4).The (CV) stands for the coefficient of variation also known as relative standard deviation (RSD),defined as the ratio of the standard deviation to the mean.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
data(religiosity)Describe(religiosity)
dHondt The D’Hondt Method of Allocating Seats Proportionally
Description
The function calculates seats allotment in legislative house, given the total number of seats andthe votes for each party based on the Victor D’Hondt’s method (1878). The D’Hondt’s method ismathematically equivalent to the method proposed by Thomas Jefferson few years before (1792).
Usage
dHondt(parties = NULL, votes = NULL, seats = NULL, ...)
dHondt(parties = NULL, votes = NULL, seats = NULL, ...)
Arguments
parties a vector containig parties labels or candidates accordingly to the votes vectororder.
votes a vector containing the total number of formal votes received by the parties/candidates.
seats an integer for the number of seats to be filled (the district magnitude).
... Additional arguements (currently ignored)
Value
A data.frame of length parties containing apportioned integers (seats) summing to seats.
Note
Adapted from Carlos Bellosta’s replies in the R-list.
Dummify 25
Author(s)
Daniel Marcelino, <[email protected]>.
References
Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democra-cies, 1945-1990. Oxford University Press.
See Also
HighestAverages, LargestRemainders, Hamilton, PoliticalDiversity.
Examples
votes <- sample(1:10000, 5)parties <- sample(LETTERS, 5)
dHondt(parties, votes, seats = 4)
# Example: 2014 Brazilian election for the lower house in# the state of Ceara. Coalitions were leading by the# following parties:
results <- c(DEM=490205, PMDB=1151547, PRB=2449440,PSB=48274, PSTU=54403, PTC=173151)
dHondt(parties=names(results), votes=results, seats=19)
Dummify Generate Dummy Variables in a Data Frame
Description
Quickly create dummies in a data frame.
Usage
Dummify(x, .data = NULL, drop = TRUE)
Arguments
x the column position to generate dummies
.data the a data.frame.
drop A logical value. If TRUE, unused levels will be omitted.
26 election2008
Details
A matrix object
Author(s)
Daniel Marcelino, <[email protected]>
Examples
df <- data.frame(y = rnorm(10), x = runif(10,0,1), sex = sample(1:2, 10, rep=TRUE))
Dummify(df$sex) ;
cbind(df, Dummify(df$sex));
election2008 2008 U.S. presidential election
Description
State-by-state information from the 2008 U.S. presidential election.
Usage
data(election2008)
Format
A data.frame object with 7 variables and 51 observations.
• State Name of the state.
• Abr Abbreviation for the state.
• Income Per capita income in the state as of 2007 (in dollars).
• HS Percentage of adults with at least a high school education.
• BA Percentage of adults with at least a college education.
• Dem.Rep Difference in %Democrat-%Republican (according to 2008 Gallup survey).
• ObamaWin 1= Obama (Democrat) wins state in 2008 or 0=McCain (Republican wins).
@source State income data from: Census Bureau Table 659. Personal Income Per Capita (in 2007).High school data from: U.S. Census Bureau, 1990 Census of Population, http://www.nces.ed.gov/programs/digest/d08/tables/dt08_011.asp. College data from: Census Bureau Table225. Educational Attainment by State (in 2007) Democrat and Republican data from: http://www.gallup.com/poll/114016/state-states-political-party-affiliation.aspx
Flag 27
Flag Add an "id" Variable to a Dataset
Description
Many functions will not work properly if there are duplicated ID variables in a dataset. This functionis a convenience function for .N from the "data.table" package to create an .id. variable that whenused in conjunction with the existing ID variables, should be unique.
Usage
Flag(.data, id.vars = NULL)
Arguments
.data The input data.frame or data.table.
id.vars The variables that should be treated as ID variables. Defaults to NULL, at whichpoint all variables are used to create the new ID variable.
Value
The input dataset (as a data.table) if ID variables are unique, or the input dataset with a newcolumn named ".ID".
Examples
df <- data.frame(A = c("a", "a", "a", "b", "b"),B = c(1, 1, 1, 1, 1), values = 1:5);
df
df = Flag(df, c("A", "B"))
df <- data.frame(A = c("a", "a", "a", "b", "b"),B = c(1, 2, 1, 1, 2), values = 1:5)
df(df <- Flag(df, 1:2) )
Footnote Add Footnote to ggplot2 Objects
Description
Add footnote to ggplot2 objects.
28 Forge
Usage
Footnote(note, x = 0.85, y = 0.014, hjust = 0.5, vjust = 0.5,newlines = TRUE, fontfamily = "serif", fontface = "plain",color = "gray85", size = 9, angle = 0, lineheight = 0.9, alpha = 1)
Arguments
note string or plotmath expression to be drawn.x the x location of the label.y the y location of the label.hjust horizontal justificationvjust vertical justificationnewlines should a new line be appended to the start and end of the string, for spacing?fontfamily the font familyfontface the font face ("plain", "bold", etc.)color text colorsize point size of textangle angle at which text is drawnlineheight line height of textalpha the alpha value of the text
Forge Veritical, left-aligned layout for plots
Description
Left-align the waffle plots by x-axis. Use the pad parameter in waffle to pad each plot to the maxwidth (num of squares), otherwise the plots will be scaled.
Usage
Forge(...)
Arguments
... one or more waffle plots
Examples
parts <- c(80, 30, 20, 10)
w1 <- Waffleplot(parts, rows=8)w2 <- Waffleplot(parts, rows=8)w3 <- Waffleplot(parts, rows=8)chart <- Forge(w1, w2, w3)print(chart)
Formatted 29
Formatted Some Formats for Nicer Display
Description
Some predefined formats for nicer display.
Usage
Formatted(x, style = c("USD", "BRL", "EUR", "Perc"), digits = 2,nsmall = 2, decimal.mark = getOption("OutDec"), flag = "")
Arguments
x a numeric vector.
style a character name for style. One of "USD", "BRL", "EUR", "Perc".
digits an integer for the number of significant digits to be used for numeric and com-plex x
nsmall an integer for the minimum number of digits to the right of the decimal point.
decimal.mark decimal mark style to be used with Percents (%), usually (",") or (".").
flag a character string giving a format modifier as "-", "+", "#".
Examples
x <- as.double(c(0.1, 1, 10, 100, 1000, 10000))Formatted(x)
Formatted(x, "BRL")
Formatted(x, "EUR")
p = c(0.25, 25, 50)
Formatted(p, "Perc", flag="+")
Formatted(p, "Perc", decimal.mark=",")
30 freq
freq Simple Frequency Table
Description
Creates a frequency table or data frame.
Usage
freq(x, weighs = NULL, breaks = graphics::hist(x, plot = FALSE)$breaks,digits = 3, include.lowest = TRUE, order = c("desc", "asc", "level","name"), perc = FALSE, useNA = c("no", "ifany", "always"), ...)
## Default S3 method:freq(x, weighs = NULL, breaks = graphics::hist(x, plot =FALSE)$breaks, digits = 3, include.lowest = TRUE, order = c("desc","asc", "level", "name"), perc = FALSE, useNA = c("no", "ifany", "always"),...)
Arguments
x A vector of values for which the frequency is desired.
weighs A vector of weights.
breaks one of: 1) a vector giving the breakpoints between histogram cells; 2) a functionto compute the vector of breakpoints; 3) a single number giving the number ofcells for the histogram; 4) a character string naming an algorithm to compute thenumber of cells (see ’Details’); 5) a function to compute the number of cells.
digits The number of significant digits required.
include.lowest Logical; if TRUE, an x[i] equal to the breaks value will be included in the first (orlast) category or bin.
order The order method.
perc logical; if TRUE percents are returned.
useNA Logical; if TRUE NA’s values are included.
... Additional arguements (currently ignored)
Author(s)
Daniel Marcelino, <[email protected]>.
See Also
Frequency, Crosstable.
Frequency 31
Frequency Frequency Table
Description
Simulating the FREQ procedure of SPSS.
Usage
Frequency(.data, x = NULL, verbose = TRUE, ...)
## Default S3 method:Frequency(.data, x = NULL, verbose = TRUE, ...)
Freq(.data, x = NULL, verbose = TRUE, ...)
Arguments
.data The data.frame.
x A column for which a frequency of values is desired.
verbose A logical value, if TRUE, extra statistics are also provided.
... Additional arguements (currently ignored)
See Also
freq, Crosstable.
Examples
data(cathedrals)
Frequency(cathedrals, Type)
# may work with operators like %>%cathedrals %>% Frequency(Height)
32 GeomSpotlight
galton Galton’s Family Data on Human Stature.
Description
It is a reproduction of the data set used by Galton in his 1885’s paper on correlation between parent’sheight and their children, although the concept of correlation would be only introduced few yearslater, in 1888. This data.frame contains the following columns:
• parent The parents’ average height
• child The child’s height
Usage
data(galton)
Format
A data.frame object with 2 variables and 928 observations.
Details
Regression analysis is the statistical method most often used in political science research. Thereason is that most scholars are interested in identifying “causal” effects from non-experimentaldata and that the linear regression is the method for doing this. The term “regression” (1889) wasfirst crafted by Sir Francis Galton upon investigating the relationship between body size of fathersand sons. Thereby he “invented” regression analysis by estimating: Ss = 85.7 + 0.56SF , meaningthat the size of the son regresses towards the mean.
References
Francis Galton (1886) Regression Towards Mediocrity in Hereditary Stature. The Journal of theAnthropological Institute of Great Britain and Ireland, Vol. 15, pp. 246–263.
GeomSpotlight Automatically enclose points in a polygon
Description
Automatically enclose points in a polygon
Usage
geom_spotlight(mapping = NULL, data = NULL, stat = "identity",position = "identity", na.rm = FALSE, show.legend = NA,inherit.aes = TRUE, ...)
geom_dumbbell 33
Arguments
mapping mapping
data data
stat stat
position position
na.rm na.rm
show.legend show.legend
inherit.aes inherit.aes
... dots
Value
adds a circle around the specified points
Examples
d <- data.frame(x=c(1,1,2),y=c(1,2,2)*100)
gg <- ggplot(d,aes(x,y))gg <- gg + scale_x_continuous(expand=c(0.5,1))gg <- gg + scale_y_continuous(expand=c(0.5,1))gg + geom_spotlight(s_shape=1, expand=0) + geom_point()
gg <- ggplot(mpg, aes(displ, hwy))ss <- subset(mpg,hwy>29 & displ<3)gg + geom_spotlight(data=ss, colour="blue", s_shape=.8, expand=0) +geom_point() + geom_point(data=ss, colour="blue")
geom_dumbbell Dumbell charts
Description
The dumbbell geom is used to create dumbbell charts. Dumbbell dot plots–dot plots with two ormore series of data–are an alternative to the clustered bar chart or slope graph.
Usage
geom_dumbbell(mapping = NULL, data = NULL, ..., point.colour.l = NULL,point.size.l = NULL, point.colour.r = NULL, point.size.r = NULL,na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
34 geom_dumbbell
Arguments
mapping Set of aesthetic mappings created by aes or aes_. If specified and inherit.aes = TRUE(the default), it is combined with the default mapping at the top level of the plot.You must supply mapping if there is no plot mapping.
data The data to be displayed in this layer. There are three options:If NULL, the default, the data is inherited from the plot data as specified in thecall to ggplot.A data.frame, or other object, will override the plot data. All objects willbe fortified to produce a data frame. See fortify for which variables will becreated.A function will be called with a single argument, the plot data. The returnvalue must be a data.frame., and will be used as the layer data.
... other arguments passed on to layer. These are often aesthetics, used to set anaesthetic to a fixed value, like color = "red" or size = 3. They may also beparameters to the paired geom/stat.
point.colour.l the colour of the left point
point.size.l the size of the left point
point.colour.r the colour of the right point
point.size.r the size of the right point
na.rm If FALSE (the default), removes missing values with a warning. If TRUE silentlyremoves missing values.
show.legend logical. Should this layer be included in the legends? NA, the default, includes ifany aesthetics are mapped. FALSE never includes, and TRUE always includes.
inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them.This is most useful for helper functions that define both data and aesthetics andshouldn’t inherit behaviour from the default plot specification, e.g. borders.
Aesthetics
geom_segment understands the following aesthetics (required aesthetics are in bold):
• x
• xend
• y
• yend
• alpha
• colour
• linetype
• size
geom_lollipop 35
Examples
df <- data.frame(trt=LETTERS[1:5],l=c(20, 40, 10, 30, 50),r=c(70, 50, 30, 60, 80))
ggplot(df, aes(y=trt, x=l, xend=r)) + geom_dumbbell()
ggplot(df, aes(y=trt, x=l, xend=r)) +geom_dumbbell(color="#a3c4dc", size=0.75, point.colour.l="#0e668b")
geom_lollipop Lollipop charts
Description
The lollipop geom is used to create lollipop charts. Lollipop charts are the creation of Andy Cot-greave going back to 2011. They are a combination of a thin segment, starting at with a dot at thetop and are a suitable alternative to or replacement for bar charts.
Geom Proto
Usage
geom_lollipop(mapping = NULL, data = NULL, ..., horizontal = FALSE,point.colour = NULL, point.size = NULL, stem.colour = NULL,stem.size = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
Arguments
mapping Set of aesthetic mappings created by aes or aes_. If specified and inherit.aes = TRUE(the default), it is combined with the default mapping at the top level of the plot.You must supply mapping if there is no plot mapping.
data The data to be displayed in this layer. There are three options:If NULL, the default, the data is inherited from the plot data as specified in thecall to ggplot.A data.frame, or other object, will override the plot data. All objects willbe fortified to produce a data frame. See fortify for which variables will becreated.A function will be called with a single argument, the plot data. The returnvalue must be a data.frame., and will be used as the layer data.
... other arguments passed on to layer. These are often aesthetics, used to set anaesthetic to a fixed value, like color = "red" or size = 3. They may also beparameters to the paired geom/stat.
horizontal If is FALSE (the default), the function will draw the lollipops up from the X axis(i.e. it will set xend to x & yend to 0). If TRUE, it wiill set yend to y & xend to0). Make sure you map the x & y aesthetics accordingly. This parameter helpsavoid the need for coord_flip().
36 geom_lollipop
point.colour the colour of the point
point.size the size of the point
stem.colour the colour of the stem
stem.size the size of the stem
na.rm If FALSE (the default), removes missing values with a warning. If TRUE silentlyremoves missing values.
show.legend logical. Should this layer be included in the legends? NA, the default, includes ifany aesthetics are mapped. FALSE never includes, and TRUE always includes.
inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them.This is most useful for helper functions that define both data and aesthetics andshouldn’t inherit behaviour from the default plot specification, e.g. borders.
Details
Use the horizontal parameter to abate the need for coord_flip() (see the Arguments section fordetails).
Aesthetics
geom_point understands the following aesthetics (required aesthetics are in bold):
• x
• y
• alpha
• colour
• fill
• shape
• size
• stroke
Examples
df <- data.frame(trt=LETTERS[1:10],value=seq(100, 10, by=-10))
ggplot(df, aes(trt, value)) + geom_lollipop()
ggplot(df, aes(value, trt)) + geom_lollipop(horizontal=TRUE)
Gini 37
Gini Gini Coefficient G
Description
Computes the unweighted and weighted Gini index of a distribution.
Usage
Gini(x, weights, ...)
## Default S3 method:Gini(x, weights = rep(1, length = length(x)), ...)
Arguments
x a data.frame, a matrix-like, or a vector.
weights a vector containing weights for x.
... additional arguements (currently ignored)
Details
One might say the Gini is oversensitive to changes in the middle, while undersensitive at the ex-tremes. The G coefficient doesn’t capture very explicitly changes in the top 10 inequality researchin the past 10 years – or the bottom 40 most poverty lies. The alternative Palma ratio does.
Author(s)
Daniel Marcelino, <[email protected]>.
See Also
GiniSimpson, Lorenz, Herfindahl, Rosenbluth, Atkinson..
Examples
# generate a vector (of incomes)x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)# compute Gini coefficientGini(x)
# For Gini index: Gini(x)*100
Gini(c(100,0,0,0), c(1,33,33,33))
# Considers thisA <- c(20000, 30000, 40000, 50000, 60000)B <- c(9000, 40000, 48000, 48000, 55000)
38 GiniSimpson
Gini(A); Gini(B);
GiniSimpson Gini-Simpson Index
Description
Computes the Gini/Simpson coefficient. NAs from the data are omitted.
Usage
GiniSimpson(x, na.rm = TRUE, ...)
## Default S3 method:GiniSimpson(x, na.rm = TRUE, ...)
Arguments
x a data.frame, a matrix-like, or a vector.na.rm a logical value to deal with NAs.... additional arguements (currently ignored).
Details
The Gini-Simpson quadratic index is a classic measure of diversity, widely used by social scientistsand ecologists. The Gini-Simpson is also known as Gibbs-Martin index in sociology, psychologyand management studies, which in turn is also known as the Blau index. The Gini-Simpson indexis computed as 1− λ = 1−
∑Ri=1 p
2i = 1− 1/2D.
Author(s)
Daniel Marcelino, <[email protected]>.
See Also
Gini.
Examples
# generate a vector (of incomes)x <- as.table(c(69,50,40,22))# let's say AB coalescedrownames(x) <- c("AB","C","D","E")print(x)
GiniSimpson(x)
griliches76 39
griliches76 Griliches’s (1976) Data
Description
Data set used by Griliches (1976) on wages of very young men. This dataset contains the followingcolumns:
• rns residency in South.• rns80 a numeric vector.• mrt marital status = 1 if married.• mrt80 a numeric vector.• smsa reside metro area = 1 if urban.• smsa80 a numeric vector.• med mother’s education, years.• iq iq score.• kww score on knowledge in world of work test.• year Year.• age a numeric vector.• age80 a numeric vector.• s completed years of schooling.• s80 a numeric vector.• expr experience, years.• expr80 a numeric vector.• tenure tenure, years.• tenure80 a numeric vector.• lw log wage.• lw80 a numeric vector.
Usage
data(griliches76)
Format
A data.frame object with 20 variables and 758 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/
References
Griliches, Z. (1976) Wages of very young men. The Journal of Political Economy, 84(4), 69–85.
40 Hamilton
Hamilton The Hamilton Method of Allocating Seats Proportionally
Description
Computes the Alexander Hamilton’s apportionment method (1792), also known as Hare-Niemeyermethod or as Vinton’s method. The Hamilton method is a largest-remainder method which uses theHare Quota.
Usage
Hamilton(parties = NULL, votes = NULL, seats = NULL, ...)
Hamilton(parties = NULL, votes = NULL, seats = NULL, ...)
Arguments
parties A vector containig parties labels or candidates in the same order of votes.
votes A vector with the formal votes received by the parties/candidates.
seats An integer for the number of seats to be returned.
... Additional arguements (currently ignored)
Details
The Hamilton/Vinton Method sets the divisor as the proportion of the total population per houseseat. After each state’s population is divided by the divisor, the whole number of the quotientis kept and the fraction dropped resulting in surplus house seats. Then, the first surplus seat isassigned to the state with the largest fraction after the original division. The next is assigned to thestate with the second-largest fraction and so on.
Value
A data.frame of length parties containing apportioned integers (seats) summing to seats.
Author(s)
Daniel Marcelino, <[email protected]>.
References
Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democra-cies, 1945-1990. Oxford University Press.
See Also
dHondt, HighestAverages, LargestRemainders, PoliticalDiversity.
Herfindahl 41
Examples
votes <- sample(1:10000, 5)parties <- sample(LETTERS, 5)Hamilton(parties, votes, seats = 4)
Herfindahl Herfindahl Index of Concentration
Description
Calculates the Herfindahl Index of concentration.
Usage
Herfindahl(x, n = rep(1, length(x)), base = 1, na.rm = FALSE, ...)
## Default S3 method:Herfindahl(x, n = rep(1, length(x)), base = NULL,
na.rm = FALSE, ...)
Arguments
x a vector of data values of non-negative elements.
n a vector of frequencies of the same length as x.
base a parameter of the concentration measure (if NULL, the default parameter (1.0) isused instead).
na.rm a logical. Should missing values be removed? The Default is set to na.rm=FALSE.
... additional arguements (currently ignored)
Details
This index is also known as the Simpson Index in ecology, the Herfindahl-Hirschman Index (HHI)in economics, and as the Effective Number of Parties (ENP) in political science.
References
Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Hand-book of Income Distribution. Amsterdam.
Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.
See Also
Atkinson, Rosenbluth, PoliticalDiversity, Gini. For more details see the Indices vignette:vignette("Indices", package = "SciencesPo")
42 HighestAverages
Examples
# generate a vector (of incomes)x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)
# compute the Herfindahl coefficient with base=1Herfindahl(x, base=1)
HighestAverages Highest Averages Methods of Allocating Seats Proportionally
Description
Computes the highest averages method for a variety of formulas of allocating seats proportionally.
Usage
HighestAverages(parties = NULL, votes = NULL, seats = NULL,method = c("dh", "sl", "msl", "danish", "hsl", "hh", "imperiali", "wb","jef", "ad", "hb"), threshold = 0, ...)
## Default S3 method:HighestAverages(parties = NULL, votes = NULL,seats = NULL, method = c("dh", "sl", "msl", "danish", "hsl", "hh","imperiali", "wb", "jef", "ad", "hb"), threshold = 0, ...)
Arguments
parties A character vector for parties labels or candidates in the same order as votes. IfNULL, alphabet will be assigned.
votes A numeric vector for the number of formal votes received by each party or can-didate.
seats The number of seats to be filled (scalar or vector).
method A character name for the method to be used. See details.
threshold A numeric value between (0~1). Default is set to 0.
... Additional arguements (currently ignored)
Details
The following methods are available:
• "dh"d’Hondt method
• "sl"Sainte-Lague method
• "msl"Modified Sainte-Lague method
HighestAverages 43
• "danish"Danish modified Sainte-Lague method
• "hsl"Hungarian modified Sainte-Lague method
• "imperiali"The Italian Imperiali (not to be confused with the Imperiali Quota, which is aLargest remainder method)
• "hh"Huntington-Hill method
• "wb"Webster’s method
• "jef"Jefferson’s method
• "ad"Adams’s method
• "hb"Hagenbach-Bischoff method
Value
A data.frame of length parties containing apportioned integers (seats) summing to seats.
Author(s)
Daniel Marcelino, <[email protected]>.
References
Gallagher, Michael (1992). "Comparing Proportional Representation Electoral Systems: Quotas,Thresholds, Paradoxes and Majorities". British Journal of Political Science, 22, 4, 469-496.
Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democra-cies, 1945-1990. Oxford University Press.
See Also
LargestRemainders, Proportionality, PoliticalDiversity. For more details see the Indicesvignette: vignette('Indices', package = 'SciencesPo').
Examples
# Results for the state legislative house of Ceara (2014):votes <- c(187906, 326841, 132531, 981096, 2043217, 15061, 103679,109830, 213988, 67145, 278267)
parties <- c("PCdoB", "PDT", "PEN", "PMDB", "PRB", "PSB", "PSC", "PSTU", "PTdoB", "PTC", "PTN")
HighestAverages(parties, votes, seats = 42, method = "dh")
# Let's create a data.frame with typical election results# with the following parties and votes to return 10 seats:
my_election_data <- data.frame(party=c("Yellow", "White", "Red", "Green", "Blue", "Pink"),votes=c(47000, 16000,15900,12000,6000,3100))
HighestAverages(my_election_data$party,my_election_data$votes,
44 Jackknife
seats = 10,method="dh")
# How this compares to the Sainte-Lague Method
(dat= HighestAverages(my_election_data$party,my_election_data$votes,seats = 10,method="sl"))
# Plot it# Barplot(data=dat, "Party", "Seats") +# theme_fte()
Jackknife Resamples Data Using the Jackknife Method
Description
This function is used for estimating standard errors when the distribution is not know.
Usage
Jackknife(x, p)
Arguments
x A vectorp An function name for estimation of parameter as a string
Value
est orignial estimation of parameter
jkest jackknife estimation of parameter
jkvar jackknife estimation of variance
jkbias jackknife estimate of biasness of parameter
jkbiascorr bias corrected parameter estimate
Author(s)
Daniel Marcelino, <[email protected]>
Examples
x = runif(10, 0, 1)mean(x)Jackknife(x,'mean')
JamesStein 45
JamesStein James-Stein Shrunken Estimates
Description
Computes James-Stein shrunken estimates of cell means given a response variable (which may bebinary) and a grouping indicator.
Usage
JamesStein(x, tau2 = 1)
JamesStein(x, tau2 = 1)
Arguments
x a vector, matrix, or array of numerics; the data.
tau2 a positive numeric. The variance, assumed known and defaults to 1. # @paramk the grouping factor.
... extra parameters typically ignored.
Details
Missing values are by default removed.
References
Efron, Bradley and Morris, Carl (1977) “Stein’s Paradox in Statistics.” Scientific American Vol.236 (5): 119-127.
James, W., & Stein, C. (1961, June). Estimation with quadratic loss. In Proceedings of the fourthBerkeley symposium on mathematical statistics and probability (Vol. 1, No. 1961, pp. 361-379).
Kurtosis Compute the Kurtosis
Description
Provides three methods for performing kurtosis test.
Usage
Kurtosis(x, na.rm = FALSE, type = 3)
46 Kurtosis
Arguments
x a numeric vector
na.rm a logical value for na.rm, default is na.rm=FALSE.
type an integer between 1 and 3 selecting one of the algorithms for computing kurto-sis detailed below
Details
Kurtosis measures the peakedness of a data distribution. A distribution with zero kurtosis has ashape as the normal curve (mesokurtic, or mesokurtotic). A positive kurtosis has a curve morepeaked about the mean and the its shape is narrower than the normal curve (leptokurtic, or leptokur-totic). A distribution with negative kurtosis has a curve less peaked about the mean and the its shapeis flatter than the normal curve (platykurtic, or platykurtotic).
Value
An object of the same type as x.
Note
To be consistent with classical use of kurtosis in political science analyses, the default type is thesame equation used in SPSS and SAS, which is the bias-corrected formula: Type 2: G_2 = ((n + 1)g_2+6) * (n-1)/(n-2)(n-3). When you set type to 1, the following equation applies: Type 1: g_2 =m_4/m_2^2-3. When you set type to 3, the following equation applies: Type 3: b_2 = m_4/s^4-3= (g_2+3)(1-1/n)^2-3. You must have at least 4 observations in your vector to apply this function.
Skewness and Kurtosis are functions to measure the third and fourth central moment of a datadistribution.
Author(s)
Daniel Marcelino, <[email protected]>
References
Balanda, K. P. and H. L. MacGillivray. (1988) Kurtosis: A Critical Review. The American Statisti-cian, 42(2), pp. 111–119.
Examples
w<-sample(4,10, TRUE)x <- sample(10, 1000, replace=TRUE, prob=w)
Kurtosis(x, type=1)Kurtosis(x, type=2)Kurtosis(x)
Label 47
Label Get Variable Label
Description
This function returns character value previously stored in variable’s label attribute. If none found,and fallback argument is set to TRUE, the function returns object’s name (retrieved by deparse(substitute(x))),otherwise NA is returned with a warning notice.
Usage
Label(x, fallback = TRUE, simplify = TRUE)
Arguments
x an R object to extract labels from.
fallback a logical indicating if labels should fallback to object name(s). Default is set toTRUE.
simplify a logical indicating if coerce results to a vector, otherwise, a list is returned.Default is set to TRUE.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
x <- rnorm(100)Label(x) # returns "x"
Label(twins$IQb) <- "IQ biological scores"
Label(twins, FALSE) # returns NA where no labels are found
Label<- Set Variable Label
Description
This function sets a label to a variable, by storing a character string to its label attribute.
Usage
Label(x) <- value
48 LargestRemainders
Arguments
x a valid variable i.e. a non-NULL atomic vector that has no dimension attribute(see dim for details).
value a character value that is to be set as variable label
See Also
label
Examples
## Not run:Label(religiosity$Religiosity) <- "Religiosity index"
vec <- rnorm(100)Label(vec) <- "random numbers"
## End(Not run)
LargestRemainders Largest Remainders Methods of Allocating Seats Proportionally
Description
Computes the largest remainders method for a variety of formulas of allocating seats proportionally.
Usage
LargestRemainders(parties = NULL, votes = NULL, seats = NULL,method = c("hare", "droop", "hagb", "imperiali", "imperiali.adj"),threshold = 0, ...)
## Default S3 method:LargestRemainders(parties = NULL, votes = NULL,seats = NULL, method = c("hare", "droop", "hagb", "imperiali","imperiali.adj"), threshold = 0, ...)
Arguments
parties A character vector for parties labels or candidates in the order as votes. If NULL,a random combination of letters will be assigned.
votes A numeric vector for the number of formal votes received by each party or can-didate.
seats The number of seats to be filled (scalar or vector).method A character name for the method to be used. See details.threshold A numeric value between (0~1). Default is set to 0.... Additional arguements (currently ignored)
LargestRemainders 49
Details
The following methods are available:
• "droop"Droop quota method
• "hare"Hare method
• "hagb"Hagenbach-Bischoff
• "imperiali"Imperiali quota (do not confuse with the Italian Imperiali, which is a highest aver-ages method)
• "imperiali.adj"Reinforced or adjusted Imperiali quota
Value
A data.frame of length parties containing apportioned integers (seats) summing to seats.
Author(s)
Daniel Marcelino, <[email protected]>.
References
Gallagher, Michael (1992). "Comparing Proportional Representation Electoral Systems: Quotas,Thresholds, Paradoxes and Majorities". British Journal of Political Science, 22, 4, 469-496.
Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democra-cies, 1945-1990. Oxford University Press.
See Also
HighestAverages, Proportionality, PoliticalDiversity. For more details see the Indicesvignette: vignette('Indices', package = 'SciencesPo').
Examples
# Let's create a data.frame with typical election results# with the following parties and votes to return 10 seats:
my_election_data <- data.frame(party=c("Yellow", "White", "Red", "Green", "Blue", "Pink"),votes=c(47000, 16000,15900,12000,6000,3100))
LargestRemainders(my_election_data$party,my_election_data$votes, seats = 10, method="droop")
with(my_election_data, LargestRemainders(party,votes, seats = 10, method="hare"))
50 Lorenz
Lorenz The Lorenz Curve
Description
Computes the (empirical) ordinary and generalized Lorenz curve of a vector.
Usage
Lorenz(x, n = rep(1, length(x)), plot = FALSE, ...)
## Default S3 method:Lorenz(x, n = rep(1, length(x)), plot = FALSE, ...)
Arguments
x A vector of non-negative values.
n A vector of frequencies of the same length as x.
plot A logical. If TRUE the Lorenz curve will be plotted.
... Additional arguements (currently ignored)
Details
The Gini coefficient ranges from a minimum value of zero, when all individuals are equal, to atheoretical maximum of one in an infinite population in which every individual except one has asize of zero. It has been shown that the sample Gini coefficients originally defined need to bemultiplied by n/(n-1) in order to become unbiased estimators for the population coefficients.
Author(s)
Daniel Marcelino, <[email protected]>.
See Also
Gini, GiniSimpson.
Examples
# generate a vector (of incomes)x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)# compute Lorenz valuesLorenz(x)# generate some weights:wgt <- runif(n=length(x))# compute the lorenz with especific weightsLorenz(x, wgt)
ltaylor96 51
ltaylor96 Lothian’s and Taylor’s (1996) Data Set
Description
Data used by Lothian and Taylor (1996) on the real exchange rate behaviour. This dataset containsthe following columns:
• year Year
• spot dollar/sterling exchange rate.
• USwpi U.S. wholesale price index, 1914==100.
• UKwpi U.K. wholesale price index, 1914==100.
Usage
data(ltaylor96)
Format
A data.frame object with 4 variables and 200 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/
References
Lothian, J. R., and Taylor, M. P. (1996) Real exchange rate behavior: the recent float from theperspective of the past two centuries. Journal of Political Economy, 488–509.
MakeNumeric Clean up a character vector to make it numeric
Description
Remove commas and whitespace, primarily
Usage
MakeNumeric(x)
Arguments
x character vector to process
52 marriage
Value
numeric vector
MakeShare Clean up a character vector to make it a percent
Description
Remove "
Usage
MakeShare(x)
Arguments
x character vector to process
Value
numeric vector
marriage Same Sex Marriage Public Opinion Data
Description
Data set fielded by the PEW Research Center on same sex marriage support in US. It covers publicopinion on the issue starting from 1996 up to date. This dataset contains the following columns:
• Date The year of the measurement• Oppose Percent opposing same-sex marriage• Favor Percent favoring same-sex marriage• DK Percent of Don’t Know
Usage
data(marriage)
Format
A data.frame object with 4 variables and 18 observations.
References
PEW Research Center. Support for same-sex marriage. http://www.pewresearch.org
MeanFromRange 53
MeanFromRange Estimates Mean and Standard Deviation from Median and Range
Description
When conductig a meta-analysis study, it is not always possible to recover from reports, the meanand standard deviation values, but rather the median and range of values. This function provides amethod for computing the mean and variance estimates from median/range or IQR estimates.
Usage
MeanFromRange(low, med, high, n)
Arguments
low The min of the data.
med The median of the data.
high The max of the data
n The size of the sample.
Note
One should use these calculations ONLY if there are strong hints, that the overall distribution doesnot relevantly deviate from normal distribution. The question is then, why some papers reportmedian and IQR if data are not far from normal distribution.
Author(s)
Daniel Marcelino, <[email protected]>
References
Hozo, Stela P.; et al (2005) Estimating the mean and variance from the median, range, and the sizeof a sample. BMC Medical Research Methodology, 5:13.
Examples
# mean=median and SD = IQR/1.35.# median= 3# Iqr= 2-3# median=mean=3# sd=iqr/1.35 2-3 means 2,5/1.35= sd = 1,85
MeanFromRange(low=5, med=8, high=12, n=10)
54 mishkin92
mishkin92 Mishkin’s (1992) Data
Description
Data from the Frederic S. Mishkin (1992) paper “Is the Fisher Effect for real?”. This dataset con-tains the following columns:
• year Year
• mon a numeric vector
• inf1mo a numeric vector
• inf3mo a numeric vector
• tbill1mo a numeric vector
• tbill3mo a numeric vector
• cpiu a numeric vector
• quote a numeric vector
Usage
data(mishkin92)
Format
A data.frame object with 8 variables and 491 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/
References
Mishkin, F. S. (1992) Is the Fisher effect for real?: A reexamination of the relationship betweeninflation and interest rates. Journal of Monetary Economics, 30(2), 195–215.
nerlove63 55
nerlove63 Marc Nerlove’s (1963) data
Description
Data used by Marc Nerlove (1963) on returns of electricity supply. This dataset contains the fol-lowing columns:
• totcost costs in 1970, MM USD.
• output output, billion KwH.
• plabor price of labor.
• pfuel price of fuel.
• pkap price of capital.
#’ @source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University.http://fmwww.bc.edu/ec-p/data/Hayashi/
Usage
data(nerlove63)
Format
A data.frame object with 5 variables and 145 observations.
References
Nerlove, M. (1963) Returns to Scale in Electricity Supply. In Measurement in Economics-Studiesin Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld, edited by Carl F.Christ. Stanford: Stanford.
Normalize Unity-based normalization
Description
Normalizes as feature scaling min - max, or unity-based normalization typically used to bringthe values into the range [0,1]. Other methods are also available, including scoring, centering, andSmithson and Verkuilen (2006) method of dependent variable transformation.
Usage
Normalize(x, method = "range", ...)
56 Normalize
Arguments
x is a vector to be normalized.
method A string for the method used for normalization. Default is method = "range",which coerces values into [0,1] range. See details for other implemented meth-ods.
... Additional arguements (currently ignored).
Details
The following methods are available:
• "range"Ranging is done by coercing x values into [0,1] range. However, this may be general-ized to follow the range of other arbitrary values using a and b:
X ′ = a+(x− xmin)(b− a)
(xmax − xmin)
.
• "scale"Scaling is done by dividing (centered) values of x by their standard deviations.
• "center"Centering is done by subtracting the means (omitting NAs) of x from its observedvalue.
• "z-score"Scoring is done by dividing the values of x from their root mean square.
• "SV"Transform a dependent variable in [0,1] rather than (0, 1) to beta regression as suggestedby Smithson and Verkuilen (2006).
Value
Normalized values in an object of the same class as x.
Author(s)
Daniel Marcelino, <[email protected]>
References
Smithson M, Verkuilen J (2006) A Better Lemon Squeezer? Maximum-Likelihood Regression withBeta-Distributed Dependent Variables. Psychological Methods, 11(1), 54-71.
See Also
scale.
Examples
x <- sample(10)(y = Normalize(x) )
# equivalently to
Outlier 57
(x-min(x))/(max(x)-min(x))
# Smithson and Verkuilen approach(y = Normalize(x, method="SV") )
# look at what happens to the correlation of two independent variables and their "interaction"# With non-centered interactions:a = rnorm(10000,20,2)b = rnorm(10000,10,2)cor(a,b)cor(a,a*b)
# With centered interactions:c = a - 20d = b - 10cor(c,c*d)
Outlier Detect Outliers
Description
Perform exploratory test to detect outliers.
Usage
Outlier(x, index = NULL, ...)
## Default S3 method:Outlier(x, index = NULL, ...)
Arguments
x A numeric object, a vector.
index A numeric value to be considered in the computations.
... Parameters which are typically ignored.
Details
The quantity in min reveals the minimum deviation from the mean, the integer value in the closestindicates the position of that element. The quantity in max is the maximum deviation from themean, and the farthest integer value indicates the position of that value.
Value
Returns the minimum and maximum values, respectively preceded by their positions in the vector,matrix or data.frame.
58 paired
Author(s)
Daniel Marcelino, <[email protected]>
References
Dixon, W.J. (1950) Analysis of extreme values. Ann. Math. Stat. 21(4), 488–506.
See Also
Winsorize for reduce the impact of outliers.
Examples
Outlier(x <- rnorm(20))
#data frame:age <- sample(1:100, 1000, rep=TRUE);Outlier(age)
paired Paired Data
Description
Artificial data for a paired experiment. This dataset contains the following columns:
• patient the patient id.
• before_X before treatment.
• after_Y after treatment.
Usage
data(paired)
Format
A data.frame object with 3 variables and 9 observations.
Palettes 59
Palettes Palette data for the themes used by package
Description
Data used by the palettes in the package.
Usage
Palettes
Format
A list.
party_pal Political Parties Color Palette (Discrete) and Scales
Description
An N-color discrete palette for political parties.
Usage
party_pal(palette = "BRA", plot = FALSE, hex = FALSE)
Arguments
palette the palette name, a character string.
plot logical, if TRUE a plot is returned.
hex logical, if FALSE, the associated color name (label) is returned.
See Also
Other color party: scale_color_party
Examples
library(scales)
# Brazilshow_col(party_pal("BRA")(20))
# Argentineshow_col(party_pal("ARG")(12))
60 Permutate
# USshow_col(party_pal("USA")(6))
# Canadashow_col(party_pal("CAN")(10))
party_pal("CAN", plot=TRUE, hex=FALSE)
Permutate Create k random permutations of a vector
Description
Creates a k random permutation of a vector.
Usage
Permutate(x, k)
Arguments
x a vector to be permutated.
k number of permutations to be conducted.
Details
should be used only for length(input)! » k
Author(s)
Daniel Marcelino, <[email protected]>.
Examples
# 5! = 5 x 4 x 3 x 2 x 1 = 120# row wise permutationsPermutate(x=1:5, k=5)
Plotting 61
Plotting Plot Customizing
Description
Several wrapper functions to deal with ggplot2.
Usage
Previewplot()
PreviewTheme()
align_title_left()
align_title_right()
no_title()
no_y_gridlines()
no_x_gridlines()
no_major_x_gridlines()
no_major_y_gridlines()
no_major_gridlines()
no_minor_x_gridlines()
no_minor_y_gridlines()
no_minor_gridlines()
no_gridlines()
rotate_axes_text(angle)
rotate_x_text(angle)
rotate_y_text(angle)
no_axes()
no_x_axis()
62 Plotting
no_y_axis()
no_x_line()
no_y_line()
no_x_text()
no_y_text()
no_ticks()
no_x_ticks()
no_y_ticks()
no_axes_titles()
no_x_title()
no_y_title()
move_legend(position)
legend_bottom()
legend_top()
legend_left()
legend_right()
no_legend()
no_legend_title()
Arguments
angle the angle of rotation.
position the location of the legend (’top’, ’bottom’, ’right’, ’left’ or ’none’).
Author(s)
NA
Examples
Previewplot()
PoliticalDiversity 63
PoliticalDiversity Political Diversity Indices
Description
Computes political diversity indices or fragmentation/concetration measures like the effective num-ber of parties for an electoral unity or across unities. The very intuition of these coefficients is tocounting parties while weighting them by their relative political–or electoral strength.
Usage
PoliticalDiversity(x, index = "laakso/taagepera", margin = 1,base = exp(1))
## Default S3 method:PoliticalDiversity(x, index = "laakso/taagepera",margin = 1, base = exp(1))
Arguments
x A data.frame, a matrix, or a vector containing values for the number of votes orseats each party received.
index The type of index desired, one of "laakso/taagepera", "golosov", "herfindahl","gini", "shannon", "simpson", "invsimpson".
margin The margin for which the index is computed.
base The logarithm base used in some indices, such as the "shannon" index.
Details
Very often, political analysts say things like ‘two-party system’ and ‘multi-party system’ to referto a particular kind of political party system. However, these terms alone does not tell exactlyhow fragmented–or concentrated a party system actually is. For instance, after the 2010 generalelection, 22 parties obtained representation in the Lower Chamber in Brazil. Nonetheless, amongthese 22 parties, nine parties together returned only 28 MPs. Thus, an index to assess the weigh orthe Effective Number of Parties is important and helps to go beyond the simple count of parties ina legislative branch. A widely accepted algorithm was proposed by M. Laakso and R. Taagepera:
N =1∑p2i
, where N denotes the effective number of parties and p_i denotes the ith party’s fraction of theseats.
In fact, this formula may be used to compute the vote share for each party. This formula is thereciprocal of a well-known concentration index (the Herfindahl-Hirschman index) used in eco-nomics to study the degree to which ownership of firms in an industry is concentrated. Laakso
64 PoliticalDiversity
and Taagepera correctly saw that the effective number of parties is simply an instance of the in-verse measurement problem to that one. This index makes rough but fairly reliable internationalcomparisons of party systems possible.
The Inverse Simpson index,
1/λ =1∑R
i=1 p2i
= 2D
Where λ equals the probability that two types taken at random from the dataset (with replacement)represent the same type. This simply equals true fragmentation of order 2, i.e. the effective numberof parties that is obtained when the weighted arithmetic mean is used to quantify average propor-tional diversity of political parties in the election of interest. Another measure is the Least squaresindex (lsq), which measures the disproportionality produced by the election. Specifically, by thedisparity between the distribution of votes and seats allocation.
Recently, Grigorii Golosov proposed a new method for computing the effective number of parties inwhich both larger and smaller parties are not attributed unrealistic scores as those resulted by usingthe Laakso/Taagepera index.I will call this as (Golosov) and is given by the following formula:
N =
n∑i=1
pipi + p2i − p2i
Author(s)
Daniel Marcelino, <[email protected]>.
References
Gallagher, Michael and Paul Mitchell (2005) The Politics of Electoral Systems. Oxford UniversityPress.
Golosov, Grigorii (2010) The Effective Number of Parties: A New Approach, Party Politics, 16:171-192.
Laakso, Markku and Rein Taagepera (1979) Effective Number of Parties: A Measure with Appli-cation to West Europe, Comparative Political Studies, 12: 3-27.
Nicolau, Jairo (2008) Sistemas Eleitorais. Rio de Janeiro, FGV.
Taagepera, Rein and Matthew S. Shugart (1989) Seats and Votes: The Effects and Determinants ofElectoral Systems. New Haven: Yale University Press.
Examples
# Here are some examples, help yourself:# The wikipedia examples
A <- c(.75,.25);B <- c(.75,.10,rep(0.01,15))C <- c(.55,.45);
# The index by "laakso/taagepera" is the defaultPoliticalDiversity(A)PoliticalDiversity(B)
pollster2008 65
# Using the method proposed by Golosov gives:PoliticalDiversity(B, index="golosov")PoliticalDiversity(C, index="golosov")
# The 1980 presidential election in the US (vote share):US1980 <- c("Democratic"=0.410, "Republican"=0.507,"Independent"=0.066, "Libertarian"=0.011, "Citizens"=0.003,"Others"=0.003)
PoliticalDiversity(US1980)
# 2010 Brazilian legislative election
votes_2010 = c("PT"=13813587, "PMDB"=11692384, "PSDB"=9421347,"DEM"=6932420, "PR"=7050274, "PP"=5987670, "PSB"=6553345,"PDT"=4478736, "PTB"=3808646, "PSC"=2981714, "PV"=2886633,"PC do B"=2545279, "PPS"=2376475, "PRB"=1659973, "PMN"=1026220,"PT do B"=605768, "PSOL"=968475, "PHS"=719611, "PRTB"=283047,"PRP"=232530, "PSL"=457490,"PTC"=563145)
seats_2010 = c("PT"=88, "PMDB"=79, "PSDB"=53, "DEM"=43,"PR"=41, "PP"=41, "PSB"=34, "PDT"=28, "PTB"=21, "PSC"=17,"PV"=15, "PC do B"=15, "PPS"=12, "PRB"=8, "PMN"=4, "PT do B"=3,"PSOL"=3, "PHS"=2, "PRTB"=2, "PRP"=2, "PSL"=1,"PTC"=1)
PoliticalDiversity(seats_2010)
PoliticalDiversity(seats_2010, index= "golosov")
pollster2008 Polls for 2008 U.S. presidential election
Description
Polls for the 2008 U.S. presidential election. The data includes all presidential polls reported on theinternet site http://www.elections.huffingtonpost.com/pollster that were taken betweenAugust 29th, when John Mc-Cain announced that Sarah Palin would be his running mate as theRepublican nominee for vice president, and the end of September.
Usage
data(pollster2008)
Format
A data.frame object with 11 variables and 102 observations.
• PollTaker Polling organization.
• PollDates Dates the poll data were collected.
66 Proportionality
• MidDate Midpoint of the polling period.
• Days Number of days after August 28th (end of Democratic convention).
• n Sample size for the poll.
• Pop A=all, LV=likely voters, RV=registered voters.
• McCain Percent supporting John McCain.
• Obama Percent supporting Barak Obama.
• Margin Obama percent minus McCain percent.
• Charlie Indicator for polls after Charlie Gibson interview with VP candidate Sarah Palin(9/11).
• Meltdown Indicator for polls after Lehman Brothers bankruptcy (9/15).
@source http://www.elections.huffingtonpost.com/pollster
Proportionality Indexes of (Disproportionality) Proportionality
Description
Calculates several indexes of (dis)-proportionality, which are for the most part used to show therelationship of votes to seats.
Usage
Proportionality(v, s, index = "Gallagher", ...)
## Default S3 method:Proportionality(v, s, index = "Gallagher", margin = 1,
...)
Arguments
v a numeric vector with the percentage share of votes obtained by each party.
s a numeric vector with the percentage share of seats obtained by each party.
index the desired method or type of index, see details below for the correct name.
margin The margin for which the index is computed.
... additional arguments (currently ignored)
Proportionality 67
Details
The following measures are available:
• Loosemore-Hanby (Percent) Loosemore-Hanby Index of disproportionality• Rae (Percent) Rae Index of disproportionality• Cox-Shugart Cox-Shugart Index of proportionality• Inv.Cox-Shugart The inverted Cox-Shugart index• Farina Farina index of proportionality, aka cosine proportionality score• "Gallagher" (Percent) Gallagher index of disproportionality• "Inv.Gallagher" The inverse of Gallagher index• "Grofman" Grofman index of proportionality• "Inv.Grofman" Grofman index of proportionality• "Lijphart" Lijphart index of proportionality• "Inv.Rae" The inverse of Rae index• "Rose" Rose index of disproportionality• "Inv.Rose" The inverse of Rose index• "Sainte-Lague" Sainte-Lague index of disproportionality• "DHondt" D’Hondt index of proportionality• "Gini" Gini index of disproportionality• "Monroe" Monroe index of inequity
Value
Each iteration returns a single score.
Author(s)
Daniel Marcelino <[email protected]>
References
Duncan, O. and Duncan, B. (1955) A methodological analysis of segregation indexes. AmericanSociological Review 20:210-7.
Gallagher, M. (1991) Proportionality, disproportionality and electoral systems. Electoral Studies10(1):33-51.
Loosemore, J. and Hanby, V. (1971) The theoretical limits of maximum distortion: Som analyticalexpressions for electoral systems. British Journal of Political Science 1:467-77.
Koppel, M., and A. Diskin. (2009) Measuring disproportionality, volatility and malapportionment:axiomatization and solutions. Social Choice and Welfare 33, no. 2: 281-286.
Rae, D. (1967) The Political Consequences of Electoral Laws. London: Yale University Press.
Rose, Richard, Neil Munro and Tom Mackie (1998) Elections in Central and Eastern Europe Since1990. Glasgow: Centre for the Study of Public Policy, University of Strathclyde.
Taagepera, R., and B. Grofman. Mapping the indices of seats-votes disproportionality and inter-election volatility. Party Politics 9, no. 6 (2003): 659-77.
68 pub_pal
See Also
PoliticalDiversity, LargestRemainders, HighestAverages. For more details, see the Indicesvignette: vignette("Indices", package = "SciencesPo").
Examples
#' # 2012 Queensland state elecionpvotes= c(49.65, 26.66, 11.5, 7.53, 3.16, 1.47)pseats = c(87.64, 7.87, 2.25, 0.00, 2.25, 0.00)
Proportionality(pvotes, pseats) # default is Gallagher
Proportionality(pvotes, pseats, index="Rae")
# Proportionality(pvotes, pseats, index="Cox-Shugart")
# 2012 Quebec provincial election:pvotes = c(PQ=31.95, Lib=31.20, CAQ=27.05, QS=6.03, Option=1.89, Other=1.88)pseats = c(PQ=54, Lib=50, CAQ=19, QS=2, Option=0, Other=0)
Proportionality(pvotes, pseats, index="Rae")
pub_pal Color Palettes for Publication (discrete)
Description
Color palettes for publication-quality graphs. See details.
Usage
pub_pal(palette = "pub12")
Arguments
palette the palette name, a character string.
Details
The following palettes are available:
• "scipo"A 12-color colorblind discrete palette.
• "gray9"5-tons of gray.
• "carnival"A 5-color palette inspired in the Brazilian samba schools.
• "tableau20"Based on software Tableau
• "tableau10"Based on software Tableau
pub_shapes 69
• "tableau10light"Based on software Tableau
• "tableau10medium"Based on software Tableau
• "fte"fivethirtyeight.com color scales
Examples
library(scales)library(ggplot2)
show_col(pub_pal("scipo")(12))show_col(pub_pal("gray9")(9), labels = FALSE)show_col(pub_pal("carnival")(4))show_col(pub_pal("gdocs")(18))show_col(pub_pal("manyeyes")(20))show_col(pub_pal("tableau10")(10))show_col(pub_pal("tableau10medium")(10))show_col(pub_pal("tableau10light")(10))show_col(pub_pal("cyclic")(20))show_col(pub_pal("bivariate1")(9))
pub_shapes Shape scales for theme_pub (discrete)
Description
Discrete shape scales for theme_pub().
Usage
pub_shapes(palette = "pub")
Arguments
palette the palette name, a character string.
See Also
Other shape pub: scale_shape_pub
Examples
library(scales)
pub_shapes("default")(6)pub_shapes("proportions")(4)pub_shapes("gender")(2)
70 rdirichlet
rdirichlet Dirichlet Distribution
Description
Density function and random number generation for the Dirichlet distribution
Usage
rdirichlet(n, alpha)
Arguments
n number of random observations to draw.
alpha the Dirichlet distribution’s parameters. Can be a vector (one set of parametersfor all observations) or a matrix (a different set of parameters for each observa-tion), see “Details”.If alpha is a matrix, a complete set of α-parameters must be supplied for eachobservation. log returns the logarithm of the densities (therefore the log-likelihood)and sum.up returns the product or sum and thereby the likelihood or log-likelihood.
Value
The rdirichlet returns a matrix with n rows, each containing a single random number accordingto the supplied alpha vector or matrix.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
# 1) General usage:rdirichlet(20, c(1,1,1) );alphas <- cbind(1:10, 5, 10:1);alphas;rdirichlet(10, alphas );alpha.0 <- sum( alphas );test <- rdirichlet(10, alphas );apply( test, 2, mean );alphas / alpha.0;apply( test, 2, var );alphas * ( alpha.0 - alphas ) / ( alpha.0^2 * ( alpha.0 + 1 ) );
# 2) A practical example of usage:# A Brazilian face-to-face poll by Datafolha conducted on Oct 03-04# with 18,116 interviews asking for their vote preferences among the# presidential candidates.
read.zTree 71
## First, draw a sample from the posteriorset.seed(1234);n <- 18116;poll <- c(40,24,22,5,5,4) / 100 * n; # The datamcmc <- 100000;sim <- rdirichlet(mcmc, alpha = poll + 1);
## Second, look at the margins of Aecio over Marina in the very last moment of the campaign:margin <- sim[,2] - sim[,3];mn <- mean(margin); # Bayes estimatemn;s <- sd(margin); # posterior standard deviation
qnts <- quantile(margin, probs = c(0.025, 0.975)); # 90% credible intervalqnts;pr <- mean(margin > 0); # posterior probability of a positive marginpr;
## Third, plot the posterior densityhist(margin, prob = TRUE, # posterior distribution
breaks = "FD", xlab = expression(p[2] - p[3]),main = expression(paste(bold("Posterior distribution of "), p[2] - p[3])));
abline(v=mn, col='red', lwd=3, lty=3);
read.zTree Reads zTree output files
Description
Extracts variables from a zTree output file.
Usage
read.zTree(object, tables = c("globals", "subjects"))
Arguments
object a zTree file or a list of files.
tables the tables of intrest.
Value
A list of dataframes, one for each table
72 Recode
Examples
## Not run: url <-zTables <- read.zTree( "131126_0009.xls" , "contracts" )zTables <- read.zTree( c("131126_0009.xls","131126_0010.xls"), c("globals","subjects", "contracts" ))
## End(Not run)
Recode Recode or Replace Values With New Values
Description
Recodes a value or a vector of values.
Usage
Recode(x, old, into, warn = TRUE)
## Default S3 method:Recode(x, old, into, warn = TRUE)
Arguments
x The vector whose values will be recoded.
old the old value or a vector of old values (numeric/factor/string).
into the new value or a vector of replacement values (1 to 1 recoding).
warn A logical to print a message if any of the old values are not actually present in x.
Examples
x <- LETTERS[1:5]Recode(x, c("B", "D"), c("Beta", "Delta"))
# On numeric vectorsx <- c(1, 4, 5, 9)Recode(x, old = c(1, 4, 5, 9), into = c(10, 40, 50, 90))
religiosity 73
religiosity Data on religiosity of countries
Description
Data from the 2007 Spring Survey conducted through the Pew Global Attitudes Project.
Usage
data(religiosity)
Format
A data.frame object with 9 variables and 44 observations.
• Country Name of country.• Religiosity A measure of degree of religiosity for residents of the country.• GDP Per capita Gross Domestic Product in the country.• Africa Indicator for countries in Africa.• EastEurope Indicator for countries in Eastern Europe.• MiddleEast Indicator for countries in the Middle East.• Asia Indicator for countries in Asia.• WestEurope Indicator for countries in Western Europe.• Americas Indicator for countries in North/South America.
@source The Pew Research Center’s Global Attitudes Project surveyed people around the worldand asked (among many other questions) whether they agreed that "belief in God is necessary formorality," whether religion is very important in their lives, and whether the at least once per day.The variable Religiosity is the sum of the percentage of positive responses on these three items,measured in each of 44 countries. The dataset also includes the per capita GDP for each country andindicator variables that record the part of the world the country is in. http://www.pewglobal.org
Render Render layer on top of a ggplot
Description
Set up a rendering layer on top of a ggplot.
Usage
Render(plot = NULL)
Arguments
plot the plot to use as a starting point, either a ggplot2 or gtable.
74 Rosenbluth
Rosenbluth Rosenbluth Index of Concentration
Description
Calculates the Rosenbluth index of concentration, also known as Hall or Tiedemann Indices.
Usage
Rosenbluth(x, n = rep(1, length(x)), na.rm = FALSE, ...)
## Default S3 method:Rosenbluth(x, n = rep(1, length(x)), na.rm = FALSE, ...)
Arguments
x A vector of data values of non-negative elements.
n A vector of frequencies of the same length as x.
na.rm A logical. Should missing values be removed? The Default is set to na.rm=FALSE.
... Additional arguements (currently ignored)
References
Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Hand-book of Income Distribution. Amsterdam.
Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.
See Also
Atkinson, Herfindahl, Gini, Lorenz. For more details see the Indices vignette: vignette("Indices", package = "SciencesPo")
Examples
# generate a vector (of incomes)x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)
# compute Rosenbluth coefficientRosenbluth(x)
RunShinyApp 75
RunShinyApp Examples from the SciencesPo Package
Description
Launch a Shiny app that shows a demo.
Usage
RunShinyApp(app)
Arguments
app the name of the shiny application.
Examples
# A demo of what \code{\link[SciencesPo]{PoliticalDiversity}} does.if (interactive()) {RunShinyApp("PoliticalDiversity")}
# Other examplesif (interactive()) {RunShinyApp("ConjugateNormal")
}
if (interactive()) {RunShinyApp("Regression")
}
scale_color_party Political Parties Color Palette (Discrete) and Scales
Description
An N-color discrete palette for political parties.
Usage
scale_color_party(palette = "BRA", ...)
Arguments
palette the palette name, a character string.
... Other arguments passed on to discrete_scale to control name, limits, breaks,labels and so forth.
76 scale_fill_party
See Also
party_pal for details and references.
Other color party: party_pal
scale_color_pub Publication color scales.
Description
See pub_pal for details.
Usage
scale_color_pub(palette = "pub12", ...)
Arguments
palette the palette name, a character string.... Other arguments passed on to discrete_scale to control name, limits, breaks,
labels and so forth.
See Also
pub_pal for references.
Other color publication: scale_fill_pub
scale_fill_party Political Parties Color Palette (Discrete) and Scales
Description
An N-color discrete palette for political parties.
Usage
scale_fill_party(palette = "BRA", ...)
Arguments
palette the palette name, a character string.... Other arguments passed on to discrete_scale to control name, limits, breaks,
labels and so forth.
See Also
party_pal for details and references.
scale_fill_pub 77
scale_fill_pub Publication color scales.
Description
See pub_pal for details.
Usage
scale_fill_pub(palette = "pub12", ...)
Arguments
palette the palette name, a character string.
... Other arguments passed on to discrete_scale to control name, limits, breaks,labels and so forth.
See Also
Other color publication: scale_color_pub
scale_shape_pub Shape scales for theme_pub (discrete)
Description
See pub_shapes for details.
Usage
scale_shape_pub(palette = "pub", ...)
Arguments
palette the palette name, a character string.
... common discrete scale parameters: name, breaks, labels, na.value, limitsand guide. See discrete_scale for more details
See Also
Other shape pub: pub_shapes
78 Scatterplot
Examples
library("ggplot2")library("scales")
p <- ggplot(mtcars) +geom_point(aes(x = wt, y = mpg, shape = factor(gear))) +facet_wrap(~am)
p + scale_shape_pub()
Scatterplot Quartile-Frame Scatterplot
Description
Plots a scatterplot with a custom axis that acts as a quartile plot (quartile-frame). Inspired by TheVisual Display of Quantitative Information by Edward R. Tufte.
Usage
Scatterplot(x, y, ...)
Arguments
x, y aesthetics passed into each layer variables # @param data data frame to use(optional).
... additional arguments.
Examples
with(mtcars, Scatterplot(x = wt, y = mpg,main = "Vehicle Weight-Gas Mileage Relationship",xlab = "Vehicle Weight",ylab = "Miles per Gallon",font.family = "serif") )
SciencesPo 79
SciencesPo A Tool Set for Analyzing Political Behavior Data
Description
Provide functions for analyzing elections and political behavior data, including measures of politicalfragmentation, seat apportionment, and small data visualization graphs.
Author(s)
NA
References
Marcelino, Daniel (2013). SciencesPo: A Tool Set for Analyzing Political Behaviour Data. Avail-able at SSRN: http://dx.doi.org/10.2139/ssrn.2320547
SciencesPo-deprecated Deprecated functions in package ‘SciencesPo’
Description
These functions are provided for compatibility with older versions of ‘SciencesPo’ only, and willbe defunct at the next release.
Usage
svTransform(y)
politicalDiversity(x, index = "laakso/taagepera", margin = 1,base = exp(1))
untable(x, ...)
detail(x, ...)
dummy(x, ...)
voronoi(x, ...)
normalize(x, ...)
outliers(x, ...)
destring(x, ...)
winsorize(x, ...)
80 SciencesPoFont
Arguments
y the dependent variable.x A data.frame, a matrix-like, or a vector containing values for the number of
votes or seats each party received.index The type of index desired, one of "laakso/taagepera", "golosov", "herfindahl",
"gini", "shannon", "simpson", "invsimpson".margin The margin for which the index is computed.base The logarithm base used in some indices, such as the "shannon" index.... Extra parameters.
Details
The following functions are deprecated and will be made defunct; use the replacement indicatedbelow:
• svTransform: Normalize• untable: Untable• politicalDiversity: PoliticalDiversity• winsorize: Winsorize• recode: Recode• dummy: Dummify• gini: Gini• clear: Clear• detail: Destring• destring: Destring• outliers: Outlier• voronoi: Voronoi• jackknife: Jackknife• bootstrap: Bootstrap• normalize: Normalize
SciencesPoFont SciencesPo Base Fonts
Description
Used to ascertain required theme fonts.
Usage
SciencesPoFont()
Author(s)
NA
sheston91 81
sheston91 The Penn World Table
Description
The Penn World Table used in Summers and Heston (1991). This dataset contains the followingcolumns:
• year Year
• pop Population (thousands)
• rgdppc Real per capita GDP
• savrat a numeric vector
• country Country
• com Communist regime
• opec OPEC country
• name Country name
Usage
data(sheston91)
Format
A data.frame object with 8 variables and 3250 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/
References
Summers, R. and Heston, A. (1991) The Penn World Table (Mark 5): an expanded set of interna-tional comparisons, 1950–1988. The Quarterly Journal of Economics, 106(2), 327–368.
82 Skewness
Skewness Compute the Skewness
Description
Provides three methods for performing skewness test.
Usage
Skewness(x, na.rm = TRUE, type = 3)
Arguments
x a numeric vector containing the values whose skewness is to be computed.
na.rm a logical value for na.rm, default is na.rm=TRUE.
type an integer between 1 and 3 for selecting the algorithms for computing the skew-ness, see details below.
Details
Skewness is a measure of symmetry distribution. Negative skewness (g_1 < 0) indicates that themean of the data distribution is less than the median, and the data distribution is left-skewed. Posi-tive skewness (g_1 > 0) indicates that the mean of the data values is larger than the median, and thedata distribution is right-skewed. Values of g_1 near zero indicate a symmetric distribution.
Value
An object of the same type as x
Note
There are several methods to compute skewness, Joanes and Gill (1998) discuss three of the mosttraditional methods. According to them, type 3 performs better in non-normal population distri-bution, whereas in normal-like population distribution type 2 fits better the data. Such differencebetween the two formulae tend to disappear in large samples. Type 1: g_1 = m_3/m_2^(3/2).
Type 2: G_1 = g_1*sqrt(n(n-1))/(n-2).
Type 3: b_1 = m_3/s^3 = g_1 ((n-1)/n)^(3/2).
Author(s)
Daniel Marcelino, <[email protected]>
References
Joanes, D. N. and C. A. Gill. (1998) Comparing measures of sample skewness and kurtosis. TheStatistician, 47, 183–189.
stature 83
Examples
w <-sample(4,10, TRUE)x <- sample(10, 1000, replace=TRUE, prob=w)Skewness(x, type = 1)Skewness(x, type = 2)Skewness(x)
stature The Measure of US Presidents
Description
The US presidents and their main opponents’ heights (cm). This dataset contains the followingcolumns:
• election The election year.
• winner The winner candidate.
• winner.height The winner candidate’s height in centimeters (cm).
• winner.vote The popular vote support for the winner.
• winner.party The winner’s party.
• opponent The main opponent candidate.
• opponent.height The opponent candidate’s height in centimeters (cm).
• opponent.vote Popular vote support for the main opponent candidate.
• opponent.party The opponent’s party.
• turnout The electorate turnout in percentages.
• winner.bmi The winner Body Mass Index (BMI) estimate (BMI = weight in kg/(height in meter)**2)
@references Murray, G. R. (2014) Evolutionary preferences for physical formidability in leaders.Politics and the Life Science, 33(1), 33-53.
Usage
data(stature)# t.test(stature$winner.height, stature$opponent.height, var.equal=TRUE)
Format
A data.frame object with 11 variables and 57 observations.
Source
US Presidents: http://www.lingerandlook.com/Names/Presidents.php Inside Gov. http://www.us-presidents.insidegov.com.
84 Stratify
Stratify Stratified Sampling
Description
Sample row values of a data frame conditional to some strata attributes.
Usage
Stratify(.data, group, size, select = NULL, replace = FALSE,both.sets = FALSE)
Arguments
.data the data frame.
group the grouping factor, may be a list.
size the sample size.
select if sampling from a specific group or list of groups.
replace should sampling be with replacement?
both.sets if TRUE, both ‘sample‘ and ‘.data‘ are returned.
Examples
data(pollster2008)
# Let's take a 10% sample from all -PollTaker- groups in pollster2008Stratify(pollster2008, "PollTaker", 0.1)
# Let's take a 10% sample from only 'LV' and 'RV' groups from -Pop- in pollster2008Stratify(pollster2008, "Pop", 0.1, select = list(Pop = c("LV", "RV")))
# Let's take 3 samples from all -PollTaker- groups in pollster2008,# specified by column 1
Stratify(pollster2008, group = 1, size = 3)
# Let's take a sample from all -Pop- groups in pollster2008, where we# specify the number wanted from each groupStratify(pollster2008, "Pop", size = c(3, 5, 4))
# Use a two-column strata (-Pop- and -PollTaker-) but only interested in# cases where -Pop- == 'LV'Stratify(pollster2008, c("Pop", "PollTaker"), 0.15, select = list(Pop = "LV"))
summary.Crosstable 85
summary.Crosstable Print Crosstable style
Description
Print Crosstable style
Usage
## S3 method for class 'Crosstable'summary(object, digits = 2, chisq = FALSE,fisher = FALSE, mcnemar = FALSE, alt = "two.sided", latex = FALSE,...)
Arguments
object The table object.
digits number of digits after the decimal point.
chisq if TRUE, the results of a chi-square test will be shown below the table.
fisher if TRUE, the results of a Fisher Exact test will be shown below the table.
mcnemar if TRUE, the results of a McNemar test will be shown below the table.
alt the alternative hypothesis and must be one of "two.sided", "greater" or "less".Only used in the 2 by 2 case.
latex if the output is to be printed in latex format.
... extra parameters.
See Also
Other Crosstables: Crosstable
swatson93 Stock’s and Watson’s (1993) Data.
Description
Data set used by Stock and Watson (1993) to estimate co-integration. This dataset contains thefollowing columns:
• lnm1 Log M1.
• lnp Log NNP price deflator.
• lnnnp Log NNP.
• cprate A numeric vector.
• year Commercial paper rate.
86 Ternaryplot
Usage
data(swatson93)
Format
A data.frame object with 5 variables and 90 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/
References
Stock, J. H., and Watson, M. W. (1993) A simple estimator of cointegrating vectors in higher orderintegrated systems. Econometrica: Journal of the Econometric Society, 783–820.
Tableplot Tableplot: a Visualization of Datasets
Description
Visualization of Large Datasets.
Usage
Tableplot(.data = NULL)
Arguments
.data the data frame.
Ternaryplot Ternary Plot Build ternary plots.
Description
Ternary Plot Build ternary plots.
Usage
Ternaryplot(data, ...)
theme_blank 87
Arguments
data Default dataset to use for plot. If not already a data.frame, will be converted toone by fortify. If not specified, must be suppled in each layer added to theplot.
... Other arguments passed on to methods. Not currently used.
theme_blank Create a Completely Empty Theme
Description
The theme created by this function shows nothing but the plot panel.
Usage
theme_blank(base_size = 12, base_family = "serif", legend = "none")
Arguments
base_size overall font size. Default is 12.
base_family a name for default font family.
legend the legend position.
Value
The theme.
Author(s)
NA
Examples
# plot with small amount of remaining paddingqplot(1:10, (1:10)^2) + theme_blank()
# remaining padding removedqplot(1:10, (1:10)^2) + theme_blank() + labs(x = NULL, y = NULL)
# Check that it is a complete themeattr(theme_blank(), "complete")
88 theme_fte
theme_darkside The Dark Side Theme
Description
The dark side of the things.
Usage
theme_darkside(base_size = 12, base_family = "serif", legend = "none")
Arguments
base_size overall font size. Default is 12.
base_family the name for default font family.
legend the position of the legend if any.
Value
The theme. # plot with small amount of padding qplot(1:10, (1:10)^2, color="green") + theme_darkside()
# Check that it is a complete theme attr(theme_darkside(), "complete")
Author(s)
NA
theme_fte Themes for ggplot2 Graphs
Description
Theme for plotting with ggplot2.
Usage
theme_fte(legend = "none", legend_title = FALSE, base_size = 12,horizontal = TRUE, base_family = "", colors = c("#F0F0F0", "#D9D9D9","#60636A", "#525252"))
theme_pub 89
Arguments
legend enables to set legend position, default is "none".
legend_title will the legend have a title? Defaults is FALSE.
base_size overall font size. Default is 13.
horizontal logical. Horizontal axis lines?
base_family a nmae for default font family.
colors default colors used in the plot in the following order: background, lines, text,and title.
Value
The theme.
Author(s)
NA
Examples
qplot(1:10, (1:10)^3) + theme_fte()
mycolors = c("wheat", "#C2AF8D", "#8F6D2F", "darkred")qplot(1:10, (1:10)^3) +theme_fte(colors=mycolors)
# Check that it is a complete themeattr(theme_fte(), "complete")
theme_pub The Default Theme
Description
After loading the SciencesPo package, this theme will be set to default for all subsequent graphsmade with ggplot2.
Usage
theme_pub(legend = "bottom", base_size = 12, base_family = "",horizontal = FALSE, axis_line = FALSE)
90 theme_scipo
Arguments
legend enables to set legend position, default is "bottom".
base_size overall font size. Default is 14.
base_family a name for default font family.
horizontal logical. Horizontal axis lines?
axis_line enables to set x and y axes.
Value
The theme.
Author(s)
NA
See Also
theme, theme_538, theme_blank.
Examples
Previewplot() + theme_pub()
# Anscombe datadat <- data.frame()for(i in 1:4)dat <- rbind(dat, data.frame(set=i, x=anscombe[,i], y=anscombe[,i+4]))
gg <- ggplot(dat, aes(x, y))gg <- gg + geom_point(size=5, color="red", fill="orange", shape=21)gg <- gg + geom_smooth(method="lm", fill=NA, fullrange=TRUE)gg <- gg + facet_wrap(~set, ncol=2)gg <- gg + theme_pub(base_family=SciencesPoFont())gg <- gg + theme(plot.background=element_rect(fill="#f7f7f7"))gg <- gg + theme(panel.background=element_rect(fill="#f7f7f7"))
theme_scipo The SciPo Theme
Description
The scpo theme for using with ggplot2 objects.
theme_scipo 91
Usage
theme_scipo(base_family = "", base_size = 11,strip_text_family = base_family, strip_text_size = 12,title_family = "", title_size = 18, title_margin = 10,margins = margin(10, 10, 10, 10), subtitle_family = "",subtitle_size = 12, subtitle_margin = 15, caption_family = "",caption_size = 9, caption_margin = 10,axis_title_family = subtitle_family, axis_title_size = 9,axis_title_just = "rt", grid = TRUE, axis = FALSE, ticks = FALSE)
Arguments
base_family base font family
base_size base font sizestrip_text_family
facet label font familystrip_text_size
facet label text size
title_family plot tilte family
title_size plot title font size
title_margin plot title margin
margins plot marginssubtitle_family
plot subtitle family
subtitle_size plot subtitle sizesubtitle_margin
plot subtitle margin
caption_family plot caption family
caption_size plot caption size
caption_margin plot caption marginaxis_title_family
axis title font familyaxis_title_size
axis title font sizeaxis_title_just
axis title font justification blmcrt
grid panel grid (TRUE, FALSE, or a combination of X, x, Y, y)
axis axis TRUE, FALSE, [xy]
ticks ticks
Value
The theme.
92 titanic
Examples
# plot with small amount of padding# qplot(1:10, (1:10)^2, color="green") + theme_scipo()
# Check that it is a complete theme# attr(theme_scipo(), "complete")
titanic Titanic
Description
Population at Risk and Death Rates for an Unusual Episode. For each person on board the fatalmaiden voyage of the ocean liner Titanic, this dataset records sex, age [adult/child], economic status[first/second/third class, or crew] and whether or not that person survived. This dataset contains thefollowing columns:
• CLASS Class (0 = crew, 1 = first, 2 = second, 3 = third)• AGE Age (1 = adult, 0 = child)• SEX Sex (1 = male, 0 = female)• SURVIVED Survived (1 = yes, 0 = no)
Usage
data(titanic)
Format
A data.frame object with 4 variables and 2201 observations.
Note
There is not complete agreement among primary sources as to the exact numbers on board, rescued,or lost. STORY BEHIND THE DATA: The sinking of the Titanic is a famous event, and newbooks are still being published about it. Many well-known facts–from the proportions of first-classpassengers to the "women and children first" policy, and the fact that that policy was not entirelysuccessful in saving the women and children in the third class–are reflected in the survival rates forvarious classes of passenger. These data were originally collected by the British Board of Trade intheir investigation of the sinking.
Source
British Board of Trade Inquiry Report (1990). Report on the Loss of the ‘Titanic’ (S.S.)", Gloucester,UK: Allan Sutton Publishing.
References
Dawson (1995). "The ‘Unusual Episode’ Data Revisited" in the Journal of Statistics Education.
Today 93
Today Print the current date in a pretty format
Description
Print the current date in a pretty format.
Usage
Today()
Value
string
Examples
Today()
turnout Turnout Data
Description
Data on voter turnout in the 50 states and D.C. for the 1992 Presidential election and 1990 Con-gressional elections. Per capita income, populations in poverty and populations with no high schooldegree are also given. This dataset contains the following columns:
• v1 state name (alphabetic, 20 characters)
• v2 region of country: 1 Northeast 2 Midwest 3 South 4 West
• v3 division within region: 1 Northeast-New England 2 Northeast-Middle Atlantic 3 Midwest-East North Central 4 Midwest-West North Central 5 South-South Atlantic 6 South-East SouthCentral 7 South-West South Central 8 West-Mountain 9 West-Pacific.
• v4 "Elazar’s state political culture assignments: 1 = moralistic 2 = individualistic 3 = tradi-tionalistic
• v5 percent of population below poverty level, 1992.
• v6 per capita personal income, 1993.
• v7 percent casting votes for presidential electors, 1992.
• v8 percent casting votes for U.S. Representatives, 1990.
• v9 population without a high school degree, of those 25 years or older, 1990.
• v10 population 25 years or older, 1990.
• v11 South = 1; all others = 0.
94 twins
Usage
data(turnout)
Format
A data.frame object with 11 variables and 50 observations.
Source
U.S. Bureau of the Census, Statistical Abstract of the United States, 1994.
twins Burt’s Twin Data
Description
The given data are IQ scores from identical twins; one raised in a foster home, and the other raisedby birth parents. This dataset contains the following columns:
• C Social class, C1=high, C2=medium, C3=low, a factor.
• IQb biological.
• IQf foster.
@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
Usage
data(twins)
Format
A data.frame object with 3 variables and 27 observations.
Source
Burt, C. (1966). The genetic estimation of differences in intelligence: A study of monozygotictwins reared together and apart. Br. J. Psych., 57, 147-153.
TwoWay 95
TwoWay Cross Tabulation
Description
TwoWay is a modified version of CrossTable (gmodels). TwoWay will print a summary table withcell counts and column proportions (similar to STATA’s tabulate.
Usage
TwoWay(x, y, digits = 3)
Arguments
x, y the variables for the cross tabulation.
digits number of digits for rounding proportions.
unions Poll attitudes towards British trade unions
Description
The British polling company Ipsos MORI conducted several opinion polls in the UK between 1975and 1995 in which they asked whether people agree or disagree with the statement "Trade unionshave too much power in Britain today".
Usage
data(unions)
Format
A data.frame object with 7 variables and 17 observations.
• Date Month of the poll Aug-77 to Sep-79.
• AgreePct Percent who agree (unions have too much power).
• DisagreePct Percent who disagree.
• NetSupport DisagreePct-AgreePct.
• Months Months since August 1975.
• Late 1=after 1986 or 0=before 1986.
• Unemployment Unemployment rate.
@source http://www.ipsos-mori.com/researchpublications/researcharchive/poll.aspx?oItemID=94
96 Untable
Untable Untable an Aggregated Data Frame
Description
A method for recovering a data.frame out of summarized data, i.e. contingency table or aggregateddata.
Usage
Untable(x, ...)
## S3 method for class 'data.frame'Untable(x, freq = "Freq", row.names = NULL, ...)
## Default S3 method:Untable(x, dimnames = NULL, type = NULL,row.names = NULL, col.names = NULL, ...)
Arguments
x the table object as a data.frame, table, or, matrix.
freq the column name of count values.
row.names row names to add to data.frame if any.
dimnames set the dimnames of object if required.
type the type of variable. If NULL, ordered factor is returned.
col.names column names to add to the data.frame.
... Extra parameters ignored.
See Also
expand.grid, gl.
Examples
gss <- data.frame(expand.grid(sex=c("female", "male"),party=c("dem", "indep", "rep")),count=c(279,165,73,47,225,191))
print(gss) # aggregated data.frame
# Then expand it:GSS <- Untable(gss, freq="count")head(GSS)
Voronoi 97
# Expand from a table or xtable object:# Fisher's Tea-Tasting Experiment datatea <- table(poured=c("Yes", "Yes", "Yes", "No","Yes","No", "No", "No"),
guess=c("Yes", "Yes", "Yes", "Yes", "No", "No","No","No"))
Untable(tea)
# Expand with a vector of weightsUntable(c(3,3,3), dimnames=list(c("Brazil","Colombia","Argentina")))
Voronoi Voronoi tessellation diagram
Description
Computes voronoi tessellation diagrams, and Dirichlet tessellation (after Peter Gustav LejeuneDirichlet).
Usage
Voronoi(n = 100, d, dim = 1000, method = NULL, seed = 51, plot = TRUE)
Arguments
n an integer for a finite set of points.
d an integer for a finite set of dimensions.
dim the image dimension.
method the distance computation method. One of euclidean, manhattan, maximum, canberra(currently not implemented).
seed an integer for random seed.
plot logical. If TRUE, a plot is returned, else, a data.frame is returned.
Details
https://en.wikipedia.org/wiki/Voronoi_diagram
Author(s)
Daniel Marcelino, <[email protected]>
Examples
## Not run: Voronoi(n=20, d=5, dim=1000)
98 Waffleplot
vote Voting correlations
Description
A dataset with voting correlations of traits, where each trait is measured for 17 developed countries(Europe, US, Japan, Australia, New Zealand).
Usage
data(vote)
Format
A 12x12 matrix object with 12 variables and 12 observations.
Source
Torben Iversen and David Soskice (2006). Electoral institutions and the politics of coalitions: Whysome democracies redistribute more than others. American Political Science Review, 100, 165-81.Table A2.
References
Using Graphs Instead of Tables. http://www.tables2graphs.com/doku.php?id=03_descriptive_statistics
Waffleplot Make waffle (square pie) charts
Description
Given a named vector, this function will return a ggplot object that represents a waffle chart of thevalues. The individual values will be summed up and each that will be the total number of squaresin the grid.
Usage
Waffleplot(parts, rows = 10, xlab = NULL, title = NULL, colors = NA,size = 2, legend_pos = "right", flip = FALSE, reverse = FALSE,equal = TRUE, pad = 0, use_glyph = FALSE, glyph_size = 12)
WeightedCorrelation 99
Arguments
parts named vector of values to use for the chart
rows number of rows of blocks
xlab text for below the chart. Highly suggested this be used to give the "1 sq == xyz"relationship if it’s not obvious
title chart title
colors exactly the number of colors as values in parts. If omitted, Color Brewer "Set2"colors are used.
size width of the separator between blocks (defaults to 2)
legend_pos position of legend
flip flips x & y axes
reverse reverses the order of the data
equal by default, waffle uses coord_equal; this can cause layout problems, so you anuse this to disable it if you are using ggsave or knitr to control output sizes (ormanually sizing the chart)
pad how many blocks to right-pad the grid with
use_glyph use specified FontAwesome glyph
glyph_size size of the FontAwesome font
Note
If the vector is not named or only partially named, capital letters will be used instead. If youspecify a string (vs FALSE) to use_glyph the function will map the input to a FontAwesome glyphname and use that glyph for the tile instead of a block (making it more like an isotype pictogramthan a waffle chart). You’ll need to actually install FontAwesome and use the extrafont package(https://github.com/wch/extrafont) to be able to use the FontAwesome glyphs.
Examples
tiles <- c(One=80, Two=30, Three=20, Four=10)Waffleplot(tiles, rows=8)
Senate <- c(`Male (44%)`=44, `Female (56%)`=56)Waffleplot(Senate, rows=10, size=0.5, colors=c("#af9139", "#544616"))
WeightedCorrelation Compute Weighted Correlations
Description
Compute the weighted correlation.
100 WeightedMean
Usage
WeightedCorrelation(x, y = NULL, weights = NULL)
Arguments
x a matrix or vector to correlate with y.
y a matrix or vector to correlate with x. If y is NULL, x will be used instead.
weights an optional vector of weights to be used to determining the weighted mean andvariance for calculation of the correlations.
Examples
x <- sample(10,10)y <- sample(10,10)w <- sample(5,10, replace=TRUE)
WeightedCorrelation(x, y, w)
WeightedMean Computes Weighted Mean
Description
Compute the weighted mean of data.
Usage
WeightedMean(x, weights = NULL, normwt = "ignored", na.rm = TRUE)
Arguments
x a numeric vector.
weights a numeric vector of weights of x.
normwt ignored at the moment.
na.rm a logical, if TRUE, missing data will be dropped. If na.rm = FALSE, missingdata will return an error.
Examples
x <- sample(10,10)w <- sample(5,10, replace=TRUE)
WeightedMean(x, w)
WeightedStdz 101
WeightedStdz Computes Weighted Standardized Values
Description
Computes weighted standardized values of data.
Usage
WeightedStdz(x, weights = NULL)
Arguments
x a vector for which a set of standardized values is desired.weights a vector of weights to be used to determining weighted values of x.
Examples
x <- sample(10,10)w <- sample(5,10, replace=TRUE)
WeightedStdz(x, w)
WeightedVariance Weighted Variance
Description
Compute the weighted variance.
Usage
WeightedVariance(x, weights = NULL, na.rm = FALSE)
Arguments
x a numeric vector.weights a numeric vector of weights for x.na.rm a logical if NA should be disregarded.
Examples
wt=c(1.23, 2.12, 1.23, 0.32, 1.53, 0.59, 0.94, 0.94, 0.84, 0.73)x = c(5, 5, 4, 4, 3, 4, 3, 2, 2, 1)
WeightedVariance(x, wt)
102 Winsorize
Winsorize Winsorized Mean
Description
Compute the winsorized mean, which consists of recoding the top k values of a vector.
Usage
Winsorize(x, k = 1, na.rm = TRUE)
Arguments
x The vector to be winsorized
k An integer for the quantity of outlier elements that to be replaced in the calcula-tion process
na.rm a logical value for na.rm, default is na.rm=TRUE.
Details
Winsorizing a vector will produce different results than trimming it. While by trimming a vectorcauses extreme values to be discarded, by winsorizing it in the other hand, causes extreme values tobe replaced by certain percentiles.
Value
An object of the same type as x
References
Dixon, W. J., and Yuen, K. K. (1999) Trimming and winsorization: A review. The AmericanStatistician, 53(3), 267–269.
Dixon, W. J., and Yuen, K. K. (1960) Simplified Estimation from Censored Normal Samples, TheAnnals of Mathematical Statistics, 31, 385–391. @references Wilcox, R. R. (2012) Introduction torobust estimation and hypothesis testing. Academic Press, 30-32. Statistics Canada (2010) SurveyMethods and Practices.
@note One may want to winsorize estimators, however, winsorization tends to be used for one-variable situations.
@author Daniel Marcelino, <[email protected]>
@examples set.seed(51) # for reproducibility x <- rnorm(50) ## introduce outlier x[1] <- x[1] * 10
# Compare to mean: mean(x) Winsorize(x)
words 103
words Word frequencies from Mosteller and Wallace
Description
The data give the frequencies of words in works from four different sources: the political writingsof eighteenth century American political figures Alexander Hamilton, James Madison, and JohnJay, and the book Ulysses by twentieth century Irish writer James Joyce. This dataset contains thefollowing columns:
• Hamilton Hamilton frequency.
• HamiltonRank Hamilton rank.
• Madison Madison frequency.
• MadisonRank Madison rank.
• Jay Jay frequency.
• JayRank Jay rank.
• Ulysses Word frequency in Ulysses.
• UlyssesRank Word rank in Ulysses.
@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
Usage
data(words)
Format
A data.frame object with 8 variables and 165 observations.
Source
Mosteller, F. and Wallace, D. (1964). Inference and Disputed Authorship: The Federalist. Reading,MA: Addison-Wesley.
104 %nin%
%nin% Not in (Find Matching or Non-Matching Elements)
Description
" logical vector indicating if there is a match or not for its left operand. A true vector elementindicates no match in left operand, false indicates a match.
Usage
x %nin% table
Arguments
x A vector of numeric, character, or factor values.
table A vector (numeric, character, factor), matching the mode of x
Examples
c('a','b','c') %nin% c('a','b')
Index
∗Topic Clean-upMakeNumeric, 51MakeShare, 52
∗Topic Concentration,GiniSimpson, 38Lorenz, 50
∗Topic ConcentrationGini, 37
∗Topic Distributionsddirichlet, 22rdirichlet, 70
∗Topic Diversity,Gini, 37GiniSimpson, 38Lorenz, 50
∗Topic ElectoraldHondt, 24HighestAverages, 42LargestRemainders, 48PoliticalDiversity, 63
∗Topic Exploratory,PoliticalDiversity, 63
∗Topic ExploratoryCI, 17Crosstable, 20Describe, 23Outlier, 57WeightedVariance, 101Winsorize, 102
∗Topic InequalityGiniSimpson, 38
∗Topic ManipulationRecode, 72Stratify, 84
∗Topic ModellingBootstrap, 13Normalize, 55
∗Topic ModelsDummify, 25
∗Topic RescalingNormalize, 55
∗Topic SamplingPermutate, 60
∗Topic TestsBartels, 8JamesStein, 45
∗Topic TransformationNormalize, 55
∗Topic VizLorenz, 50
∗Topic datasetsalpha, 4auto, 7baseball, 10bhodrick93, 10big5, 11bollen, 12bush, 14cathedrals, 15cgreene76, 16chismoke, 17election2008, 26galton, 32geom_dumbbell, 33geom_lollipop, 35GeomSpotlight, 32griliches76, 39ltaylor96, 51marriage, 52mishkin92, 54nerlove63, 55paired, 58Palettes, 59pollster2008, 65religiosity, 73sheston91, 81stature, 83swatson93, 85
105
106 INDEX
titanic, 92turnout, 93twins, 94unions, 95vote, 98words, 103
∗Topic ggplot2geom_dumbbell, 33geom_lollipop, 35GeomSpotlight, 32party_pal, 59Plotting, 61pub_pal, 68pub_shapes, 69scale_color_party, 75scale_color_pub, 76scale_fill_party, 76scale_fill_pub, 77scale_shape_pub, 77SciencesPoFont, 80theme_blank, 87theme_darkside, 88theme_fte, 88theme_pub, 89
%nin%, 104
aes, 34, 35aes_, 34, 35align_title_left (Plotting), 61align_title_right (Plotting), 61alpha, 4Anonymize, 5Atkinson, 6, 37, 41, 74auto, 7
Bartels, 8baseball, 10bhodrick93, 10big5, 11bollen, 12Bootstrap, 13, 80borders, 34, 36bush, 14
Calibrateplot, 14cathedrals, 15cgreene76, 16chismoke, 17CI, 17
Clear, 18, 80Condorcet, 19CountMissing, 19CrossTable, 95Crosstable, 20, 30, 31, 85CrossValidation, 21
ddirichlet, 22Describe, 23Destring, 80destring (SciencesPo-deprecated), 79Detail (Describe), 23detail (SciencesPo-deprecated), 79dHondt, 24, 40Dirichlet (rdirichlet), 70discrete_scale, 75–77Disproportionality (Proportionality), 66Dummify, 25, 80dummy (SciencesPo-deprecated), 79
election2008, 26expand.grid, 96
Flag, 27Footnote, 27Forge, 28Formatted, 29fortify, 34, 35, 87Freq (Frequency), 31freq, 30, 31Frequency, 20, 30, 31
galton, 32geom_dumbbell, 33geom_lollipop, 35geom_spotlight (GeomSpotlight), 32GeomDumbbell (geom_dumbbell), 33GeomLollipop (geom_lollipop), 35GeomSpotlight, 32ggplot, 34, 35Gini, 7, 37, 38, 41, 50, 74, 80GiniSimpson, 37, 38, 50gl, 96griliches76, 39
Hamilton, 25, 40Herfindahl, 7, 37, 41, 74HighestAverages, 25, 40, 42, 49, 68
Jackknife, 44, 80
INDEX 107
JamesStein, 45
Kurtosis, 45
Label, 47label, 48Label<-, 47LargestRemainders, 25, 40, 43, 48, 68layer, 34, 35legend_bottom (Plotting), 61legend_left (Plotting), 61legend_right (Plotting), 61legend_top (Plotting), 61Lorenz, 37, 50, 74ltaylor96, 51
MakeNumeric, 51MakeShare, 52marriage, 52MeanFromRange, 53mishkin92, 54move_legend (Plotting), 61
nerlove63, 55no_axes (Plotting), 61no_axes_titles (Plotting), 61no_gridlines (Plotting), 61no_legend (Plotting), 61no_legend_title (Plotting), 61no_major_gridlines (Plotting), 61no_major_x_gridlines (Plotting), 61no_major_y_gridlines (Plotting), 61no_minor_gridlines (Plotting), 61no_minor_x_gridlines (Plotting), 61no_minor_y_gridlines (Plotting), 61no_ticks (Plotting), 61no_title (Plotting), 61no_x_axis (Plotting), 61no_x_gridlines (Plotting), 61no_x_line (Plotting), 61no_x_text (Plotting), 61no_x_ticks (Plotting), 61no_x_title (Plotting), 61no_y_axis (Plotting), 61no_y_gridlines (Plotting), 61no_y_line (Plotting), 61no_y_text (Plotting), 61no_y_ticks (Plotting), 61no_y_title (Plotting), 61
Normalize, 55, 80normalize (SciencesPo-deprecated), 79
Outlier, 57, 80outliers (SciencesPo-deprecated), 79
paired, 58Palettes, 59party_pal, 59, 76Permutate, 60Plotting, 61PoliticalDiversity, 25, 40, 41, 43, 49, 63,
68, 80politicalDiversity
(SciencesPo-deprecated), 79pollster2008, 65Previewplot (Plotting), 61PreviewTheme (Plotting), 61prop.table, 20Proportionality, 43, 49, 66pub_pal, 68, 76, 77pub_shapes, 69, 77
rdirichlet, 70read.zTree, 71Recode, 72, 80religiosity, 73Render, 73Rosenbluth, 7, 37, 41, 74rotate_axes_text (Plotting), 61rotate_x_text (Plotting), 61rotate_y_text (Plotting), 61RunShinyApp, 75
scale, 56scale_color_party, 59, 75scale_color_pub, 76, 77scale_fill_party, 76scale_fill_pub, 76, 77scale_shape_pub, 69, 77Scatterplot, 78SciencesPo, 79SciencesPo-deprecated, 79SciencesPo-package (SciencesPo), 79SciencesPoFont, 80sheston91, 81Simpson (Herfindahl), 41Skewness, 82stature, 83
108 INDEX
Stratify, 84summary.Crosstable, 20, 85svTransform (SciencesPo-deprecated), 79swatson93, 85
table, 20Tableplot, 86Ternaryplot, 86theme, 90theme_538, 90theme_538 (theme_fte), 88theme_blank, 87, 90theme_darkside, 88theme_fte, 88theme_pub, 89theme_scipo, 90titanic, 92Today, 93turnout, 93twins, 94TwoWay, 95
unions, 95Untable, 80, 96untable (SciencesPo-deprecated), 79
Voronoi, 80, 97voronoi (SciencesPo-deprecated), 79vote, 98
Waffleplot, 98WeightedCorrelation, 99WeightedMean, 100WeightedStdz, 101WeightedVariance, 101Winsorize, 58, 80, 102winsorize (SciencesPo-deprecated), 79words, 103
xtabs, 20