package ‘sciencespo’

108
Package ‘SciencesPo’ August 5, 2016 Type Package Title A Tool Set for Analyzing Political Behavior Data Date/Publication 2016-08-05 00:24:29 Version 1.4.1 Date 2016-08-02 Maintainer Daniel Marcelino <[email protected]> Description Provide functions for analyzing elections and political behavior data, including measures of political fragmentation, seat apportionment, and small data visualization graphs. URL http://CRAN.R-project.org/package=SciencesPo License GPL (>= 2) Encoding UTF-8 Depends R (>= 3.2.0), stats, utils, graphics, grDevices, methods, ggplot2 (>= 2.1.0) Imports Rcpp (>= 0.12.3), pander (>= 0.6.0), data.table (>= 1.9.4), grid (>= 3.0.0), rstudioapi, htmltools, lubridate, stringr, xtable, shiny Suggests testthat, devtools, scales, extrafont, gtable, knitr LinkingTo Rcpp VignetteBuilder knitr NeedsCompilation yes LazyData TRUE Repository CRAN BugReports http://github.com/danielmarcelino/SciencesPo/issues ByteCompile TRUE RoxygenNote 5.0.1 1

Upload: others

Post on 16-Jan-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Package ‘SciencesPo’August 5, 2016

Type Package

Title A Tool Set for Analyzing Political Behavior Data

Date/Publication 2016-08-05 00:24:29

Version 1.4.1

Date 2016-08-02

Maintainer Daniel Marcelino <[email protected]>

Description Provide functions for analyzing elections and political behaviordata, including measures of political fragmentation, seat apportionment, andsmall data visualization graphs.

URL http://CRAN.R-project.org/package=SciencesPo

License GPL (>= 2)

Encoding UTF-8

Depends R (>= 3.2.0), stats, utils, graphics, grDevices, methods,ggplot2 (>= 2.1.0)

Imports Rcpp (>= 0.12.3), pander (>= 0.6.0), data.table (>= 1.9.4),grid (>= 3.0.0), rstudioapi, htmltools, lubridate, stringr,xtable, shiny

Suggests testthat, devtools, scales, extrafont, gtable, knitr

LinkingTo Rcpp

VignetteBuilder knitr

NeedsCompilation yes

LazyData TRUE

Repository CRAN

BugReports http://github.com/danielmarcelino/SciencesPo/issues

ByteCompile TRUE

RoxygenNote 5.0.1

1

2 R topics documented:

Collate 'Anonymize.R' 'Atkinson.R' 'Bootstrap.R' 'CI.R''CMDcheckVars.R' 'CalibratePlot.R' 'Clear.R' 'Condorcet.R''CountMissing.R' 'CrossValidation.R' 'Crosstable.R' 'DATA.R''DEPRECATED.R' 'Describe.R' 'Dirichlet.R' 'Dummy.R''Frequency.R' 'Geoms.R' 'Gini.R' 'Herfindahl.R''HighestAverages.R' 'Jackknife.R' 'Label.R''LargestRemainders.R' 'Lorenz.R' 'MOMENTS.R' 'MeanFromRange.R''Normalize.R' 'Operators.R' 'Outlier.R' 'Palettes.R''Permutate.R' 'PoliticalDiversity.R' 'Proportionality.R''RcppExports.R' 'Recode.R' 'Rosenbluth.R' 'RunShinyApp.R''ScatterPlot.R' 'SciencesPo.R' 'Stratify.R' 'TESTS.R''Tableplot.R' 'Ternaryplot.R' 'Themes.R' 'Transform.R''UTILS.R' 'Untable.R' 'Voronoi.R' 'Waffleplot.R' 'Winsorize.R''flag.R' 'print.SciencesPo.R' 'zTree.R' 'zzz.R'

Author Daniel Marcelino [aut, cre]

R topics documented:alpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Anonymize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Atkinson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6auto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Bartels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8baseball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10bhodrick93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10big5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11bollen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13bush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Calibrateplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14cathedrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15cgreene76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16chismoke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Clear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Condorcet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19CountMissing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Crosstable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20CrossValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21ddirichlet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Describe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23dHondt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Dummify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25election2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Footnote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Forge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

R topics documented: 3

Formatted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29freq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31galton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32GeomSpotlight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32geom_dumbbell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33geom_lollipop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Gini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37GiniSimpson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38griliches76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Hamilton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Herfindahl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41HighestAverages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Jackknife . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44JamesStein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Label<- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47LargestRemainders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Lorenz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50ltaylor96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51MakeNumeric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51MakeShare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52marriage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52MeanFromRange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53mishkin92 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54nerlove63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Normalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Outlier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57paired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Palettes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59party_pal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Permutate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61PoliticalDiversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63pollster2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Proportionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66pub_pal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68pub_shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69rdirichlet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70read.zTree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Recode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72religiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Render . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Rosenbluth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74RunShinyApp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75scale_color_party . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75scale_color_pub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4 alpha

scale_fill_party . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76scale_fill_pub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77scale_shape_pub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Scatterplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78SciencesPo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79SciencesPo-deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79SciencesPoFont . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80sheston91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82stature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Stratify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84summary.Crosstable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85swatson93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Tableplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Ternaryplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86theme_blank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87theme_darkside . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88theme_fte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88theme_pub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89theme_scipo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90titanic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93turnout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94TwoWay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Untable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Voronoi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97vote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Waffleplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98WeightedCorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99WeightedMean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100WeightedStdz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101WeightedVariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Winsorize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103%nin% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Index 105

alpha A dataset that contains four test items

Description

A dataset with four test items used in SPSS to to compute Cronbach’s alpha.

• q1:q4 Numeric items of a scale.

Anonymize 5

Usage

data(alpha)

Format

A data.frame object with 4 variables and 60 observations.

Anonymize To Make Anonymous a Data Frame

Description

Replaces factor and character variables by a combination of random sampled letters and numbers.Numeric columns are also transformed, see details.

Usage

Anonymize(.data, col.names = "V", row.names = "")

## Default S3 method:Anonymize(.data, col.names = "V", row.names = "")

Arguments

.data A vector or a data frame.

col.names A string for column names.

row.names A string for rown names, default is empty.

Details

Strings will is replaced by an algorithm with 52/102 chance of choosing a letter and 50/102 chanceof choosing a number; then it joins everything in a 5-digits long character string.

Value

An object of the same type as .data.

Author(s)

Daniel Marcelino, <[email protected]>.

6 Atkinson

Examples

dt <- data.frame(Z = sample(LETTERS,10),X = sample(1:10),Y = sample(c("yes", "no"), 10, replace = TRUE))dt;

Anonymize(dt)

Atkinson Atkinson Index of Inequality

Description

Calculates the Atkinson index A. This inequality measure is especially good at determining whichend of the distribution is contributing most to the observed inequality.

Usage

Atkinson(x, n = rep(1, length(x)), epsilon = NULL, na.rm = FALSE, ...)

## Default S3 method:Atkinson(x, n = rep(1, length(x)), epsilon = NULL,

na.rm = FALSE, ...)

Arguments

x a vector of data values of non-negative elements.

n a vector of frequencies of the same length as x.

epsilon a parameter of the inequality measure (if NULL, the default parameter (0.5) of therespective measure is used).

na.rm logical. Should missing values be removed? Defaults is set to FALSE.

... additional arguements (currently ignored)

Details

epsilon = 0,5: little inequality aversion epsilon = 1,0: medium inequality aversion epsilon = 2,0:great inequality aversion

References

Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Hand-book of Income Distribution. Amsterdam.

Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.

auto 7

See Also

Herfindahl, Rosenbluth, Gini. For more details see the “Indices” vignette.

Examples

if (interactive()) {# generate a vector (of incomes)# y <- c(80, 60, 10, 20, 30)# Entropy 1.392321# Maximum Entropy 1.609438# Normalized Entropy 0.865098# Exponential Index 0.248498# Herfindahl 0.285000# Normalized Herfindahl 0.106250# Gini Coefficient 0.360000# Concentration Coefficient 0.450000

x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)

# compute Atkinson coefficient with epsilon=0.5Atkinson(x, epsilon=0.5)

w <- c(10, 15, 20, 25, 40, 20, 30, 35, 45, 90)}

auto Statistics of 1979 automobile models

Description

The data give the following statistics for 74 automobiles in the 1979 model year as sold in the US.

Usage

data(auto)

Format

A data.frame object with 14 variables and 74 observations.

Model Make and model of car.

Origin a factor with levels A,E,J

Price Price in dollars.

MPG Miles per gallon.

Rep78 Repair record for 1978 on 1 (worst) to 5 (best) scale.

Rep77 Repair record for 1978 on 1 to 5 scale.

8 Bartels

Hroom Headroom in inches.

Rseat Rear seat clearance in inches.

Trunk Trunk volume in cubic feet.

Weight Weight in pounds.

Length Length in inches.

Turn Turning diameter in feet.

Displa Engine displacement in cubic inches.

Gratio Gear ratio for high gear.

Source

This data frame was created from http://euclid.psych.yorku.ca/ftp/sas/sssg/data/auto.sas.

References

Originally published in Chambers, Cleveland, Kleiner, and Tukey, Graphical Methods for DataAnalysis, 1983, pages 352-355.

Bartels Bartels Rank Test of Randomness

Description

Performs Bartels rank test of randomness. The default method for testing the null hypothesis of ran-domness is two.sided. By using the alternative left.sided, the null hypothesis is tested againsta trend. By using the alternative right.sided, the null hypothesis of randomness is tested againsta systematic oscillation in the observed data.

Usage

Bartels(x, alternative = "two.sided", pvalue = "normal")

## Default S3 method:Bartels(x, alternative = "two.sided", pvalue = "normal")

Arguments

x a numeric vector of data values.

alternative a character string for hypothesis testing method; must be one of two.sided(default), left.sided or right.sided.

pvalue A method for asymptotic aproximation used to compute the p-value.

Bartels 9

Details

Missing values are by default removed.

The RVN test statistic is

RV N =

∑n−1i=1 (Ri −Ri+1)

2∑ni=1 (Ri − (n+ 1)/2)

2

where Ri = rank(Xi), i = 1, . . . , n. It is known that (RV N − 2)/σ is asymptotically standardnormal, where σ2 = 4(n−2)(5n2−2n−9)

5n(n+1)(n−1)2 .

Value

statistic The value of the RVN statistic test and the theoretical mean value and varianceof the RVN statistic test.

n the sample size, after the remotion of consecutive duplicate values.

p.value the asymptotic p-value.

method a character string indicating the test performed.

data.name a character string giving the name of the data.

alternative a character string describing the alternative.

References

Bartels, R. (1982). The Rank Version of von Neumann’s Ratio Test for Randomness, Journal of theAmerican Statistical Association, 77(377), 40-46.

Gibbons, J.D. and Chakraborti, S. (2003). Nonparametric Statistical Inference, 4th ed. (pp. 97-98).URL: http://books.google.pt/books?id=dPhtioXwI9cC&lpg=PA97&ots=ZGaQCmuEUq

Examples

# Example 5.1 in Gibbons and Chakraborti (2003), p.98.# Annual data on total number of tourists to the United States for 1970-1982.years <- 1970:1982tourists <- c(12362, 12739, 13057, 13955, 14123, 15698, 17523,18610, 19842, 20310, 22500, 23080, 21916)

# See it graphicallyqplot(factor(years), tourists)+ geom_point()

# Test the null against a trendBartels(tourists, alternative="left.sided", pvalue="beta")

10 bhodrick93

baseball Bradley Efron and Carl Morris ("Stein’s Paradox in Statistics)

Description

A dataset used in Efron and Morris’s 1977 paper, "Stein’s Paradox in Statistics".

• nameFirst and last name of selected player.• atBatsnumber of times at bats.• hitsnumber of hits after 45 at bats.• avg45is the average after 45 at bats.• remainingAtBatsremaining number at bats.• remainingAvgis the remaining average to the end of the season.• seasonAtBatsis the number of times at bats in the end of the season.• seasonHitsis the number of hits in the end of the season.• avgSeasonis the end of the season average.

Usage

data(baseball)

Format

A data.frame object with 9 variables and 18 observations.

bhodrick93 Bekaert’s and Hodrick’s (1993) Data

Description

Data set used by Bekaert and Hodrick (1993) on biases in the measurement of foreign exchangerisk premiums. This dataset contains the following columns:

• date A character vector for date.• jyspot Price of USD in JY, spot.• jyfwd Price of USD in JY, 30-day forward.• jys30 Price of USD in JY, spot market at 30-day forward deliver/date.• dmspot Price of USD in DM, spot.• dmfwd Price of USD in DM, 30-day forward• dms30 Price of USD in DM, spot market at 30-day forward deliver/date.• bpspot Price of USD in BP, spot.• bpfwd Price of USD in BP, 30-day forward.• bps30 Price of USD in BP, spot market at 30-day forward deliver/date.• quote A numeric vector.

big5 11

Usage

data(bhodrick93)

Format

A data.frame object with 11 variables and 778 observations.

Source

Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/

References

Bekaert, G., and Hodrick, R. J. (1993) On biases in the measurement of foreign exchange riskpremiums. Journal of International Money and Finance, 12(2), 115-138.

big5 Big 5 dataset

Description

This is a dataset of the Big 5 personality traits. It is a measurement of the Dutch translation of theNEO-PI-R on 500 first year psychology students (Dolan, Oort, Stoel, Wicherts, 2009). The testconsists of fifty items that you must rate on how true they are about you on a five point scale where1=Disagree, 3=Neutral and 5=Agree. It takes most people 3-8 minutes to complete.

• NeuroticismThe level of neuroticism

• ExtraversionThe level of extraversion

• OpennessThe level of openness

• AgreeablenessThe level of agreeableness

• ConscientiousnessThe level of conscientiousness

Usage

data(big5)

Format

A data.frame object with 5 variables and 500 observations.

12 bollen

References

Hoekstra, H. A., Ormel, J., & De Fruyt, F. (2003). NEO-PI-R/NEO-FFI: Big 5 persoonlijkhei-dsvragenlijst. Handleiding Manual of the Dutch version of the NEO-PI-R/NEO-FFI. Lisse, TheNetherlands: Swets and Zeitlinger.

Dolan, C. V., Oort, F. J., Stoel, R. D., & Wicherts, J. M. (2009). Testing measurement invari-ance in the target rotates multigroup exploratory factor model. Structural Equation Modeling, 16,295<96>314.

bollen Bollen Data on Industrialization and Political Democracy

Description

The data were reported in Bollen (1989, p. 428, Table 9.4) This set includes data from 75 developingcountries each assessed on four measures of democracy measured twice (1960 and 1965), and threemeasures of industrialization measured once (1960).

• y1Freedom of the press, 1960

• y2Freedom of political opposition, 1960

• y3Fairness of elections, 1960

• y4Effectiveness of elected legislature, 1960

• y5Freedom of the press, 1965

• y6Freedom of political opposition, 1965

• y7Fairness of elections, 1965

• y8Effectiveness of elected legislature, 1965

• x1GNP per capita, 1960

• x2Energy consumption per capita, 1960

• x3Percentage of labor force in industry, 1960

@references Bollen, K. A. (1979). Political democracy and the timing of development. AmericanSociological Review, 44, 572-587.

Bollen, K. A. (1980). Issues in the comparative measurement of political democracy. AmericanSociological Review, 45, 370-390.

Usage

data(bollen)

Format

A data.frame object with 11 variables and 75 observations.

Bootstrap 13

Bootstrap Method for Bootstrapping

Description

this method provides bootstrapping statistics.

Usage

Bootstrap(x, ...)

## Default S3 method:Bootstrap(x, nboots = 30, FUN, ...)

## S3 method for class 'model'Bootstrap(x, ...)

Arguments

x is a vector or a fitted model object whose parameters will be used to producebootstrapped statistics. Model objects are assumed to be of class “glm” or “lm”.

nboots an integer for the number of reiteration.

FUN a statistic function name to bootstrap, i.e., ’mean’, ’var’, ’cov’, etc.

... further arguments passed to or used by other methods.

Value

A list with “alpha” and “beta” slots. The “alpha” corresponds to ancillary parameters and “beta” tosystematic components of the model.

Author(s)

Daniel Marcelino, <[email protected]>

Examples

x = runif(10, 0, 1)Bootstrap(x, FUN=mean)

14 Calibrateplot

bush Approval Ratings for President George W. Bush

Description

Approval ratings for George W. Bush.

Usage

data(bush)

Format

A data.frame object with 5 variables and 270 observations.

• start.date. Start date of the survey.• end.date. End date of the survey.• approve. Percent which approve of the president.• disapprove. Percent which disapprove of the president.• undecided. Percent undecided about the president.

Details

A data set of approval ratings of George Bush over the time of his presidency, as reported by severalagencies. Most polls were of size approximately 1,000 so the margin of error is about 3 percentagepoints. @source http://www.pollingreport.com/BushJob.htm

Calibrateplot Calibrate Plot

Description

Plots a calibration plot for overlaping observed versus actual values.

Usage

Calibrateplot(.data = NULL, x, y, ci = 0.95,xlab = "Forecast Probability", ylab = "Observed Frequency")

Arguments

.data the data frame.x, y are the forecast and observed data vectors.ci is the size of the confidence intervalxlab a character name for horizontal axis.ylab a character name for vertical axis.

cathedrals 15

Author(s)

Daniel Marcelino

Examples

data <- data.frame(x=c(10,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100),y=c(0,05,10,20,25,30,40,50,60,70,75,80,95,100,100,100,100,100))

Calibrateplot(data, x, y)

cathedrals Cathedrals

Description

Heights and lengths of Gothic and Romanesque cathedrals. This dataset contains the followingcolumns:

• Type Romanesque or Gothic.

• Height Total height, feet.

• Length Total length, feet.

@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.

Usage

data(cathedrals)

Format

A data.frame object with 3 variables and 25 observations.

16 cgreene76

cgreene76 Christensen’s and Greene’s (1976) Data

Description

Data set used by Christensen and Greene (1976) on economies of scale in US electric power gener-ation. This dataset contains the following columns:

• firmid Observation id.

• costs Costs in 1970, MM USD.

• output Output, million KwH.

• plabor Price of labor.

• pkap Price of capital.

• pfuel Price of fuel.

• labshr Labor’s cost share.

• kapshr Capital’s cost share.

Usage

data(cgreene76)

Format

A data.frame object with 8 variables and 99 observations.

Source

Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/

References

Christensen, L. R., and Greene, W. H. (1976) Economies of scale in US electric power generation.The Journal of Political Economy, 655–676.

chismoke 17

chismoke Smoking and Lung Cancer in China

Description

The Chinese smoking and lung cancer data (Agresti, Table 5.12, p. 60).

• Citycity name.

• Smokerif smoker or not.

• Cancerif detected cancer or not.

• Countthe frequencies.

@references Agresti, A. An introduction to categorical Data Analysis, (2nd ed.). Wiley.

Usage

data(chismoke)

Format

A data.frame object with 4 variables and 32 observations.

CI An S4 Class to Confidence Intervals

Description

Calculates the confidence intervals for a vector of data values.

Usage

CI(x, level = 0.95, alpha = 1 - level, na.rm = FALSE, ...)

CI(x, level = 0.95, alpha = 1 - level, na.rm = FALSE, ...)

Arguments

x A vector of data values.

level The confidence level. Default is 0.95.

alpha The significance level. Default is 1-level. If alpha equals 0.05, then yourconfidence level is 0.95.

na.rm A logical value, default is FALSE

... Additional arguements (currently ignored)

18 Clear

Value

CI lower

Est. Mean Mean of data.

CI upper Upper bound of interval.

Std. Error Standard Error of the mean.

Slots

lower Lower bound of interval.

mean Estimated mean.

upper Upper bound of interval.

stderr Standard Error of the mean.

Author(s)

Daniel Marcelino, <[email protected]>.

Examples

x <- c(1, 2.3, 2, 3, 4, 8, 12, 43, -1,-4)

CI(x, level=.90)

Clear Clear Memory of All Objects

Description

This function is a wrapper for the command rm(list=ls()).

Usage

Clear(what = NULL, keep = TRUE)

Arguments

what The object (as a string) that needs to be removed (or kept)

keep Should obj be kept (i.e., everything but obj removed)? Or dropped?

Author(s)

Daniel Marcelino, <[email protected]>.

Condorcet 19

Examples

# create objectsa=1; b=2; c=3; d=4; e=5# remove dClear("d", keep=FALSE)ls()# remove all but a and bClear(c("a", "b"), keep=TRUE)ls()

Condorcet Condorcet Voting

Description

Condorcet Voting.

Usage

Condorcet()

CountMissing Count missing data.

Description

Function to count missing data.

Usage

CountMissing(.data, vars = NULL, prop = TRUE, exclude.complete = TRUE)

Arguments

.data the data frame.

vars vector with name of variables.

prop logical. If this is TRUE will show share of missing data by variable.exclude.complete

Logical. Exclude complete variables from the output.

Examples

CountMissing(marriage, prop = FALSE)

20 Crosstable

Crosstable Cross-tabulation

Description

Crosstable produces all possible two-way and three-way tabulations of variables.

Usage

Crosstable(..., digits = 2, chisq = FALSE, fisher = FALSE,mcnemar = FALSE, latex = FALSE, alt = "two.sided", deparse.level = 2)

## Default S3 method:Crosstable(..., digits = 2, chisq = FALSE,fisher = FALSE, mcnemar = FALSE, alt = "two.sided", latex = FALSE,deparse.level = 2)

Arguments

digits number of digits after the decimal point.

chisq if TRUE, the results of a chi-square test will be shown below the table.

fisher if TRUE, the results of a Fisher Exact test will be shown below the table.

mcnemar if TRUE, the results of a McNemar test will be shown below the table.

latex if the output is to be printed in latex format.

alt the alternative hypothesis and must be one of "two.sided", "greater" or "less".Only used in the 2 by 2 case.

deparse.level an integer controlling the construction of labels in the case of non-matrix-likearguments. If 0, middle 2 rownames, if 1, 3 rownames, if 2, 4 rownames (de-fault).

... the variables for the cross tabulation.

Value

A cross-tabulated object.

See Also

xtabs, Frequency, table, prop.table

Other Crosstables: summary.Crosstable

CrossValidation 21

Examples

with(titanic, Crosstable( SEX, AGE, SURVIVED))

#' # Agresti (2002), table 3.10, p. 106# 1992 General Social Survey--Race and Party affiliationgss <- data.frame(

expand.grid(Race=c("black", "white"),party=c("dem", "indep", "rep")),count=c(103,341,15,105,11,405))

df <- gss[rep(1:nrow(gss), gss[["count"]]), ]

with(df, Crosstable(Race, party))

# Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinkertea <- data.frame(

expand.grid(poured=c("Yes", "No"),guess=c("Yes", "No")),count=c(3,1,1,3))

# nicer way of recreating long tablesdata = Untable(tea, freq="count")

with(data, Crosstable(poured, guess, fisher=TRUE, alt="greater")) # fisher=TRUE

CrossValidation Cross Validation of SciencesPo Trees and Forests

Description

Cross Validation of SciencesPo Trees and Forests.

Usage

CrossValidation(object, ...)

## S3 method for class 'Tree'CrossValidation(object, FUN, ...)

## S3 method for class 'Forest'CrossValidation(object, FUN, ...)

Arguments

object a fitted Tree or Forest object in the formula.

... additional arguments.

FUN either tree or forest

22 ddirichlet

Value

An object of class CV with elements:

call the function call

Author(s)

Daniel Marcelino, <[email protected]>.

ddirichlet Dirichlet Distribution

Description

Density function and random number generation for the Dirichlet distribution

Usage

ddirichlet(x, alpha, log = FALSE, sum = FALSE)

Arguments

x a matrix containing observations.

alpha the Dirichlet distribution’s parameters. Can be a vector (one set of parametersfor all observations) or a matrix (a different set of parameters for each observa-tion), see “Details”.

log if TRUE, logarithmic densities are returned.

sum if TRUE, the (log-)likelihood is returned.

Value

the ddirichlet returns a vector of densities (if sum = FALSE) or the (log-)likelihood (if sum = TRUE)for the given data and alphas.

Author(s)

Daniel Marcelino, <[email protected]>

Examples

mat <- cbind(1:10, 5, 10:1);mat;x <- rdirichlet(10, mat);

ddirichlet(x, mat);

Describe 23

Describe Summary Description Table

Description

Produce summary description table of a dataframe with the following features: variable names,labels, factor levels, frequencies or summary statistics.

Usage

Describe(.data, style = "multiline", justify = "left", max.number = 10,trim.strings = FALSE, max.width = 15, digits = 2, split.cells = 35,display.labels = FALSE, display.attributes = FALSE, file = NA,append = FALSE, escape.pipe = FALSE, ...)

Arguments

.data a data frame.

style the style to be used in pander table, one of “multiline”, “grid”, “simple”, and“rmarkdown”.

justify pander argument; defaults is justified to “left”.

max.number an integer for the maximum number of items to be displayed in the frequencycell. If variable has more distinct values, no frequency will be shown (only amessage stating the number of distinct values).

trim.strings remove white spaces at the beginning or end of the string. This may impact thefrequencies, so interpret the frequencies cell accordingly. Defaults to FALSE.

max.width Limits the number of characters to display in the frequency tables. Defaults to15.

digits the number of digits for rounding.

split.cells pander argument. Number of characters allowed on a line before splitting thecell. Defaults to 35.

display.labels If TRUE, variable labels will be displayed. Defaults to FALSE.display.attributes

If TRUE, variable attibutes will be displayed. Defaults to FALSE.

file the text file to be written to disk. Defaults to NA.

append When “file” argument is supplied, this indicates whether to append output toexisting file (TRUE) or to overwrite any existing file (FALSE, default). If TRUEand no file exists, a new file will be created.

escape.pipe Only useful when style='grid' and file argument is not NA, in which case itwill escape the pipe character (|) to allow Pandoc to correctly convert multilinecells.

... additional arguments passed to pander.

24 dHondt

Details

The IQR formula used is the R standard IQR(x) = quantile(x, 3/4) - quantile(x, 1/4).The (CV) stands for the coefficient of variation also known as relative standard deviation (RSD),defined as the ratio of the standard deviation to the mean.

Author(s)

Daniel Marcelino, <[email protected]>

Examples

data(religiosity)Describe(religiosity)

dHondt The D’Hondt Method of Allocating Seats Proportionally

Description

The function calculates seats allotment in legislative house, given the total number of seats andthe votes for each party based on the Victor D’Hondt’s method (1878). The D’Hondt’s method ismathematically equivalent to the method proposed by Thomas Jefferson few years before (1792).

Usage

dHondt(parties = NULL, votes = NULL, seats = NULL, ...)

dHondt(parties = NULL, votes = NULL, seats = NULL, ...)

Arguments

parties a vector containig parties labels or candidates accordingly to the votes vectororder.

votes a vector containing the total number of formal votes received by the parties/candidates.

seats an integer for the number of seats to be filled (the district magnitude).

... Additional arguements (currently ignored)

Value

A data.frame of length parties containing apportioned integers (seats) summing to seats.

Note

Adapted from Carlos Bellosta’s replies in the R-list.

Dummify 25

Author(s)

Daniel Marcelino, <[email protected]>.

References

Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democra-cies, 1945-1990. Oxford University Press.

See Also

HighestAverages, LargestRemainders, Hamilton, PoliticalDiversity.

Examples

votes <- sample(1:10000, 5)parties <- sample(LETTERS, 5)

dHondt(parties, votes, seats = 4)

# Example: 2014 Brazilian election for the lower house in# the state of Ceara. Coalitions were leading by the# following parties:

results <- c(DEM=490205, PMDB=1151547, PRB=2449440,PSB=48274, PSTU=54403, PTC=173151)

dHondt(parties=names(results), votes=results, seats=19)

Dummify Generate Dummy Variables in a Data Frame

Description

Quickly create dummies in a data frame.

Usage

Dummify(x, .data = NULL, drop = TRUE)

Arguments

x the column position to generate dummies

.data the a data.frame.

drop A logical value. If TRUE, unused levels will be omitted.

26 election2008

Details

A matrix object

Author(s)

Daniel Marcelino, <[email protected]>

Examples

df <- data.frame(y = rnorm(10), x = runif(10,0,1), sex = sample(1:2, 10, rep=TRUE))

Dummify(df$sex) ;

cbind(df, Dummify(df$sex));

election2008 2008 U.S. presidential election

Description

State-by-state information from the 2008 U.S. presidential election.

Usage

data(election2008)

Format

A data.frame object with 7 variables and 51 observations.

• State Name of the state.

• Abr Abbreviation for the state.

• Income Per capita income in the state as of 2007 (in dollars).

• HS Percentage of adults with at least a high school education.

• BA Percentage of adults with at least a college education.

• Dem.Rep Difference in %Democrat-%Republican (according to 2008 Gallup survey).

• ObamaWin 1= Obama (Democrat) wins state in 2008 or 0=McCain (Republican wins).

@source State income data from: Census Bureau Table 659. Personal Income Per Capita (in 2007).High school data from: U.S. Census Bureau, 1990 Census of Population, http://www.nces.ed.gov/programs/digest/d08/tables/dt08_011.asp. College data from: Census Bureau Table225. Educational Attainment by State (in 2007) Democrat and Republican data from: http://www.gallup.com/poll/114016/state-states-political-party-affiliation.aspx

Flag 27

Flag Add an "id" Variable to a Dataset

Description

Many functions will not work properly if there are duplicated ID variables in a dataset. This functionis a convenience function for .N from the "data.table" package to create an .id. variable that whenused in conjunction with the existing ID variables, should be unique.

Usage

Flag(.data, id.vars = NULL)

Arguments

.data The input data.frame or data.table.

id.vars The variables that should be treated as ID variables. Defaults to NULL, at whichpoint all variables are used to create the new ID variable.

Value

The input dataset (as a data.table) if ID variables are unique, or the input dataset with a newcolumn named ".ID".

Examples

df <- data.frame(A = c("a", "a", "a", "b", "b"),B = c(1, 1, 1, 1, 1), values = 1:5);

df

df = Flag(df, c("A", "B"))

df <- data.frame(A = c("a", "a", "a", "b", "b"),B = c(1, 2, 1, 1, 2), values = 1:5)

df(df <- Flag(df, 1:2) )

Footnote Add Footnote to ggplot2 Objects

Description

Add footnote to ggplot2 objects.

28 Forge

Usage

Footnote(note, x = 0.85, y = 0.014, hjust = 0.5, vjust = 0.5,newlines = TRUE, fontfamily = "serif", fontface = "plain",color = "gray85", size = 9, angle = 0, lineheight = 0.9, alpha = 1)

Arguments

note string or plotmath expression to be drawn.x the x location of the label.y the y location of the label.hjust horizontal justificationvjust vertical justificationnewlines should a new line be appended to the start and end of the string, for spacing?fontfamily the font familyfontface the font face ("plain", "bold", etc.)color text colorsize point size of textangle angle at which text is drawnlineheight line height of textalpha the alpha value of the text

Forge Veritical, left-aligned layout for plots

Description

Left-align the waffle plots by x-axis. Use the pad parameter in waffle to pad each plot to the maxwidth (num of squares), otherwise the plots will be scaled.

Usage

Forge(...)

Arguments

... one or more waffle plots

Examples

parts <- c(80, 30, 20, 10)

w1 <- Waffleplot(parts, rows=8)w2 <- Waffleplot(parts, rows=8)w3 <- Waffleplot(parts, rows=8)chart <- Forge(w1, w2, w3)print(chart)

Formatted 29

Formatted Some Formats for Nicer Display

Description

Some predefined formats for nicer display.

Usage

Formatted(x, style = c("USD", "BRL", "EUR", "Perc"), digits = 2,nsmall = 2, decimal.mark = getOption("OutDec"), flag = "")

Arguments

x a numeric vector.

style a character name for style. One of "USD", "BRL", "EUR", "Perc".

digits an integer for the number of significant digits to be used for numeric and com-plex x

nsmall an integer for the minimum number of digits to the right of the decimal point.

decimal.mark decimal mark style to be used with Percents (%), usually (",") or (".").

flag a character string giving a format modifier as "-", "+", "#".

Examples

x <- as.double(c(0.1, 1, 10, 100, 1000, 10000))Formatted(x)

Formatted(x, "BRL")

Formatted(x, "EUR")

p = c(0.25, 25, 50)

Formatted(p, "Perc", flag="+")

Formatted(p, "Perc", decimal.mark=",")

30 freq

freq Simple Frequency Table

Description

Creates a frequency table or data frame.

Usage

freq(x, weighs = NULL, breaks = graphics::hist(x, plot = FALSE)$breaks,digits = 3, include.lowest = TRUE, order = c("desc", "asc", "level","name"), perc = FALSE, useNA = c("no", "ifany", "always"), ...)

## Default S3 method:freq(x, weighs = NULL, breaks = graphics::hist(x, plot =FALSE)$breaks, digits = 3, include.lowest = TRUE, order = c("desc","asc", "level", "name"), perc = FALSE, useNA = c("no", "ifany", "always"),...)

Arguments

x A vector of values for which the frequency is desired.

weighs A vector of weights.

breaks one of: 1) a vector giving the breakpoints between histogram cells; 2) a functionto compute the vector of breakpoints; 3) a single number giving the number ofcells for the histogram; 4) a character string naming an algorithm to compute thenumber of cells (see ’Details’); 5) a function to compute the number of cells.

digits The number of significant digits required.

include.lowest Logical; if TRUE, an x[i] equal to the breaks value will be included in the first (orlast) category or bin.

order The order method.

perc logical; if TRUE percents are returned.

useNA Logical; if TRUE NA’s values are included.

... Additional arguements (currently ignored)

Author(s)

Daniel Marcelino, <[email protected]>.

See Also

Frequency, Crosstable.

Frequency 31

Frequency Frequency Table

Description

Simulating the FREQ procedure of SPSS.

Usage

Frequency(.data, x = NULL, verbose = TRUE, ...)

## Default S3 method:Frequency(.data, x = NULL, verbose = TRUE, ...)

Freq(.data, x = NULL, verbose = TRUE, ...)

Arguments

.data The data.frame.

x A column for which a frequency of values is desired.

verbose A logical value, if TRUE, extra statistics are also provided.

... Additional arguements (currently ignored)

See Also

freq, Crosstable.

Examples

data(cathedrals)

Frequency(cathedrals, Type)

# may work with operators like %>%cathedrals %>% Frequency(Height)

32 GeomSpotlight

galton Galton’s Family Data on Human Stature.

Description

It is a reproduction of the data set used by Galton in his 1885’s paper on correlation between parent’sheight and their children, although the concept of correlation would be only introduced few yearslater, in 1888. This data.frame contains the following columns:

• parent The parents’ average height

• child The child’s height

Usage

data(galton)

Format

A data.frame object with 2 variables and 928 observations.

Details

Regression analysis is the statistical method most often used in political science research. Thereason is that most scholars are interested in identifying “causal” effects from non-experimentaldata and that the linear regression is the method for doing this. The term “regression” (1889) wasfirst crafted by Sir Francis Galton upon investigating the relationship between body size of fathersand sons. Thereby he “invented” regression analysis by estimating: Ss = 85.7 + 0.56SF , meaningthat the size of the son regresses towards the mean.

References

Francis Galton (1886) Regression Towards Mediocrity in Hereditary Stature. The Journal of theAnthropological Institute of Great Britain and Ireland, Vol. 15, pp. 246–263.

GeomSpotlight Automatically enclose points in a polygon

Description

Automatically enclose points in a polygon

Usage

geom_spotlight(mapping = NULL, data = NULL, stat = "identity",position = "identity", na.rm = FALSE, show.legend = NA,inherit.aes = TRUE, ...)

geom_dumbbell 33

Arguments

mapping mapping

data data

stat stat

position position

na.rm na.rm

show.legend show.legend

inherit.aes inherit.aes

... dots

Value

adds a circle around the specified points

Examples

d <- data.frame(x=c(1,1,2),y=c(1,2,2)*100)

gg <- ggplot(d,aes(x,y))gg <- gg + scale_x_continuous(expand=c(0.5,1))gg <- gg + scale_y_continuous(expand=c(0.5,1))gg + geom_spotlight(s_shape=1, expand=0) + geom_point()

gg <- ggplot(mpg, aes(displ, hwy))ss <- subset(mpg,hwy>29 & displ<3)gg + geom_spotlight(data=ss, colour="blue", s_shape=.8, expand=0) +geom_point() + geom_point(data=ss, colour="blue")

geom_dumbbell Dumbell charts

Description

The dumbbell geom is used to create dumbbell charts. Dumbbell dot plots–dot plots with two ormore series of data–are an alternative to the clustered bar chart or slope graph.

Usage

geom_dumbbell(mapping = NULL, data = NULL, ..., point.colour.l = NULL,point.size.l = NULL, point.colour.r = NULL, point.size.r = NULL,na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)

34 geom_dumbbell

Arguments

mapping Set of aesthetic mappings created by aes or aes_. If specified and inherit.aes = TRUE(the default), it is combined with the default mapping at the top level of the plot.You must supply mapping if there is no plot mapping.

data The data to be displayed in this layer. There are three options:If NULL, the default, the data is inherited from the plot data as specified in thecall to ggplot.A data.frame, or other object, will override the plot data. All objects willbe fortified to produce a data frame. See fortify for which variables will becreated.A function will be called with a single argument, the plot data. The returnvalue must be a data.frame., and will be used as the layer data.

... other arguments passed on to layer. These are often aesthetics, used to set anaesthetic to a fixed value, like color = "red" or size = 3. They may also beparameters to the paired geom/stat.

point.colour.l the colour of the left point

point.size.l the size of the left point

point.colour.r the colour of the right point

point.size.r the size of the right point

na.rm If FALSE (the default), removes missing values with a warning. If TRUE silentlyremoves missing values.

show.legend logical. Should this layer be included in the legends? NA, the default, includes ifany aesthetics are mapped. FALSE never includes, and TRUE always includes.

inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them.This is most useful for helper functions that define both data and aesthetics andshouldn’t inherit behaviour from the default plot specification, e.g. borders.

Aesthetics

geom_segment understands the following aesthetics (required aesthetics are in bold):

• x

• xend

• y

• yend

• alpha

• colour

• linetype

• size

geom_lollipop 35

Examples

df <- data.frame(trt=LETTERS[1:5],l=c(20, 40, 10, 30, 50),r=c(70, 50, 30, 60, 80))

ggplot(df, aes(y=trt, x=l, xend=r)) + geom_dumbbell()

ggplot(df, aes(y=trt, x=l, xend=r)) +geom_dumbbell(color="#a3c4dc", size=0.75, point.colour.l="#0e668b")

geom_lollipop Lollipop charts

Description

The lollipop geom is used to create lollipop charts. Lollipop charts are the creation of Andy Cot-greave going back to 2011. They are a combination of a thin segment, starting at with a dot at thetop and are a suitable alternative to or replacement for bar charts.

Geom Proto

Usage

geom_lollipop(mapping = NULL, data = NULL, ..., horizontal = FALSE,point.colour = NULL, point.size = NULL, stem.colour = NULL,stem.size = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)

Arguments

mapping Set of aesthetic mappings created by aes or aes_. If specified and inherit.aes = TRUE(the default), it is combined with the default mapping at the top level of the plot.You must supply mapping if there is no plot mapping.

data The data to be displayed in this layer. There are three options:If NULL, the default, the data is inherited from the plot data as specified in thecall to ggplot.A data.frame, or other object, will override the plot data. All objects willbe fortified to produce a data frame. See fortify for which variables will becreated.A function will be called with a single argument, the plot data. The returnvalue must be a data.frame., and will be used as the layer data.

... other arguments passed on to layer. These are often aesthetics, used to set anaesthetic to a fixed value, like color = "red" or size = 3. They may also beparameters to the paired geom/stat.

horizontal If is FALSE (the default), the function will draw the lollipops up from the X axis(i.e. it will set xend to x & yend to 0). If TRUE, it wiill set yend to y & xend to0). Make sure you map the x & y aesthetics accordingly. This parameter helpsavoid the need for coord_flip().

36 geom_lollipop

point.colour the colour of the point

point.size the size of the point

stem.colour the colour of the stem

stem.size the size of the stem

na.rm If FALSE (the default), removes missing values with a warning. If TRUE silentlyremoves missing values.

show.legend logical. Should this layer be included in the legends? NA, the default, includes ifany aesthetics are mapped. FALSE never includes, and TRUE always includes.

inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them.This is most useful for helper functions that define both data and aesthetics andshouldn’t inherit behaviour from the default plot specification, e.g. borders.

Details

Use the horizontal parameter to abate the need for coord_flip() (see the Arguments section fordetails).

Aesthetics

geom_point understands the following aesthetics (required aesthetics are in bold):

• x

• y

• alpha

• colour

• fill

• shape

• size

• stroke

Examples

df <- data.frame(trt=LETTERS[1:10],value=seq(100, 10, by=-10))

ggplot(df, aes(trt, value)) + geom_lollipop()

ggplot(df, aes(value, trt)) + geom_lollipop(horizontal=TRUE)

Gini 37

Gini Gini Coefficient G

Description

Computes the unweighted and weighted Gini index of a distribution.

Usage

Gini(x, weights, ...)

## Default S3 method:Gini(x, weights = rep(1, length = length(x)), ...)

Arguments

x a data.frame, a matrix-like, or a vector.

weights a vector containing weights for x.

... additional arguements (currently ignored)

Details

One might say the Gini is oversensitive to changes in the middle, while undersensitive at the ex-tremes. The G coefficient doesn’t capture very explicitly changes in the top 10 inequality researchin the past 10 years – or the bottom 40 most poverty lies. The alternative Palma ratio does.

Author(s)

Daniel Marcelino, <[email protected]>.

See Also

GiniSimpson, Lorenz, Herfindahl, Rosenbluth, Atkinson..

Examples

# generate a vector (of incomes)x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)# compute Gini coefficientGini(x)

# For Gini index: Gini(x)*100

Gini(c(100,0,0,0), c(1,33,33,33))

# Considers thisA <- c(20000, 30000, 40000, 50000, 60000)B <- c(9000, 40000, 48000, 48000, 55000)

38 GiniSimpson

Gini(A); Gini(B);

GiniSimpson Gini-Simpson Index

Description

Computes the Gini/Simpson coefficient. NAs from the data are omitted.

Usage

GiniSimpson(x, na.rm = TRUE, ...)

## Default S3 method:GiniSimpson(x, na.rm = TRUE, ...)

Arguments

x a data.frame, a matrix-like, or a vector.na.rm a logical value to deal with NAs.... additional arguements (currently ignored).

Details

The Gini-Simpson quadratic index is a classic measure of diversity, widely used by social scientistsand ecologists. The Gini-Simpson is also known as Gibbs-Martin index in sociology, psychologyand management studies, which in turn is also known as the Blau index. The Gini-Simpson indexis computed as 1− λ = 1−

∑Ri=1 p

2i = 1− 1/2D.

Author(s)

Daniel Marcelino, <[email protected]>.

See Also

Gini.

Examples

# generate a vector (of incomes)x <- as.table(c(69,50,40,22))# let's say AB coalescedrownames(x) <- c("AB","C","D","E")print(x)

GiniSimpson(x)

griliches76 39

griliches76 Griliches’s (1976) Data

Description

Data set used by Griliches (1976) on wages of very young men. This dataset contains the followingcolumns:

• rns residency in South.• rns80 a numeric vector.• mrt marital status = 1 if married.• mrt80 a numeric vector.• smsa reside metro area = 1 if urban.• smsa80 a numeric vector.• med mother’s education, years.• iq iq score.• kww score on knowledge in world of work test.• year Year.• age a numeric vector.• age80 a numeric vector.• s completed years of schooling.• s80 a numeric vector.• expr experience, years.• expr80 a numeric vector.• tenure tenure, years.• tenure80 a numeric vector.• lw log wage.• lw80 a numeric vector.

Usage

data(griliches76)

Format

A data.frame object with 20 variables and 758 observations.

Source

Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/

References

Griliches, Z. (1976) Wages of very young men. The Journal of Political Economy, 84(4), 69–85.

40 Hamilton

Hamilton The Hamilton Method of Allocating Seats Proportionally

Description

Computes the Alexander Hamilton’s apportionment method (1792), also known as Hare-Niemeyermethod or as Vinton’s method. The Hamilton method is a largest-remainder method which uses theHare Quota.

Usage

Hamilton(parties = NULL, votes = NULL, seats = NULL, ...)

Hamilton(parties = NULL, votes = NULL, seats = NULL, ...)

Arguments

parties A vector containig parties labels or candidates in the same order of votes.

votes A vector with the formal votes received by the parties/candidates.

seats An integer for the number of seats to be returned.

... Additional arguements (currently ignored)

Details

The Hamilton/Vinton Method sets the divisor as the proportion of the total population per houseseat. After each state’s population is divided by the divisor, the whole number of the quotientis kept and the fraction dropped resulting in surplus house seats. Then, the first surplus seat isassigned to the state with the largest fraction after the original division. The next is assigned to thestate with the second-largest fraction and so on.

Value

A data.frame of length parties containing apportioned integers (seats) summing to seats.

Author(s)

Daniel Marcelino, <[email protected]>.

References

Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democra-cies, 1945-1990. Oxford University Press.

See Also

dHondt, HighestAverages, LargestRemainders, PoliticalDiversity.

Herfindahl 41

Examples

votes <- sample(1:10000, 5)parties <- sample(LETTERS, 5)Hamilton(parties, votes, seats = 4)

Herfindahl Herfindahl Index of Concentration

Description

Calculates the Herfindahl Index of concentration.

Usage

Herfindahl(x, n = rep(1, length(x)), base = 1, na.rm = FALSE, ...)

## Default S3 method:Herfindahl(x, n = rep(1, length(x)), base = NULL,

na.rm = FALSE, ...)

Arguments

x a vector of data values of non-negative elements.

n a vector of frequencies of the same length as x.

base a parameter of the concentration measure (if NULL, the default parameter (1.0) isused instead).

na.rm a logical. Should missing values be removed? The Default is set to na.rm=FALSE.

... additional arguements (currently ignored)

Details

This index is also known as the Simpson Index in ecology, the Herfindahl-Hirschman Index (HHI)in economics, and as the Effective Number of Parties (ENP) in political science.

References

Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Hand-book of Income Distribution. Amsterdam.

Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.

See Also

Atkinson, Rosenbluth, PoliticalDiversity, Gini. For more details see the Indices vignette:vignette("Indices", package = "SciencesPo")

42 HighestAverages

Examples

# generate a vector (of incomes)x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)

# compute the Herfindahl coefficient with base=1Herfindahl(x, base=1)

HighestAverages Highest Averages Methods of Allocating Seats Proportionally

Description

Computes the highest averages method for a variety of formulas of allocating seats proportionally.

Usage

HighestAverages(parties = NULL, votes = NULL, seats = NULL,method = c("dh", "sl", "msl", "danish", "hsl", "hh", "imperiali", "wb","jef", "ad", "hb"), threshold = 0, ...)

## Default S3 method:HighestAverages(parties = NULL, votes = NULL,seats = NULL, method = c("dh", "sl", "msl", "danish", "hsl", "hh","imperiali", "wb", "jef", "ad", "hb"), threshold = 0, ...)

Arguments

parties A character vector for parties labels or candidates in the same order as votes. IfNULL, alphabet will be assigned.

votes A numeric vector for the number of formal votes received by each party or can-didate.

seats The number of seats to be filled (scalar or vector).

method A character name for the method to be used. See details.

threshold A numeric value between (0~1). Default is set to 0.

... Additional arguements (currently ignored)

Details

The following methods are available:

• "dh"d’Hondt method

• "sl"Sainte-Lague method

• "msl"Modified Sainte-Lague method

HighestAverages 43

• "danish"Danish modified Sainte-Lague method

• "hsl"Hungarian modified Sainte-Lague method

• "imperiali"The Italian Imperiali (not to be confused with the Imperiali Quota, which is aLargest remainder method)

• "hh"Huntington-Hill method

• "wb"Webster’s method

• "jef"Jefferson’s method

• "ad"Adams’s method

• "hb"Hagenbach-Bischoff method

Value

A data.frame of length parties containing apportioned integers (seats) summing to seats.

Author(s)

Daniel Marcelino, <[email protected]>.

References

Gallagher, Michael (1992). "Comparing Proportional Representation Electoral Systems: Quotas,Thresholds, Paradoxes and Majorities". British Journal of Political Science, 22, 4, 469-496.

Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democra-cies, 1945-1990. Oxford University Press.

See Also

LargestRemainders, Proportionality, PoliticalDiversity. For more details see the Indicesvignette: vignette('Indices', package = 'SciencesPo').

Examples

# Results for the state legislative house of Ceara (2014):votes <- c(187906, 326841, 132531, 981096, 2043217, 15061, 103679,109830, 213988, 67145, 278267)

parties <- c("PCdoB", "PDT", "PEN", "PMDB", "PRB", "PSB", "PSC", "PSTU", "PTdoB", "PTC", "PTN")

HighestAverages(parties, votes, seats = 42, method = "dh")

# Let's create a data.frame with typical election results# with the following parties and votes to return 10 seats:

my_election_data <- data.frame(party=c("Yellow", "White", "Red", "Green", "Blue", "Pink"),votes=c(47000, 16000,15900,12000,6000,3100))

HighestAverages(my_election_data$party,my_election_data$votes,

44 Jackknife

seats = 10,method="dh")

# How this compares to the Sainte-Lague Method

(dat= HighestAverages(my_election_data$party,my_election_data$votes,seats = 10,method="sl"))

# Plot it# Barplot(data=dat, "Party", "Seats") +# theme_fte()

Jackknife Resamples Data Using the Jackknife Method

Description

This function is used for estimating standard errors when the distribution is not know.

Usage

Jackknife(x, p)

Arguments

x A vectorp An function name for estimation of parameter as a string

Value

est orignial estimation of parameter

jkest jackknife estimation of parameter

jkvar jackknife estimation of variance

jkbias jackknife estimate of biasness of parameter

jkbiascorr bias corrected parameter estimate

Author(s)

Daniel Marcelino, <[email protected]>

Examples

x = runif(10, 0, 1)mean(x)Jackknife(x,'mean')

JamesStein 45

JamesStein James-Stein Shrunken Estimates

Description

Computes James-Stein shrunken estimates of cell means given a response variable (which may bebinary) and a grouping indicator.

Usage

JamesStein(x, tau2 = 1)

JamesStein(x, tau2 = 1)

Arguments

x a vector, matrix, or array of numerics; the data.

tau2 a positive numeric. The variance, assumed known and defaults to 1. # @paramk the grouping factor.

... extra parameters typically ignored.

Details

Missing values are by default removed.

References

Efron, Bradley and Morris, Carl (1977) “Stein’s Paradox in Statistics.” Scientific American Vol.236 (5): 119-127.

James, W., & Stein, C. (1961, June). Estimation with quadratic loss. In Proceedings of the fourthBerkeley symposium on mathematical statistics and probability (Vol. 1, No. 1961, pp. 361-379).

Kurtosis Compute the Kurtosis

Description

Provides three methods for performing kurtosis test.

Usage

Kurtosis(x, na.rm = FALSE, type = 3)

46 Kurtosis

Arguments

x a numeric vector

na.rm a logical value for na.rm, default is na.rm=FALSE.

type an integer between 1 and 3 selecting one of the algorithms for computing kurto-sis detailed below

Details

Kurtosis measures the peakedness of a data distribution. A distribution with zero kurtosis has ashape as the normal curve (mesokurtic, or mesokurtotic). A positive kurtosis has a curve morepeaked about the mean and the its shape is narrower than the normal curve (leptokurtic, or leptokur-totic). A distribution with negative kurtosis has a curve less peaked about the mean and the its shapeis flatter than the normal curve (platykurtic, or platykurtotic).

Value

An object of the same type as x.

Note

To be consistent with classical use of kurtosis in political science analyses, the default type is thesame equation used in SPSS and SAS, which is the bias-corrected formula: Type 2: G_2 = ((n + 1)g_2+6) * (n-1)/(n-2)(n-3). When you set type to 1, the following equation applies: Type 1: g_2 =m_4/m_2^2-3. When you set type to 3, the following equation applies: Type 3: b_2 = m_4/s^4-3= (g_2+3)(1-1/n)^2-3. You must have at least 4 observations in your vector to apply this function.

Skewness and Kurtosis are functions to measure the third and fourth central moment of a datadistribution.

Author(s)

Daniel Marcelino, <[email protected]>

References

Balanda, K. P. and H. L. MacGillivray. (1988) Kurtosis: A Critical Review. The American Statisti-cian, 42(2), pp. 111–119.

Examples

w<-sample(4,10, TRUE)x <- sample(10, 1000, replace=TRUE, prob=w)

Kurtosis(x, type=1)Kurtosis(x, type=2)Kurtosis(x)

Label 47

Label Get Variable Label

Description

This function returns character value previously stored in variable’s label attribute. If none found,and fallback argument is set to TRUE, the function returns object’s name (retrieved by deparse(substitute(x))),otherwise NA is returned with a warning notice.

Usage

Label(x, fallback = TRUE, simplify = TRUE)

Arguments

x an R object to extract labels from.

fallback a logical indicating if labels should fallback to object name(s). Default is set toTRUE.

simplify a logical indicating if coerce results to a vector, otherwise, a list is returned.Default is set to TRUE.

Author(s)

Daniel Marcelino, <[email protected]>

Examples

x <- rnorm(100)Label(x) # returns "x"

Label(twins$IQb) <- "IQ biological scores"

Label(twins, FALSE) # returns NA where no labels are found

Label<- Set Variable Label

Description

This function sets a label to a variable, by storing a character string to its label attribute.

Usage

Label(x) <- value

48 LargestRemainders

Arguments

x a valid variable i.e. a non-NULL atomic vector that has no dimension attribute(see dim for details).

value a character value that is to be set as variable label

See Also

label

Examples

## Not run:Label(religiosity$Religiosity) <- "Religiosity index"

vec <- rnorm(100)Label(vec) <- "random numbers"

## End(Not run)

LargestRemainders Largest Remainders Methods of Allocating Seats Proportionally

Description

Computes the largest remainders method for a variety of formulas of allocating seats proportionally.

Usage

LargestRemainders(parties = NULL, votes = NULL, seats = NULL,method = c("hare", "droop", "hagb", "imperiali", "imperiali.adj"),threshold = 0, ...)

## Default S3 method:LargestRemainders(parties = NULL, votes = NULL,seats = NULL, method = c("hare", "droop", "hagb", "imperiali","imperiali.adj"), threshold = 0, ...)

Arguments

parties A character vector for parties labels or candidates in the order as votes. If NULL,a random combination of letters will be assigned.

votes A numeric vector for the number of formal votes received by each party or can-didate.

seats The number of seats to be filled (scalar or vector).method A character name for the method to be used. See details.threshold A numeric value between (0~1). Default is set to 0.... Additional arguements (currently ignored)

LargestRemainders 49

Details

The following methods are available:

• "droop"Droop quota method

• "hare"Hare method

• "hagb"Hagenbach-Bischoff

• "imperiali"Imperiali quota (do not confuse with the Italian Imperiali, which is a highest aver-ages method)

• "imperiali.adj"Reinforced or adjusted Imperiali quota

Value

A data.frame of length parties containing apportioned integers (seats) summing to seats.

Author(s)

Daniel Marcelino, <[email protected]>.

References

Gallagher, Michael (1992). "Comparing Proportional Representation Electoral Systems: Quotas,Thresholds, Paradoxes and Majorities". British Journal of Political Science, 22, 4, 469-496.

Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democra-cies, 1945-1990. Oxford University Press.

See Also

HighestAverages, Proportionality, PoliticalDiversity. For more details see the Indicesvignette: vignette('Indices', package = 'SciencesPo').

Examples

# Let's create a data.frame with typical election results# with the following parties and votes to return 10 seats:

my_election_data <- data.frame(party=c("Yellow", "White", "Red", "Green", "Blue", "Pink"),votes=c(47000, 16000,15900,12000,6000,3100))

LargestRemainders(my_election_data$party,my_election_data$votes, seats = 10, method="droop")

with(my_election_data, LargestRemainders(party,votes, seats = 10, method="hare"))

50 Lorenz

Lorenz The Lorenz Curve

Description

Computes the (empirical) ordinary and generalized Lorenz curve of a vector.

Usage

Lorenz(x, n = rep(1, length(x)), plot = FALSE, ...)

## Default S3 method:Lorenz(x, n = rep(1, length(x)), plot = FALSE, ...)

Arguments

x A vector of non-negative values.

n A vector of frequencies of the same length as x.

plot A logical. If TRUE the Lorenz curve will be plotted.

... Additional arguements (currently ignored)

Details

The Gini coefficient ranges from a minimum value of zero, when all individuals are equal, to atheoretical maximum of one in an infinite population in which every individual except one has asize of zero. It has been shown that the sample Gini coefficients originally defined need to bemultiplied by n/(n-1) in order to become unbiased estimators for the population coefficients.

Author(s)

Daniel Marcelino, <[email protected]>.

See Also

Gini, GiniSimpson.

Examples

# generate a vector (of incomes)x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)# compute Lorenz valuesLorenz(x)# generate some weights:wgt <- runif(n=length(x))# compute the lorenz with especific weightsLorenz(x, wgt)

ltaylor96 51

ltaylor96 Lothian’s and Taylor’s (1996) Data Set

Description

Data used by Lothian and Taylor (1996) on the real exchange rate behaviour. This dataset containsthe following columns:

• year Year

• spot dollar/sterling exchange rate.

• USwpi U.S. wholesale price index, 1914==100.

• UKwpi U.K. wholesale price index, 1914==100.

Usage

data(ltaylor96)

Format

A data.frame object with 4 variables and 200 observations.

Source

Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/

References

Lothian, J. R., and Taylor, M. P. (1996) Real exchange rate behavior: the recent float from theperspective of the past two centuries. Journal of Political Economy, 488–509.

MakeNumeric Clean up a character vector to make it numeric

Description

Remove commas and whitespace, primarily

Usage

MakeNumeric(x)

Arguments

x character vector to process

52 marriage

Value

numeric vector

MakeShare Clean up a character vector to make it a percent

Description

Remove "

Usage

MakeShare(x)

Arguments

x character vector to process

Value

numeric vector

marriage Same Sex Marriage Public Opinion Data

Description

Data set fielded by the PEW Research Center on same sex marriage support in US. It covers publicopinion on the issue starting from 1996 up to date. This dataset contains the following columns:

• Date The year of the measurement• Oppose Percent opposing same-sex marriage• Favor Percent favoring same-sex marriage• DK Percent of Don’t Know

Usage

data(marriage)

Format

A data.frame object with 4 variables and 18 observations.

References

PEW Research Center. Support for same-sex marriage. http://www.pewresearch.org

MeanFromRange 53

MeanFromRange Estimates Mean and Standard Deviation from Median and Range

Description

When conductig a meta-analysis study, it is not always possible to recover from reports, the meanand standard deviation values, but rather the median and range of values. This function provides amethod for computing the mean and variance estimates from median/range or IQR estimates.

Usage

MeanFromRange(low, med, high, n)

Arguments

low The min of the data.

med The median of the data.

high The max of the data

n The size of the sample.

Note

One should use these calculations ONLY if there are strong hints, that the overall distribution doesnot relevantly deviate from normal distribution. The question is then, why some papers reportmedian and IQR if data are not far from normal distribution.

Author(s)

Daniel Marcelino, <[email protected]>

References

Hozo, Stela P.; et al (2005) Estimating the mean and variance from the median, range, and the sizeof a sample. BMC Medical Research Methodology, 5:13.

Examples

# mean=median and SD = IQR/1.35.# median= 3# Iqr= 2-3# median=mean=3# sd=iqr/1.35 2-3 means 2,5/1.35= sd = 1,85

MeanFromRange(low=5, med=8, high=12, n=10)

54 mishkin92

mishkin92 Mishkin’s (1992) Data

Description

Data from the Frederic S. Mishkin (1992) paper “Is the Fisher Effect for real?”. This dataset con-tains the following columns:

• year Year

• mon a numeric vector

• inf1mo a numeric vector

• inf3mo a numeric vector

• tbill1mo a numeric vector

• tbill3mo a numeric vector

• cpiu a numeric vector

• quote a numeric vector

Usage

data(mishkin92)

Format

A data.frame object with 8 variables and 491 observations.

Source

Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/

References

Mishkin, F. S. (1992) Is the Fisher effect for real?: A reexamination of the relationship betweeninflation and interest rates. Journal of Monetary Economics, 30(2), 195–215.

nerlove63 55

nerlove63 Marc Nerlove’s (1963) data

Description

Data used by Marc Nerlove (1963) on returns of electricity supply. This dataset contains the fol-lowing columns:

• totcost costs in 1970, MM USD.

• output output, billion KwH.

• plabor price of labor.

• pfuel price of fuel.

• pkap price of capital.

#’ @source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University.http://fmwww.bc.edu/ec-p/data/Hayashi/

Usage

data(nerlove63)

Format

A data.frame object with 5 variables and 145 observations.

References

Nerlove, M. (1963) Returns to Scale in Electricity Supply. In Measurement in Economics-Studiesin Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld, edited by Carl F.Christ. Stanford: Stanford.

Normalize Unity-based normalization

Description

Normalizes as feature scaling min - max, or unity-based normalization typically used to bringthe values into the range [0,1]. Other methods are also available, including scoring, centering, andSmithson and Verkuilen (2006) method of dependent variable transformation.

Usage

Normalize(x, method = "range", ...)

56 Normalize

Arguments

x is a vector to be normalized.

method A string for the method used for normalization. Default is method = "range",which coerces values into [0,1] range. See details for other implemented meth-ods.

... Additional arguements (currently ignored).

Details

The following methods are available:

• "range"Ranging is done by coercing x values into [0,1] range. However, this may be general-ized to follow the range of other arbitrary values using a and b:

X ′ = a+(x− xmin)(b− a)

(xmax − xmin)

.

• "scale"Scaling is done by dividing (centered) values of x by their standard deviations.

• "center"Centering is done by subtracting the means (omitting NAs) of x from its observedvalue.

• "z-score"Scoring is done by dividing the values of x from their root mean square.

• "SV"Transform a dependent variable in [0,1] rather than (0, 1) to beta regression as suggestedby Smithson and Verkuilen (2006).

Value

Normalized values in an object of the same class as x.

Author(s)

Daniel Marcelino, <[email protected]>

References

Smithson M, Verkuilen J (2006) A Better Lemon Squeezer? Maximum-Likelihood Regression withBeta-Distributed Dependent Variables. Psychological Methods, 11(1), 54-71.

See Also

scale.

Examples

x <- sample(10)(y = Normalize(x) )

# equivalently to

Outlier 57

(x-min(x))/(max(x)-min(x))

# Smithson and Verkuilen approach(y = Normalize(x, method="SV") )

# look at what happens to the correlation of two independent variables and their "interaction"# With non-centered interactions:a = rnorm(10000,20,2)b = rnorm(10000,10,2)cor(a,b)cor(a,a*b)

# With centered interactions:c = a - 20d = b - 10cor(c,c*d)

Outlier Detect Outliers

Description

Perform exploratory test to detect outliers.

Usage

Outlier(x, index = NULL, ...)

## Default S3 method:Outlier(x, index = NULL, ...)

Arguments

x A numeric object, a vector.

index A numeric value to be considered in the computations.

... Parameters which are typically ignored.

Details

The quantity in min reveals the minimum deviation from the mean, the integer value in the closestindicates the position of that element. The quantity in max is the maximum deviation from themean, and the farthest integer value indicates the position of that value.

Value

Returns the minimum and maximum values, respectively preceded by their positions in the vector,matrix or data.frame.

58 paired

Author(s)

Daniel Marcelino, <[email protected]>

References

Dixon, W.J. (1950) Analysis of extreme values. Ann. Math. Stat. 21(4), 488–506.

See Also

Winsorize for reduce the impact of outliers.

Examples

Outlier(x <- rnorm(20))

#data frame:age <- sample(1:100, 1000, rep=TRUE);Outlier(age)

paired Paired Data

Description

Artificial data for a paired experiment. This dataset contains the following columns:

• patient the patient id.

• before_X before treatment.

• after_Y after treatment.

Usage

data(paired)

Format

A data.frame object with 3 variables and 9 observations.

Palettes 59

Palettes Palette data for the themes used by package

Description

Data used by the palettes in the package.

Usage

Palettes

Format

A list.

party_pal Political Parties Color Palette (Discrete) and Scales

Description

An N-color discrete palette for political parties.

Usage

party_pal(palette = "BRA", plot = FALSE, hex = FALSE)

Arguments

palette the palette name, a character string.

plot logical, if TRUE a plot is returned.

hex logical, if FALSE, the associated color name (label) is returned.

See Also

Other color party: scale_color_party

Examples

library(scales)

# Brazilshow_col(party_pal("BRA")(20))

# Argentineshow_col(party_pal("ARG")(12))

60 Permutate

# USshow_col(party_pal("USA")(6))

# Canadashow_col(party_pal("CAN")(10))

party_pal("CAN", plot=TRUE, hex=FALSE)

Permutate Create k random permutations of a vector

Description

Creates a k random permutation of a vector.

Usage

Permutate(x, k)

Arguments

x a vector to be permutated.

k number of permutations to be conducted.

Details

should be used only for length(input)! » k

Author(s)

Daniel Marcelino, <[email protected]>.

Examples

# 5! = 5 x 4 x 3 x 2 x 1 = 120# row wise permutationsPermutate(x=1:5, k=5)

Plotting 61

Plotting Plot Customizing

Description

Several wrapper functions to deal with ggplot2.

Usage

Previewplot()

PreviewTheme()

align_title_left()

align_title_right()

no_title()

no_y_gridlines()

no_x_gridlines()

no_major_x_gridlines()

no_major_y_gridlines()

no_major_gridlines()

no_minor_x_gridlines()

no_minor_y_gridlines()

no_minor_gridlines()

no_gridlines()

rotate_axes_text(angle)

rotate_x_text(angle)

rotate_y_text(angle)

no_axes()

no_x_axis()

62 Plotting

no_y_axis()

no_x_line()

no_y_line()

no_x_text()

no_y_text()

no_ticks()

no_x_ticks()

no_y_ticks()

no_axes_titles()

no_x_title()

no_y_title()

move_legend(position)

legend_bottom()

legend_top()

legend_left()

legend_right()

no_legend()

no_legend_title()

Arguments

angle the angle of rotation.

position the location of the legend (’top’, ’bottom’, ’right’, ’left’ or ’none’).

Author(s)

NA

Examples

Previewplot()

PoliticalDiversity 63

PoliticalDiversity Political Diversity Indices

Description

Computes political diversity indices or fragmentation/concetration measures like the effective num-ber of parties for an electoral unity or across unities. The very intuition of these coefficients is tocounting parties while weighting them by their relative political–or electoral strength.

Usage

PoliticalDiversity(x, index = "laakso/taagepera", margin = 1,base = exp(1))

## Default S3 method:PoliticalDiversity(x, index = "laakso/taagepera",margin = 1, base = exp(1))

Arguments

x A data.frame, a matrix, or a vector containing values for the number of votes orseats each party received.

index The type of index desired, one of "laakso/taagepera", "golosov", "herfindahl","gini", "shannon", "simpson", "invsimpson".

margin The margin for which the index is computed.

base The logarithm base used in some indices, such as the "shannon" index.

Details

Very often, political analysts say things like ‘two-party system’ and ‘multi-party system’ to referto a particular kind of political party system. However, these terms alone does not tell exactlyhow fragmented–or concentrated a party system actually is. For instance, after the 2010 generalelection, 22 parties obtained representation in the Lower Chamber in Brazil. Nonetheless, amongthese 22 parties, nine parties together returned only 28 MPs. Thus, an index to assess the weigh orthe Effective Number of Parties is important and helps to go beyond the simple count of parties ina legislative branch. A widely accepted algorithm was proposed by M. Laakso and R. Taagepera:

N =1∑p2i

, where N denotes the effective number of parties and p_i denotes the ith party’s fraction of theseats.

In fact, this formula may be used to compute the vote share for each party. This formula is thereciprocal of a well-known concentration index (the Herfindahl-Hirschman index) used in eco-nomics to study the degree to which ownership of firms in an industry is concentrated. Laakso

64 PoliticalDiversity

and Taagepera correctly saw that the effective number of parties is simply an instance of the in-verse measurement problem to that one. This index makes rough but fairly reliable internationalcomparisons of party systems possible.

The Inverse Simpson index,

1/λ =1∑R

i=1 p2i

= 2D

Where λ equals the probability that two types taken at random from the dataset (with replacement)represent the same type. This simply equals true fragmentation of order 2, i.e. the effective numberof parties that is obtained when the weighted arithmetic mean is used to quantify average propor-tional diversity of political parties in the election of interest. Another measure is the Least squaresindex (lsq), which measures the disproportionality produced by the election. Specifically, by thedisparity between the distribution of votes and seats allocation.

Recently, Grigorii Golosov proposed a new method for computing the effective number of parties inwhich both larger and smaller parties are not attributed unrealistic scores as those resulted by usingthe Laakso/Taagepera index.I will call this as (Golosov) and is given by the following formula:

N =

n∑i=1

pipi + p2i − p2i

Author(s)

Daniel Marcelino, <[email protected]>.

References

Gallagher, Michael and Paul Mitchell (2005) The Politics of Electoral Systems. Oxford UniversityPress.

Golosov, Grigorii (2010) The Effective Number of Parties: A New Approach, Party Politics, 16:171-192.

Laakso, Markku and Rein Taagepera (1979) Effective Number of Parties: A Measure with Appli-cation to West Europe, Comparative Political Studies, 12: 3-27.

Nicolau, Jairo (2008) Sistemas Eleitorais. Rio de Janeiro, FGV.

Taagepera, Rein and Matthew S. Shugart (1989) Seats and Votes: The Effects and Determinants ofElectoral Systems. New Haven: Yale University Press.

Examples

# Here are some examples, help yourself:# The wikipedia examples

A <- c(.75,.25);B <- c(.75,.10,rep(0.01,15))C <- c(.55,.45);

# The index by "laakso/taagepera" is the defaultPoliticalDiversity(A)PoliticalDiversity(B)

pollster2008 65

# Using the method proposed by Golosov gives:PoliticalDiversity(B, index="golosov")PoliticalDiversity(C, index="golosov")

# The 1980 presidential election in the US (vote share):US1980 <- c("Democratic"=0.410, "Republican"=0.507,"Independent"=0.066, "Libertarian"=0.011, "Citizens"=0.003,"Others"=0.003)

PoliticalDiversity(US1980)

# 2010 Brazilian legislative election

votes_2010 = c("PT"=13813587, "PMDB"=11692384, "PSDB"=9421347,"DEM"=6932420, "PR"=7050274, "PP"=5987670, "PSB"=6553345,"PDT"=4478736, "PTB"=3808646, "PSC"=2981714, "PV"=2886633,"PC do B"=2545279, "PPS"=2376475, "PRB"=1659973, "PMN"=1026220,"PT do B"=605768, "PSOL"=968475, "PHS"=719611, "PRTB"=283047,"PRP"=232530, "PSL"=457490,"PTC"=563145)

seats_2010 = c("PT"=88, "PMDB"=79, "PSDB"=53, "DEM"=43,"PR"=41, "PP"=41, "PSB"=34, "PDT"=28, "PTB"=21, "PSC"=17,"PV"=15, "PC do B"=15, "PPS"=12, "PRB"=8, "PMN"=4, "PT do B"=3,"PSOL"=3, "PHS"=2, "PRTB"=2, "PRP"=2, "PSL"=1,"PTC"=1)

PoliticalDiversity(seats_2010)

PoliticalDiversity(seats_2010, index= "golosov")

pollster2008 Polls for 2008 U.S. presidential election

Description

Polls for the 2008 U.S. presidential election. The data includes all presidential polls reported on theinternet site http://www.elections.huffingtonpost.com/pollster that were taken betweenAugust 29th, when John Mc-Cain announced that Sarah Palin would be his running mate as theRepublican nominee for vice president, and the end of September.

Usage

data(pollster2008)

Format

A data.frame object with 11 variables and 102 observations.

• PollTaker Polling organization.

• PollDates Dates the poll data were collected.

66 Proportionality

• MidDate Midpoint of the polling period.

• Days Number of days after August 28th (end of Democratic convention).

• n Sample size for the poll.

• Pop A=all, LV=likely voters, RV=registered voters.

• McCain Percent supporting John McCain.

• Obama Percent supporting Barak Obama.

• Margin Obama percent minus McCain percent.

• Charlie Indicator for polls after Charlie Gibson interview with VP candidate Sarah Palin(9/11).

• Meltdown Indicator for polls after Lehman Brothers bankruptcy (9/15).

@source http://www.elections.huffingtonpost.com/pollster

Proportionality Indexes of (Disproportionality) Proportionality

Description

Calculates several indexes of (dis)-proportionality, which are for the most part used to show therelationship of votes to seats.

Usage

Proportionality(v, s, index = "Gallagher", ...)

## Default S3 method:Proportionality(v, s, index = "Gallagher", margin = 1,

...)

Arguments

v a numeric vector with the percentage share of votes obtained by each party.

s a numeric vector with the percentage share of seats obtained by each party.

index the desired method or type of index, see details below for the correct name.

margin The margin for which the index is computed.

... additional arguments (currently ignored)

Proportionality 67

Details

The following measures are available:

• Loosemore-Hanby (Percent) Loosemore-Hanby Index of disproportionality• Rae (Percent) Rae Index of disproportionality• Cox-Shugart Cox-Shugart Index of proportionality• Inv.Cox-Shugart The inverted Cox-Shugart index• Farina Farina index of proportionality, aka cosine proportionality score• "Gallagher" (Percent) Gallagher index of disproportionality• "Inv.Gallagher" The inverse of Gallagher index• "Grofman" Grofman index of proportionality• "Inv.Grofman" Grofman index of proportionality• "Lijphart" Lijphart index of proportionality• "Inv.Rae" The inverse of Rae index• "Rose" Rose index of disproportionality• "Inv.Rose" The inverse of Rose index• "Sainte-Lague" Sainte-Lague index of disproportionality• "DHondt" D’Hondt index of proportionality• "Gini" Gini index of disproportionality• "Monroe" Monroe index of inequity

Value

Each iteration returns a single score.

Author(s)

Daniel Marcelino <[email protected]>

References

Duncan, O. and Duncan, B. (1955) A methodological analysis of segregation indexes. AmericanSociological Review 20:210-7.

Gallagher, M. (1991) Proportionality, disproportionality and electoral systems. Electoral Studies10(1):33-51.

Loosemore, J. and Hanby, V. (1971) The theoretical limits of maximum distortion: Som analyticalexpressions for electoral systems. British Journal of Political Science 1:467-77.

Koppel, M., and A. Diskin. (2009) Measuring disproportionality, volatility and malapportionment:axiomatization and solutions. Social Choice and Welfare 33, no. 2: 281-286.

Rae, D. (1967) The Political Consequences of Electoral Laws. London: Yale University Press.

Rose, Richard, Neil Munro and Tom Mackie (1998) Elections in Central and Eastern Europe Since1990. Glasgow: Centre for the Study of Public Policy, University of Strathclyde.

Taagepera, R., and B. Grofman. Mapping the indices of seats-votes disproportionality and inter-election volatility. Party Politics 9, no. 6 (2003): 659-77.

68 pub_pal

See Also

PoliticalDiversity, LargestRemainders, HighestAverages. For more details, see the Indicesvignette: vignette("Indices", package = "SciencesPo").

Examples

#' # 2012 Queensland state elecionpvotes= c(49.65, 26.66, 11.5, 7.53, 3.16, 1.47)pseats = c(87.64, 7.87, 2.25, 0.00, 2.25, 0.00)

Proportionality(pvotes, pseats) # default is Gallagher

Proportionality(pvotes, pseats, index="Rae")

# Proportionality(pvotes, pseats, index="Cox-Shugart")

# 2012 Quebec provincial election:pvotes = c(PQ=31.95, Lib=31.20, CAQ=27.05, QS=6.03, Option=1.89, Other=1.88)pseats = c(PQ=54, Lib=50, CAQ=19, QS=2, Option=0, Other=0)

Proportionality(pvotes, pseats, index="Rae")

pub_pal Color Palettes for Publication (discrete)

Description

Color palettes for publication-quality graphs. See details.

Usage

pub_pal(palette = "pub12")

Arguments

palette the palette name, a character string.

Details

The following palettes are available:

• "scipo"A 12-color colorblind discrete palette.

• "gray9"5-tons of gray.

• "carnival"A 5-color palette inspired in the Brazilian samba schools.

• "tableau20"Based on software Tableau

• "tableau10"Based on software Tableau

pub_shapes 69

• "tableau10light"Based on software Tableau

• "tableau10medium"Based on software Tableau

• "fte"fivethirtyeight.com color scales

Examples

library(scales)library(ggplot2)

show_col(pub_pal("scipo")(12))show_col(pub_pal("gray9")(9), labels = FALSE)show_col(pub_pal("carnival")(4))show_col(pub_pal("gdocs")(18))show_col(pub_pal("manyeyes")(20))show_col(pub_pal("tableau10")(10))show_col(pub_pal("tableau10medium")(10))show_col(pub_pal("tableau10light")(10))show_col(pub_pal("cyclic")(20))show_col(pub_pal("bivariate1")(9))

pub_shapes Shape scales for theme_pub (discrete)

Description

Discrete shape scales for theme_pub().

Usage

pub_shapes(palette = "pub")

Arguments

palette the palette name, a character string.

See Also

Other shape pub: scale_shape_pub

Examples

library(scales)

pub_shapes("default")(6)pub_shapes("proportions")(4)pub_shapes("gender")(2)

70 rdirichlet

rdirichlet Dirichlet Distribution

Description

Density function and random number generation for the Dirichlet distribution

Usage

rdirichlet(n, alpha)

Arguments

n number of random observations to draw.

alpha the Dirichlet distribution’s parameters. Can be a vector (one set of parametersfor all observations) or a matrix (a different set of parameters for each observa-tion), see “Details”.If alpha is a matrix, a complete set of α-parameters must be supplied for eachobservation. log returns the logarithm of the densities (therefore the log-likelihood)and sum.up returns the product or sum and thereby the likelihood or log-likelihood.

Value

The rdirichlet returns a matrix with n rows, each containing a single random number accordingto the supplied alpha vector or matrix.

Author(s)

Daniel Marcelino, <[email protected]>

Examples

# 1) General usage:rdirichlet(20, c(1,1,1) );alphas <- cbind(1:10, 5, 10:1);alphas;rdirichlet(10, alphas );alpha.0 <- sum( alphas );test <- rdirichlet(10, alphas );apply( test, 2, mean );alphas / alpha.0;apply( test, 2, var );alphas * ( alpha.0 - alphas ) / ( alpha.0^2 * ( alpha.0 + 1 ) );

# 2) A practical example of usage:# A Brazilian face-to-face poll by Datafolha conducted on Oct 03-04# with 18,116 interviews asking for their vote preferences among the# presidential candidates.

read.zTree 71

## First, draw a sample from the posteriorset.seed(1234);n <- 18116;poll <- c(40,24,22,5,5,4) / 100 * n; # The datamcmc <- 100000;sim <- rdirichlet(mcmc, alpha = poll + 1);

## Second, look at the margins of Aecio over Marina in the very last moment of the campaign:margin <- sim[,2] - sim[,3];mn <- mean(margin); # Bayes estimatemn;s <- sd(margin); # posterior standard deviation

qnts <- quantile(margin, probs = c(0.025, 0.975)); # 90% credible intervalqnts;pr <- mean(margin > 0); # posterior probability of a positive marginpr;

## Third, plot the posterior densityhist(margin, prob = TRUE, # posterior distribution

breaks = "FD", xlab = expression(p[2] - p[3]),main = expression(paste(bold("Posterior distribution of "), p[2] - p[3])));

abline(v=mn, col='red', lwd=3, lty=3);

read.zTree Reads zTree output files

Description

Extracts variables from a zTree output file.

Usage

read.zTree(object, tables = c("globals", "subjects"))

Arguments

object a zTree file or a list of files.

tables the tables of intrest.

Value

A list of dataframes, one for each table

72 Recode

Examples

## Not run: url <-zTables <- read.zTree( "131126_0009.xls" , "contracts" )zTables <- read.zTree( c("131126_0009.xls","131126_0010.xls"), c("globals","subjects", "contracts" ))

## End(Not run)

Recode Recode or Replace Values With New Values

Description

Recodes a value or a vector of values.

Usage

Recode(x, old, into, warn = TRUE)

## Default S3 method:Recode(x, old, into, warn = TRUE)

Arguments

x The vector whose values will be recoded.

old the old value or a vector of old values (numeric/factor/string).

into the new value or a vector of replacement values (1 to 1 recoding).

warn A logical to print a message if any of the old values are not actually present in x.

Examples

x <- LETTERS[1:5]Recode(x, c("B", "D"), c("Beta", "Delta"))

# On numeric vectorsx <- c(1, 4, 5, 9)Recode(x, old = c(1, 4, 5, 9), into = c(10, 40, 50, 90))

religiosity 73

religiosity Data on religiosity of countries

Description

Data from the 2007 Spring Survey conducted through the Pew Global Attitudes Project.

Usage

data(religiosity)

Format

A data.frame object with 9 variables and 44 observations.

• Country Name of country.• Religiosity A measure of degree of religiosity for residents of the country.• GDP Per capita Gross Domestic Product in the country.• Africa Indicator for countries in Africa.• EastEurope Indicator for countries in Eastern Europe.• MiddleEast Indicator for countries in the Middle East.• Asia Indicator for countries in Asia.• WestEurope Indicator for countries in Western Europe.• Americas Indicator for countries in North/South America.

@source The Pew Research Center’s Global Attitudes Project surveyed people around the worldand asked (among many other questions) whether they agreed that "belief in God is necessary formorality," whether religion is very important in their lives, and whether the at least once per day.The variable Religiosity is the sum of the percentage of positive responses on these three items,measured in each of 44 countries. The dataset also includes the per capita GDP for each country andindicator variables that record the part of the world the country is in. http://www.pewglobal.org

Render Render layer on top of a ggplot

Description

Set up a rendering layer on top of a ggplot.

Usage

Render(plot = NULL)

Arguments

plot the plot to use as a starting point, either a ggplot2 or gtable.

74 Rosenbluth

Rosenbluth Rosenbluth Index of Concentration

Description

Calculates the Rosenbluth index of concentration, also known as Hall or Tiedemann Indices.

Usage

Rosenbluth(x, n = rep(1, length(x)), na.rm = FALSE, ...)

## Default S3 method:Rosenbluth(x, n = rep(1, length(x)), na.rm = FALSE, ...)

Arguments

x A vector of data values of non-negative elements.

n A vector of frequencies of the same length as x.

na.rm A logical. Should missing values be removed? The Default is set to na.rm=FALSE.

... Additional arguements (currently ignored)

References

Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Hand-book of Income Distribution. Amsterdam.

Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.

See Also

Atkinson, Herfindahl, Gini, Lorenz. For more details see the Indices vignette: vignette("Indices", package = "SciencesPo")

Examples

# generate a vector (of incomes)x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)

# compute Rosenbluth coefficientRosenbluth(x)

RunShinyApp 75

RunShinyApp Examples from the SciencesPo Package

Description

Launch a Shiny app that shows a demo.

Usage

RunShinyApp(app)

Arguments

app the name of the shiny application.

Examples

# A demo of what \code{\link[SciencesPo]{PoliticalDiversity}} does.if (interactive()) {RunShinyApp("PoliticalDiversity")}

# Other examplesif (interactive()) {RunShinyApp("ConjugateNormal")

}

if (interactive()) {RunShinyApp("Regression")

}

scale_color_party Political Parties Color Palette (Discrete) and Scales

Description

An N-color discrete palette for political parties.

Usage

scale_color_party(palette = "BRA", ...)

Arguments

palette the palette name, a character string.

... Other arguments passed on to discrete_scale to control name, limits, breaks,labels and so forth.

76 scale_fill_party

See Also

party_pal for details and references.

Other color party: party_pal

scale_color_pub Publication color scales.

Description

See pub_pal for details.

Usage

scale_color_pub(palette = "pub12", ...)

Arguments

palette the palette name, a character string.... Other arguments passed on to discrete_scale to control name, limits, breaks,

labels and so forth.

See Also

pub_pal for references.

Other color publication: scale_fill_pub

scale_fill_party Political Parties Color Palette (Discrete) and Scales

Description

An N-color discrete palette for political parties.

Usage

scale_fill_party(palette = "BRA", ...)

Arguments

palette the palette name, a character string.... Other arguments passed on to discrete_scale to control name, limits, breaks,

labels and so forth.

See Also

party_pal for details and references.

scale_fill_pub 77

scale_fill_pub Publication color scales.

Description

See pub_pal for details.

Usage

scale_fill_pub(palette = "pub12", ...)

Arguments

palette the palette name, a character string.

... Other arguments passed on to discrete_scale to control name, limits, breaks,labels and so forth.

See Also

Other color publication: scale_color_pub

scale_shape_pub Shape scales for theme_pub (discrete)

Description

See pub_shapes for details.

Usage

scale_shape_pub(palette = "pub", ...)

Arguments

palette the palette name, a character string.

... common discrete scale parameters: name, breaks, labels, na.value, limitsand guide. See discrete_scale for more details

See Also

Other shape pub: pub_shapes

78 Scatterplot

Examples

library("ggplot2")library("scales")

p <- ggplot(mtcars) +geom_point(aes(x = wt, y = mpg, shape = factor(gear))) +facet_wrap(~am)

p + scale_shape_pub()

Scatterplot Quartile-Frame Scatterplot

Description

Plots a scatterplot with a custom axis that acts as a quartile plot (quartile-frame). Inspired by TheVisual Display of Quantitative Information by Edward R. Tufte.

Usage

Scatterplot(x, y, ...)

Arguments

x, y aesthetics passed into each layer variables # @param data data frame to use(optional).

... additional arguments.

Examples

with(mtcars, Scatterplot(x = wt, y = mpg,main = "Vehicle Weight-Gas Mileage Relationship",xlab = "Vehicle Weight",ylab = "Miles per Gallon",font.family = "serif") )

SciencesPo 79

SciencesPo A Tool Set for Analyzing Political Behavior Data

Description

Provide functions for analyzing elections and political behavior data, including measures of politicalfragmentation, seat apportionment, and small data visualization graphs.

Author(s)

NA

References

Marcelino, Daniel (2013). SciencesPo: A Tool Set for Analyzing Political Behaviour Data. Avail-able at SSRN: http://dx.doi.org/10.2139/ssrn.2320547

SciencesPo-deprecated Deprecated functions in package ‘SciencesPo’

Description

These functions are provided for compatibility with older versions of ‘SciencesPo’ only, and willbe defunct at the next release.

Usage

svTransform(y)

politicalDiversity(x, index = "laakso/taagepera", margin = 1,base = exp(1))

untable(x, ...)

detail(x, ...)

dummy(x, ...)

voronoi(x, ...)

normalize(x, ...)

outliers(x, ...)

destring(x, ...)

winsorize(x, ...)

80 SciencesPoFont

Arguments

y the dependent variable.x A data.frame, a matrix-like, or a vector containing values for the number of

votes or seats each party received.index The type of index desired, one of "laakso/taagepera", "golosov", "herfindahl",

"gini", "shannon", "simpson", "invsimpson".margin The margin for which the index is computed.base The logarithm base used in some indices, such as the "shannon" index.... Extra parameters.

Details

The following functions are deprecated and will be made defunct; use the replacement indicatedbelow:

• svTransform: Normalize• untable: Untable• politicalDiversity: PoliticalDiversity• winsorize: Winsorize• recode: Recode• dummy: Dummify• gini: Gini• clear: Clear• detail: Destring• destring: Destring• outliers: Outlier• voronoi: Voronoi• jackknife: Jackknife• bootstrap: Bootstrap• normalize: Normalize

SciencesPoFont SciencesPo Base Fonts

Description

Used to ascertain required theme fonts.

Usage

SciencesPoFont()

Author(s)

NA

sheston91 81

sheston91 The Penn World Table

Description

The Penn World Table used in Summers and Heston (1991). This dataset contains the followingcolumns:

• year Year

• pop Population (thousands)

• rgdppc Real per capita GDP

• savrat a numeric vector

• country Country

• com Communist regime

• opec OPEC country

• name Country name

Usage

data(sheston91)

Format

A data.frame object with 8 variables and 3250 observations.

Source

Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/

References

Summers, R. and Heston, A. (1991) The Penn World Table (Mark 5): an expanded set of interna-tional comparisons, 1950–1988. The Quarterly Journal of Economics, 106(2), 327–368.

82 Skewness

Skewness Compute the Skewness

Description

Provides three methods for performing skewness test.

Usage

Skewness(x, na.rm = TRUE, type = 3)

Arguments

x a numeric vector containing the values whose skewness is to be computed.

na.rm a logical value for na.rm, default is na.rm=TRUE.

type an integer between 1 and 3 for selecting the algorithms for computing the skew-ness, see details below.

Details

Skewness is a measure of symmetry distribution. Negative skewness (g_1 < 0) indicates that themean of the data distribution is less than the median, and the data distribution is left-skewed. Posi-tive skewness (g_1 > 0) indicates that the mean of the data values is larger than the median, and thedata distribution is right-skewed. Values of g_1 near zero indicate a symmetric distribution.

Value

An object of the same type as x

Note

There are several methods to compute skewness, Joanes and Gill (1998) discuss three of the mosttraditional methods. According to them, type 3 performs better in non-normal population distri-bution, whereas in normal-like population distribution type 2 fits better the data. Such differencebetween the two formulae tend to disappear in large samples. Type 1: g_1 = m_3/m_2^(3/2).

Type 2: G_1 = g_1*sqrt(n(n-1))/(n-2).

Type 3: b_1 = m_3/s^3 = g_1 ((n-1)/n)^(3/2).

Author(s)

Daniel Marcelino, <[email protected]>

References

Joanes, D. N. and C. A. Gill. (1998) Comparing measures of sample skewness and kurtosis. TheStatistician, 47, 183–189.

stature 83

Examples

w <-sample(4,10, TRUE)x <- sample(10, 1000, replace=TRUE, prob=w)Skewness(x, type = 1)Skewness(x, type = 2)Skewness(x)

stature The Measure of US Presidents

Description

The US presidents and their main opponents’ heights (cm). This dataset contains the followingcolumns:

• election The election year.

• winner The winner candidate.

• winner.height The winner candidate’s height in centimeters (cm).

• winner.vote The popular vote support for the winner.

• winner.party The winner’s party.

• opponent The main opponent candidate.

• opponent.height The opponent candidate’s height in centimeters (cm).

• opponent.vote Popular vote support for the main opponent candidate.

• opponent.party The opponent’s party.

• turnout The electorate turnout in percentages.

• winner.bmi The winner Body Mass Index (BMI) estimate (BMI = weight in kg/(height in meter)**2)

@references Murray, G. R. (2014) Evolutionary preferences for physical formidability in leaders.Politics and the Life Science, 33(1), 33-53.

Usage

data(stature)# t.test(stature$winner.height, stature$opponent.height, var.equal=TRUE)

Format

A data.frame object with 11 variables and 57 observations.

Source

US Presidents: http://www.lingerandlook.com/Names/Presidents.php Inside Gov. http://www.us-presidents.insidegov.com.

84 Stratify

Stratify Stratified Sampling

Description

Sample row values of a data frame conditional to some strata attributes.

Usage

Stratify(.data, group, size, select = NULL, replace = FALSE,both.sets = FALSE)

Arguments

.data the data frame.

group the grouping factor, may be a list.

size the sample size.

select if sampling from a specific group or list of groups.

replace should sampling be with replacement?

both.sets if TRUE, both ‘sample‘ and ‘.data‘ are returned.

Examples

data(pollster2008)

# Let's take a 10% sample from all -PollTaker- groups in pollster2008Stratify(pollster2008, "PollTaker", 0.1)

# Let's take a 10% sample from only 'LV' and 'RV' groups from -Pop- in pollster2008Stratify(pollster2008, "Pop", 0.1, select = list(Pop = c("LV", "RV")))

# Let's take 3 samples from all -PollTaker- groups in pollster2008,# specified by column 1

Stratify(pollster2008, group = 1, size = 3)

# Let's take a sample from all -Pop- groups in pollster2008, where we# specify the number wanted from each groupStratify(pollster2008, "Pop", size = c(3, 5, 4))

# Use a two-column strata (-Pop- and -PollTaker-) but only interested in# cases where -Pop- == 'LV'Stratify(pollster2008, c("Pop", "PollTaker"), 0.15, select = list(Pop = "LV"))

summary.Crosstable 85

summary.Crosstable Print Crosstable style

Description

Print Crosstable style

Usage

## S3 method for class 'Crosstable'summary(object, digits = 2, chisq = FALSE,fisher = FALSE, mcnemar = FALSE, alt = "two.sided", latex = FALSE,...)

Arguments

object The table object.

digits number of digits after the decimal point.

chisq if TRUE, the results of a chi-square test will be shown below the table.

fisher if TRUE, the results of a Fisher Exact test will be shown below the table.

mcnemar if TRUE, the results of a McNemar test will be shown below the table.

alt the alternative hypothesis and must be one of "two.sided", "greater" or "less".Only used in the 2 by 2 case.

latex if the output is to be printed in latex format.

... extra parameters.

See Also

Other Crosstables: Crosstable

swatson93 Stock’s and Watson’s (1993) Data.

Description

Data set used by Stock and Watson (1993) to estimate co-integration. This dataset contains thefollowing columns:

• lnm1 Log M1.

• lnp Log NNP price deflator.

• lnnnp Log NNP.

• cprate A numeric vector.

• year Commercial paper rate.

86 Ternaryplot

Usage

data(swatson93)

Format

A data.frame object with 5 variables and 90 observations.

Source

Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/

References

Stock, J. H., and Watson, M. W. (1993) A simple estimator of cointegrating vectors in higher orderintegrated systems. Econometrica: Journal of the Econometric Society, 783–820.

Tableplot Tableplot: a Visualization of Datasets

Description

Visualization of Large Datasets.

Usage

Tableplot(.data = NULL)

Arguments

.data the data frame.

Ternaryplot Ternary Plot Build ternary plots.

Description

Ternary Plot Build ternary plots.

Usage

Ternaryplot(data, ...)

theme_blank 87

Arguments

data Default dataset to use for plot. If not already a data.frame, will be converted toone by fortify. If not specified, must be suppled in each layer added to theplot.

... Other arguments passed on to methods. Not currently used.

theme_blank Create a Completely Empty Theme

Description

The theme created by this function shows nothing but the plot panel.

Usage

theme_blank(base_size = 12, base_family = "serif", legend = "none")

Arguments

base_size overall font size. Default is 12.

base_family a name for default font family.

legend the legend position.

Value

The theme.

Author(s)

NA

Examples

# plot with small amount of remaining paddingqplot(1:10, (1:10)^2) + theme_blank()

# remaining padding removedqplot(1:10, (1:10)^2) + theme_blank() + labs(x = NULL, y = NULL)

# Check that it is a complete themeattr(theme_blank(), "complete")

88 theme_fte

theme_darkside The Dark Side Theme

Description

The dark side of the things.

Usage

theme_darkside(base_size = 12, base_family = "serif", legend = "none")

Arguments

base_size overall font size. Default is 12.

base_family the name for default font family.

legend the position of the legend if any.

Value

The theme. # plot with small amount of padding qplot(1:10, (1:10)^2, color="green") + theme_darkside()

# Check that it is a complete theme attr(theme_darkside(), "complete")

Author(s)

NA

theme_fte Themes for ggplot2 Graphs

Description

Theme for plotting with ggplot2.

Usage

theme_fte(legend = "none", legend_title = FALSE, base_size = 12,horizontal = TRUE, base_family = "", colors = c("#F0F0F0", "#D9D9D9","#60636A", "#525252"))

theme_pub 89

Arguments

legend enables to set legend position, default is "none".

legend_title will the legend have a title? Defaults is FALSE.

base_size overall font size. Default is 13.

horizontal logical. Horizontal axis lines?

base_family a nmae for default font family.

colors default colors used in the plot in the following order: background, lines, text,and title.

Value

The theme.

Author(s)

NA

Examples

qplot(1:10, (1:10)^3) + theme_fte()

mycolors = c("wheat", "#C2AF8D", "#8F6D2F", "darkred")qplot(1:10, (1:10)^3) +theme_fte(colors=mycolors)

# Check that it is a complete themeattr(theme_fte(), "complete")

theme_pub The Default Theme

Description

After loading the SciencesPo package, this theme will be set to default for all subsequent graphsmade with ggplot2.

Usage

theme_pub(legend = "bottom", base_size = 12, base_family = "",horizontal = FALSE, axis_line = FALSE)

90 theme_scipo

Arguments

legend enables to set legend position, default is "bottom".

base_size overall font size. Default is 14.

base_family a name for default font family.

horizontal logical. Horizontal axis lines?

axis_line enables to set x and y axes.

Value

The theme.

Author(s)

NA

See Also

theme, theme_538, theme_blank.

Examples

Previewplot() + theme_pub()

# Anscombe datadat <- data.frame()for(i in 1:4)dat <- rbind(dat, data.frame(set=i, x=anscombe[,i], y=anscombe[,i+4]))

gg <- ggplot(dat, aes(x, y))gg <- gg + geom_point(size=5, color="red", fill="orange", shape=21)gg <- gg + geom_smooth(method="lm", fill=NA, fullrange=TRUE)gg <- gg + facet_wrap(~set, ncol=2)gg <- gg + theme_pub(base_family=SciencesPoFont())gg <- gg + theme(plot.background=element_rect(fill="#f7f7f7"))gg <- gg + theme(panel.background=element_rect(fill="#f7f7f7"))

theme_scipo The SciPo Theme

Description

The scpo theme for using with ggplot2 objects.

theme_scipo 91

Usage

theme_scipo(base_family = "", base_size = 11,strip_text_family = base_family, strip_text_size = 12,title_family = "", title_size = 18, title_margin = 10,margins = margin(10, 10, 10, 10), subtitle_family = "",subtitle_size = 12, subtitle_margin = 15, caption_family = "",caption_size = 9, caption_margin = 10,axis_title_family = subtitle_family, axis_title_size = 9,axis_title_just = "rt", grid = TRUE, axis = FALSE, ticks = FALSE)

Arguments

base_family base font family

base_size base font sizestrip_text_family

facet label font familystrip_text_size

facet label text size

title_family plot tilte family

title_size plot title font size

title_margin plot title margin

margins plot marginssubtitle_family

plot subtitle family

subtitle_size plot subtitle sizesubtitle_margin

plot subtitle margin

caption_family plot caption family

caption_size plot caption size

caption_margin plot caption marginaxis_title_family

axis title font familyaxis_title_size

axis title font sizeaxis_title_just

axis title font justification blmcrt

grid panel grid (TRUE, FALSE, or a combination of X, x, Y, y)

axis axis TRUE, FALSE, [xy]

ticks ticks

Value

The theme.

92 titanic

Examples

# plot with small amount of padding# qplot(1:10, (1:10)^2, color="green") + theme_scipo()

# Check that it is a complete theme# attr(theme_scipo(), "complete")

titanic Titanic

Description

Population at Risk and Death Rates for an Unusual Episode. For each person on board the fatalmaiden voyage of the ocean liner Titanic, this dataset records sex, age [adult/child], economic status[first/second/third class, or crew] and whether or not that person survived. This dataset contains thefollowing columns:

• CLASS Class (0 = crew, 1 = first, 2 = second, 3 = third)• AGE Age (1 = adult, 0 = child)• SEX Sex (1 = male, 0 = female)• SURVIVED Survived (1 = yes, 0 = no)

Usage

data(titanic)

Format

A data.frame object with 4 variables and 2201 observations.

Note

There is not complete agreement among primary sources as to the exact numbers on board, rescued,or lost. STORY BEHIND THE DATA: The sinking of the Titanic is a famous event, and newbooks are still being published about it. Many well-known facts–from the proportions of first-classpassengers to the "women and children first" policy, and the fact that that policy was not entirelysuccessful in saving the women and children in the third class–are reflected in the survival rates forvarious classes of passenger. These data were originally collected by the British Board of Trade intheir investigation of the sinking.

Source

British Board of Trade Inquiry Report (1990). Report on the Loss of the ‘Titanic’ (S.S.)", Gloucester,UK: Allan Sutton Publishing.

References

Dawson (1995). "The ‘Unusual Episode’ Data Revisited" in the Journal of Statistics Education.

Today 93

Today Print the current date in a pretty format

Description

Print the current date in a pretty format.

Usage

Today()

Value

string

Examples

Today()

turnout Turnout Data

Description

Data on voter turnout in the 50 states and D.C. for the 1992 Presidential election and 1990 Con-gressional elections. Per capita income, populations in poverty and populations with no high schooldegree are also given. This dataset contains the following columns:

• v1 state name (alphabetic, 20 characters)

• v2 region of country: 1 Northeast 2 Midwest 3 South 4 West

• v3 division within region: 1 Northeast-New England 2 Northeast-Middle Atlantic 3 Midwest-East North Central 4 Midwest-West North Central 5 South-South Atlantic 6 South-East SouthCentral 7 South-West South Central 8 West-Mountain 9 West-Pacific.

• v4 "Elazar’s state political culture assignments: 1 = moralistic 2 = individualistic 3 = tradi-tionalistic

• v5 percent of population below poverty level, 1992.

• v6 per capita personal income, 1993.

• v7 percent casting votes for presidential electors, 1992.

• v8 percent casting votes for U.S. Representatives, 1990.

• v9 population without a high school degree, of those 25 years or older, 1990.

• v10 population 25 years or older, 1990.

• v11 South = 1; all others = 0.

94 twins

Usage

data(turnout)

Format

A data.frame object with 11 variables and 50 observations.

Source

U.S. Bureau of the Census, Statistical Abstract of the United States, 1994.

twins Burt’s Twin Data

Description

The given data are IQ scores from identical twins; one raised in a foster home, and the other raisedby birth parents. This dataset contains the following columns:

• C Social class, C1=high, C2=medium, C3=low, a factor.

• IQb biological.

• IQf foster.

@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.

Usage

data(twins)

Format

A data.frame object with 3 variables and 27 observations.

Source

Burt, C. (1966). The genetic estimation of differences in intelligence: A study of monozygotictwins reared together and apart. Br. J. Psych., 57, 147-153.

TwoWay 95

TwoWay Cross Tabulation

Description

TwoWay is a modified version of CrossTable (gmodels). TwoWay will print a summary table withcell counts and column proportions (similar to STATA’s tabulate.

Usage

TwoWay(x, y, digits = 3)

Arguments

x, y the variables for the cross tabulation.

digits number of digits for rounding proportions.

unions Poll attitudes towards British trade unions

Description

The British polling company Ipsos MORI conducted several opinion polls in the UK between 1975and 1995 in which they asked whether people agree or disagree with the statement "Trade unionshave too much power in Britain today".

Usage

data(unions)

Format

A data.frame object with 7 variables and 17 observations.

• Date Month of the poll Aug-77 to Sep-79.

• AgreePct Percent who agree (unions have too much power).

• DisagreePct Percent who disagree.

• NetSupport DisagreePct-AgreePct.

• Months Months since August 1975.

• Late 1=after 1986 or 0=before 1986.

• Unemployment Unemployment rate.

@source http://www.ipsos-mori.com/researchpublications/researcharchive/poll.aspx?oItemID=94

96 Untable

Untable Untable an Aggregated Data Frame

Description

A method for recovering a data.frame out of summarized data, i.e. contingency table or aggregateddata.

Usage

Untable(x, ...)

## S3 method for class 'data.frame'Untable(x, freq = "Freq", row.names = NULL, ...)

## Default S3 method:Untable(x, dimnames = NULL, type = NULL,row.names = NULL, col.names = NULL, ...)

Arguments

x the table object as a data.frame, table, or, matrix.

freq the column name of count values.

row.names row names to add to data.frame if any.

dimnames set the dimnames of object if required.

type the type of variable. If NULL, ordered factor is returned.

col.names column names to add to the data.frame.

... Extra parameters ignored.

See Also

expand.grid, gl.

Examples

gss <- data.frame(expand.grid(sex=c("female", "male"),party=c("dem", "indep", "rep")),count=c(279,165,73,47,225,191))

print(gss) # aggregated data.frame

# Then expand it:GSS <- Untable(gss, freq="count")head(GSS)

Voronoi 97

# Expand from a table or xtable object:# Fisher's Tea-Tasting Experiment datatea <- table(poured=c("Yes", "Yes", "Yes", "No","Yes","No", "No", "No"),

guess=c("Yes", "Yes", "Yes", "Yes", "No", "No","No","No"))

Untable(tea)

# Expand with a vector of weightsUntable(c(3,3,3), dimnames=list(c("Brazil","Colombia","Argentina")))

Voronoi Voronoi tessellation diagram

Description

Computes voronoi tessellation diagrams, and Dirichlet tessellation (after Peter Gustav LejeuneDirichlet).

Usage

Voronoi(n = 100, d, dim = 1000, method = NULL, seed = 51, plot = TRUE)

Arguments

n an integer for a finite set of points.

d an integer for a finite set of dimensions.

dim the image dimension.

method the distance computation method. One of euclidean, manhattan, maximum, canberra(currently not implemented).

seed an integer for random seed.

plot logical. If TRUE, a plot is returned, else, a data.frame is returned.

Details

https://en.wikipedia.org/wiki/Voronoi_diagram

Author(s)

Daniel Marcelino, <[email protected]>

Examples

## Not run: Voronoi(n=20, d=5, dim=1000)

98 Waffleplot

vote Voting correlations

Description

A dataset with voting correlations of traits, where each trait is measured for 17 developed countries(Europe, US, Japan, Australia, New Zealand).

Usage

data(vote)

Format

A 12x12 matrix object with 12 variables and 12 observations.

Source

Torben Iversen and David Soskice (2006). Electoral institutions and the politics of coalitions: Whysome democracies redistribute more than others. American Political Science Review, 100, 165-81.Table A2.

References

Using Graphs Instead of Tables. http://www.tables2graphs.com/doku.php?id=03_descriptive_statistics

Waffleplot Make waffle (square pie) charts

Description

Given a named vector, this function will return a ggplot object that represents a waffle chart of thevalues. The individual values will be summed up and each that will be the total number of squaresin the grid.

Usage

Waffleplot(parts, rows = 10, xlab = NULL, title = NULL, colors = NA,size = 2, legend_pos = "right", flip = FALSE, reverse = FALSE,equal = TRUE, pad = 0, use_glyph = FALSE, glyph_size = 12)

WeightedCorrelation 99

Arguments

parts named vector of values to use for the chart

rows number of rows of blocks

xlab text for below the chart. Highly suggested this be used to give the "1 sq == xyz"relationship if it’s not obvious

title chart title

colors exactly the number of colors as values in parts. If omitted, Color Brewer "Set2"colors are used.

size width of the separator between blocks (defaults to 2)

legend_pos position of legend

flip flips x & y axes

reverse reverses the order of the data

equal by default, waffle uses coord_equal; this can cause layout problems, so you anuse this to disable it if you are using ggsave or knitr to control output sizes (ormanually sizing the chart)

pad how many blocks to right-pad the grid with

use_glyph use specified FontAwesome glyph

glyph_size size of the FontAwesome font

Note

If the vector is not named or only partially named, capital letters will be used instead. If youspecify a string (vs FALSE) to use_glyph the function will map the input to a FontAwesome glyphname and use that glyph for the tile instead of a block (making it more like an isotype pictogramthan a waffle chart). You’ll need to actually install FontAwesome and use the extrafont package(https://github.com/wch/extrafont) to be able to use the FontAwesome glyphs.

Examples

tiles <- c(One=80, Two=30, Three=20, Four=10)Waffleplot(tiles, rows=8)

Senate <- c(`Male (44%)`=44, `Female (56%)`=56)Waffleplot(Senate, rows=10, size=0.5, colors=c("#af9139", "#544616"))

WeightedCorrelation Compute Weighted Correlations

Description

Compute the weighted correlation.

100 WeightedMean

Usage

WeightedCorrelation(x, y = NULL, weights = NULL)

Arguments

x a matrix or vector to correlate with y.

y a matrix or vector to correlate with x. If y is NULL, x will be used instead.

weights an optional vector of weights to be used to determining the weighted mean andvariance for calculation of the correlations.

Examples

x <- sample(10,10)y <- sample(10,10)w <- sample(5,10, replace=TRUE)

WeightedCorrelation(x, y, w)

WeightedMean Computes Weighted Mean

Description

Compute the weighted mean of data.

Usage

WeightedMean(x, weights = NULL, normwt = "ignored", na.rm = TRUE)

Arguments

x a numeric vector.

weights a numeric vector of weights of x.

normwt ignored at the moment.

na.rm a logical, if TRUE, missing data will be dropped. If na.rm = FALSE, missingdata will return an error.

Examples

x <- sample(10,10)w <- sample(5,10, replace=TRUE)

WeightedMean(x, w)

WeightedStdz 101

WeightedStdz Computes Weighted Standardized Values

Description

Computes weighted standardized values of data.

Usage

WeightedStdz(x, weights = NULL)

Arguments

x a vector for which a set of standardized values is desired.weights a vector of weights to be used to determining weighted values of x.

Examples

x <- sample(10,10)w <- sample(5,10, replace=TRUE)

WeightedStdz(x, w)

WeightedVariance Weighted Variance

Description

Compute the weighted variance.

Usage

WeightedVariance(x, weights = NULL, na.rm = FALSE)

Arguments

x a numeric vector.weights a numeric vector of weights for x.na.rm a logical if NA should be disregarded.

Examples

wt=c(1.23, 2.12, 1.23, 0.32, 1.53, 0.59, 0.94, 0.94, 0.84, 0.73)x = c(5, 5, 4, 4, 3, 4, 3, 2, 2, 1)

WeightedVariance(x, wt)

102 Winsorize

Winsorize Winsorized Mean

Description

Compute the winsorized mean, which consists of recoding the top k values of a vector.

Usage

Winsorize(x, k = 1, na.rm = TRUE)

Arguments

x The vector to be winsorized

k An integer for the quantity of outlier elements that to be replaced in the calcula-tion process

na.rm a logical value for na.rm, default is na.rm=TRUE.

Details

Winsorizing a vector will produce different results than trimming it. While by trimming a vectorcauses extreme values to be discarded, by winsorizing it in the other hand, causes extreme values tobe replaced by certain percentiles.

Value

An object of the same type as x

References

Dixon, W. J., and Yuen, K. K. (1999) Trimming and winsorization: A review. The AmericanStatistician, 53(3), 267–269.

Dixon, W. J., and Yuen, K. K. (1960) Simplified Estimation from Censored Normal Samples, TheAnnals of Mathematical Statistics, 31, 385–391. @references Wilcox, R. R. (2012) Introduction torobust estimation and hypothesis testing. Academic Press, 30-32. Statistics Canada (2010) SurveyMethods and Practices.

@note One may want to winsorize estimators, however, winsorization tends to be used for one-variable situations.

@author Daniel Marcelino, <[email protected]>

@examples set.seed(51) # for reproducibility x <- rnorm(50) ## introduce outlier x[1] <- x[1] * 10

# Compare to mean: mean(x) Winsorize(x)

words 103

words Word frequencies from Mosteller and Wallace

Description

The data give the frequencies of words in works from four different sources: the political writingsof eighteenth century American political figures Alexander Hamilton, James Madison, and JohnJay, and the book Ulysses by twentieth century Irish writer James Joyce. This dataset contains thefollowing columns:

• Hamilton Hamilton frequency.

• HamiltonRank Hamilton rank.

• Madison Madison frequency.

• MadisonRank Madison rank.

• Jay Jay frequency.

• JayRank Jay rank.

• Ulysses Word frequency in Ulysses.

• UlyssesRank Word rank in Ulysses.

@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.

Usage

data(words)

Format

A data.frame object with 8 variables and 165 observations.

Source

Mosteller, F. and Wallace, D. (1964). Inference and Disputed Authorship: The Federalist. Reading,MA: Addison-Wesley.

104 %nin%

%nin% Not in (Find Matching or Non-Matching Elements)

Description

" logical vector indicating if there is a match or not for its left operand. A true vector elementindicates no match in left operand, false indicates a match.

Usage

x %nin% table

Arguments

x A vector of numeric, character, or factor values.

table A vector (numeric, character, factor), matching the mode of x

Examples

c('a','b','c') %nin% c('a','b')

Index

∗Topic Clean-upMakeNumeric, 51MakeShare, 52

∗Topic Concentration,GiniSimpson, 38Lorenz, 50

∗Topic ConcentrationGini, 37

∗Topic Distributionsddirichlet, 22rdirichlet, 70

∗Topic Diversity,Gini, 37GiniSimpson, 38Lorenz, 50

∗Topic ElectoraldHondt, 24HighestAverages, 42LargestRemainders, 48PoliticalDiversity, 63

∗Topic Exploratory,PoliticalDiversity, 63

∗Topic ExploratoryCI, 17Crosstable, 20Describe, 23Outlier, 57WeightedVariance, 101Winsorize, 102

∗Topic InequalityGiniSimpson, 38

∗Topic ManipulationRecode, 72Stratify, 84

∗Topic ModellingBootstrap, 13Normalize, 55

∗Topic ModelsDummify, 25

∗Topic RescalingNormalize, 55

∗Topic SamplingPermutate, 60

∗Topic TestsBartels, 8JamesStein, 45

∗Topic TransformationNormalize, 55

∗Topic VizLorenz, 50

∗Topic datasetsalpha, 4auto, 7baseball, 10bhodrick93, 10big5, 11bollen, 12bush, 14cathedrals, 15cgreene76, 16chismoke, 17election2008, 26galton, 32geom_dumbbell, 33geom_lollipop, 35GeomSpotlight, 32griliches76, 39ltaylor96, 51marriage, 52mishkin92, 54nerlove63, 55paired, 58Palettes, 59pollster2008, 65religiosity, 73sheston91, 81stature, 83swatson93, 85

105

106 INDEX

titanic, 92turnout, 93twins, 94unions, 95vote, 98words, 103

∗Topic ggplot2geom_dumbbell, 33geom_lollipop, 35GeomSpotlight, 32party_pal, 59Plotting, 61pub_pal, 68pub_shapes, 69scale_color_party, 75scale_color_pub, 76scale_fill_party, 76scale_fill_pub, 77scale_shape_pub, 77SciencesPoFont, 80theme_blank, 87theme_darkside, 88theme_fte, 88theme_pub, 89

%nin%, 104

aes, 34, 35aes_, 34, 35align_title_left (Plotting), 61align_title_right (Plotting), 61alpha, 4Anonymize, 5Atkinson, 6, 37, 41, 74auto, 7

Bartels, 8baseball, 10bhodrick93, 10big5, 11bollen, 12Bootstrap, 13, 80borders, 34, 36bush, 14

Calibrateplot, 14cathedrals, 15cgreene76, 16chismoke, 17CI, 17

Clear, 18, 80Condorcet, 19CountMissing, 19CrossTable, 95Crosstable, 20, 30, 31, 85CrossValidation, 21

ddirichlet, 22Describe, 23Destring, 80destring (SciencesPo-deprecated), 79Detail (Describe), 23detail (SciencesPo-deprecated), 79dHondt, 24, 40Dirichlet (rdirichlet), 70discrete_scale, 75–77Disproportionality (Proportionality), 66Dummify, 25, 80dummy (SciencesPo-deprecated), 79

election2008, 26expand.grid, 96

Flag, 27Footnote, 27Forge, 28Formatted, 29fortify, 34, 35, 87Freq (Frequency), 31freq, 30, 31Frequency, 20, 30, 31

galton, 32geom_dumbbell, 33geom_lollipop, 35geom_spotlight (GeomSpotlight), 32GeomDumbbell (geom_dumbbell), 33GeomLollipop (geom_lollipop), 35GeomSpotlight, 32ggplot, 34, 35Gini, 7, 37, 38, 41, 50, 74, 80GiniSimpson, 37, 38, 50gl, 96griliches76, 39

Hamilton, 25, 40Herfindahl, 7, 37, 41, 74HighestAverages, 25, 40, 42, 49, 68

Jackknife, 44, 80

INDEX 107

JamesStein, 45

Kurtosis, 45

Label, 47label, 48Label<-, 47LargestRemainders, 25, 40, 43, 48, 68layer, 34, 35legend_bottom (Plotting), 61legend_left (Plotting), 61legend_right (Plotting), 61legend_top (Plotting), 61Lorenz, 37, 50, 74ltaylor96, 51

MakeNumeric, 51MakeShare, 52marriage, 52MeanFromRange, 53mishkin92, 54move_legend (Plotting), 61

nerlove63, 55no_axes (Plotting), 61no_axes_titles (Plotting), 61no_gridlines (Plotting), 61no_legend (Plotting), 61no_legend_title (Plotting), 61no_major_gridlines (Plotting), 61no_major_x_gridlines (Plotting), 61no_major_y_gridlines (Plotting), 61no_minor_gridlines (Plotting), 61no_minor_x_gridlines (Plotting), 61no_minor_y_gridlines (Plotting), 61no_ticks (Plotting), 61no_title (Plotting), 61no_x_axis (Plotting), 61no_x_gridlines (Plotting), 61no_x_line (Plotting), 61no_x_text (Plotting), 61no_x_ticks (Plotting), 61no_x_title (Plotting), 61no_y_axis (Plotting), 61no_y_gridlines (Plotting), 61no_y_line (Plotting), 61no_y_text (Plotting), 61no_y_ticks (Plotting), 61no_y_title (Plotting), 61

Normalize, 55, 80normalize (SciencesPo-deprecated), 79

Outlier, 57, 80outliers (SciencesPo-deprecated), 79

paired, 58Palettes, 59party_pal, 59, 76Permutate, 60Plotting, 61PoliticalDiversity, 25, 40, 41, 43, 49, 63,

68, 80politicalDiversity

(SciencesPo-deprecated), 79pollster2008, 65Previewplot (Plotting), 61PreviewTheme (Plotting), 61prop.table, 20Proportionality, 43, 49, 66pub_pal, 68, 76, 77pub_shapes, 69, 77

rdirichlet, 70read.zTree, 71Recode, 72, 80religiosity, 73Render, 73Rosenbluth, 7, 37, 41, 74rotate_axes_text (Plotting), 61rotate_x_text (Plotting), 61rotate_y_text (Plotting), 61RunShinyApp, 75

scale, 56scale_color_party, 59, 75scale_color_pub, 76, 77scale_fill_party, 76scale_fill_pub, 76, 77scale_shape_pub, 69, 77Scatterplot, 78SciencesPo, 79SciencesPo-deprecated, 79SciencesPo-package (SciencesPo), 79SciencesPoFont, 80sheston91, 81Simpson (Herfindahl), 41Skewness, 82stature, 83

108 INDEX

Stratify, 84summary.Crosstable, 20, 85svTransform (SciencesPo-deprecated), 79swatson93, 85

table, 20Tableplot, 86Ternaryplot, 86theme, 90theme_538, 90theme_538 (theme_fte), 88theme_blank, 87, 90theme_darkside, 88theme_fte, 88theme_pub, 89theme_scipo, 90titanic, 92Today, 93turnout, 93twins, 94TwoWay, 95

unions, 95Untable, 80, 96untable (SciencesPo-deprecated), 79

Voronoi, 80, 97voronoi (SciencesPo-deprecated), 79vote, 98

Waffleplot, 98WeightedCorrelation, 99WeightedMean, 100WeightedStdz, 101WeightedVariance, 101Winsorize, 58, 80, 102winsorize (SciencesPo-deprecated), 79words, 103

xtabs, 20