using social network information in survey estimation · 2 social networks 3 linear models that use...
TRANSCRIPT
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Using Social Network Information In SurveyEstimation
Thomas Suße and Raymond Chambers
National Institute for Applied Statistics Research Australia (NIASRA)University of Wollongong
2013 Graybill Conference, Fort Collins, Colorado
11 June 2013
1/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Outline
1 Introduction
2 Social Networks
3 Linear Models that Use Social Network Data
4 Simulation Study
5 Conclusions
2/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Outline
1 Introduction
2 Social Networks
3 Linear Models that Use Social Network Data
4 Simulation Study
5 Conclusions
2/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Outline
1 Introduction
2 Social Networks
3 Linear Models that Use Social Network Data
4 Simulation Study
5 Conclusions
2/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Outline
1 Introduction
2 Social Networks
3 Linear Models that Use Social Network Data
4 Simulation Study
5 Conclusions
2/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Outline
1 Introduction
2 Social Networks
3 Linear Models that Use Social Network Data
4 Simulation Study
5 Conclusions
2/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Introduction
Population U of size NSample s of size n, remainder of population r := U \s of sizeN−nSurvey variable Y with realisations yi , i ∈ UFocus on estimating population total ty = ∑i∈U yi
Auxiliary variables X1, . . . ,Xp
Non-informative sampling method given population values ofauxiliariesModel-based approach
3/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
A Place To Start
Simple linear model for Y in terms of X1, . . . ,Xp
yi = x1iβ1 + · · ·+ xpiβp + εi
εi ∼ (0,σ2)
In matrix terms
yi = XTi β + εi or YU = XUβ + εU
Best linear unbiased predictor (BLUP) for population total ty
ty = ∑i∈s
yi + ∑i∈r
yi = 1Ts Ys + 1T
r (Xr β )
β = (XTs Xs)−1XT
s Ys, YU =
(YsYr
), XU =
[XsXr
]4/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
A More Complex Reality: Hierarchical Data
Data available at enumeration district (ED) and ward levelIndividuals i (level 1); EDs j (level 2); wards k (level 3)Multilevel model:
yijk = XTijk β + u(3)
k + u(2)jk + u(1)
ijk
withu(3)
k ∼(
0,τ(3)),u(2)
jk ∼(
0,τ(2)),u(1)
ijk ∼(
0,τ(1))
5/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
A Patterned Covariance Structure
Var(yijk ) = σ2 = τ(3) + τ(2) + τ(1)
Cov(yijk ,ylmn) =
τ(2) + τ(3) different people, same EDτ(3) different EDs, same ward0 different wards
Linear model for population has the form
YU = XUβ + εU ,εU ∼ (0,σ2VU)
where VU has a nested block diagonal structure
6/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
The General BLUP
BLUP for dependent responses
ty = 1Ts Ys + 1T
r
{Xr β + VrsV−1
ss (Ys−Xsβ )}
with best linear unbiased estimator (BLUE)
β s = (XTs V−1
ss Xs)−1XTs V−1
ss Ys
and
VU =
[Vss VrsVrs Vrr
]
7/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Using Social Networks to Characterise Non-Hierarchical Dependence
Widespread (Facebook, Linkedin, Google, family, friends,colleagues, etc.)N actors or nodesSimplest characterisation via adjacency matrix ZU = [Zij ]
Ni ,j=1 with
Zij = 1 if relationship (’edge’) exists between i and j ; Zij = 0otherwiseZU has zero main diagonal and is symmetric (undirectednetwork) or asymmetric (directed network)Extensions exist for multiple types of relationships and count orcontinuous values for Zij , e.g. level/strength of communicationbetween two nodes
8/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Example: Law Firm Collaborations
Working relations among N = 36 partners in a law firm (Lazega,2001)An edge exists between two partners if, and only if, both indicatethat they collaborate with the otherUndirected networkNumbers of edges (row and column sums) associated with eachof the N = 36 nodes range from 0 to 16, with an average of 6.4Node attributes (covariates collected on each partner) includeseniority (rank number of entry into the firm), gender, office(three offices in different cities), and practice (litigation = 0, andcorporate law = 1)
9/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Example: Adjacency Matrix for Law Firm Collaborations1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 02 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 04 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1 0 0 0 07 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 09 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 012 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 015 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 0 1 116 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1 0 117 1 1 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 1 1 0 1 1 0 0 0 0 1 0 018 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 019 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 1 020 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 021 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 022 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 023 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 024 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 125 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 026 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 027 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 028 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 029 0 1 0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 030 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 031 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 1 0 1 032 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 1 033 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 034 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 035 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 036 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
10/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Example: Graph of Law Firm Collaborations
1
2
3
4
5 6
7
8
9
1011
12
13
14
15
16
17
18
19
20
21
22 23
2425
26
27
28
29
3031
32
33
34
35
36
11/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Modelling ZU : Exponential Random Graph Models
Widely used family of models for network dataProbability distribution generated by an ERGM is
Pr(ZU = z) = exp(
ηT g(z)−κ(η)
)=
exp(ηT g(z)
)∑ζ∈Z exp(ηT g(ζ ))
η vector of model parametersg(z) vector of network statisticsκ is the normalising constant
κ(η) = log
{∑
ζ∈Zexp(η
T g(ζ ))
}
12/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Examples of Network Statistics
Edges Statistic
A B
Two-Star Statistic
A
B C
Edgewise Shared-Partner Statistic
A B
CD E
Triangle Statistic
A
B C
13/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
The GWESP Statistic
EPk (ZU) is the number of edges (Zij = 1 with i < j) that shareexactly k neighbors in commonEP0 + · · ·+ EPN−2 = number of edgesGeometrically weighted edgewise shared partner (GWESP)statistic defined as
GWESP(ZU ,θ) = exp(θ)N−2
∑k=1
{1− (1−exp(−θ))k
}EPk (ZU)
Geometrically weighted sum of EPk (ZU) values, with parameterθ controlling distribution of weights
14/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Fitting ERGMs
ERGM’s are difficult to fit because the normalising constant κ
cannot be calculated explicitly in any realistic applicationMCMC techniques are typically used to approximate thelog-likelihoodGeometrically weighted statistics (e.g. GWESP) generate MCMCsamples that are degenerate less often
15/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Three Questions
(a) Is embedding social network information into linear modelsuseful for survey estimation based on these models?
(b) If the answer to (a) is yes, then
(b1) Which network-based linear models are potentially useful?(b2) How much network data needs to be collected in order to obtain
potentially higher precision for survey estimation?
16/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Linear Models with Embedded Social Network Information
There are basically three types of linear models that use theinformation in the adjacency matrix ZU generated by a socialnetwork
1. Contextual Network models2. Autocorrelation models3. Network Disturbance models
17/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Contextual Network (CN) Models
Basic idea is to add one or more network-based contextualcovariates to the modelMotivation: Student academic performance (AP) as a function ofsocio-economic status (SES)Network: Student friendship networkModel student’s AP as a function of his/her SES and averageSES of his/her friends (Friedkin, 1990)
18/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Contextual Network (CN) Models
CN model can be written as
YU = XUβ + WUTUγ + εU
where
TU is the population matrix of covariates measured on the networkWU is a row-normalised version of ZU , i.e. the rows of WU sum toone
19/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Autocorrelation (AR) Models
The matrix TU can be any set of measurements on theindividuals in the network, and in particular it can be YU
Autocorrelation (AR) models, also known as network effectsmodels (Ord, 1975; Doreian et al., 1984; Duke, 1993; Leenders,2002), are defined by
YU = Xβ + λWUYU + εU
where λ ∈ (−1,+1)
The conditional (on XU ) mean and variance of YU areµ = D−1
U XUβ and VU = σ2(DTUDU)−1, where DU = IU −λWU
20/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Network Disturbance (ND) Models
The linear model errors are assumed to have an AR structure,i.e. YU = XUβ + εU , where εU = λWUεU + vU and vU ∼ (0,σ2IU)
Conditional (on XU ) the mean and variance of YU are µ = XUβ
and VU = σ2(DTUDU)−1 respectively. That is, the network induces
correlation structure but does not affect mean structure (Ord,1975; Leenders, 2002)AR and ND models are similar to conditional autoregressive(CAR) and simultaneous autoregressive (SAR) modelscommonly used for spatial data (Banerjee et al., 2004)
21/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
BLUP/EBLUP Specification
In order to calculate the BLUP, we generally need to specify
A design matrix HU such that the conditional mean µ of YU givenXU satisfies µ = HUξ
A positive definite matrix VU proportional to the conditionalvariance of YU given XU
When these quantities themselves depend on unknownparameters, we first estimate these parameters from the sampledata and then substitute in HU and VU before calculating theBLUP. This is the ‘plug-in’ version of the EBLUP
22/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Model Specification
Standard HU = XU and VU = σ2IU . The residual mean squarederror is an unbiased estimator of σ2
CN HU = [XU ,WUTU ] and VU = σ2IU . Again, the residualmean squared error is an unbiased estimator of σ2
AR HU = D−1U XU and VU = σ2(DT
UDU)−1 withDU = IU −λWU . Estimates of σ2 and λ can beobtained by maximum likelihood (ML)
ND HU = XU and VU = σ2(DTUDU)−1. Both σ2 and λ can be
estimated via ML
23/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Imputation of Missing Network Information
Calculation of the EBLUPs defined by the CN, AR and NDmodels assumes that the population network ZU is knownIn practice this is extremely unlikely, and it is more realistic toconsider situations where ZU is partially known
SS We only know Zss, i.e. the sub-network of relationships betweenthe n sampled individuals in s
SS+SR We also know the links between the sampled individuals and theremaining N−n non-sampled individuals in the population, i.e. weknow Zsr . Note that for an undirected network this means that weknow Zrs as well
We use model-based imputation to ‘fill in’ the rest of ZU in eithercase
24/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Optimum Imputation
An optimal model-based approach is to assume that ZU can beadequately modelled via an ERGM and to use the minimummean squared error predictor E(Zmis
U |ZobsU = zobs)
In this case the conditional distribution of ZmisU is defined by
Pr(ZmisU = zmis|Zobs
U = zobs) =exp
(ηT g(zmis,zobs;θ)
)∑ζ mis∈Z mis exp
(ηT g(ζ mis,zobs;θ)
)where Z mis is the sample space of Zmis
U
In theory, MCMC techniques can be used to sample from thisconditional distribution, with η and θ replaced by estimatesbased on the observed network. However, this is impractical atpresent
25/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Practical ImputationMethod 1
Suppose conditionally on zobs that Z misij and Z mis
kl areconditionally independent for any two distinct pairs ij and kl , i.e.Pr(Zmis = zmis|Zobs = zobs) = ∏ij Pr(Z mis
ij |Zobs = zobs)
This leads to
Pr(Z misij = 1|Zobs = zobs)
Pr(Z misij = 0|Zobs = zobs)
= exp(ηT ∆gmis
ij )
where ∆gmisij is the change statistic, i.e. the difference in g
between (zmisij ,zobs) = (1,zobs) and (zmis
ij ,zobs) = (0,zobs)
26/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Practical ImputationMethod 1
Re-arranging this equation gives the MMSEP under conditionalindependence,
E(Zij = 1|Zobs = zobs) = Pr(Zij = 1|Zobs = zobs) = expit(ηT ∆gmis
ij )
with expit(x) = exp(x)/(1 + exp(x))
It is only necessary to compute ∆gmisij in order to obtain this
MMSEP for any distinct pair ij ∈misSince the conditional independence assumption is generallyunwarranted, this approach can only be considered as definingan approximation to Pr(Zmis|Zobs = zobs)
However, it is computationally feasible for realistic sample andpopulation sizes
27/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Practical ImputationMethod 2
A very simple approach is to calculate the proportion of Zij = 1 inzobs and use this proportion (the network density) to impute Zmis
This corresponds to imputing on the basis of an ERGM modeldefined by just the EDGES statistic, i.e. the number of edges inthe network
Equivalent to assuming that each Zij in the network matrix ZU is anindependent Bernoulli variable with a common probability of a‘success’
If the network model also contains exogenous effects, then thissimple approach corresponds to imputation on the basis of thelogistic regression model defined by these effects
28/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Simulation Study - Model Specification
Standard Yi = β0 + β1Xi + εi , εi ∼ N(0,1)β0 = 40, β1 = 5 and Xi is drawn randomly from 1, . . . ,9
CN Yi = β0 + Xiβ1 + Uiγ + εi , εi ∼ N(0,1)γ = 2 and Ui is the contextual variable defined byaverage value of X for all individuals in the network thatare linked to individual i
AR Yi = β0 + Xiβ1 + Uiλ + εi , εi ∼ N(0,1)λ = 0.5 and Ui is the average value of Y for allindividuals in the network that are linked to individual i
ND Yi = β0 + β1Xi + εi , with εi = Uiλ + vivi ∼ N(0,1) and Ui is the average value of ε for allindividuals in the network that are linked to individual i
29/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Simulation Study - Network Specification
Two types of networks were simulated
An ERGM network. Here ZU was generated as a random drawfrom an ERGM with a density of about 15 network links for eachsubject (EDGES statistic equal to −4.18 on the logit scale) and aweight parameter of θ = 1 for the GWESP statistic
A Gang network, where ZU defined a network of 100 ‘gangs’, eachof size 10. In this network each gang member only knows everyother member of his/her gang, so Z, after re-ordering rows, is blockdiagonal. This is analogous to the network defined by members ofthe same household.
30/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Simulation Study - Characteristics & Notation
Population of size N = 1,000 was independently simulated 2,000timesIndependent simple random samples of size n = 100 andn = 200 were independently selected without replacement fromeach simulated populationSS denotes the case where only Zss is observedSS+SR/1 denotes where Zss and Zsr are observed andimputation method 1 is usedSS+SR/2 denotes where Zss and Zsr are observed andimputation method 2 is used
31/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Table: Monte Carlo Bias of EBLUP (average population total ≈ 65K); n = 100
ERGM network Gang NetworkTrue Model True Model
Prediction Based On CN AR ND CN AR NDBLUP 1.81 2.19 2.27 2.14 −2.11 −1.93full network known 1.81 2.50 2.14 2.14 −1.84 −1.83
SS 1.96 3.06 2.22 2.90 0.93 −1.12CN SS+SR/1 0.92 2.11 2.07 – – –
SS+SR/2 1.05 2.37 2.12 2.10 −1.24 −1.17SS 0.51 2.29 2.08 2.34 −1.57 −0.84
AR SS+SR/1 1.94 2.71 2.08 – – –SS+SR/2 1.26 1.27 2.05 2.24 −3.47 −1.15SS 0.78 1.48 2.20 2.39 −1.54 −1.63
ND SS+SR/1 0.80 1.79 2.23 – – –SS+SR/2 0.85 1.49 2.13 3.09 14.1 −1.83
standard model 0.78 1.59 2.13 2.98 1.21 −1.06
32/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Table: Monte Carlo MSE of EBLUP relative to MSE of BLUP; n = 100
ERGM network Gang NetworkTrue Model True Model
Prediction Based On CN AR ND CN AR NDBLUP - actual MSE 9,390 8,739 8,736 9,421 12,330 12,315full network known 1.00 1.10 1.02 1.00 1.08 1.01CN SS 2.28 3.38 1.00 2.66 9.57 1.05
SS+SR/1 1.19 1.39 1.00 – – –SS+SR/2 1.14 1.31 1.00 1.01 1.07 1.06
AR SS 2.30 3.34 1.01 2.57 8.67 1.05SS+SR/1 1.45 1.42 1.00 – – –SS+SR/2 1.31 1.30 1.00 1.24 1.10 1.06
ND SS 2.30 3.52 1.02 2.54 8.79 1.01SS+SR/1 2.30 3.48 1.02 – – –SS+SR/2 2.30 3.49 1.02 3.14 11.2 1.01
standard model 2.30 3.50 1.00 2.87 10.5 1.04
33/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Table: Monte Carlo average length of nominal 95% Gaussian confidenceinterval generated by EBLUP relative to that generated by BLUP; n = 100;corresponding coverage (%) shown in subscript
ERGM Network Gang NetworkTrue Model True Model
Prediction Based On CN AR ND CN AR NDBLUP - average length 37094.5 38296.2 38296.4 37094.0 41893.9 41893.4
full network known 1.0094.4 0.9995.0 0.9895.3 1.0094.0 1.0092.8 1.2794.1
CN SS 1.5493.5 1.7995.4 0.9995.5 1.6594.4 3.0791.8 1.0093.2SS+SR/1 1.0092.3 1.0192.2 0.9995.6 – – –SS+SR/2 1.0092.9 1.0192.7 0.9995.8 1.0093.7 1.0193.0 1.0193.7
AR SS 1.5494.1 1.7795.3 0.9995.4 1.6294.2 2.8992.4 1.0093.4SS+SR/1 1.1092.2 1.0192.2 0.9995.5 – – –SS+SR/2 1.1093.1 1.0192.8 0.9995.6 1.0693.7 0.9992.4 1.0093.4
ND SS 1.5994.8 1.8395.7 0.9895.1 1.5993.5 2.9392.3 0.9893.2SS+SR/1 1.5994.8 1.7794.4 0.9794.4 – – –SS+SR/2 1.6094.8 1.8095.2 0.9894.9 2.0190.1 2.8889.5 1.2794.1
standard model 1.5994.8 1.8496.0 0.9995.7 1.7293.8 3.2392.8 1.0193.6
34/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Tentative Answers to Three Questions
(a) Is embedding social network information into linear modelsuseful for survey estimation based on these models?
Yes
(b1) Which network-based linear models are potentially useful?CN and AR models are useful when either model is true, since inboth cases the mean of the response depends on the networkIgnoring the network does not result in a significant loss ofefficiency when the ND model is true
(b2) How much network data needs to be collected in order to obtainpotentially higher precision for survey estimation?
Both Zss and Zsr must be available in order to obtain efficiencygains. Knowledge of Zss alone is not enough
35/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
A Recommendation & A Caution
The AR model can be difficult to fit, see Suesse (2012), so werecommend that the CN model be used if it is a reasonable fit tothe data and relevant population level auxiliary network data areavailable. Otherwise ignoring the network might be the bestoptionNote that we have assumed that the method of sampling isindependent of the network structure given the availablepopulation auxiliary information
There are important applications, see Thompson and Seber(1996), where inclusion in sample depends on being linked toanother sampled individual via a networkIn these cases we cannot treat the observed network structure inZss and Zsr as ancillary (as we have here), and this ‘informative’method of sampling needs to be taken into account when weimpute the unknown components of ZU
36/36
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References
Banerjee S., Carlin B. P. and Gelfand A. E. (2004) Hierarchical modelling and analysisfor spatial data Boca Raton, Fla.: Chapman & Hall/CRC Press.
Doreian, P., Teuter, K. and Wang, C. H. (1984) Network auto-correlation models - somemonte-carlo results. Sociological Methods & Research 13, 155–200.
Duke, J. B. (1993) Estimation of the network effects model in a large data set.Sociological Methods & Research 21, 465–481.
Friedkin, N. E. (1990) Social networks in structural equation models. SocialPsychology Quarterly 53, 316–328.
Lazega, E. (2001) The Collegial Phenomenon: The Social Mechanism of CooperationAmong Peers in a Corporate Law Partnership. Oxford: Oxford University Press.
Leenders, R. (2002) Modeling social influence through network autocorrelation:constructing the weight matrix. Social Networks 24, 21–47.
Ord, K. (1975) Estimation methods for models of spatial interaction. Journal of theAmerican Statistical Association 70, 120–126.
Suesse, T. (2012) Estimation in autoregressive population models. In Proceedings ofFifth Annual ASEARC Research Conference. University of Wollongong: ASEARC.2-3 February 2012.
Thompson, S. K. and Seber, G. A. F. (1996) Adaptive sampling. Wiley series inprobability and mathematical statistics. New York: Wiley.
36/36