estimating trade elasticities using product-level dataeconseminar/seminar2015/may27_kwak.pdf ·...

Estimating Trade Elasticities Using Product-Level Data

(Very preliminary, Please do not cite)

Juyoung Cheong⇤, Do Won Kwak†, and Kam Ki Tang ‡

Abstract

This paper estimates trade cost (tariff) elasticities using bilateral tariff data at the HS6-

digit level for 47 countries from 2002 to 2010. It extends the Helpman et al. (2008) model

to incorporate firms’ fixed costs of exporting that vary at the pair-product level. We control

firm heterogeneity and product level multilateral resistance directly using a multi-way error

component model framework by treating them as high-dimensional unobserved factors at the

most disaggregate level (5,310 products). These unobserved factors are removed by a novel use

of the within estimator as the conventional within estimator in a multi-way error component

model is inconsistent with unbalanced data. The empirical results show that there is substantial

upward bias in the estimates of trade cost elasticities in the literature. Properly accounting

for zero trade flows and firm heterogeneity using more disaggregated data yield significantly

smaller estimates of trade cost elasticities, which imply much larger welfare gains from trade.

JEL Code: C23, F14

Keywords: Trade Cost Elasticity, Tariff, Unobserved Firm Heterogeneity, Within Estimator

⇤School of Economics, University of Queensland, QLD 4072, Australia; e-mail: [email protected]†School of Economics, University of Queensland,QLD 4072, Australia; e-mail: [email protected]‡School of Economics, University of Queensland, QLD 4072, Australia; e-mail: [email protected]

1

1 Introduction

Trade elasticities are key parameters to quantify trade costs and welfare gains from trade. Arkolakis

et al. (2012) show that for a range of trade models, gains from trade can be estimated using two

variables: the import-penetration ratio (or domestic share), and the trade elasticity with respect

to variable trade cost obtained from a gravity equation.1 The import-penetration ratio data are

regularly compiled by national statistical agencies and readily available across countries. On the

contrary, the trade elasticity is not directly observable and, in general, needs to be estimated. The

precise estimation of trade elasticities is difficult however, because of data limitations. In this paper,

we approximate the trade elasticities using the responsiveness of imports to tariffs changes. We

focus on identifying trade elasticities from the changes in tariffs as a representative of all sources

of price change because tariffs suffer less from measurement errors than other variable trade cost

elements like transportation cost.2

Many studies have attempted to empirically estimate trade elasticities (e.g. Anderson (1979),

Harrigan (1993), Hummels (2001), Baier and Bergstrand (2001), Broda and Weinstein (2006) and

Bergstrand et al. (2013)).

Unlike most of these studies that estimate the elasticities using either product or firm-level

data for a single exporting countrydue to the limited availability of disaggregate data, this study

uses bilateral tariffs and trade flows data for 5,310 products among 47 country pairs. The use of

multiple pairs of countries for most disaggregate data is a progress because, on the one hand, using

data for a single country cannot account for unobserved heterogeneity at the importer level, which

includes multilateral resistence terms (MRTs) as in Anderson and van Wincoop (2004) (denoted1Arkolakis et al. (2012) note that the perfect competition models in Anderson and van Wincoop (2003) and Eaton

and Kortum (2002) and the monopolistic competition models in Melitz (2003) are all in a broad class of models sharingDixit–Stiglitz preferences, one factor of production, linear cost functions, complete specialization, and iceberg tradecosts. They show that if three “macro-level” restrictions hold, then all four models will share a common estimator ofthe gains from trade — which depends only on the import-penetration ratio and a gravity-equation-based estimate ofthe “trade-cost” elasticity of trade flows (of which sv is a measure of in many models). (Bergstrand et al. (2013), p.111)

2We assume a tariff to be a form of variable trade costs, and changes of tariff will have the identical effect ontrade flows as equal changes in any other sources of variable costs such as transportation costs due to fungibility ofcapital. Focusing on trade costs changes caused from tariff changes is beneficial, especially when they are measuredat very disaggregated product level, because transportation cost and other costs are very hard to obtain without seriousmeasurement errors while tariffs are measured more precisely in relative sense.

2

as AvW (2004) hereafter). On the other hand, using aggregate data is even more problematic

because two potentially serious biases could arise in the estimations of trade elasticities. Firstly,

at any given time, tariffs vary substantially across products for each country, but data aggregation

omits the information on substitution between products, causing severe bias in the trade elasticity

estimation.3 Secondly, unfiltered bilateral trade data typically features a large proportion of zero

observations but aggregation can hide these zero flows which has special implication. Information

on zero trade flows at the product level is naturally lost with aggregate data and firm’s self-selection

of exporting at the product level could not be addressed as a result. Previous studies also point out

problems associated with zeros observations. For instance, AvW (2004) and Broda and Weinstein

(2006) illustrate that ignoring zero trade flows in estimating trade cost elasticities in the gravity

equation using aggregate data can cause upward bias.

Our elasticities estimates are most comparable to those from Broda and Weinstein (2006) for

the elasticities of substitution, which are typically interpreted as the trade cost elasticities. How-

ever, unlike Broda and Weinstein (2006) that make simplifying assumptions on the demand and

supply structure to empirically implement their method, our model directly address factors such

as firm heterogeneity and other unobserved determinants of firms’ export market entry decisions

without making any simplifying assumptions on the demand and supply structure.

In this paper, we propose a method to estimate trade cost (tariff) elasticities using product level

data that can avoid biases from various sources; (i) pile-ups at zero trade flows, (ii) heteroge-

neous pair-product fixed costs of exporting, and (iii) product-level MRTs. Our estimation results

show that in unbalanced data it is crucial to model the mechanism through which zero observations

emerge. This issue is particularly important for product-level bilateral trade data, which could have

more than half of the observations being zero, as countries typically trade only specific products

with individual partners. Helpman et al. (2008) (denoted as HMR hereafter) show that zero trade

flows in aggregate data are the result of firms’ self-selection out of foreign/export markets due to3Anderson and van Wincoop (2004) demonstrate with a numerical example that “the elasticity of substitution at

the more aggregated level is entirely irrelevant.” (p. 727). They advise that: “one should choose elasticities at asufficiently disaggregated level at which firms truly compete.”

3

firms’ fixed costs of exporting, and suggest a two-stage procedure to account for the bias caused

by their self-selection. However, their method is difficult to implement with product-level data, be-

cause the exclusion restriction at their first stage estimation require a variable that could determine

firms’ export market entry decision but could not affect the volume of exports.4 An innovation of

this paper is that it extends the HMR approach in controlling for firms’ fixed costs of exporting to a

product level model using the within estimator for unbalanced data in multi-way error components

models.

The proposed estimator is attractive because first, it allows us to account for various unob-

served factors without the requirement for any exclusion restrictions and second, unlike the con-

ventional within estimator which is inconsistent with unbalanced data, it could provide a consistent

estimator even when being directly applied to multiple error components with high dimensional

indexes (e.g. importer-product-year, exporter-product-year, importer-exporter-product) for unbal-

anced data (Wansbeek and Kapteyn (1989); Baltagi et al. (2001); Davis (2002)). We examine the

properties of the proposed within estimator with unbalanced data in multi-way error components

models and we show that our proposed within estimator has following main properties with un-

balanced data. Firstly, it can remove only one error component completely without causing any

inconsistency. Second, the order of demeaning matters for the magnitude of inconsistency in the

sense that only the first ordered demeaning could remove one error component completely, while

the second and further ordered demeanings could remove error components partially and cause

inconsistency. Third, using partially balanced data, i.e. data that are balanced in only two or three

dimensions out of four, we could obtain certain within estimators that could remove more than one

error component completely data. Using these properties of the within estimator, we show how to

obtain a consistent estimate for trade cost elasticities from unbalanced panel data that are typically

available with bilateral trade data. In particular, the within estimators with partially balanced data

could avoid these biases from unbalanced data as well as the computational difficulty in dealing

with high-dimensional unobserved factors, while directly accounting for unobserved product-level4Recent theoretical studies like Chaney (2008) and Krautheim (2012) pay attention to the role of fixed costs in

heterogeneous firms’ entry decision in new export markets, which is empirically supported by Koenig et al. (2010).

4

heterogeneity using multiple high-dimensional fixed effects.

This paper contributes to the literature in the following ways. Firstly, we construct a new bi-

lateral tariff dataset for 47 countries and about 5,310 products at the HS6-digit level, from 2002 to

2010 as they are not readily available. Few studies have used product level tariff data for multiple

countries to estimate trade cost elasticity.5 For the case of trade flows, about 70% of the observa-

tions, however, are of a zero value, meaning that many pairs trade only a limited range of products.

Disaggregated product level trade data (e.g, SITC 4 or 5 digits, HS6-digit) for multiple countries’

trading partners are publicly available from various data sources such as WITS and UNComtrade

but typically only positive flows are used in the estimation.6

Secondly, using multi-way error components models and the within estimators, we explicitly

account for the factors important for firms’ foreign market entry decision, including pair-product

level fixed costs and pair-year specific fixed costs, following HMR and Chaney (2008). In our

dataset, tariffs and trade flows vary across four dimensions, namely: importing country, exporting

country, product, and year. This allows us to directly control for (i) unobserved demand-side fac-

tors such as consumer tastes using importer-product-year fixed effects, (ii) unobserved supply-side

factors such as productivity changes using exporter-product-year fixed effects, (iii) unobserved

pair-product-specific fixed costs of exporting such as information costs using importer-exporter-

product fixed effects, and (iv) unobserved pair-time-specific fixed costs such as political alliance

using importer-exporter-year fixed effects. To our knowledge, our empirical strategy is the most

comprehensive in terms of accounting for these four distinct unobserved factors that are important

to the estimation of trade elasticities. Our proposed novel use of the within estimator with par-

tially balanced data could also avoid inconsistency and computational difficulties associated with

using the conventional within estimator and using a large number of dummy variables for high-

dimensional unobserved factors respectively. As a result, we can perform regression analyses with

product-level data while avoiding omitted variable bias (OVB) caused by product-level MRTs as5Exceptions are Hummels (2001) which uses year 1992 data of sectoral imports of six countries and Caliendo and

Parro (2014) which estimate trade elasticities for 20 sectors using year 1993 data for 30 countries.6See Broda and Weinstein (2006), Baier et al. (2014), and Dutt et al. (2013) for examples.

5

emphasized in Anderson and Yotov (2011, 2012) and pair- product level unobserved fixed costs

which arise from information barriers, interest-group lobbying, and government red-tape, to name

just a few as modelled in Chaney (2008) and Krautheim (2012).

Our empirical results show that ignoring zero observations and any one of the demand-side

factors, supply-side factors, product-level MRTs or fixed costs at the disaggregated level could

overestimate the trade elasticities (i.e. the absolute values of the estimates are smaller than their

’true values’). It implies that welfare gains from trade are much larger when measured with the

trade cost elasticity estimated using product-level data including zero observations with proper

controls. As an illustration, we provide some examples of welfare effects using trade costs elas-

ticity and import-penetration ratio measures following Arkolakis et al. (2012) and compare our

results and those from previous studies. Furthermore, our results imply that substantial hetero-

geneity exist in the elasticity estimates across products. We examine elasticity heterogeneity using

different market structures of products and input quality in skill levels.

The rest of the paper is organized as follows. Section 2 describes the HMR model to introduce

an estimable gravity equation at the product level data using multi-way error components model.

Section 3 describes the data. In section 4, we explain our empirical model framework and provide

the estimation results. The last section provides concluding remarks and policy implications. Sim-

ulation and sensitivity analysis for our proposed estimator results are provided in the Appendix.

2 Gravity Model

2.1 Gravity Model at the Sectoral Level

In this section, the gravity model is derived for sectoral (or product) level estimations. We adopt

the HMR model but extend it to multiple sectors indexed by k (k = 1,2, ...K). For aggregate

data, K = 1; for HS 2-digit data, K = 98; and for HS 6-digit data, K ⇡ 5,400. Note that as each

sector produces one product (but with multiple varieties), in what follows we use “sector k” and

“product k” interchangeably. As in Chaney (2008) and Arkolakis et al. (2012), we further assume

6

that consumer’s utility function is of the Cobb-Douglas form for products and of the Dixit-Stiglitz

form for varieties within a product.

A representative consumer in country j ( j = 1,2, ...J) has the following utility function at time

t:

Ujt =K

’k=1

ˆh2Hjkt

x jkt(h)(sk�1)/skdh

!qksk/(sk�1)

(1)

where x jkt(h) is a differentiated variety h of product k available in j, Hjkt is the set of varieties

available for product k in j’s country, sk is the elasticity of substitution between varieties of product

k with sk > 1 and is assumed to vary across products, and qk is the share of income Yjt on goods

from sector k with ÂKk qk = 1.

Although consumption can change over time as Yjt changes, for simplicity we assume no bor-

rowing or lending so that the consumer is maximizing consumption period by period instead of

in an intertemporal manner. There is also no dynamic on the production side unlike Alessandria

et al. (2014). Then, with country j’s income Yjt equals the expenditure in every period, we obtain

country j’s demand for variety h of product k from (1):

x jkt(h) =p⇤jkt(h)

�skE jkt

P1�skjkt

(2)

where p⇤jkt(h) is the price of variety h of product k in country j, E jkt = qkYjt is the expenditure by

j on goods from sector k from all origins, and Pjkt is the price index associated with varieties from

sector k in j. Pjkt is given by

Pjkt =

"ˆh2Hjkt

p⇤jkt(h)1�skdh

#1/(1�sk)

(3)

Using firms’ standard markup pricing equation in the monopolistic competition model, we

essentially have the same outcome for i0s imports of product k from j as HMR (2008) for each

product k.

7

Ti jkt =

✓c jktti jkt

akPit

◆1�sk

EiktNjktVi jkt (4)

where Ti jkt is the import of product k by country i from country j, Njkt is the number of firms in

sector k in j, Vi jkt is a monotone function of the fraction of j’s firms in sector k that export to i

(which is based on firms’ relative productivity and fixed export costs to i). c jkt is a measure of

average productivity of firms in sector k in j, but different firms have different productivity based

on akt , ti jkt is the variable trade cost in sector k exported from j to i, and (1�sk) is the trade

elasticity with respect to variable trade cost for product k.

As in HMR, we also have

(Pikt)1�sk =

J

Âj=1

✓c jktti jkt

ak

◆1�sk

NjktVi jkt (5)

Vi jkt =

ˆ ai jkt

aLkt

(akt)1�sk dG(akt)

=gag�sk+1

Lt

(g �sk +1)(agHkt

�agLkt)Wi jkt (6)

Wi jkt = max

(✓ai jkt

aLkt

◆g�sk+1�1,0

)(7)

In their baseline specification, HMR assume that firm productivity 1/akt has a truncated Pareto

distribution with the support [aLkt,aHkt ] so that G(akt) = (agkt � ag

Lkt)/(agHkt � ag

Lkt),g > (sk � 1).

Pikt is corresponding to i’s MRTs in product k as introduced by Anderson and van Wincoop (2003),

but it additionally takes into account the impact of firm heterogeneity (due to the fixed costs of

exporting) on prices of product k.

The following estimable equation could be derived from the model:

8

lnTi jkt = (sk �1)lnak � (sk �1)lnc jkt + lnNjkt +(sk �1)lnPikt

+lnEikt � (sk �1)lnti jkt + lnVi jkt (8)

where Pikt , and Njkt and c jkt are unobserved, but they can be accounted for by using exporter-

sector-specific factors and importer-sector-specific factors. Thus, with Wi jkt, being a monotonic

function of Vi jkt , we can obtain the following estimable equation:

lnTi jtk =� (sk �1)lnti jkt + lnWi jkt +u1ikt + v1 jkt (9)

where u1ikt =(sk�1)lnPikt +lnEikt +(sk�1)lna , and v1 jkt = lnNjkt �(sk�1)lnc jkt +ln(gag�sk+1Lt )�

ln(g �sk +1)� ln(agHkt

�agLkt).

Here we depart from HMR and further assume that lnWi jkt is consisted of a number of addi-

tively separable components:

lnWi jkt = w2i jk +h2i jt +u2ikt + v2 jkt + e2i jkt (10)

where w2i jk represents time-invariant pair-sector specific fixed costs, h2i jt is a pair-time specific

random shock to fixed costs, u2ikt is an importer-sector-time specific random shock to fixed costs,

v2 jkt is an exporter-sector-time specific random shock to fixed costs, and e2i jkt is all remaining

random elements that could affect firm’s export market entry decision. These five factors together

determines the fraction of firms that export from j to i in sector k at time t.

Likewise, we further assume that non-tariff-related variable trade costs are consisted of five

additively separable factors such that:

lnti jkt = ln(1+ tari f fi jkt)+w3i jk +h3i jt +u3ikt + v3 jkt + e3i jkt (11)

where w3i jk +h3i jt +u3ikt + v3 jkt + e3i jkt is the linear approximation of non-tariff-related variable

9

trade costs.

Substituting (10) and (11) into (4), we obtain the following equation:

lnTi jtk = a0 � (sk �1) · ln(1+ tari f fi jkt)+wi jk +hi jt +uikt + v jkt + ei jkt (12)

where wi jk ⌘ w2i jk � (sk � 1)w3i jk, hi jt ⌘ h2i jt � (s � 1)h3i jt , uikt ⌘ u1ikt + u2ikt � (sk � 1)u3ikt ,

v jkt ⌘ v1 jkt + v2 jkt � (sk � 1)v3 jkt , and ei jkt ⌘ e2i jkt � (sk � 1)e3i jkt +(s �sk)h3i jt with s is the

average of sk. As wi jk, hi jt , uikt , and v jkt are unobservable, we model them using fixed effects.

Here (sk �1) can be consistently estimated if we assume

Cov(ln(1+ tari f fi jkt),ei jkt |wi jk,hi jt ,uikt ,v jkt) = 0 (13)

That is, we assume that once we have controlled for all i jt-, i jk-, ikt-, and jkt-varying factors,

the remaining trade costs factors that could vary in all four dimensions of i, j,k and t simultane-

ously (i.e. ei jkt) are uncorrelated with tari f fi jkt . One possible scenario in which this assumption

is violated is that, when a country lowers its tariffs idiosyncratically on certain imports from a

specific source country as a results of, for instance, trade agreement negotiations, it might provide

assistance to the affected industries or erect non-tariff barriers on those products as a compensation

(Ray, 1981; Limao and Tovar, 2011). However, as long as the assistance or non-tariff barriers are

industry specific but not both industry and source specific simultaneously, they would be controlled

for by uikt .

Also, the consistency condition (13) is not conditional on Ti jkt > 0 unlike the HMR method.

In order to directly incorporate zero trade flows in the estimation, we replace lnTi jkt in (9) with

ln(Ti jkt +1) and use all zero observations in the estimations.

2.2 Self-Selection and Firm Heterogeneity

A key insight of HMR is that zero trade flows commonly observed in bilateral trade data are re-

sults of firms’ decision of self-selection into (or out of) foreign market and the decision is not

10

uniform across country due to firm heterogeneity.7 HMR suggest a two-stage procedure to account

for self-selection and firm heterogeneity for aggregate data. The first-stage estimation requires an

exclusion restriction variable that affects firms’ entrance into a foreign market but not their trade

volume.8 In order to extend the HMR approach to estimate product level models, the first stage

estimation requires an exclusion restriction variable that affects firms’ entrance into individual

product markets in a foreign country but not their trade volumes in each of those markets. How-

ever, finding an exclusion restriction that varies over pair-product, importer-product-time, and/or

exporter-product-time is extremely difficult due to data limitation.

To avoid these data problems, we propose an estimator that allows us to by-pass the first-stage

estimation in HMR’s two-stage procedure while still control for self-selection and firm hetero-

geneity. We assume that unobserved factors associated with self-selection and firm heterogeneity

could be approximated by four additively separable factors: wi jk,hi jt ,uikt and v jkt , so that we can

write an estimable equation using a multi-way error components model. Furthermore, it could be

estimated by the within estimator that treat these four unobserved factors as fixed effects.

When using disaggregate panel data, our estimator is superior to the two-stage HMR method

in several ways. First, it avoids data limitations as mentioned above. Second, fixed effects are

free from measurement errors and the validity associated with exclusion restriction (e.g. the re-

ligion variable). Third, besides pair-time unobserved factors associated with self-selection and

firm heterogeneity, it allows us to additionally control for pair-product, importer-product-time,

and exporter-product-time unobserved factors associated with self-selection and firm heterogene-

ity. Lastly, the multi-dimensional fixed effects account for other unobserved factors as well. For

instance, importer (exporter)-product-time fixed effects additionally control for product-time vary-7As we use product level data, we assume that firms’ self-selection of exporting occurs at pair-product level for the

rest of paper.8HMR first consider bilateral regulation measure as the exclusion restriction variable. Besides strong assumption

of excludability in the model for trade volume, a practical limitation of this exclusion restriction is data availability,which prompts HMR to consider an alternative variable of an index of common religion (between any pair). Data oncommon religion has been available only for sporadic periods until Maoz and Henderson (2013) construct a worldreligion dataset for every five years from 1945 to 2010. While not to depreciate the value of this improved dataset onreligion, we should be cautious about the measurement errors due to the challenging nature in measuring them at thefirst place.

11

ing MRTs and exporter’s supply-side factors (importer’s demand-sid), which are crucial to obtain

unbiased estimates, but their data again are very difficult to collect at a fairly dissagregate product

level.

3 Data

We construct two sets of data. The first data set is HS 6-digit import data from UN COMTRADE

for 47 countries from 2002 to 2010 for about 5,310 products. The second data set is bilateral tariff

data from TRAINS of UNCTAD for the same sample. More detailed explanations of the data are

provided in the Appendix.

For each pair of country and any product with available tariffs, unreported entries of trade flows

are replaced with zero. In our data only positive observations are reported. Because every country

in our sample imports from other sample countries for all sample periods in aggregate level, we

presume that unreported entries at product level are zeros.

Conceptually, balanced data mean that for all pairs of countries (i j), products (k) and time (t),

observations are available for both the dependent variable (y) and all explanatory variables (X).

In practice, bilateral trade data can never be truly balanced. This is because, data is ’naturally

unbalanced’ because countries do not conduct international trade with themselves so that Tii is not

defined for all i. Furthermore, there are some cases where we do not observe a country’s tariff

rate for a specific product for the entire sample years. For example, in the cases of Argentina, no

tariffs are reported for about 20 out of 5,310 products. We leave them as missing to be truthful

to the data.9 Finally, over time some product codes are removed, combined with others, or newly

introduced and these products could induce missing data. We define partially balanced data to deal

with this type of missing data.9These two types of missing data together make up only less than 3% of all data. Because the proportion of

missing is very small, we regard the data as (almost) balanced in practical sense if there are not other sources ofmissing data. Extensive simulation experiments are performed with various data generating processes to examine theeffect of missing data for our proposed within estimators. Simulation results show that there is no size bias of theestimators. Results can be found in Kwak et al. (2012) for three-way error components model and in the appendix forfour-way error components model .

12

Besides balanced data, we also consider unbalanced and partially balanced data (i.e. only part

of dimension is balanced among four dimensions) for the comparison purpose in evaluating our

estimator and conventional within estimator. The use of the unbalanced data in most of previous

studies is inevitable as they use only positive trade flows, i.e. zero trade flows are not imputed in

estimations. The degree of unbalancedness depends on the level of data aggregation, increasing up

to about 70% for HS6-digit data in our sample. We show that the size bias of conventional within

estimator increases with missing proportion in unbalanced data so that more care is required in

using within estimator with disaggregate data to avoid bias.

4 Empirical Framework

4.1 A Gravity Equation with 4-way Error Components

Our estimation equation is obtained from (12) in Section 2. :

ln(Ti jkt +1) = a0 +a · ln(1+ tari f fi jkt)+wi jk +hi jt +uikt + v jkt + ei jkt (14)

where ln(Ti jkt + 1) is log trade flows, tari f fi jkt is a tariff that importing country i imposes on

product k from exporting country j at time t. wi jk, hi jt , uikt and v jkt are pair-product, pair-

time, importer-product-time, and exporter-product-time heterogeneity, respectively and ei jkt is

an idiosyncratic error term. Our primary object of interests is to obtain a consistent estimate of

a = (1�s), the elasticity of trade costs. This specification implies that, in contrast to the the-

oretical model in section 2, here we assume the trade elasticity, 1�sk, is homogeneous across

all products so that 1�sk = 1�s . We relax this assumption later to adherent to the theoretical

model.

To control for unobserved factors, we adopt fixed effects following standard approach in the

studies.. However, unlike in 3-way error components (3WEC) model followed by most of studies

13

in the literature (e.g., Baier and Bergstrand (2007); Magee (2008); Baier et al. (2014) ) using fixed

effects in the 4-way error components (4WEC) model may not be trivial depending on degree

of balancedness of data. Using the dummy variable method is computationally challenging and

infeasible in most statistical package while within estimator could be inconsistent unless data are

balanced.10 In this sub-section, we discuss within estimators for different degrees of balancedness

of data and suggest a method to adopt in our estimations.

4.1.1 Within Estimator for Balanced Data

As provided in the Appendix, with balanced data and conditional exogeneity assumption on ei jkt ,

the conventional LSDV (Least Squared Dummy Variables) estimator to eq.(14) is numerically

equivalent to the pooled Least Squared (LS) estimator (See Wooldridge (2009); Abrevaya (2013)

for two dimensional panel data cases) in the following equation with the within transformed vari-

ables:

yi jkt = Xi jkta + ei jkt (15)

where multiple transformations that remove all four unobserved factors is defined as follows:

yi jkt = yi jkt � yi jk·� yi j·t � yi·kt � y· jkt

+yi j··+ yi·k·+ yi··t + y· jk·+ y· j·t + y··kt (16)

�yi···� y· j··� y··k·� y···t + y····10Multi-way error components model are typically estimated using dummy variables (e.g. Antweiler (2001); Baltagi

et al. (2001); Davis (2002) among others) or within transformations of variables to remove all or some of these errorcomponents (e.g. Matyas and Balazsi (2012) and Kwak et al. (2012)). However, an extension of these methods toestimate product level data are very difficult. Firstly, the dummy variables method confronts computation difficultiesarising from the large size of the data set. Antweiler (2001); Baltagi et al. (2001); Davis (2002) propose estimatorsthat can be applied to unnested and unbalanced bilateral trade data; in practice, however, implementation of theirmethod requires to specify the transformation matrix and to generate too many dummy variables. As a result, we arenot able to find any empirical study of gravity models implementing this type of estimators. On the other hand, thewithin-transformation based estimators generally are biased with unbalanced panel data although they do not sufferfrom computational difficulty.

14

In eq.(16) a dot in the subscript denotes the averaging dimension. For instance, yi jk·=1

Ti jkÂT

t=1 si jktyi jkt ,

where si jt is an indicator for observability of both yi jkt and Xi jkt (i.e. si jkt = 1 if both yi jkt

and Xi jkt are observed, and zero otherwise), and Ti jk = ÂTt=1 si jkt , so that Ti jk = T for balanced

data but Ti jk < T for unbalanced data.11 We denote averaging over other dimension as fol-

lows: yi·kt =1

ni jtÂn

j=1 si jtyi jt , nikt = Ânj=1 si jkt , y· jkt =

1m jkt

Âmi=1 si jktyi jkt and m jkt = Âm

i=1 si jkt , yi·k·

= 1Tik

ÂTt=1

1nikt

Ânj=1 si jktyi jkt . We can similarly define the rest of averaging terms in eq.(16).

The four error components, wi jk, hi jt , uikt , and v jkt , can be removed by demeaning over the

dimensions of t, k, j, and i, respectively. Our estimation strategy is to remove all four error compo-

nents by using four distinct within transformations. Four successive demeanings need to be applied

and these give yi jkt as the dependent variable in eq.(16). Also note that the order of demeaning in

t, k, j, and i does not matter for balanced data but it matters for the bias size in unbalanced data.

The transformations of X and e are denoted as Xi jt and ei jt , respectively, could be similarly de-

fined as in (16). We call eq.(16) the four-way within transformation as it removes all 4WECs. The

four-way within estimator is consistent only when the data is balanced and conditional exogeneity

assumption on ei jkt is satisfied; for unbalanced data it is inconsistent.12

Our sample includes 47 largest countries countries in terms of GDP in 2007 but excludes some

countries such as India that do not have tariffs data at least for one partner country. Therefore,

for each country at a given year, both tariff data and trade flows data imputed with zeros are

available for HS 6-digit products for all partner countries. Therefore, tariff data are balanced for

pair and product but not year. This is because each year variety of products vary over some years.

For instance, in 2007 reclassification occurs for some HS 6-digit products. In the process, some

product codes are removed, combined with others, or newly introduced. Therefore, data are not

balanced in strict sense. We name this partially balanced data and provide further discussion on11Using selection indicator notation is very powerful to show our algebraic results as it manifests the sources of bias

for the conventional within estimators in unbalanced data.12We assume conditional exogeneity of idiosyncratic error terms throughout the paper. For the within estimator

in (15), non-zero terms remain after the within transformation. Moreover, these terms are not mean-zero and do notgo to zero as the sample size increases. These terms are all zero if si jkt = 1 for all observations as in the case ofbalanced data, but once si jkt becomes random non-zero terms, additional terms remain after transformation and causeinconsistency. See Matyas and Balazsi (2012); Kwak et al. (2012) for the detailed discussion of within transformationfor 3WEC models and Monte Carlo evidence.

15

this in section 4.1.3.

4.1.2 Within Estimator for Unbalanced Data

In general, the within estimator that removes multiple error components in unbalanced data is

inconsistent. Bias arises when eq.(16) is applied to unbalanced data and its size depends upon

which of error components are the source of bias and the order of demeaning. Unlike the case

of balanced data, the values of the last eleven terms in eq.(16) (i.e. all terms that are averaging

two or more dimensions yi j··, yi·k·, yi··t , y· jk·, ...y...) now depend on the order of averaging. For these

eleven terms, we use superscripts to denote the order of averaging over various dimensions. For

instance, y jti·k· is obtained by averaging over the j-dimension first and the t-dimension second, i.e.

y jti·k· =

1Tik

ÂTt=1

1nikt

Ânj=1 si jktyi jkt , while yt j

i·k· is obtained the reversed order of averaging over the

two dimensions, i.e. yt ji·k· =

1nik

Ânj=1

1Ti jk

ÂTt=1 si jktyi jkt . The mean values depend on the order of

averaging, y jti·k· 6= yt j

i·k· unless by coincidence. This is because si jkt depends on all i, j, k, and t.

As a result, 1Tik

ÂTt=1

1nikt

Ânj=1 si jkt and 1

niktÂn

j=11

TikÂT

t=1 si jkt become distinctively different values.

Similarly, the mean values of the other eleven terms in eq.(16) depend upon the order of averaging.

Finally, for the y··· (i.e. yi jkt , yi jtk, yit jk, yitk j, yikt j, y jk jt , ......), there are 24 distinct sequences of four

dimensional averaging and their values are different each. For a 4WEC model, there are a total of

24 different orders of averaging and thus 24 different transformations based on eq.(16).

For any unbalanced data, we use the property of the within estimator that one of the error

components in eq.(14) can be completely removed by the within transformation without causing

any bias. For our purpose, the important implication is that t-dimensional demeaning, if performed

very first, can remove the unobserved factor wi jk:

si jktwi jk �1

Ti jk

T

Ât=1

si jktwi jk = si jktwi jk �wi jk1

Ti jk

T

Ât=1

si jkt = si jktwi jk �wi jk1

Ti jkTi jk = si jktwi jk �wi jk

16

where first equality hold because wi jk does not depend on t. Thus, for observed value, si jkt = 1,

si jktwi jk �1

Ti jk

T

Ât=1

si jktwi jk = si jktwi jk �wi jk = 0 (17)

Similarly, one error component can be removed by the within transformation such that k-

dimensional demeaning, i-dimensional demeaning, and j-dimensional demeaning, if performed

very first, can remove hi jt , uikt , and v jkt , respectively. Obviously, since only one of these four

demeanings can be performed first, three unobserved factors cannot be completely removed and,

thus, potentially become the sources of bias in the estimations.

Furthermore, less bias is retained if it is an earlier demeaning dimension that takes out the error

component that causes inconsistency.13 For instance, suppose that hi jt is the only unobserved

factor correlated with tari f fi jkt in equation (14). Let us consider demeaning over product and

four classes of demeaning orders: k · ··, ·k · ·, · · k·, · · ·k. Here k · ·· indicates demeaning over the

k dimension first (e.g. ki jt, k jti, kt ji etc.), ·k · · demeaning over the k dimension second, and so

forth. Within estimators of each of these demeaning order have different magnitudes of bias due to

omitted variable hi jt . We have already proven that the class k · ·· can completely remove OVB due

to hi jt . Simulation results in the Appendix show that absolute magnitude of OVB due to hi jt for

other classes increases as the demeaning order for k increases, i.e. k · ·· will cause no bias, ·k · · will

cause the second smallest bias and · · ·k the largest bias. A corollary of these findings is that, within

transformations can remove one error component completely and for the other error components

only partially. However, compared to the estimates from pooled OLS, the absolute magnitude of

the bias from each source is reduced from the within transformations.14

13The Appendix C provides some algebraic results.14Matyas and Balazsi (2012) suggest a solution is to remove one error component using within transformation, and

then model the remaining error components using dummy variables. However, in the case of using pair-product-levelpanel data, there are still way too many dummy variables to be feasibly estimate even using their method.

17

4.1.3 Within Estimator for Partially Balanced Data

Our sample is balanced in pair and product dimension but unbalanced in time dimension. In this

case, we can still design within transformations that could take out multiple error components

completely. By investigating the sources of biases and how biases arise from each demeaning of

within transformation, we could obtain a consistent within estimator using partial balanced of data,

si jkt = st . Our main insight to obtain a consistent estimator is that we remove wi jk as the first order

t-dimensional demeaning (note that t is not balanced dimension) and remove uikt , and v jkt as j and

i dimensional demeanings (note that data are balanced at pair level).

Let us consider transformations that remove wi jk, uikt , andv jkt completely. The demean order-

ing we use as an illustration is i- j-t. First, first order demeaning over dimension i can remove v jkt

as shown in section 4.1.2. Now, we consider uikt with double demeanings over dimensions i and j.

stuikt �1N

N

Âi=1

stuikt �1N

N

Âj=1

(stuikt �1N

N

Âi=1

stuikt) = stuikt � st1N

N

Âi=1

uikt � stuikt1N

N

Âj=1

+st1N

N

Âj=1

1 · 1N

N

Âi=1

uikt

= stuikt � st1N

N

Âi=1

uikt � stuikt + st1N

N

Âi=1

uikt ·1N

N

Âj=1

1

= stuikt � st1N

N

Âi=1

uikt � stuikt + st1N

N

Âi=1

uikt = 0

Finally, we consider wi jk with triple demeanings over dimensions i, j, and t.

stwi jk�1N

N

Âi=1

stwi jk�1N

N

Âj=1

(stwi jk�1N

N

Âi=1

stwi jk)�1T

T

Ât=1

[stwi jk�1N

N

Âi=1

stwi jk�1N

N

Âj=1

(stwi jk�1N

N

Âi=1

stwi jk)]

18

= stwi jk � st1N

N

Âi=1

wi jk � st1N

N

Âj=1

wi jk + st1N

N

Âj=1

1N

N

Âi=1

wi jk

�wi jk1T

T

Ât=1

st + st1N

N

Âi=1

wi jk +1T

T

Ât=1

st1N

N

Âj=1

wi jk �1T

T

Ât=1

st1N

N

Âj=1

1N

N

Âi=1

wi jk

= stwi jk � st1N

N

Âi=1

wi jk � st1N

N

Âj=1

wi jk + st1N

N

Âj=1

1N

N

Âi=1

wi jk

�wi jk + st1N

N

Âi=1

wi jk +1N

N

Âj=1

wi jk �1N

N

Âj=1

1N

N

Âi=1

wi jk

For st = 1, the last equation is equal to zero because all terms are canceled out each other:

wi jk�1N

N

Âi=1

wi jk�1N

N

Âj=1

wi jk+1N

N

Âj=1

1N

N

Âi=1

wi jk�wi jk+1N

N

Âi=1

wi jk+1N

N

Âj=1

wi jk�1N

N

Âj=1

1N

N

Âi=1

wi jk = 0

In sum, using st = 1 and triple demeanings over i- j-t (or it could be shown for demeanings over

j-i-t), we algebraically show that within transformations can remove wi jk, uikt , andv jkt completely.

Thus, for all our estimations in results section, we obtain elasticity estimates from using within

transformations over i- j-t and dummy variables for hi jt unless indicated otherwise. Note that

number of dummy variables for hi jt is 20⇥19⇥9 = 3,420 and no computational difficulty arises

with less than 4,000 explanatory variables. We provide extensive simulation results for within

estimators with triple transformations and partially balanced data, st = si jkt in the Appendix.

Monte Carlo simulation results show that, firstly, demeaning can remove only one error com-

ponent completely without causing any inconsistency. Second, if there are two or more sources

of bias, no demeaning sequence can completely eliminate the bias, but it could reduce the bias

significantly so that the within estimator improve upon the pooled estimator. Third, the order

of demeaning matters for the magnitude of inconsistency in the sense that only the first ordered

demeaning could remove one error component completely, while the second and further ordered

demeanings could remove error components partially and cause inconsistency. Fourth, other things

19

being equal, the missing proportion increases the absolute magnitude of the bias. Fifth, using par-

tially balanced data, we could obtain certain within estimators that could remove more than one

error component completely (partially balanced data is defined as the data are balanced in only two

or three dimensions out of four).

4.2 Estimation

4.2.1 Trade Elasticities with Disaggregate data

Table 1 shows various estimation results of using HS6-digit disaggregate data. In column (1),

only non-zero observations are used while in columns (2)-(5) zero observations are included, and

various columns differ in term of the controls of unobserved factors.

At the HS6-digit level of disaggregation, about 67% of trade flow observations are zeros. The

inclusion of such large number of zero observations dramatically reduces the trade elasticity es-

timate from -3.6 down to -0.7. The reason for this is that almost all country pairs trade only a

limited number of HS6-digit level products at any time. For instance, countries that do not have

oil reserves cannot export crude oil, and developing countries do not have skilled labour to export

high-technology products. When zero trade flows are excluded from the estimation, the trade elas-

ticity estimate is determined entirely by the response of the intensive product margin of trade to the

reduction in tariffs. According to column (1), a 1% reduction in tariffs is expected to increase the

flows of existing trade link between two countries by 3.6%. When zero trade flows are included,

the estimate is determined by the response of both the intensive and the extensive product margins

of trade. Even if some firms are responsive to reduction in variable trade costs and start to export to

new market, the initial volume is likely to be small compared to the export volume to established

markets. As a result, a 1% reduction in tariff only increase the average trade flow per product

by a much smaller extent of 0.7%. However, it should be noted that column (2) has not properly

accounted for the product-specific unobserved heterogeneity including product-specific fixed costs

and MRTs, and therefore suffers from omitted variable bias.

As explained in Section 2, we assume these unobserved factors can be collectively decomposed20

into hi jt , uikt , v jkt and wi jk. We start with controlling for i jt-varying unobserved factors in column

(3), which reduces the trade elasticity estimate marginally to -0.68. In column (4) we further

control for ikt- and jkt-varying unobserved factors. The elasticity estimate reduces from -0.68 to

-0.57. In column (5) we have the full set of controls for unobserved heterogeneity at the product

level; the additional control for i jk-varying factors further reduces the elasticity estimate to -0.43.

A comparison of columns (2) to (5) shows that the progressively more comprehensive control

of unobserved heterogeneity reduces the estimation bias. In particular, the absolute value of the

elasticity estimate is reduced by nearly 40% from -0.71 in column (2) to -0.43 in column (5).

Arkolakis et al. (2012) show that for a range of trade models, including the Armington model

and new trade models with micro-foundation like Eaton and Kortum (2002) and Melitz and Otta-

viano (2008), the welfare gains from trade (compared to autarky) can be simply measured using

two statistics, the share of expenditure on domestic goods, l and the elasticity of imports with re-

spect to variable trade cost, f . Using their welfare change formula to evaluate the welfare change in

US’s year 2000 compared to Autarky, which is (1�l�1/f ) where l = 0.93, they illustrate that the

percentage change in real income needed to compensate a representative consumer for going back

to autarky is 0.7 percent to 1.4 percent depending if the trade elasticity are ranged from -5 to -10

as surveyed in Anderson and van Wincoop (2004). If we use the estimate obtained from non-zero

observations as in column (1), US’ gains from trade in year 2000 is implied to be 2 percent, slightly

higher than the figure in Arkolakis et al. (2012) but still seems to be quite small. When we use

the estimate obtained from data including zero observations with proper product-level controls as

in column (5), the implied US’s gains from trade in year 2000 increase to 16 percent. This simple

numerical example serves the purpose that, to evaluate the welfare impact of trade liberalization,

it is paramount to have an unbiased estimate of trade elasticity.

21

Table 1: Trade elasticities: HS 6-digit

(1) (2) (3) (4) (5)Disaggregate: HS6

Elasticity (1-s ) -3.606*** -0.705*** -0.681*** -0.571**** -0.427***(0.354) (0.123) (0.125) (0.268) (0.101)

ij, it, jt Yes Yes Yes Yes Yesijt No No Yes Yes Yes

ikt, jkt No No No Yes Yesijk No No No No Yes

Zero obs. No Yes Yes Yes YesModel log-linear

No. of obs. 6,016,280 18,158,824

Notes: Cluster (pair) robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the10%, 5%, 1% levels, respectively. We used balanced data and the within estimator.

4.2.2 The Extensive Margin

Given that the elasticity estimate is substantially smaller when the extensive product margin of

trade is added to the intensive product margin of trade, it will be interesting to know that how the

extensive product margin of trade itself responses to change in variable trade costs.

We use the following model to estimate the effect on the probability of starting import:

I(Ti jkt > 0) = F(a0 +a · tari f fi jkt +wi jk +hi jt +uikt + v jkt) (18)

where I is an indicator of trade flows Ti jkt that it is equal to one if Ti jkt > 0 and zero otherwise, and

F(·) is normal CDF. We estimate (18) using a linear probability model that allows to control four

multi-way error components. Table 2 shows the probability of a country starting to export a new

product to a partner country in response to a tariff cut by its partner for that product. Given this is

a probability model, the tariff enters the model in level terms instead of in logarithmic terms. The

estimated probability varies depending on what type of unobserved heterogeneity being accounted

for. In column (3) where the control for unobserved heterogeneity is most comprehensive, it is

22

Table 2: Probability of starting new product trade

(1) (2) (3)Disaggregate: HS6

Probability -0.011 -0.068*** -0.038***(0.009) (0.023) (0.011)

ijt Yes Yes Yesikt, jkt No Yes Yes

ijk No No YesModel linear probability model

No. of obs. 18,158,824

Notes: Cluster (pair) robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the10%, 5%, 1% levels, respectively. We used balanced data and the within estimator.

found that on average there is a close to 3.8% chance that a country will start importing a new

product from a partner country if the latter cuts the tariff on that product by 1 percentage point.

4.2.3 Homogeneous vs Heterogeneous Products

The more homogeneous a product is, the higher the substituiability amongst its varieties should

be. Rauch (1999) test this hypothesis by classifying goods into three groups: commodities, refer-

ence priced goods and differentiated goods, where commodities are goods that traded on organized

exchange, Reference priced goods are the goods that reference prices of them being published in

trade journals, and the rest are differentiated goods. He find that the trade elasticity of commodities

is the largest and that of differentiated goods the smallest. Broda and Weinstein (2006), using a

model of import demand and supply equations, also find the commodities have higher elasticity

of substitution than the other goods. However, Broda and Weinstein (2006) caution that com-

modities are not necessarily perfectly substitutable goods, as demonstrated in the example that tea

products, despite highly differentiated, are traded in organized exchange and thus classified as a

commodity by Rauch. They further argue that such product classifications may not truely reflect

the homogeneity of a product with the example that “dried, salted, or smoked fish” is classified as

a commodity, while “fresh fish” a reference priced good and “frozen fish” a differentiated good.23

In this section, we reexamine this hypothesis by applying the classification of Rauch (1999)15

to our data.

In our sample, 5.1%, 30.7% and 64.2% of HS-6 product classifications are composed of com-

modities, reference priced and differentiated goods, respectively.

Table ?? shows the results. In column (1), commodity goods have the largest trade elasticity,

followed by referenced priced goods, and then differentiated goods. This is consistent with the

expectation in Rauch (1999) and finding in Broda and Weinstein (2006). However, it can be noticed

that the trade elasticity of differentiated goods is of an unexpected positive sign. This could be

due to fact that we have not fully accounted for all product-level unobserved heterogeneity and

therefore the estimate is biased.

When we progressively control for more unobserved factors, surprisingly the relative trade

elasticities of the three types of goods reverse. In column (3), differentiate goods now have the

largest elasticity and of a correct sign, while commodities goods have the smallest and insignificant

one. A potential explanation for the reverse of their relative magnitudes is the following. As the

world markets for commodities are more integrated via organized market trading, changes in tariffs

on a product between a pair (of countries) has greater effects on the prices of that product elsewhere

in the world, and the feedback (or general equilibrium) effect on the trade between the original pair

will also be stronger. In the Anderson and van Wincoop (2003) framework, the relative prices of

goods in the rest of the world for a pair is denoted as MRTs. In our modeling setting, product-

level MRTs are captured by ikt and jkt fixed effects. Therefore, once we have controlled for ikt-

and jkt-unobserved heterogeneity, we eliminate this feedback effect. Because the feedback effect

could be larger for goods sold in more organized market, its removal may reverse the relative trade

elasticities of the three types of products in column (3) as compared to column (1).15We use the conservative definition from Rauch (1999). Using his liberal definition yields very similar results.

24

Table 3: Trade elasticities heterogeneity according to homogeneity of goods with unobserved het-erogeneity


Differentiated goods 0.483*** -0.512 -0.482***(0.169) (0.366) (0.143)

Reference priced goods -1.444*** -0.519* -0.300***(0.151) (0.281) (0.080)

Commodity goods -1.887*** 0.097 -0.041(0.217) (0.416) (0.183)


ijk No No YesModel log-linear

No. of obs. 15,835,089

Notes: Some sectors that were not categorized in these three groups are dropped in the estimations. Cluster (pair)robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the 10%, 5%, 1% levels,respectively. We used balanced data and the within estimator.

4.2.4 Products of Different Skill-Intensity

Firms’ markups depend on trade elasticity. Higher trade elasticity yields lower markups. The skill

intensity of products may indicate the firms’ market power, where greater skill intensity and thus

market power gives the firms’ room to set a higher markup. Firms produce higher skill intensity

products, such as contact lenses, may have more market power compared firms producing lower

skill intensity products such as toilet papers.

In this section, we divide the products based on the skill-intensity classification of Basu (2011).16

In our sample, 25.5%, 34.8%, 16.7% and 23% are minerals, resource and low-skill intensive man-

ufacturing (e.g., toilet paper), medium-skill intensive manufacturing (e.g., air conditioner) and

high-skill intensive manufacturing (e.g., microscopes), respectively. Table 5 shows the results. In

column (1), without proper controls at product level, the trade elasticity estimates for medium and

high-skill intensive manufacturing have wrong signs and statistically significant. However, with

controlling for all fixed effects in column (3), the estimates change dramatically and turn to be16See Appendix A for details of the classification.

25

Table 4: Trade elasticities heterogeneity according to skill-intensity of goods with unobservedheterogeneity


Minerals -1.379*** -0.214 -0.187**(0.158) (0.216) (0.078)

Resource & low-skill intensive manufacturing -0.475* 0.410 -0.474***(0.252) (0.425) (0.159)

Medium-skill intensive manufacturing 6.722*** -5.129*** -0.368**(0.458) (0.762) (0.186)

High-skill intensive manufacturing 1.692*** -0.571 -0.216(0.248) (0.654) (0.141)


ijk No No YesModel log-linear

No. of obs. 14,712,838

Notes: Some sectors that were not categorized in these four groups are dropped in the estimations. Cluster (pair)robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the 10%, 5%, 1% levels,respectively. We used balanced data and the within estimator.

of correct signs. Resource and low-skill intensive manufacturing products have the highest elas-

ticity and of a correct sign while high-skill intensive manufacturing products have the lowest and

insignificant one. It implies that if countries’ final goods imports are more weighted on low and

medium-skill intensive manufacturing goods, for the same trade share the welfare gains from trade

would be smaller.

5 Conclusion

Trade elasticity is a key variable in assessing the gains from trade. In particular, gains from trade are

shown to be a power function of import penetration ratio and trade elasticity that, a small decrease

in the absolute value of trade elasticity implies a large increase in gains from trade. Previous

empirical literature almost uniformly suggests that the trade elasticity is of an order way bigger

than unity, implying questionably small gains from trade In this paper, we show that previous

26

estimations suffer from various sources of bias: data aggregation, pile-ups at zero trade flows,

heterogeneous pair-product fixed costs of exporting, and product-level MRTs.

To account for these sources of bias, this paper estimated trade elasticity using an new dataset

on tariffs, a novel empirical model, and an innovative estimation method. The new dataset devel-

oped in this paper contains tariff and bilateral trade data for 47 countries from 2002 to 2010 at the

HS 6-digit product level, which covers over 5,300 products. This is the most disaggregate tariff

and bilateral trade data that are currently available for such a large number of countries and years.

This highly disaggregate dataset provides us a platform to tackle bias due to data aggregation on

the one hand, but exacerbates the problem of pile-ups at zero trade flows on the other because

each country pair tends to trade a limited range of HS6-digit level products. A proper treatment

of zero trade flows at the product level requires us to account for heterogeneous pair-product fixed

costs of exporting. We propose to deal with this problem as well as product-level MRTs using a

gravity model with 4-way error components. These error components control for observed and

unobserved heterogeneity at i jt, ikt, jkt and i jk dimensions using fixed effects. Estimating such as

4WEC models is not trivial, especially with unbalanced data, as being the case here. We therefore

suggest a novel application of the within estimator so that it is consistent even with unbalanced

data.

It is shown that, once these sources of bias are arrested, the trade elasticity estimate is reduced

by a fraction of one-tenth, implying much sizable welfare gains from trade. It is also found that

the response of the extensive product margin of trade to tariff cut is quite small at least in the short

run, leaving the intensive product margin of trade the main channel through which tariff changes

affect trade flows. Lastly, there is evidence that there are large variations in the trade elasticities

across sectors of different levels of market structures and skill-intensity.

27

A Data descriptions

For our analysis we need bilateral tariff data at product level. Bilateral tariff data at sectoral levels,

however, are not readily available and need to be newly constructed. We construct a dataset for

time-variant bilateral tariffs at the HS6-digit level for every pair in our sample as follows.17

First, we download each importing country’s HS6-digit simple average applied MFN tariffs

from the World Integrated Trade Solution (WITS). For countries engaged in a Customs Union

(CU), tariff schedules are available at the union level only. We therefore, use these CU tariff

data to generate country level MFN tariff data for the member countries of a CU. Also, for some

countries, MFN tariffs are not available for certain years. In those cases, we use the data from the

closest previous available year as a substitute. The MFN tariffs are unilateral instead of bilateral.

To obtain bilateral tariff data, for each year we apply an importing country’s MFN tariffs to all its

trading partner countries. This procedure is justifiable as our sample includes the WTO members

only.

Preferential tariffs are collected separately for applicable pairs for each year also at the HS6-

digit level. If preferential tariffs on certain products are imposed by an importing country are

applicable to multiple exporting countries simultaneously, data are observed at the group level

only. To identify which countries belong to a specific group, we use the Preference Beneficiaries

data from the Trade Analysis and Information System (TRAINS), and then generate preferential

tariff data for each pair. Like MFN tariffs, for the countries engaged in a CU, we only observe

the preferential external tariff schedules at the union level. Therefore, we generate an individual

member country’s bilateral tariff data using the CU tariff data. Preferential tariffs are also not

always reported for all years for many countries, so we use the data from the closest previous

available year as a substitute for an unavailable year.

For given pair, product and year, if data for both MFN and preferential tariffs are available,

the latter is lower with only a few exceptions. So we use the lower rate between the two that is17The tariff data are standard for all countries only up to the HS6-digit level. For more disaggregated levels,

countries do not always use the same codes to define products.

28

available.18

All data are converted to HS2007 6 digit using UN correspondence tables. Trade and tariff data

is from 2002 to 2010, so all data from 2002-2006 are converted into HS2007 product code. In Ta-

ble 3, Rauch (1999)’s SITC Revision 2 4-digit data on product homogeneity/heterogeneity are also

converted into HS2007 product code. In Table 4, in order to classify the skill requirement level, we

use Basu (2011)’s classifications. He classified the products in to 7 categories: (A) non-fuel pri-

mary commodities, (B) resource-intensive manufactures, (C) low skill- and technology-intensive

manufactures, (D) medium skill- and technology intensive manufactures, (E) high skill- and tech-

nology intensive manufactures, (F) mineral fuels, and (G) unclassified products. For presentation

purpose, we combine (A) and (F) and classify them “Minerals”, and combine (B) and (C) and

dropped (G).

18Alternatively, we could use preferential tariffs whenever they are available, otherwise use MFN tariffs. Thedifference it makes to the tariff variable after averaging over all products is negligible.

29

B The equivalence between LSDV method and the within esti-

mator for balanced data

For the sake of simplicity of illustration, we consider 3-way error component model instead of

4-way error component model. Given that both 3-way and 4-way error component models have

multiple error component, both require multiple with transformations so that the procedure is ba-

sically the same. A extension to 4-way error component model is straightforward. The following

equation is considered for an illustration of transformation:

yi jt = Xi jta +uit + v jt +wi j + ei jt

where i = 1,2, ...n and j = 1,2, ...m are country identifiers and t = 1,2, ...T the time identifier.

In matrix format, we can rewrite:

y = Xa +u+v+Dww + e (19)

where y=(y111,y112, ...,y11T ,y121, ...,y12T ,y131...,y1m1, ...,y1mT ,y211, ...,y21T ,y221, ..,ynm1, ..,ynmT )

is a m⇥n⇥T by 1 vector. In matrix format, we stack over t first, then over j and, finally, over i.

Qw = IT �Dw(D0wDw)

�1D0w

Qw =

0

BBBBBBB@

IT

IT

. . .

IT

1

CCCCCCCA

�

0

BBBBBBB@

iT

iT. . .

iT

1

CCCCCCCA

·

30

2

66666664

0

BBBBBBB@

i0T

i0T. . .

i0T

1

CCCCCCCA

0

BBBBBBB@

iT

iT. . .

iT

1

CCCCCCCA

3

77777775

�10

BBBBBBB@

i0T

i0T. . .

i0T

1

CCCCCCCA

Qw =

0

BBBBBBBBBB@

IT

IT

. . .. . .

IT

1

CCCCCCCCCCA

�

0

BBBBBBBBBB@

1T iT i0T

1T iT i0T

. . .. . .

1T iT i0T

1

CCCCCCCCCCA

Qw =

0

BBBBBBBBBB@

IT � 1T iT i0T

IT � 1T iT i0T

. . .. . .

IT � 1T iT i0T

1

CCCCCCCCCCA

where iT is a T ⇥1 column vector of 1, Qw is a m⇥n⇥T by m⇥n⇥T matrix and Dw is a m⇥n⇥T

by m⇥n matrix.

Therefore, we obtain:

Qwy =

0

BBBBBBBBBB@

IT � 1T iT i0T

IT � 1T iT i0T

. . .. . .

IT � 1T iT i0T

1

CCCCCCCCCCA

0

BBBBBBBBBBBBBBBBB@

y11

y12...

y1m

y21...

ynm

1

CCCCCCCCCCCCCCCCCA

where y11 = (y111,y112, ...,y11T )0 and, similarly, ynm = (ynm1,ynm2, ...,ymnT )0.31

Qwy =

0

BBBBBBBBBBBBBBBBB@

y111 � y11 ⌘ y⇤111

y112 � y11 ⌘ y⇤112...

y1mT � y1m ⌘ y⇤1mT

y211 � y21 ⌘ y⇤211...

ynmT � ymn ⌘ y⇤nmT

1

CCCCCCCCCCCCCCCCCA

where y11 =1T ÂT

t=1 y11t , y1m = 1T ÂT

t=1 y1mt , and ynm = 1T ÂT

t=1 ynmt . Therefore, pre-multiplying by

Qw demeans variables using the mean over t.

We can rewrite (19), after pre-multiplying it by Qw, as follows:

y⇤i jt = X⇤i jta +u⇤it + v⇤jt + e⇤i jt


y⇤ = X⇤a +Duu⇤+v⇤+ e⇤ (20)

where y⇤=(y⇤111,y⇤121, ...,y

⇤1m1,y

⇤112, ...,y

⇤1m2,y

⇤113...,y

⇤1m3, ...,y

⇤1mT ,y

⇤211, ...,y

⇤2m1,y

⇤212, ..,y

⇤n1T , ..,y

⇤nmT )

is a m⇥n⇥T by 1 vector. In matrix format, we stack over j first, then over t and, finally, over i.

Qu = Im �Du(D0uDu)

�1D0u

Qu =

0

BBBBBBB@

Im

Im

. . .

Im

1

CCCCCCCA

�

0

BBBBBBB@

im

im. . .

im

1

CCCCCCCA

·

32

2

66666664

0

BBBBBBB@

i0m

i0m. . .

i0m

1

CCCCCCCA

0

BBBBBBB@

im

im. . .

im

1

CCCCCCCA

3

77777775

�10

BBBBBBB@

i0m

i0m. . .

i0m

1

CCCCCCCA

Qu =

0

BBBBBBBBBB@

Im

Im

. . .. . .

Im

1

CCCCCCCCCCA

�

0

BBBBBBBBBB@

1m imi0m

1m imi0m

. . .. . .

1m imi0m

1

CCCCCCCCCCA

where im is a m⇥1 column vector of 1, Qu is a m⇥n⇥T by m⇥n⇥T matrix and Du is a m⇥n⇥T

by n⇥T matrix.


Quy⇤ =

0

BBBBBBBBBB@

Im � 1m imi0m

Im � 1m imi0m

. . .. . .

Im � 1m imi0m

1

CCCCCCCCCCA

0

BBBBBBBBBBBBBBBBB@

y⇤11

y⇤12...

y⇤1T

y⇤21...

y⇤nT

1

CCCCCCCCCCCCCCCCCA

where y⇤11 = (y⇤111,y⇤121, ...,y

⇤1m1)

0 and, similarly, y⇤nT = (y⇤n1T ,y⇤n2T , ...,y

⇤nmT )

0.

33

Quy⇤ =

0

BBBBBBBBBBBBBBBBB@

y⇤111 � y⇤11 ⌘ y#111

y⇤121 � y⇤11 ⌘ y#121

...

y⇤1mT � y⇤1T ⌘ y#1mT

y⇤211 � y⇤21 ⌘ y#211

...

y⇤nmT � y⇤nT ⌘ y#nmT

1

CCCCCCCCCCCCCCCCCA

where y⇤11 = 1m Âm

j=1 y⇤1 j1, y⇤1T = 1m Âm

j=1 y⇤1 jT , and y⇤nT = 1m Âm

j=1 yn jT . Therefore, pre-multiplying

by Qu demeans variables using the mean over j. We can rewrite (20), after pre-multiplying it by

Qu, as follows:

y#i jt = X#

i jta + v#jt + e#

i jt


y# = X#a +Dvv# + e# (21)

We stack over i first, then over t and, finally, over j.

Qv = In �Dv(D0vDv)

�1D0v

Qv =

0

BBBBBBB@

In

In

. . .

In

1

CCCCCCCA

�

0

BBBBBBB@

1n ini0n

1n ini0n

. . .

1n ini0n

1

CCCCCCCA

where in is a n⇥1 column vector of 1, Qv is a m⇥n⇥T by m⇥n⇥T matrix and Dv is a m⇥n⇥T

by m⇥T matrix.


34

Qvy# =

0

BBBBBBBBBB@

In � 1n ini0n

In � 1n ini0n

. . .. . .

In � 1n ini0n

1

CCCCCCCCCCA

0

BBBBBBBBBBBBBBBBB@

y#11

y#12...

y#1T

y#21...

y#mT

1

CCCCCCCCCCCCCCCCCA

where y#11 = (y#

111,y#211, ...,y

#n11)

0 and, similarly, y#mT = (y#

1mT ,y#2mT , ...,y

#nmT )

0.

Qvy# =

0

BBBBBBBBBBBBBBBBB@

y#111 � y#

11 ⌘ ey111

y#211 � y#

11 ⌘ ey211...

y#n1T � y#

1T ⌘ eyn1T

y#121 � y#

21 ⌘ ey121...

y#nmT � y#

mT ⌘ eynmT

1

CCCCCCCCCCCCCCCCCA

Therefore, pre-multiplying by Qv demeans variables using the mean over i. We can rewrite

(21), after pre-multiplying it by Qv as follows:

eyi jt = eXi jta +eei jt (22)

In conclusion, we establish that sequential least square dummy variables approach on (19), (20)

and (21) is equivalent to the within estimator with our proposed transformations of variables as in

(22).

35

C Within transformation for unbalanced data

For the sake of simplicity of illustration, we consider 3-way error component model instead of 4-

way error component model. Similar to in the Appendix B, both 3-way and 4-way error component

models require multiple within transformations to remove multiple unobserved heterogeneity fac-

tors so that the procedure of within transformations is basically the same. Consider the following

3-way error components model.

yi jt = Zi jtb +wi j +uit + v jt + ei jt

As shown in section 4.3.2, the first order demeaning can remove one unobserved heterogeneity

completely. On the other hand, the second and third order demeaning can only partially eliminate

unobserved heterogeneity. Incomplete elimination of uit and v jt occurs because non-first-order

summations over i, j, t provide distinct values according to the order of summation when there are

missing observations in the i and j dimensions. Equations (23), (24), and (25) provide demeaned

values for unobserved heterogeneity for wi j, uit and v jt , respectively, after the t � i� j within

transformation.

ewi j = si jtwi j �1

Ti jST

t=1si jtwi j �1

NitSN

j=1si jtwi j �1

NjtSN

i=1si jtwi j +1

NitSN

j=11

Ti jST

t=1si jtwi j

+1

NjtSN

i=11

Ti jST

t=1si jtwi j +1

NitSN

j=11

NjtSN

i=1si jtwi j �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtwi j(23)

= si jwi j � si jwi j �1Ni

SNj=1si jwi j �

1Nj

SNi=1si jwi j +

1Ni

SNj=1si jwi j

+1

NjSN

i=1si jwi j +1Ni

SNj=1

1Nj

SNi=1si jwi j �

1Ni

SNj=1

1Nj

SNi=1si jwi j

= 0

where, for the second equality in (23), we use 1Ti j

STt=1si jtwi j =

1Ti j

Ti j · si jtwi j, and for i j-variant

variables si jt , N and T do not depend on t (i.e. si jt = si j,Nit = Ni, Njt = Nj).

36

ev jt = si jtv jt �1

Ti jST

t=1si jtv jt �1

NitSN

j=1si jtv jt �1

NjtSN

i=1si jtv jt +1

NitSN

j=11

Ti jST

t=1si jtv jt

+1

NjtSN

i=11

Ti jST

t=1si jtv jt +1

NitSN

j=11

NjtSN

i=1si jtv jt �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtv jt

= si jtv jt �1

Ti jST

t=1si jtv jt �1

NitSN

j=1si jtv jt � si jtv jt +1

NitSN

j=11

Ti jST

t=1si jtv jt

+1

NjtSN

i=11

Ti jST

t=1si jtv jt +1

NitSN

j=1si jtv jt �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtv jt (24)

= (� 1Tj

STt=1s jtv jt +

1Nt

SNj=1

1Tj

STt=1s jtv jt)� (� 1

NjtSN

i=11Tj

STt=1s jtv jt +

1Nt

SNj=1

1Njt

SNi=1

1Tj

STt=1s jtv jt)

where, for the second equality in (24), we make use of the facts that 1Njt

SNi=1si jtv jt =

1Njt

Njt ·si jtv jt =

si jtv jt , and that for jt-variant variables si jt , N and T do not depend on i (i.e. si jt = s jt , Nit = Nt ,

Ti j = Tj). In the last line in (24), the first term is not equal to zero if there are missing observations

in the j dimension and the second term is not equal to zero if there are missing observations in the

i and j dimensions.

euit = si jtuit �1

Ti jST

t=1si jtuit �1

NitSN

j=1si jtuit �1

NjtSN

i=1si jtuit +1

NitSN

j=11

Ti jST

t=1si jtuit

+1

NjtSN

i=11

Ti jST

t=1si jtuit +1

NitSN

j=11

NjtSN

i=1si jtuit �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtuit

= si jtuit �1

Ti jST

t=1si jtuit � si jtuit �1

NjtSN

i=1si jtuit +1

NitSN

j=11

Ti jST

t=1si jtuit

+1

NjtSN

i=11

Ti jST

t=1si jtuit +1

NitSN

j=11

NjtSN

i=1si jtuit �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtuit (25)

= (� 1Ti

STt=1situit +

1Nt

SNi=1

1Ti

STt=1situit)+(� 1

NtSN

i=1situit +1

NitSN

j=11Nt

SNi=1situit)

+(1

NitSN

j=11Ti

STt=1situit �

1Nit

SNj=1

1Nt

SNi=1

1Ti

STt=1situit)

= (� 1Ti

STt=1situit +

1Nt

SNi=1

1Ti

STt=1situit)� (� 1

NitSN

j=11Ti

STt=1situit +

1Nit

SNj=1

1Nt

SNi=1

1Ti

STt=1situit)

where, for the second equality in (25), we make use of the facts that 1Nit

SNj=1si jtuit =

1Nit

Nit ·si jtuit =

si jtuit , and that for it-variant variables si jt , N and T do not depend on j (i.e. si jt = sit , Njt = Nt ,37

Ti j = Ti). In the last line in (25), the first term is not equal to zero if there are missing observations

in the i dimension and the second term is not equal to zero if there are missing observations in the

i and j dimensions.

D Monte Carlo Simulation

For 3WEC model with unbalanced data, simulation results for the within estimators are reproduced

using the results in Kwak et al. (2012).The statistical properties of the within estimator are the same

for both 3WEC and 4WEC models with unbalanced data.

D.1 Monte Carlo design: 3WEC model with unbalanced data

The model we use for all simulations in this section is:

yi jt = ax1i jt +b z1it + gz2 jt +uit + v jt +wi j + ei jt (26)

where x1 is an endogenous variable as it is highly correlated with uit , v jt and wi j; uit⇠ iidN(0,1);

v jt ⇠ iidN(0,1); wi j ⇠ iidN(0,1); idiosyncratic error ei jt ⇠ iidN(0,1); and the number of obser-

vations is 2,000 with i =1,...,20 , j =1,...,20 and t =1,...,5 .

Our DGP is:

x1i jt =14[a ·uit +b · v jt + c ·wi j]+

N(0,1)4

; z1it = d ·uit +N(0,1)

2; z2 jt = e · v jt +

N(0,1)2

(27)

We consider the following nine sets of values for (a, b, c, d, e): (i) (0,0,1,0,0), (ii) (1,0,0,0,0),

(iii) (0,1,0,0,0), (iv) (1,0,1,0,0), (v) (1,1,0,0,0), (vi) (0,1,1,0,0), (vii) (12 ,1

4 ,1,0,0), (viii) (1,12 ,1

4 ,0,0),

and (ix) (12 ,1,1

4 ,0,0). For (i), (ii) and (iii), the bias of within estimators can only be caused by

one type of unobserved heterogeneity. For (iv), (v) and (vi), the bias is caused by two types of

unobserved heterogeneity with equal magnitude. Finally, for (vii), (viii), (ix), the bias is caused38

by all three types of unobserved heterogeneity with different magnitudes. The last three processes

are to examine how the bias of different within estimators changes with the relative importance of

different types of unobserved heterogeneity. 19

Thus, given x1i jt , z1it , z2 jt , uit , v jt , wi j and ei jt , and with true parameter values for a = 1 ,

b = �.5 , and g = .2 , we obtain yi jt in (26). We are interested in consistently estimating a with

the proposed within estimators.

Pooled OLS estimator for a is given by:

plima = a +cov(x⇤1,u

⇤it)

var(x⇤1)| {z }bias due to uit

+cov(x⇤1,v

⇤jt)

var(x⇤1)| {z }bias due to v jt

+cov(x⇤1,w⇤

i j)

var(x⇤1)| {z }bias due to wi j

(28)

where x⇤1 is a residual from the regression of x1 on z1it and z2 jt , and we can similarly define u⇤it ,

v⇤jt , and w⇤i j as residuals from the generated data using the regressions of these variables on z1it and

z2 jt . Our primary focus is to quantify how the within transformation increases/reduces the bias in

(28) and to identify which within estimator causes the smallest bias.

For the t � i� j within transformation, the within estimator for a is given by:

plima1 = a +cov(ex⇤1, eu⇤it +ev⇤jt)

var(ex⇤1)| {z }bias after transformation

= a + v2 +u3 (29)

where v2 is bias due to v jt which is not eliminated completely after the second order i-dimensional

demeaning and u3 is bias due to uit which is not eliminated completely after the third order j-

dimensional demeaning. We can obtain a for the other five within estimators in similar ways.

In the first MC simulation, missing observations are assigned randomly to y, x, z1 and z2 using

the following process. We randomly assign a number q from 1 to 2000 for each observation and

then randomly draw 500 missing observations (i.e. 25% missing) from a uniform distribution:19We also consider DGPs which allow z1it and z2 jt to be correlated with unobserved heterogeneity and the x1i jt , but

the qualitative implications are the same as the results in Table 5 and 6.

39

dk⇤= uni f orm(0,1)⇤2000+1 (30)

dk = [d⇤k ]; sq = 0 if dk = q

where k = 1,2, ...,500 and [x] is the largest integer not greater than x. In the second MC simulation,

we increase the proportion of missing data to 40%.

D.2 Simulation results

Table 5 reports the finite sample behavior of the within estimators for a . On the left hand side of

the table are sources of biases, on the right hand side are the remaining biases after each of the

transformations. For instance, in the first row where the 198% bias is entirely due to uit , the bias

is reduced to 30.5% after the t � i� j transformation, and 25.8% after the t � j� i transformation,

and so forth. The following implications can be drawn from Table 5:

(i) As long as the direction of bias due to uit , v jt and wi j is the same, all six within estimators

reduce the bias significantly. Even in the worst case (the fifth row), a within estimator reduces the

bias by four-fifths, from 264% to 53.2%.

(ii) The extent of bias reduction varies amongst the six estimators. For instance, in the first row,

the bias is reduced to as low as 0% or as high as 30.5%.

(iii) It is possible to completely eliminate a single-sourced bias if an appropriate estimator is

used. For instance, in the first row where the bias is entirely due to uit , doing the j-dimensional

demeaning first can reduce the bias to 0%, while doing it last will reduce the bias least (with 30.5%

bias left).

(iv) When there are two or more sources of bias, no demeaning sequence can completely elim-

inate the bias.

(v) Whenever the bias due to wi j does not dominate those due to either uit or v jt , doing the

t-dimensional demeaning later will lead to larger bias reduction. For instance, except for the third

40

row in which the only source of bias is wi j, the bias reduction by the i� j� t demeaning sequence

tends to be larger than i� t � j, which in turn tends to be larger than t � i� j etc.

Table 6 shows the same results for the data with 40% missing proportion. As missing propor-

tion increases, the remaining bias after within transformations also increases regardless of the order

of demeaning dimension. However, the relationship between relative importance of biases due to

uit , v jt and wi j and the order of transformation remains the same as the data with 25% missing

proportion in Table 5.

Table 7 provides inference consistency of the t-test under the null hypothesis that the within

estimator is consistent. For the pooled OLS estimator, the bias due to unobserved heterogeneity

is provided in the first column of Table 7 and is about .80, as the mean of estimate for a is 1.80

while its true value is one. We examine a consistent within estimator for balanced panel data in

the second column and two consistent within estimators for unbalanced panel data with 25% of

missing data in third and fourth columns. The mean value of the rejection probability for t-test

of H0 : a = 1 is reported in the fourth row and its coverage values are reported in the fifth row.

The results show that if a within estimator completely eliminates unobserved heterogeneity, the

within estimator is consistent and a usual t-test with appropriate degrees of freedom adjustment in

section 2.2 provides a valid inference. The estimated mean value of the rejection probability for a

t-test of H0 : a = 1 is very close to .05 and its coverage always contains true type-I error .05.These

simulation results confirm that our proposed adjustment properly corrects the bias of standard error

estimates.

41

Table 5: Tracing source biases- 25% random missing

Source of bias The order of demean-dimension for within estimatorsuit + v jt +wi j uit v jt wi j tij tji jti itj ijt jit

(uit ,0,0) 198% 198% 0% 0% 30.5% 25.8% 0% 30.5% 5.5% 0%(0,v jt ,0) 197% 0% 197% 0% 25.7% 30.4% 30.6% 0% 0% 5.6%(0,0,wi j) 200% 0% 0% 200% 0% 0% 5.9% 5.9% 11.6% 11.6%(uit ,0,wi j) 265% 131% 0% 134% 30.9% 26.2% 6.0% 36.0% 17.1% 11.5%(uit ,v jt ,0) 264% 132% 132% 0% 53.2% 53.2% 31.2% 31.2% 6.1% 6.1%(0,v jt ,wi j) 265% 0% 131% 134% 26.2% 30.9% 36.0% 6.0% 11.6% 17.0%(uit ,v jt ,wi j) 299% 99% 99% 101% 52.9% 52.9% 35.9% 35.9% 17.0% 17.0%(uit ,

v jt2 ,wi j

4 ) 300% 170% 86% 44% 42.8% 40.7% 17.9% 32.2% 8.7% 5.9%(uit ,

v jt4 ,wi j

2 ) 300% 170% 43% 87% 36.9% 33.5% 11.3% 33.4% 11.6% 7.3%(uit

2 ,v jt ,wi j4 ) 300% 86% 170% 44% 40.7% 42.8% 32.3% 17.9% 5.9% 8.7%

(uit2 , v jt

4 ,wi j) 300% 85% 43% 173% 23.0% 21.7% 14.1% 21.9% 14.4% 12.9%(uit

4 ,v jt ,wi j2 ) 301% 43% 170% 88% 33.6% 36.9% 33.6% 11.3% 7.4% 11.6%

(uit4 , v jt

2 ,wi j) 301% 43% 85% 173% 21.8% 23.1% 22.1% 14.2% 13.0% 14.4%

Note: The number of replications is 2,000. (·, ·, ·) represent common element in unobserved heterogeneity and en-dogenous x. The first element is up, the second element is vp and the third element is wp. uit , v jt , and wi j are sourcesof bias as in equation (3.11).

D.3 Monte Carlo design: 4WEC model with partially balanced data

We use 4WEC model to examine the performance of the within estimator with partially balanced

data.

42

Table 6: Tracing source biases- 40% random missing

Source of bias The order of demean-dimension for within estimatorsuit + v jt +wi j uit v jt wi j tij tji jti itj ijt jit

(uit ,0,0) 197% 197% 0% 0% 49.5% 42.5% 0% 49.4% 9.8% 0%(0,v jt ,0) 197% 0% 197% 0% 42.7% 49.7% 49.7% 0% 0% 9.8%(0,0,wi j) 200% 0% 0% 200% 0% 0% 10.2% 10.2% 19.5% 19.5%(uit ,0,wi j) 265% 131% 0% 134% 49.7% 42.8% 10.2% 57.6% 28.4% 19.5%(uit ,v jt ,0) 264% 133% 132% 0% 82.8% 82.8% 49.7% 49.8% 9.9% 9.9%(0,v jt ,wi j) 265% 0% 131% 135% 42.7% 49.7% 57.6% 10.2% 19.5% 28.4%(uit ,v jt ,wi j) 299% 99% 101% 100% 82.8% 82.8% 57.6% 57.6% 28.4% 28.4%(uit ,

v jt2 ,wi j

4 ) 300% 170% 86% 44% 68.2% 65.3% 29.5% 51.2% 14.2% 9.7%(uit ,

v jt4 ,wi j

2 ) 301% 170% 43% 87% 59.8% 55.0% 19.0% 53.9% 19.3% 12.3%(uit

2 ,v jt ,wi j4 ) 301% 85% 171% 44% 66.0% 68.9% 52.0% 29.9% 9.9% 14.7%

(uit2 , v jt

4 ,wi j) 301% 85% 43% 173% 38.7% 36.7% 23.7% 36.4% 23.8% 21.6%(uit

4 ,v jt ,wi j2 ) 301% 43% 171% 87% 55.0% 59.8% 54.1% 19.0% 12.4% 19.4%

(uit4 , v jt

2 ,wi j) 301% 42% 85% 173% 36.7% 38.7% 36.6% 23.6% 21.5% 23.9%

Note: The number of replications is 2,000. (·, ·, ·) represent common element in unobserved heterogeneity and en-dogenous x. The first element is up, the second element is vp and the third element is wp. uit , v jt , and wi j are sourcesof bias as in equation (3.11).

Table 7: Inference accuracybias balanced unbalanced-jit unbalanced-jti

(uit ,0,0), mean of bias .79 .00 .00 .00(uit ,0,0), SD (.043) (.053) (.062) (.063)(uit ,0,0), coverage of coefficient (.997,1.001) (.997,1.002) (.997,1.002)(uit ,0,0), rejection% mean .051 .056 .050(uit ,0,0), rejection% coverage (.042,.061) (.046,.066) (.040,.059)

bias balanced unbalanced-ijt unbalanced-itj(0,v jt ,0), mean of bias .79 .00 .00 .00(0,v jt ,0), SD (.044) (.054) (.065) (.065)(0,v jt ,0), coverage of coefficient (.998,1.003) (.998,1.003) (.998,1.003)(0,v jt ,0), rejection% mean .049 .052 .057(0,v jt ,0), rejection% coverage (.039,.059) (.041,.062) (.046,.067)

bias balanced unbalanced-tij unbalanced-tji(0,0,wi j), mean of bias .80 .00 .00 .00(0,0,wi j), SD (.040) (.053) (.063) (.063)(0,0,wi j), coverage of coefficient (.998,1.002) (.997,1.003) (.997,1.003)(0,0,wi j), rejection% mean .056 .046 .045(0,0,wi j), rejection% coverage (.046,.066) (.037,.055) (.036,.054)

Note: The number of replications is 2,000. (·, ·, ·) represent common element in unobserved heterogeneity and en-dogenous x. The first element is uit , the second element is v jt and the third element is wi j. Balanced column representsthe estimates of the within estimator for balanced panel data.

43

References

Abrevaya, J., 2013. The projection approach for unbalanced panel data. The Econometrics Journal

16, 161–178.

Alessandria, G., Choi, H., Ruhl, K., 2014. Trade adjustment dynamics and the welfare gains from

trade. NBER Working Paper Series 20663.

Anderson, J. E., 1979. A theoretical foundation for the gravity equation. American Economic

Review 69 (1), 106–116.

Anderson, J. E., van Wincoop, E., 2003. Gravity with gravitas: a solution to the border puzzle.

American Economic Review 93 (1), 170–192.

Anderson, J. E., van Wincoop, E., 2004. Trade costs. Journal of Economic Literature 42 (3), 691–

751.

Anderson, J. E., Yotov, Y. V., Apr. 2011. Terms of Trade and Global Efficiency Effects of Free

Trade Agreements, 1990-2002. NBER Working Papers 17003, National Bureau of Economic

Research.

Anderson, J. E., Yotov, Y. V., Apr. 2012. Gold standard gravity. NBER Working Papers 17835,

National Bureau of Economic Research.

Antweiler, W., 2001. Nested random efects estimation in unbalanced panel data. Journal of Econo-

metrics 101, 295–313.

Arkolakis, C., Costinot, A., Rodriguez-Clare, A., February 2012. New trade models, same old

gains? American Economic Review 102 (1), 94–130.

Baier, S. L., Bergstrand, J. H., 2001. The growth of world trade: tariffs, transport costs, and income

similarity. Journal of International Economics 53 (1), 1–27.

44

Baier, S. L., Bergstrand, J. H., 2007. Do free trade agreements actually increase members’ interna-

tional trade? Journal of International Economics 71 (1), 72–95.

Baier, S. L., Bergstrand, J. H., Feng, M., 2014. Economic integration agreements and the margins

of international trade. Journal of International Economics 93 (2), 339–350.

Baltagi, B. H., Song, S., Jung, B. C., 2001. The unbalanced nested error component regression

model. Journal of Econometrics 101, 357–381.

Basu, S., 2011. Retooling trade policy in developing countries: Does technology intensity of ex-

ports matter for gdp per capita. Policy Issues in International Trade and Commodities 56.

Bergstrand, J. H., Egger, P., Larch, M., 2013. Gravity redux: Estimation of gravity-equation coeffi-

cients, elasticities of substitution, and general equilibrium comparative statics under asymmetric

bilateral trade costs. Journal of International Economics 89 (1), 110–121.

Broda, C., Weinstein, D. E., May 2006. Globalization and the gains from variety. The Quarterly

Journal of Economics 121 (2), 541–585.

Caliendo, L., Parro, F., 2014. Estimates of the trade and welfare effects of nafta. The Review of

Economic Studies, rdu035.

Chaney, T., 2008. Distorted gravity: The intensive and extensive margins of international trade.

American Economic Review 98 (4), 1707–21.

Davis, P., 2002. Estimating multi-way error components models with unbalanced data structures.

Journal of Econometrics 106, 67–95.

Dutt, P., Mihov, I., Van Zandt, T., 2013. Does wto matter for the extensive and the intensive margins

of trade? Journal of International Economics forthcoming.

Eaton, J., Kortum, S., 2002. Consistent estimation from partially consistent observations. Econo-

metrica 70, 1741–1779.

45

Harrigan, J., 1993. Oecd imports and trade barriers in 1983. Journal of International Economics

35 (1-2), 91–111.

Helpman, E., Melitz, M., Rubinstein, Y., 2008. Estimating trade flows: Trading partners and trad-

ing volumes. The Quarterly Journal of Economics 123 (2), 441–487.

Hummels, D., 2001. Toward a geography of trade costs. Tech. rep., Mimeo, Purdue University.

Koenig, P., Mayneris, F., Poncet, S., 2010. Local export spillovers in france. The European Eco-

nomic Review 54 (4), 622–641.

Krautheim, S., 2012. Heterogeneous firms, exporter networks and the effect of distance on inter-

national trade. Journal of International Economics 87 (1), 27–35.

Kwak, D. W., Cheong, J., Tang, K. K., 2012. A within estimator for three-level data: An application

to the wto effect on trade flows. mimeo, School of Economics Discussion Paper No. 501, 2012.

Limao, N., Tovar, P., 2011. Policy choice: Theory and evidence from commitment via international

trade agreements. Journal of International Economics 85 (2), 186–205.

Magee, C. S., 2008. New measures of trade creation and trade diversion. Journal of International

Economics 75 (2), 349–362.

Maoz, Z., Henderson, E. A., 2013. The world religion dataset, 1945-2010: Logic, estimates, and

trends. International Interactions 39 (3), 265–291.

Matyas, L., Balazsi, L., 2012. The estimation of multi-dimensional fixed effects panel data models.

mimeo.

Melitz, M. J., November 2003. The impact of trade on intra-industry reallocations and aggregate

industry productivity. Econometrica 71 (6), 1695–1725.

Melitz, M. J., Ottaviano, G. I., 2008. Market size, trade, and productivity. The review of economic

studies 75 (1), 295–316.46

Rauch, J. E., 1999. Networks versus markets in international trade. Journal of International Eco-

nomics 48 (1), 7–35.

Ray, E. J., 1981. The determinants of tariff and nontariff trade restrictions in the united states.

Journal of Political Economy 89 (1), 105–21.

Wansbeek, T., Kapteyn, A., 1989. Estimation of the error-components model with incomplete

panels. Journal of Econometrics 41, 341–361.

Wooldridge, J., 2009. Correlated random effects models with unbalanced panels. mimeo.

47

estimating trade elasticities using product-level dataeconseminar/seminar2015/may27_kwak.pdf ·...

Documents