Federico Belotti's niche on the web

CEIS, Centre for Economics and International Studies - Tor Vergata is hosting the 1st ERASMUS Intensive Programme on Efficiency and Productivity Analysis Programme (EPASP) organised with CHE, Centre for Health Economics - University of York and COHERE, Centre for Health Economics Research - University of Southern Denmark.

This course will present the latest contributions and developments in the field of efficiency and productivity growth measurement in manufacturing and services. The objective is to provide a comprehensive and up–to–date survey of the existing models, together with a significant discussion on data and on the core methods of practically measuring efficiency and productivity. First, the students are introduced to the measurement of partial and total factor productivity growth. Different parametric and non–parametric approaches to the productivity measurement in the context of firm–specific modeling are discussed. Second, a detailed survey of the econometric approach to efficiency analysis will be discussed, focusing on modeling, distributional assumptions and estimation methods. The correspondence between a number of hypotheses and empirical findings are examined through a varieties of relevant empirical applications. Third, measurement of inputs and outputs in manufacturing and services are discussed, with a particular emphasis to the analysis of efficiency and productivity growth in the service sector.

Morning classes will cover theoretical topics while computer sessions, in the afternoon, will focus on applied issues that will be analysed, mainly, using the Stata commands sfcross and sfpanel, two new user written packages documented at Stata Journal.

For this reason a good knowledge of STATA (data management, do file programming) is a prerequisite for the admission.

Apply now!

Missing data can pose major problems when estimating econometric models since it is generally unlikely that missing values are Missing Completely At Random. A strategy to address this issue without using more complex econometric approaches is represented by multiple imputation, that is the process of replacing missing values by multiple sets of plausible values. This post provides a simple example in which xsmle is used together with mi, a Stata's suite of commands that deals with multiple data imputation. Consider the following data in which the only regressor (x1) has 14 missing values

. sum y x1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
           y |       940    4.144785    1.674932  -.7510805   9.170195
          x1 |       926    1.375773    1.202928  -2.039139   4.966914

Since these missing values make the panel unbalanced, xsmle will not be able to work. Nonetheless, we can overcome the obstacle by exploiting mi and xsmle jointly.

The first step is to declare the dataset as a mi dataset. The command mi set wide is the appropriate one. Indeed, data must be mi setted before other mi commands can be used. It does not matter which mi style you choose since you can always change it using mi convert. In this example, I choose the wide style.

. mi set wide                                                       

. mi register imputed x1                                 

. set seed 1712                                                      

. mi impute regress x1 =, add(10)   

Univariate imputation                   Imputations =       10
Linear regression                             added =       10
Imputed: m=1 through m=10                   updated =        0

               |              Observations per m              
      Variable |   complete   incomplete   imputed |     total
            x1 |        926           14        14 |       940
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled in observations.)

Once the dataset has been mi setted, the second step is to register the variables with missing values. In this example, x1 is a variable that has missing values and mi register imputed x1 declares this variable as a variable to be imputed. A good practice for reproducible results is to set the seed of the Stata's pseudo random number generator using the command set seed #, where # is any number between between 0 and 2^31-1.

Then, the command mi impute regress x1 =, add(10) can be used to fill in missing values of x1 using the set of dummy variables from the categorical variable cat through the regress method (see help mi impute for detail on the available methods). The option add(10) specifies the number of imputations to add to the mi data (currently, the total number of imputations cannot exceed 1,000). After mi impute regress x1 =, add(10) has been executed, ten new variables _#_x1 (with # = 1,...,10) will be created in the dataset, each representing an imputed version of x1.

. mi estimate, dots: xsmle y x1, wmat(W) model(sdm) fe type(ind)

Imputations (10):
  .........10 done

Multiple-imputation estimates                     Imputations     =         10
SDM with spatial fixed-effects                    Number of obs   =        940
                                                  Average RVI     =     0.0409
                                                  Complete DF     =        748
DF adjustment:   Small sample                     DF:     min     =     421.51
                                                          avg     =     624.79
                                                          max     =     732.54
Model F test:       Equal FMI                     F(   7,  730.5) =     214.64
Within VCE type:          OIM                     Prob > F        =     0.0000

           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
Main         |
          x1 |   .3927311   .0358267    10.96   0.000     .3223164    .4631458
Wx           |
          x1 |   .7172546   .0759222     9.45   0.000     .5682036    .8663056
Spatial      |
         rho |   .3342481   .0404076     8.27   0.000     .2548957    .4136005
Variance     |
    sigma2_e |   .8201919   .0385477    21.28   0.000     .7445121    .8958717
Direct       |
          x1 |   .4641954   .0315439    14.72   0.000     .4021925    .5261983
Indirect     |
          x1 |   1.215779   .1155787    10.52   0.000     .9888711    1.442687
Total        |
          x1 |   1.679974    .127172    13.21   0.000     1.430302    1.929647

Finally, as documented in help mi estimate, the prefix command mi estimate: estimation_command can be used to execute the estimation_command on the imputed _#_x1 variables. This command will adjust coefficients and standard errors for the variability between imputations according to the combination rules by Rubin (1987). In this example, the command

mi estimate, dots: xsmle y x1, wmat(W) model(sdm) fe type(ind)

estimates a spatial fixed effects Durbin model on the ten imputed versions of the x1 variable.


Official Stata manuals and help files.
Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

Given the high number of requests, my colleagues and I have decided to provide a brief tutorial on how to get started with xsmle. First of all, you have to install the command by typing

net install xsmle, all from(

It is fundamental to use the option all because when you install a user-written package without using this option, the ancillary files (in this case "product.dta" and "usaww.spmat") won't be downloaded, while with the all option all the ancillary files will always (conditionally on the presence of an internet connection) be downloaded to 1) your current working directory OR 2) the directory you specified for ancillary files using the net set other command.

Now, lets assume that you haven't changed your "ancillary files" directory and that your Stata current working directory is the Desktop. Then, you should now find on your Desktop two new files: "product.dta" and "usaww.spmat". These files contain informations on public capital productivity in 48 US states observed over 17 years, as well as the spatial weights matrix for the US states (Munnel, 1990). The examples reported in the xsmle help file make use of these files, also available in R through the Ecdat package.

The first step is to load the "product.dta" into memory by typing

use product.dta, clear

Then, you have to do the same for the spatial weights matrix contained in the "usaww.spmat" file. Being an spmat file, this can be done by typing

spmat use W using "usaww.spmat"

At this stage, after computing the logarithm of the dependent and independent variables, we have all the ingredients to estimate a spatial panel data model. For instance, the syntax to estimate a Spatial AutoRegressive (SAR) model with random effects is

xsmle lngsp lnpcap lnpc lnemp unemp, wmat(W) model(sar)

Notice that the dataset "product.dta" has been already declared to be a panel. In general, you need to declare your dataset to be a panel dataset by using the xtset command.


Munnell AH (1990). “Why Has Productivity Growth Declined? Productivity and Public Investment.” New England Economic Review, 1990, 3–22.

A new command for estimating and forecasting spatial panel data models using Stata is now available: xsmle.

xsmle fits fixed or random effects spatial models for balanced panel data. See the mi prefix command in order to use xsmle in the unbalanced case. Consider the following general specification for the spatial panel data model:

 y_{it} = \tau y_{it-1} + \rho W y_{it} + X_{it} \beta + D Z_{it} \theta + a_i + \gamma_t + v_{it}
 v_{it} = \lambda E v_{it} + u_{it}

where u_{it} is a normally distributed error term, W is the spatial matrix for the autoregressive component, D the spatial matrix for the spatially lagged independent variables, E the spatial matrix for the idiosyncratic error component. a_i is the individual fixed or random effect and \gamma_t is the time effect. xsmle fits the following nested models:

i) The SAR model with lagged dependent variable (\theta=\lambda=0)

y_{it} = \tau y_{it-1} + \rho W y_{it} + X_{it} \beta + a_i + \gamma_t + u_{it},

where the standard SAR model is obtained by setting \tau=0.

ii) The SDM model with lagged dependent variable (\lambda=0)

y_{it} = \tau y_{it-1} + \rho W y_{it} + X_{it} \beta + D Z_{it} \theta + a_i + \gamma_t + u_{it},

where the standard SDM model is obtained by setting \tau=0. xsmle allows to use a different weighting matrix for the spatially lagged dependent variable (W) and the spatially lagged regressors (D) together with a different sets of explanatory (X_{it}) and spatially lagged regressors (Z_{it}). The default is to use W=D and X_{it}=Z_{it}.

iii) The SAC model (\theta=\tau=0)

y_{it} = \rho W y_{it} + X_{it} \beta + a_i + \gamma_t + v_{it},
v_{it} = \lambda E v_{it} + u_{it},

for which xsmle allows to use a different weighting matrix for the spatially lagged dependent variable (W) and the error term (E).

iv) The SEM model (\rho=\theta=\tau=0)

y_{it} = X_{it} \beta + a_i + \gamma_t + v_{it},
v_{it} = \lambda E v_{it} + u_{it}.

v) The GSPRE model (\rho=\theta=\tau=0)

y_{it} = X_{it} \beta + a_i + v_{it},
a_i = \phi W a_i + \mu_i,
v_{it} = \lambda E v_{it} + u_{it},

where also the random effects have a spatial autoregressive form.

The command was written together with Andrea Piano Mortari and Gordon Hughes.

You may install it by typing

net install xsmle, all from(

in your Stata command bar.


Powered by WordPress © 2005 - 2014 © Web Design by myself using Arjuna-x theme (by SRS Solutions)