# econometrics.it

Federico Belotti's niche on the web

## A xsmle brief tutorial

Given the high number of requests, my colleagues and I have decided to provide a brief tutorial on how to get started with xsmle. First of all, you have to install the command by typing

net install xsmle, all from(http://www.econometrics.it/stata)

It is fundamental to use the option all because when you install a user-written package without using this option, the ancillary files (in this case "product.dta" and "usaww.spmat") won't be downloaded, while with the all option all the ancillary files will always (conditionally on the presence of an internet connection) be downloaded to 1) your current working directory OR 2) the directory you specified for ancillary files using the net set other command.

Now, lets assume that you haven't changed your "ancillary files" directory and that your Stata current working directory is the Desktop. Then, you should now find on your Desktop two new files: "product.dta" and "usaww.spmat". These files contain informations on public capital productivity in 48 US states observed over 17 years, as well as the spatial weights matrix for the US states (Munnel, 1990). The examples reported in the xsmle help file make use of these files, also available in R through the Ecdat package.

The first step is to load the "product.dta" into memory by typing

use product.dta, clear

Then, you have to do the same for the spatial weights matrix contained in the "usaww.spmat" file. Being an spmat file, this can be done by typing

spmat use W using "usaww.spmat"

At this stage, after computing the logarithm of the dependent and independent variables, we have all the ingredients to estimate a spatial panel data model. For instance, the syntax to estimate a Spatial AutoRegressive (SAR) model with random effects is

xsmle lngsp lnpcap lnpc lnemp unemp, wmat(W) model(sar)

Notice that the dataset "product.dta" has been already declared to be a panel. In general, you need to declare your dataset to be a panel dataset by using the xtset command.

References

Munnell AH (1990). “Why Has Productivity Growth Declined? Productivity and Public Investment.” New England Economic Review, 1990, 3–22.

## Spatial panel data models using Stata

A new command for estimating and forecasting spatial panel data models using Stata is now available: xsmle.

xsmle fits fixed or random effects spatial models for balanced panel data. See the mi prefix command in order to use xsmle in the unbalanced case. Consider the following general specification for the spatial panel data model:

$y_{it} = \tau y_{it-1} + \rho W y_{it} + X_{it} \beta + D Z_{it} \theta + a_i + \gamma_t + v_{it}$
$v_{it} = \lambda E v_{it} + u_{it}$

where $u_{it}$ is a normally distributed error term, $W$ is the spatial matrix for the autoregressive component, $D$ the spatial matrix for the spatially lagged independent variables, $E$ the spatial matrix for the idiosyncratic error component. $a_i$ is the individual fixed or random effect and $\gamma_t$ is the time effect. xsmle fits the following nested models:

i) The SAR model with lagged dependent variable ($\theta=\lambda=0$)

$y_{it} = \tau y_{it-1} + \rho W y_{it} + X_{it} \beta + a_i + \gamma_t + u_{it}$,

where the standard SAR model is obtained by setting $\tau=0$.

ii) The SDM model with lagged dependent variable ($\lambda=0$)

$y_{it} = \tau y_{it-1} + \rho W y_{it} + X_{it} \beta + D Z_{it} \theta + a_i + \gamma_t + u_{it}$,

where the standard SDM model is obtained by setting $\tau=0$. xsmle allows to use a different weighting matrix for the spatially lagged dependent variable ($W$) and the spatially lagged regressors ($D$) together with a different sets of explanatory ($X_{it}$) and spatially lagged regressors ($Z_{it}$). The default is to use $W=D$ and $X_{it}=Z_{it}$.

iii) The SAC model ($\theta=\tau=0$)

$y_{it} = \rho W y_{it} + X_{it} \beta + a_i + \gamma_t + v_{it}$,
$v_{it} = \lambda E v_{it} + u_{it}$,

for which xsmle allows to use a different weighting matrix for the spatially lagged dependent variable ($W$) and the error term ($E$).

iv) The SEM model ($\rho=\theta=\tau=0$)

$y_{it} = X_{it} \beta + a_i + \gamma_t + v_{it}$,
$v_{it} = \lambda E v_{it} + u_{it}$.

v) The GSPRE model ($\rho=\theta=\tau=0$)

$y_{it} = X_{it} \beta + a_i + v_{it}$,
$a_i = \phi W a_i + \mu_i$,
$v_{it} = \lambda E v_{it} + u_{it}$,

where also the random effects have a spatial autoregressive form.

The command was written together with Andrea Piano Mortari and Gordon Hughes.

You may install it by typing

net install xsmle, all from(http://www.econometrics.it/stata)

HTH,
Federico

## sfcross and sfpanel: stochastic frontier analysis using Stata

Two new Stata commands for the estimation and post-estimation of cross-sectional and panel data stochastic frontier models. sfcross extends the official frontier capabilities by including additional models (Greene 2003; Wang 2002) and command functionality, such as the possibility to manage complex survey data characteristics. Similarly, sfpanel allows to estimate a much wider range of time-varying inefficiency models compared to the official xtfrontier command. In particular, when estimation is done with likelihood-based methods, the SF model is:

$y_{it} = \alpha + X_{it}\beta + v_{it} \pm u$

where $v_{it}$ is a normally distributed error term and $u$ is a one-sided strictly non-negative term representing inefficiency. The sign of the $u$ term is positive or negative depending on whether the frontier describes a cost or production function, respectively. Among the time-varying inefficiency models $(u=u_{it})$, sfpanel fits:

i) the true fixed-effects (TFE) and the true random-effects (TRE) models developed by Greene (2005), in which both time-invariant unmeasured heterogeneity $(\alpha=\alpha_i)$ and time-varying firm inefficiency are considered;

ii) the Battese and Coelli (1995) model, in which the $u_{it}$ is obtained by truncation at zero of the normal distribution with mean $(Z_{it} \delta)$, where $Z_{it}$ is a set of covariates explaining the mean of inefficiency;

iii) the time decay model by Battese and Coelli (1992), in which $u_{it}=u_i B(t)$, and $B(t)=\{\exp[-\eta(t-T_i)]\}$. $u_i$ is assumed to be truncated-normally distributed with non-zero mean and constant variance, while $\eta$ governs the temporal pattern of inefficiency.

iv) the flexible parametric model by Kumbhakar (1990), in which $u_{it}=u_i B(t)$ , and $B(t)=[1+\exp(bt+ct^2)]^{-1}$.

Among the time-invariant inefficiency models $(u=u_i)$, sfpanel fits:

v) the Battese and Coelli (1988) model, in which $u_i$ is truncated-normally distributed with non-zero mean and constant variance;

vi) the Pitt and Lee (1981) model, in which $u_i$ is half-normally distributed with constant variance;

When estimation is done with least squares methods, the SF production model is:

$y_{it} = \alpha + X_{it}\beta + v_{it}$

Among the time-varying inefficiency models $(\alpha=\alpha_{it})$, sfpanel fits:

vii) the Lee and Schmidt (1993) model, in which $\alpha_{it} = \theta_t \delta_i$ and $\theta_t$ are parameters to be estimated. This model is a special case of Kumbhakar (1990), in which $B(t)$ is represented by a set of dummy variables for time.

viii) the Cornwell et al. (1990) model, in which $\alpha_{it} = \delta_{i0} + \delta_{i1} t + \delta_{i2} t^2$

Among the time-invariant inefficiency models $(\alpha=\alpha_i)$, sfpanel fits:

ix) the Schmidt and Sickles (1984) model in which $\alpha_i$ can be either fixed or random.

The two commands were written together with Silvio Daidone, Giuseppe Ilardi and Vincenzo Atella.

You may install them by typing

net install sfcross, all from(http://www.econometrics.it/stata)
net install sfpanel, all from(http://www.econometrics.it/stata)

Click here to access the accompanying paper.

HTH,
Federico

## twopm: estimating two-part models using Stata

A new Stata command to estimate two-part models for mixed discrete-continuous outcomes is now available at SSC/econometrics.it.

In two part models, a binary choice model is estimated for the probability of observing a zero versus positive outcome. Then, conditional on a positive outcome, an appropriate regression model is estimated for the positive outcome.

twopm focuses on continuous outcomes modeled using regress or glm. When the outcome is a count variable, such models are known as hurdle models. Of special note is that twopm allows the user to leverage the capabilities of predict and margins to calculate predictions and marginal effects from the combined first- and second-part models.

It was written together with Partha Deb.
You may install the command by typing

ssc install twopm