Missing data can pose major problems when estimating econometric models since it is generally unlikely that missing values are *Missing Completely At Random*. A strategy to address this issue without using more complex econometric approaches is represented by multiple imputation, that is the process of replacing missing values by multiple sets of plausible values. This post provides a simple example in which `xsmle`

is used together with `mi`

, a Stata's suite of commands that deals with multiple data imputation. Consider the following data in which the only regressor (`x1`

) has 14 missing values

. sum y x1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
y | 940 4.144785 1.674932 -.7510805 9.170195
x1 | 926 1.375773 1.202928 -2.039139 4.966914

Since these missing values make the panel unbalanced, `xsmle`

will not be able to work. Nonetheless, we can overcome the obstacle by exploiting `mi`

and `xsmle`

jointly.

The first step is to declare the dataset as a `mi`

dataset. The command `mi set wide`

is the appropriate one. Indeed, data must be `mi set`

ted before other `mi`

commands can be used. It does not matter which `mi`

*style* you choose since you can always change it using `mi convert`

. In this example, I choose the `wide`

style.

. mi set wide
. mi register imputed x1
. set seed 1712
. mi impute regress x1 = i.cat, add(10)
Univariate imputation Imputations = 10
Linear regression added = 10
Imputed: m=1 through m=10 updated = 0
| Observations per m
|----------------------------------------------
Variable | complete incomplete imputed | total
---------------+-----------------------------------+----------
x1 | 926 14 14 | 940
--------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled in observations.)

Once the dataset has been `mi set`

ted, the second step is to *register* the variables with missing values. In this example, `x1`

is a variable that has missing values and `mi register imputed x1`

declares this variable as a variable to be imputed. A good practice for reproducible results is to set the seed of the Stata's pseudo random number generator using the command `set seed #`

, where `#`

is any number between between 0 and 2^31-1.

Then, the command `mi impute regress x1 = i.cat, add(10)`

can be used to fill in missing values of `x1`

using the set of dummy variables from the categorical variable `cat`

through the `regress`

method (see `help mi impute`

for detail on the available methods). The option `add(10)`

specifies the number of imputations to add to the `mi`

data (currently, the total number of imputations cannot exceed 1,000). After `mi impute regress x1 = i.cat, add(10)`

has been executed, ten new variables `_#_x1`

(with # = 1,...,10) will be created in the dataset, each representing an imputed version of `x1`

.

. mi estimate, dots: xsmle y x1, wmat(W) model(sdm) fe type(ind)
Imputations (10):
.........10 done
Multiple-imputation estimates Imputations = 10
SDM with spatial fixed-effects Number of obs = 940
Average RVI = 0.0409
Complete DF = 748
DF adjustment: Small sample DF: min = 421.51
avg = 624.79
max = 732.54
Model F test: Equal FMI F( 7, 730.5) = 214.64
Within VCE type: OIM Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Main |
x1 | .3927311 .0358267 10.96 0.000 .3223164 .4631458
-------------+----------------------------------------------------------------
Wx |
x1 | .7172546 .0759222 9.45 0.000 .5682036 .8663056
-------------+----------------------------------------------------------------
Spatial |
rho | .3342481 .0404076 8.27 0.000 .2548957 .4136005
-------------+----------------------------------------------------------------
Variance |
sigma2_e | .8201919 .0385477 21.28 0.000 .7445121 .8958717
-------------+----------------------------------------------------------------
Direct |
x1 | .4641954 .0315439 14.72 0.000 .4021925 .5261983
-------------+----------------------------------------------------------------
Indirect |
x1 | 1.215779 .1155787 10.52 0.000 .9888711 1.442687
-------------+----------------------------------------------------------------
Total |
x1 | 1.679974 .127172 13.21 0.000 1.430302 1.929647
------------------------------------------------------------------------------

Finally, as documented in `help mi estimate`

, the prefix command `mi estimate: estimation_command`

can be used to execute the `estimation_command`

on the imputed `_#_x1`

variables. This command will adjust coefficients and standard errors for the variability between imputations according to the combination rules by Rubin (1987). In this example, the command

`mi estimate, dots: xsmle y x1, wmat(W) model(sdm) fe type(ind)`

estimates a spatial fixed effects Durbin model on the ten imputed versions of the `x1`

variable.

**References**

Official Stata manuals and help files.

Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.