Missing data can pose major problems when estimating econometric models since it is generally unlikely that missing values are Missing Completely At Random. A strategy to address this issue without using more complex econometric approaches is represented by multiple imputation, that is the process of replacing missing values by multiple sets of plausible values. This post provides a simple example in which xsmle is used together with mi, a Stata's suite of commands that deals with multiple data imputation. Consider the following data in which the only regressor (x1) has 14 missing values

. sum y x1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           y |       940    4.144785    1.674932  -.7510805   9.170195
          x1 |       926    1.375773    1.202928  -2.039139   4.966914


Since these missing values make the panel unbalanced, xsmle will not be able to work. Nonetheless, we can overcome the obstacle by exploiting mi and xsmle jointly.

The first step is to declare the dataset as a mi dataset. The command mi set wide is the appropriate one. Indeed, data must be mi setted before other mi commands can be used. It does not matter which mi style you choose since you can always change it using mi convert. In this example, I choose the wide style.

. mi set wide                                                       

. mi register imputed x1                                 

. set seed 1712                                                      

. mi impute regress x1 = i.cat, add(10)   

Univariate imputation                   Imputations =       10
Linear regression                             added =       10
Imputed: m=1 through m=10                   updated =        0

               |              Observations per m              
               |----------------------------------------------
      Variable |   complete   incomplete   imputed |     total
---------------+-----------------------------------+----------
            x1 |        926           14        14 |       940
--------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled in observations.)



Once the dataset has been mi setted, the second step is to register the variables with missing values. In this example, x1 is a variable that has missing values and mi register imputed x1 declares this variable as a variable to be imputed. A good practice for reproducible results is to set the seed of the Stata's pseudo random number generator using the command set seed #, where # is any number between between 0 and 2^31-1.

Then, the command mi impute regress x1 = i.cat, add(10) can be used to fill in missing values of x1 using the set of dummy variables from the categorical variable cat through the regress method (see help mi impute for detail on the available methods). The option add(10) specifies the number of imputations to add to the mi data (currently, the total number of imputations cannot exceed 1,000). After mi impute regress x1 = i.cat, add(10) has been executed, ten new variables _#_x1 (with # = 1,...,10) will be created in the dataset, each representing an imputed version of x1.

. mi estimate, dots: xsmle y x1, wmat(W) model(sdm) fe type(ind)

Imputations (10):
  .........10 done

Multiple-imputation estimates                     Imputations     =         10
SDM with spatial fixed-effects                    Number of obs   =        940
                                                  Average RVI     =     0.0409
                                                  Complete DF     =        748
DF adjustment:   Small sample                     DF:     min     =     421.51
                                                          avg     =     624.79
                                                          max     =     732.54
Model F test:       Equal FMI                     F(   7,  730.5) =     214.64
Within VCE type:          OIM                     Prob > F        =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Main         |
          x1 |   .3927311   .0358267    10.96   0.000     .3223164    .4631458
-------------+----------------------------------------------------------------
Wx           |
          x1 |   .7172546   .0759222     9.45   0.000     .5682036    .8663056
-------------+----------------------------------------------------------------
Spatial      |
         rho |   .3342481   .0404076     8.27   0.000     .2548957    .4136005
-------------+----------------------------------------------------------------
Variance     |
    sigma2_e |   .8201919   .0385477    21.28   0.000     .7445121    .8958717
-------------+----------------------------------------------------------------
Direct       |
          x1 |   .4641954   .0315439    14.72   0.000     .4021925    .5261983
-------------+----------------------------------------------------------------
Indirect     |
          x1 |   1.215779   .1155787    10.52   0.000     .9888711    1.442687
-------------+----------------------------------------------------------------
Total        |
          x1 |   1.679974    .127172    13.21   0.000     1.430302    1.929647
------------------------------------------------------------------------------


Finally, as documented in help mi estimate, the prefix command mi estimate: estimation_command can be used to execute the estimation_command on the imputed _#_x1 variables. This command will adjust coefficients and standard errors for the variability between imputations according to the combination rules by Rubin (1987). In this example, the command

mi estimate, dots: xsmle y x1, wmat(W) model(sdm) fe type(ind)

estimates a spatial fixed effects Durbin model on the ten imputed versions of the x1 variable.

References

Official Stata manuals and help files.
Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.