Some code for estimating clustered SEs in mlogit models

There’s a well-known bit of code for estimating Liang and Zeger (1986) type cluster robust standard errors for GLM models in R (see also Rogers 1993), but it doesn’t work exactly right off-the-shelf for multinomial models estimated in the mlogit package. In support of another project, I’ve modified it slightly to work with mlogit models. I thought it might be nice to provide this code to anyone interested in using it:

# load in a function to create clustered standard errors for mlogit models
# initial code by Mahmood Ara: http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/
# slightly modified for mlogit models by Justin Esarey on 3/3/2015

cl.mlogit   <- function(fm, cluster){

# fm: a fitted mlogit model
# cluster: a data vector with the cluster
#          identity of each observation in fm

require(sandwich, quietly = TRUE)
require(lmtest, quietly = TRUE)
M <- length(unique(cluster))
N <- length(cluster)
K <- length(coefficients(fm))
# edit 6/22/2015: change dfc
# dfc <- (M/(M-1))*((N-1)/(N-K))
dfc <- (M/(M-1))
uj  <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
vcovCL <- dfc*sandwich(fm, meat.=crossprod(uj)/N)
coeftest(fm, vcovCL) }

And here’s a little example you can test it on:

require(VGAMdata)
data(vtinpat)
vtinpat$hos.num <- as.numeric(vtinpat$hospital)
vtinpat$age <- as.numeric(vtinpat$age.group)
vtinpat.mlogit <- mlogit.data(vtinpat, choice = "admit", shape = "wide")
vt.mod <- mlogit(admit ~ 0 | age + sex, data = vtinpat.mlogit)
summary(vt.mod)

cl.mlogit(vt.mod, vtinpat$hos.num)

You can verify that this yields the same results as mlogit with cluster in Stata by writing the data file out with library(foreign) and read.dta, then running an identical model.

The reason for writing this code is to eventually use it as a part of another project:

http://cran.r-project.org/web/packages/clusterSEs/index.html

Pairs cluster bootstrapping seems to work much better when you use cluster-robust standard errors for the replicates, and until now I couldn’t estimate them in R (had to do it in Stata). With this code, I should be able to adapt the pairs cluster bootstrap procedure for mlogit models in R.

[Update 6/22/2015]: Further checking indicated that Stata actually uses a slightly different degrees of freedom correction for mlogit (and MLE models in general) than was initially included in this code. The commented (original) code includes the “regression-like” correction while the uncommented (updated) code includes the “asymptotic-like” correction referenced on pp. 1865-1866 of the Stata 13 [R] manual; see also p. 54 here. The answers with the updated dfc code are closer to Stata’s answers for mlogit models.

References

Liang, Kung-Yee and Scott L. Zeger. 1986. “Longitudinal data analysis using generalized linear models.” Biometrika 73(1):13-22.

Rogers, William. 1993. “Regression standard errors in clustered samples.” Stata Technical Bulletin 13: 19-23.

About Justin Esarey

Associate Professor of Political Science at Rice University.
This entry was posted in Software. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s