## The Alternative Specification of Interaction Models With a Discrete Modifying Variable

Since Brambor, Clark and Golder’s (2006) article in Political Analysis (hereafter BCG), our understanding of interaction models has improved significantly and most empirical scholars have now integrated the tools to execute and interpret interaction models properly. In particular, one of the main recommendations of BCG was to include all constitutive terms of the interaction in the model specification. However, BCG acknowledge (in the text surrounding equation 7 of their paper) that there is a mathematically equivalent model specification that allows researchers to exclude certain constitutive terms from an interaction model when one of the modifying variables is discrete. A recent review experience made me realize that this exception is not as widely recognized as BCG’s core advice to include all constitutive terms, suggesting therefore, that a brief note to the scholarly community might be important in publicizing this exception. In the next section, I show the equivalency of BCG standard specification and this alternative specification. I then provide a brief example of both approaches when applied in a substantive case — Adams et al. (2006) study “Are Niche Parties Fundamentally Different from Mainstream Parties?” — and show that we get the same results either using BCG’s approach or the alternative approach.

Overall, I show that while the two model specifications are equivalent, each has some advantages in terms of the interpretation of the regression results. On the one hand, the advantage of the standard specification is to present directly in the regression results whether the difference in the marginal effects of X on Y between the categories of the modifying variable Z is statistically significant. On the other hand, the main benefit of the alternative approach is to present directly in the regression results the marginal effects of X under each category of the modifying variable Z. Researchers may thus choose between the two equivalent specifications depending on the results they want to present and emphasize.

1  Equivalency of the Standard and Alternative Specifications

In order to show the equivalency of BCG standard specification and the alternative specification when one of the modifying variables is discrete, I take as an example a dependent variable Y which is a function of an interaction effect between a continuous variable X and a dummy variable D. BCG standard approach to interaction models indicates that we must multiply the variables X and D and include this interaction term as well as the constitutive terms X and D, respectively, in the regression model. Specifically, the standard specification is the following:

Y = b0 + b1D + b2X + b3XD + ϵ                                                   (1)

where X is continuous and D is a dummy variable (0,1). The marginal effect of X when D = 0 is given by b2 while the marginal effect of X when D = 1 is given by b2 + b3

The alternative approach explained briefly in BCG (see equation 7) and Wright (1976) consists in treating the dummy variable D as two dummy variables: D, the original variable, which equals 0 and 1 and D0, the inverse of D, which equals 1 when D = 0 and 0 when D = 1. For example, if D is a dummy variable where 1 represents democratic countries and 0 authoritarian countries D0 would simply be the inverse dummy variable where 1 represents authoritarian countries and 0 democratic countries. Consequently, D + D0 = 1 and D0 = 1 − D. The alternative approach consists in multiplying X respectively with D and D0, including all constitutive terms in the regression model except X and one of the dummy variables, D or D0. The reason for including only D or D0 is that these variables are perfectly collinear. It is not possible to include X neither because of perfect multicollinearity with XD and XD0.

The alternative specification is thus the following:

Y = a0 + a1D + a2XD + a3XD0 + ϵ                                                     (2)

Equation 2 could be rewritten as

Y = a0 + a1D + a2XD + a3X(1 − D) + ϵ                                              (3)

Equation 3 highlights explicitly that we do not necessarily need to create D0 but only to multiply X by (1 − D). In equations 2 and 3, the marginal effect of X when D = 0 (i.e. when D0 = 1) is given by a3 while the marginal effect of X when D = 1 is given by a2

The main advantage of this alternative specification is that for each category of the discrete modifying variable D (0 and 1 in this case) the marginal effect of X and its associated standard error are provided directly from the regression results. This is not the case in the standard approach where only one of these results is directly provided (i.e. b2, the effect of X when D=0). Consequently, we need to add up b2 and b3 to obtain the effect of X when D=1.   This is easy to do in Stata with the command lincom (lincom _b[coef ] + _b[coef ]).

A disadvantage of the alternative specification is that the regression results do not indicate whether the difference between the marginal effects of X when D=0 and when D=1 is statistically significant. This is the advantage of the standard approach which provides this information in the regression results . If the coefficient b3 is statistically significant in equation 1, this indicates that the marginal effect of X when D=1 is statistically different than the marginal effect of X when D=0. To answer this question with the alternative approach, we must test the equality of a2 and a3. This is also straightforward in Stata with the command test (test _b[coef ] = _b[coef ]) or lincom (lincom _b[coef ] − _b[coef ]).

The specification of the alternative approach could be easily generalized to discrete variables with multiple categories whether the discrete variable is nominal or ordinal. The procedure is the same. We need first to create a dummy variable for each category of the discrete modifying variable. We then multiply X with each of these dummy variables and include all constitutive terms in the equation (except X and one of the dummy variables). This specification will also allow researchers to evaluate directly the magnitude of the substantive effect of X across the different values of the discrete modifying variables without including all constitutive terms of the interaction explicitly.

2  Replication of “Are Niche Parties Fundamentally Different From Mainstream Parties?

In this section, I compare the results of the standard and alternative approaches to interaction models in replicating Adams et al. (2006) study “Are Niche Parties Fundamentally Different from Mainstream Parties?” published in the American Journal of Political Science. Two main research questions are examined in this article. First, the authors examined whether mainstream parties are more responsive than niche parties to shift in public opinion in adjusting their policy programs. Second, and building on this prediction, they examined whether niche parties are more penalized electorally than mainstream parties when they moderate their policy positions. Here, I only replicate their model associated with the first question.

Adams et al. (2006) tested these hypotheses in seven advanced democracies (Italy, Britain, Greece, Luxembourg, Denmark, Netherlands, and Spain) over the 1976-1998 period. They measure parties’ policy position on the left-right ideological scale with data from the Comparative Manifesto Project (CMP). Surveys from the Eurobarometer are used to locate respondents on the corresponding left-right ideological dimension. Public opinion is measured as the average of all respondents’ self-placement. Finally, the authors coded Communist, Green, and Nationalist parties as niche parties with a dummy variable.

In table 1, I examine party responsiveness to public opinion and present the results of the standard and alternative approaches. Adams et al. (2006) use the standard approach and interact the variable public opinion shift with the dummy variable niche party. The dependent variable is the change in a party’s left- right position. Adams et al. (2006) thus specified a dynamic model where they assess whether a change in public opinion influences a change in party positions between two elections. The models include fixed effects for countries and a number of control variables (see the original study for the justifications). The specification of the standard approach in column (1) is the following:

party position = b0 + b1∆public opinion + b2niche party + b3(∆public opinionXniche party) + controls

In column (1) of Table 1, I display the same results as those published in Table 1 of Adams et al. (2006) article. The results in column (1) support the authors’ argument that niche parties are less responsive than mainstream parties to change in public opinion. The coefficient of public opinion shift (0.97) is positive and statistically significant indicating that when public opinion is moving to the left (right) mainstream parties (niche party=0) adjust their policy positions accordingly to the left (right). The coefficient of public opinion shift X niche party indicates that niche parties are less responsive than mainstream parties to shift in public opinion by -1.52 points on the left-right scale and the difference is statistically significant (p<0.01).

In column (2) of Table 1, I display the results of the alternative approach. The specification is now the following:

party position = b0 +b1niche party+b2(∆public opinionXniche party)+b3(∆public opinionXmainstream party)+controls

where mainstream party equals (1 − niche party).

It is important to highlight that the results in columns (1) and (2) are mathematically equivalent. For example, the coefficients of the control variables are exactly the same in both columns. There are some differences, however, in terms of the interpretation of the interaction effect. In column (2), the coefficient of public opinion shift – mainstream party (0.97) equals the coefficient of public opinion shift in column (1). This is because public opinion shift – mainstream party in column (2) indicates the impact of a change in public opinion on the positions of mainstream parties as it is for public opinion shift in column (1). On the other hand, the coefficient of public opinion shift – niche party in column (2) equals -0.55 and is statistically significant at the 0.05 level. This indicates that when public opinion is moving to the left (right) niche parties adjust their policy positions in the opposite direction to the right (left). This result is not explicitly displayed in column (1) when using the standard approach. The coefficient of public opinion shift – niche party in column (2) equals actually the sum of the coefficients of public opinion shift and public opinion shift X niche party in column (1) — i.e. 0.97 + -1.52 = -0.55. In column (2), a Wald-test indicates that the difference of the effects of public opinion shift – niche party and public opinion shift – mainstream party is statistically significant at the 0.01 level, exactly as indicated by the coefficient of public opinion shift X niche party in column (1).

Overall, researchers may choose between two equivalent specifications when one of the modifying variables is discrete in an interaction model: BCG specification which includes all constitutive terms of the interaction and an alternative specification that does not include all constitutive terms of the interaction explicitly. Each specification has its advantages in terms of the interpretation of the interaction effect. The advantage of the alternative approach is to present directly the marginal effects of an independent variable X on Y for each category of the discrete modifying variable Z. On the other hand, the advantage of BCG approach is to present directly whether the difference in the marginal effects of X on Y between the categories of Z is statistically significant. In both specifications, researchers then need to perform an additional test to verify whether the difference in the marginal effects is statistically significant (in the alternative specification) or to calculate the substantive marginal effects under each category of the discrete modifying variable (in the standard specification).

Notes

¹Assistant Professor, School of Political Studies, University of Ottawa, 120 University, Ottawa, ON, K1N 6N5, Canada (bferland@uottawa.ca). I thank James Adams, Michael Clark, Lawrence Ezrow, and Garrett Glasgow for sharing their data. I also thank Justin Esarey for his helpful comments on the paper.

² The marginal effect of X in equation 1 is given by b2 + b3D.

³ The marginal effect of X in equations 2 and 3 is calculated by  a2D + a3D0.

Note also that equation 1 on the left-hand side equals either equation 2 or 3 on the right-hand side:

b0 + b1D + b2X + b3XD + ϵ=a0 + a1D + a2XD + a3X(1 − D) + ϵ

b0 + b1D + b2X + b3XD + ϵ=a0 + a1D + a2XD + a3X − a3XD + ϵ

It is possible then to isolate XD on the right-hand side:

b0 + b1D + b2X + b3XD + ϵ=a0 + a1D + a3X + (a2 − a3)XD + ϵ

Assuming that the models on the left-hand side and right-hand side are estimated with the same data b0 would equal a0, b1 would equal a1, b2 would equal a3 (i.e. the estimated parameter of X(1-D)), and b3 would equal (a2 − a3).

References

Adams, James, Michael Clark, Lawrence Ezrow and Garrett Glasgow. 2006. “Are Niche Parties Fundamentally Different from Mainstream Parties? The Causes and the Electoral Consequences of Western European Parties’ Policy Shifts, 1976-1998.” American Journal of Political Science 50(3):513–529.

Brambor, Thomas, William Roberts Clark and Matt Golder. 2006. “Understanding Interaction Models: Improving Empirical Analyses.” Political Analysis 14:63–82.

Wright, Gerald C. 1976. “Linear Models for Evaluating Conditional Relationships.” American Journal of Political Science 2:349–373.

## Questions and Answers: Reproducibility and a Stricter Threshold for Statistical Significance

“Redefine statistical significance,” a paper recently published in Nature Human Behavior (Benjamin et al., 2017) generated a substantial amount of discussion in methodological circles. This paper proposes to lower the $\alpha$ threshold for statistical significance from the conventional level of $0.05$ to a new, more stringent level of $0.005$ and to apply this threshold specifically to newly discovered relationships (i.e., relationships that have not yet been demonstrated in multiple studies). This proposal touched off a debate about the effect null hypothesis significance testing (NHST) has on published work in the social and behavioral sciences in which many statisticians and social scientists have participated. Some have proposed alternative reforms that they believe will be more effective at improving the replicability of published results.

To facilitate further discussion of these proposals—and perhaps to begin to develop an actionable plan for reform—the International Methods Colloquium (IMC) hosted a panel discussion on “reproducibility and a stricter threshold for statistical significance” on October 27, 2017. The one-hour discussion included six panelists and over 240 attendees, with each panelist giving a brief initial statement concerning the proposal to “redefine statistical significance” and the remainder of the time being devoted to questions and answers from the audience. The event was recorded and can be viewed online for free at the International Methods Colloquium website.

Unfortunately, the IMC’s time limit of one hour prevented many audience members from asking their questions and having a chance to hear our panelists respond. Panelists and audience members alike agreed that the time limit was not adequate to fully explore all the issues raised by Benjamin et al. (2017). Consequently, questions that were not answered during the presentation were forwarded to all panelists, who were given a chance to respond.

The questions and answers, both minimally edited for clarity, are presented in this article. The full series of questions and answers (and this introduction) are embedded in the PDF below.

## New Print Edition Released!

Volume 24, Number 2 of The Political Methodologist has just been released!

You can find a direct link to a downloadable version of the print edition here [update: a version with a minor correction has been added as of 5:23 PM on 9/26/2017]:

https://thepoliticalmethodologist.com/v24-n2-fix/

## Minnesota Political Methodology Colloquium Graduate Student Conference 2018

(Post by Carly Potz-Nielsen and Robert Ralston)

We are very excited to announce a new Minnesota Political Methodology Colloquium (MPMC) initiative: the Minnesota Political Methodology Graduate Student Conference.  The conference is scheduled for May 4 & May 5, 2018.

The Minnesota Political Methodology Graduate Student Conference is designed to provide doctoral students with feedback on their research from peers and faculty. Research papers may focus on any substantive topic, employ any research methodology, and/or be purely methodological. We are particularly interested in novel applied work to interesting and important questions in political science, sociology, psychology, and related fields.

The conference represents a unique opportunity for graduate students in different programs, across different disciplines, and with different substantive interests to network and receive feedback on their work.  Papers will receive feedback from a faculty discussant, written feedback from other panelists, and comments/suggestions from audience members.

The conference will occur over two days (May 4 and May 5, 2018) and feature at least 24 presentations in 6 panels. Proposals are due December 1, 2017.

Our keynote speaker for the event is Sara Mitchell, F. Wendell Miller Professor of Political Science at the University of Iowa.

Details about the conference may be found here.

Questions should be addressed to mpmc@umn.edu

## Response to MacKinnon and Webb

MacKinnon and Webb offer a useful analysis of how the uncertainty of causal effects can be underestimated when observations are clustered and the treatment is applied to a very large or vary small share of the clusters. Their mathematical exposition, simulation exercises, and replication analysis provide a helpful guide for how to proceed when data are poorly behaved in this way. These are valuable lessons for researchers studying impacts of policy in observational data where policies tend to be sluggish and thus do not generate much variability in the key explanatory variables.

Correction of Two Errors

MacKinnon and Webb find two errors in our analysis, while nonetheless concluding “we do not regard these findings as challenging the conclusions of Burden et al. (2017).” Although we are embarrassed by the mistakes, we are also grateful for their discovery.1 Our commitment to transparency is reflected by the fact the data was been made public for replication purposes since well before the article was published. We have posted corrected versions of the replication files and published a corrigendum with the journal where the article was original published.

Fortunately, none of the other analyses in our article were affected. It is only Table 7 where errors affect the analysis. Tables 2 through 6 remain intact.

We concede that when corrections are made the effect of early voting drops from statistical significance in the model of the difference in the Democratic vote between 2008 and 2012. All of the various standard errors they report are far too large to reject the null hypothesis.

The Problem of Limited Variation

The episode highlights the tradeoffs that researchers face between applying what appears to be a theoretically superior estimation technique (i.e., difference-in-difference) and the practical constraints of a particular application (i.e., limited variation in treatment variables) that make its use intractable. In the case of our analysis, election laws do not change rapidly, and the conclusions of our analysis were largely based on cross-sectional analyses (Tables 2-6), with the difference-in-difference largely offered as a supplemental analysis.

We are in agreement with MacKinnon and Webb that models designed to estimate causal effects (or even simple relationships) may be quite tenuous when the number of clusters is small and the clusters are treated in a highly unbalanced fashion. In fact, we explained our reluctance to apply the difference-in-difference model to our data because of the limited leverage available. We were explicit about our reservations in this regard. As our article stated:

“A limitation of the difference-in-difference approach in our application is that few states actually changed their election laws between elections. As Table A1 (see Supplemental Material) shows, for some combinations of laws there are no changes at all. For others, the number of states changing is as low as one or two. As result, we cannot include some of the variables in the model because they do not change. For some other variables, the interpretation of the coefficients would be ambiguous given the small number of states involved; the dummy variables essentially become fixed effects for one or two states” (p. 572).

This is unfortunate in our application because the difference-in-difference models are likely to be viewed as more convincing than the cross-sectional models. This is why we offered theory suggesting that the more robust cross-sectional results were not likely to suffer from endogeneity.

The null result in the difference-in-difference models is not especially surprising given our warning above about the limited leverage provided by the dataset. Indeed, the same variable was insignificant in our model of the Democratic vote between 2004 and 2008 that we also reported in Table 7. We are left to conclude that the data are not amenable to detecting effects using difference-in-difference models. Perhaps researchers will collect data from more elections to provide more variation in the key variable and estimate parameters more efficiently.

In addition to simply replicating our analysis, MacKinnon and Webb also conduct an extension to explore asymmetric effects. They separate the treated states into those where early voting was adopted and where early voting was repealed. We agree that researchers ought to investigate such asymmetries. We recommended as much in our article: “As early voting is being rolled back in some states, future research should explore the potential asymmetry between the expansion and contraction of election practices” (p. 573). However, we think this is not feasible with existing data. As MacKinnon and Webb note, only two states adopted early voting and only one state repealed early voting. As a result, analyzing these cases separately as they do essentially renders the treatment variables to be little more than fixed effects for one or two states, as we warned in our article. The coefficients might be statistically significant using various standard error calculations, but it is not clear that MacKinnon and Webb are actually estimating the treatment effects rather than something idiosyncratic about one or two states.

Conclusion

While the errors made in our difference-in-difference analysis were regrettable, we think the greater lesson from the skilled analysis of MacKinnon and Webb is to raise further doubt about whether this tool is simply unsuitable in such a policy setting. While all else is equal, it may offer a superior mode of analysis; but all else is not equal. Researchers need to find the best mode of analysis to fit with the limitations of the data.

Footnotes

1. The mistake in coding Alaska is inconsequential because, as MacKinnon and Webb note, observations from Alaska and Hawaii are dropped from the multivariate analysis.
2.

## Pitfalls when Estimating Treatment Effects Using Clustered Data

James G. MacKinnon, Department of Economics, Queen’s University1
Matthew D. Webb, Department of Economics, Carleton University

Extended Abstract

There is a large and rapidly growing literature on inference with clustered data, that is, data where the disturbances (error terms) are correlated within clusters. This type of correlation is commonly observed whenever multiple observations are associated with the same political jurisdictions. Observations might also be clustered by time periods, industries, or institutions such as hospitals or schools.

When estimating regression models with clustered data, it is very common to use a “cluster-robust variance estimator” or CRVE. However, inference for estimates of treatment effects with clustered data requires great care when treatment is assigned at the group level. This is true for both pure treatment models and difference-in-differences regressions, where the data have both a time dimension and a cross-section dimension and it is common to cluster at the cross-section level.

Even when the number of clusters is quite large, cluster-robust standard errors can be much too small if the number of treated (or control) clusters is small. Standard errors also tend to be too small when cluster sizes vary a lot, resulting in too many false positives. Bootstrap methods based on the wild bootstrap generally perform better than t-tests, but they can also yield very misleading inferences in some cases. In particular, what would otherwise be the best variant of the wild bootstrap can underreject extremely severely when the number of treated clusters is very small. Other bootstrap methods can overreject extremely severely in that case.

In Section 2, we briefly review the key ideas of cluster-robust covariance matrices and standard errors. In Section 3, we then explain why inference based on these standard errors can fail when there are few treated clusters. In Section 4, we discuss bootstrap methods for cluster-robust inference. In Section 5, we report (graphically) the results of several simulation experiments which illustrate just how severely both conventional and bootstrap methods can overreject or underreject when there are few treated clusters. In Section 6, the implications of these results are illustrated using an empirical example from Burden, Canon, Mayer, and Moynihan (2017). The final section concludes and provides some recommendations for empirical work.

Replication File

Replication files for the Monte Carlo simulations and the empirical example can be found at: doi:10.7910/DVN/GBEKTO .

1. We are grateful to Justin Esarey for several very helpful suggestions and to Joshua Roxborough for valuable research assistance. This research was supported, in part, by a grant from the Social Sciences and Humanities Research Council of Canada. Some of the computations were performed at the Centre for Advanced Computing at Queen’s University.