But Shouldn’t That Work Against Me?

The refrain is ubiquitous in seminars, workshops, and the discussion sections of quantitative studies. An audience member or reviewer might raise one of the following objections:

  • The purported effect/mechanism seems implausible.
  • It seems unlikely that your design could detect the effect/mechanism of interest.
  • Your estimates are likely biased toward zero.

The common, often enthusiastic retorts are the subject of this essay:

  • But shouldn’t that work against me?
  • Shouldn’t that make it harder for me to detect an effect?
  • Doesn’t that make it all the more likely that I have, in fact, detected a genuine effect?
  • Don’t my statistically significant results negate your concern?

In this note, I explain why these retorts are typically not satisfying. As a preview, the answers to these four questions are, respectively, you’re a scientist not a lawyer, yes, no, and no.

For convenience, I’ll refer to the author’s line of argument as the work-against-me defense. The author believes that the surprisingness of her results should make her audience, if anything, even more persuaded by the analysis. The problems with this argument are two-fold. First, because we’re scholars and not activists, we often care about biases in either direction. Underestimating a substantively significant phenomenon could be just as problematic as overestimating a substantively trivial phenomenon. Second, even if we don’t care about substantive effect sizes, Bayes’ rule tells us that more surprising results are also more likely to be wrong. All else equal, the lower the power of the study or the lower our prior beliefs, the lower our posterior beliefs should be conditional upon detecting a positive result. Consistent with the author’s intuition, low power and low prior beliefs make it harder to obtain a positive estimate, but inconsistent with the author’s intuition, this also means that conditional upon obtaining a positive estimate, it’s more likely that we’ve obtained a false positive.

We Often Care About Biases in Either Direction

The goals of social science are many and complex. Among other things, we want to better understand how the social and political world works, and when possible, we want to help people make better decisions and contribute to policy debates. Our ability to do those things would be quite limited if we could only produce a list of binary results: e.g., prices affect choices, electoral institutions influence policy outcomes, legislative institutions can cause gridlock, and legal institutions can enhance economic growth. 

One setting in which quantitative estimates from social scientific studies are particularly relevant is the vaunted cost-benefit analysis. The CBO, EPA, FDA, or UN (pick your favorite acronym) might look to academic studies to figure out if a proposed policy’s benefits are worth the costs. It’s fairly obvious that in this setting, we would like unbiased estimates of the costs and the benefits. And generally speaking, there’s no compelling argument that underestimating a quantity of interest is somehow better than overestimating it. Of course, there could be a particular situation where the costs are well understood and we have a biased estimate of the benefits that exceeds the costs. This would be a situation in which the work-against-me defense might be persuasive (although this assumes there is no selective reporting, a topic to which I’ll return). But there would be still many other situations in which the work-against-me defense would be counterproductive.

Even when we’re not engaging in cost-benefit analyses, we often care about the substantive magnitude of our estimates. Presumably, we don’t just want to know, for example, whether political polarization influences inequality, whether campaign contributions distort the behavior of elected officials, whether the loss of health insurance increases emergency room utilization, or whether taste-based discrimination influences hiring decisions. We want to know how much. Are these big problems or small problems?

Bayes’ Rule and False Positive Results

Suppose that we don’t care about substantive magnitudes. We just want to know if an effect exists or not. Is the work-against-me defense compelling in this case? The answer is still no. Let’s see why.

My concerns about the work-against-me defense are closely related to concerns about selective reporting and publication bias. Authors typically only use the work-against-me defense when they have obtained a non-zero result, and they likely would never have reported the result if it were null. After all, the work-against-me defense wouldn’t make much sense if the result were null (although as we’ll see, it rarely makes sense). As we already know, selective reporting means that many of our published results are false positives or overestimates (e.g., see Ioannidis 2005; Simmons, Nelson, and Simonsohn 2011), and as it turns out, things that work against the author’s purported hypothesis can exacerbate the extent of this problem.

Since we’re now considering situations in which we don’t care about effect sizes, let’s assume that effects and estimates are binary. A purported effect is either genuine or not, and our statistical test either detects it or not. When we see a result, we’d like to know the likelihood that the purported effect is genuine. Since the work-against-me defense is only ever used when a researcher has detected an effect, let’s think about our posterior beliefs conditional upon obtaining a positive result:

.

Conditional upon positively detecting an effect, the probability that the effect is genuine is a function of our prior beliefs (our ex-ante beliefs about the likelihood the effect is genuine before seeing this estimate), the power of the study (the probability of a positive result if the effect is genuine), and the probability of a positive result if the effect is not genuine, which I refer to as α for convenience.

The equation above is just a simple application of Bayes’ rule. It tells us that, all else equal, we’re more confident that an estimated effect is genuine when (1) the prior is higher, (2) the power is higher, and (3) α is lower. All of this sounds obvious, although the implications for the work-against-me defense may not be.

How does the exchange between the seminar critic and the author at the outset of this essay relate to the equation? The purported mechanism seems implausible is just another way of saying that one’s prior is low. Your design is unlikely to detect this effect is another way of saying the power is low. Conditional upon having obtained a positive result, low power and/or low priors make it less likely that the estimated effect is genuine.

Your estimates are likely biased toward zero is a trickier case. How can we think about a bias toward zero in the context of this discussion of Bayes’ rule? A bias toward zero clearly reduces power, making an analyst less likely to reject the null if the effect of interest is genuine. However, some biases toward zero also reduce α, making an analyst less likely to reject the null if the effect of interest is not genuine. Suppose we’re conducting a one-sided hypothesis test for a purportedly positive effect. Attenuation bias (perhaps resulting from measurement error) lowers power without lowering α, so the presence of attenuation bias has an unambiguously negative effect on our posterior beliefs. Now imagine a  systematic bias that lowers all estimates without increasing noise (e.g., omitted variables bias with a known negative sign). This lowers power but it also lowers α. So this kind of bias has an ambiguous effect on our posterior beliefs conditional on a positive result. Reducing power decreases our beliefs that the result is genuine while reducing α increases our beliefs. And it will likely be hard to know which effect dominates. But suffice it to say that a bias toward zero either has negative or ambiguous implications for the credibility of the result, meaning that the work-against-me defense is misguided even in this context. 

Low power and low priors make it less likely that the author will detect an effect. That’s where the author’s intuition is right. Given the critic’s concerns about the prior and/or the power, we’re surprised that the author obtained a positive result. But seeing a positive result doesn’t negate these concerns. Rather, these concerns mean that conditional upon detecting an effect, the result is more likely to be a false positive. Presumably, we don’t want to reward researchers for being lucky and obtaining surprising estimates. We want to publish studies with reliable estimates (i.e., our posteriors are high) or those that meaningfully shift our beliefs (i.e., our posteriors are far from our priors). When the prior beliefs and/or power are really low, then our posterior beliefs should be low as well.

Discussion

The work-against-me defense is rarely if ever persuasive. In cases where we care about the substantive magnitude of the effect of interest, we typically care about biases in either direction. And in cases where we don’t care about the substantive magnitude but we just want to know whether an effect or mechanism exists, low power and/or a low prior make it more likely that a reported result is a false positive.

Loken and Gelman (2017) conduct simulations that make the same point made here regarding attenuation bias. What I call the work-against-me defense, they refer to as “the fallacy of assuming that that which does not kill statistical significance makes it stronger” (p. 584).  However, I believe it’s illustrative to see the same points made using only Bayes’ rule and to see that all forms of the work-against-me defense are unpersuasive, whether in regard to low power, biases toward zero, or low priors.

The problem with the work-against-me defense is likely not intuitive for most of us. Loken and Gelman also use an analogy that is helpful for thinking about why our intuitions can lead us astray. “If you learned that a friend had run a mile in 5 minutes, you would be respectful; if you learned that she had done it while carrying a heavy backpack, you would be awed” (p. 584). So why does the same intuition not work in the context of scientific studies? It has to do with both noise and selective reporting. If you suspected that your friend’s mile time was measured with significant error and that you would have only learned about her performance if her time was 5 minutes or less, learning about the heavy backpack should lead you to put more weight on the possibility that it was measurement error rather than your friend’s running ability that explains this seemingly impressive time. In light of measurement error or noise, the effect of learning about the backpack on your inference about your friend’s running ability is ambiguous at best.

Where does this leave us? Careful researchers should think about biases in either direction, they should think about their priors, and they should think about the power of their study. If there’s little reason to suspect a genuine effect or if the test isn’t likely to distinguish genuine effects from null effects, then there’s little point in conducting the test. Even if the test favors the hypothesis, it will likely be a false positive. And referees and seminar attendees should probably ask these cranky questions even more often, assuming their concerns are well justified.

I suspect a common time to invoke the work-against-me defense is in the planning stages of a research project. An eager, career-oriented researcher might anticipate a number of objections to their study. But they reassure themselves, “But that should work against me, making it all the more impressive if I find the result I’m hoping for. I might as well run the test on the off chance that I get a positive result.” Bayes’ rule tells us why this is a bad idea. In general, we probably shouldn’t be running tests that we know we’d only report if they were statistically significant. And the credibility of our results is compromised when our prior or power is low.

For all of these reasons, let’s dispense with the work-against-me defense.

References

Ioannidis, John P.A. 2005. Why Most Published Research Findings Are False. PLOS Medicine 2(8):696-701.

Loken, Eric and Andrew Gelman. 2017. Measurement Error and the Replication Crisis: The Assumption that Measurement Error Always Reduces Effect Sizes is False. Science 355(6325):584-585.

Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn. 2011. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science 22(11):1359-1366.

Posted in Ask a Methodologist, Replication, Statistics, The Discipline | Tagged , | Leave a comment

Utilizing Javascript in Qualtrics for Survey Experimental Designs

Experiments have undeniably increased in their use within political science in the last twenty years (Druckman et al., 2006). Especially frequent in this rise of experimental research is the survey experiment. Survey experiments are an effective method for exploiting random assignment to determine accurate causal effects while keeping the costs of research low and the speed with which data can be collected short (Mullinix et al., 2015). Moreover, survey experiments are a research methodology that can be accessible and open to many: students and faculty with varying degrees of resources can often use convenience samples to assess experimental treatment effects without great cost.

Yet while survey experiments, and in particular the use of convenience samples such as Amazon.com’s Mechanical Turk (MTurk), can make causal inference cheap and fast, sometimes the hurdles associated with designing and implementing the actual experiment can stymie researchers. Accomplishing basic randomization is not always easy within standard survey software platforms, and less basic randomization can be even more difficult (or impossible). In particular, complex randomization, such as assigning multiple options from a larger set of options — n choose p — and block randomization are not easy to implement using off-the-shelf survey software.

Moreover, standard survey practices such as presenting different information to respondents based on their answers earlier in a survey using a “lookup table” can be difficult to implement. Yet practices such as blocked randomization are standard in experimental research more broadly and can help researchers to improve statistical efficiency (e.g., Horiuchi, Imai, and Taniguchi, 2007; Imai, King, and Stuart, 2008; Moore, 2012). Ensuring access for all types of researchers conducting survey experiments to these experimental practices is therefore important for political science as a discipline more generally to produce better experimental results.

In this short post, I walk through several examples of how survey researchers can use Javascript to accomplish commonly needed survey tasks without advanced knowledge of the Javascript language. Much of this more complex functionality can be accomplished via Javascript embedded into the survey questions in Qualtrics, a common survey software platform. The Qualtrics platform is an industry-standard for running online public opinion, marketing, and experimental survey research. Researchers — in academia and elsewhere — rely on the platform to host their surveys usually because they are at institutions or companies that have site licenses, eliminating the cost for individual researchers to use the platform. The platform makes question layout and basic experimental designs easy to implement via a WYSIWYG user interface.

The first of these is simply randomization: picking one option out of many options, which can be time-consuming and difficult to implement using the built-in Qualtrics randomization functionality when the potential options are very numerous. This method can be extended to accomplish n choose p randomization – that is, choosing a set number of options from a larger set of many options — which is not possible via built-in Qualtrics randomization features. Third, I show how to accomplish block randomization among blocks created by respondent-entered characteristics. Finally, I demonstrate a technique useful for survey researchers in both experimental and non-experimental settings: using a lookup table to present certain information based on a unique identifier, respondents’ panel characteristics, or respondents’ answers.

Applications of Javascript in Qualtrics

Randomization by picking one option from many options

Basic randomization can easily be accomplished in Qualtrics via the “Survey Flow” tool of any survey project, which has a built-in “Randomizer” element that allows the researcher to assign p number of n choices to the respondent. Using embedded data elements, the researcher can then assign a respondent to one of n treatment groups randomly. If the research would like complete randomization (e.g. in cases of smaller sample sizes), they should check the “evenly present elements” option (Figure 1).

Figure 1: Basic Randomization via Survey Flow

This is a simple and easily-implemented solution for the researcher, yet it be- comes much more difficult as the number of experimental treatments (n) increases. For instance, a researcher might want to present many different campaign ads to respondents, or vary a treatment along multiple dimensions. In this world of larger n treatments options, using the built-in survey flow randomizer to add options be- comes tedious and prone to human error. In contrast, this sample functionality can be easily accomplished by embedding Javascript in the question text of the survey prior to when the experimental treatments are to appear. Simply put, this allows the user to paste a longer list of n treatment options into a Javascript array which is then randomly shuffled, and then one p option chosen and assigned as the treatment condition. To accomplish randomization this method uses what is called a Fisher-Yates shuffle. The Javascript code for this simple randomization is below, and is also available publicly on GitHub: “Randomize QT pickone.js.”

Qualtrics.SurveyEngine.addOnload(function(){
    function shuffle(array){
        var counter = array.length,
            temp, index;
        while (counter > 0){
            index = Math.floor(Math.random() * counter);
            counter = counter-1;
            temp = array[counter];
            array[counter] = array[index];
            array[index] = temp;
        }
        return array;
    }

    var myArray=["treatment1", "treatment2", "treatment3",
        "treatment4", "treatment5", "treatment6", "treatment7",
        "treatment8", "treatment9", "treatment10", "treatment11",
        "treatment12", "treatment13", "treatment14", "control"];

    shuffle(myArray);
    Qualtrics.SurveyEngine.setEmbeddedData("treatment",myArray[0]);
});

Randomization by picking more than one option from many options

Sometimes, a researcher might wish to randomize each respondent into more than one condition — for instance, when having respondents view multiple articles to ascertain the effects of media on political preferences. In this case, the researcher would want to use an n choose p technique to assign possible options to a respondent. The Javascript used in the first example above can be easily adapted to serve this purpose. In this instance, I use the hypothetical example of a researcher wishing to assign a respondent five media articles from a list of fifteen. The Javascript code for this is below, and is also available on GitHub: “Randomize QT pickmany.js.”

Qualtrics.SurveyEngine.addOnload(function(){
    function shuffle(array){
        var counter = array.length,
            temp, index;
        while (counter > 0){
            index = Math.floor(Math.random() * counter);
            counter = counter-1;
            temp = array[counter];
            array[counter] = array[index];
            array[index] = temp;
        }
        return array;
    }
    var myArray=["treatment1", "treatment2", "treatment3",
        "treatment4", "treatment5", "treatment6", "treatment7",
        "treatment8", "treatment9", "treatment10", "treatment11",
        "treatment12", "treatment13", "treatment14", "treatment15"];
    
    shuffle(myArray);
Qualtrics.SurveyEngine.setEmbeddedData("first_article",myArray[0]);
    Qualtrics.SurveyEngine.setEmbeddedData("second_article",myArray[1]);
    Qualtrics.SurveyEngine.setEmbeddedData("third_article",myArray[2]);
    Qualtrics.SurveyEngine.setEmbeddedData("fourth_article",myArray[3]);
    Qualtrics.SurveyEngine.setEmbeddedData("fifth_article",myArray[4]);

});

Block Randomization

This Javascript can also be adapted and combined with a web-hosted Shiny app to accomplish block randomization, a useful practice for increasing statistical efficiency in experiments. For instance, a researcher might want to block-randomize a treatment condition within different categories of respondents — usually a characteristic that the researcher believes will induce variation in the outcome. To block-randomize within a respondent characteristic, the researcher simply needs to create a question measuring that characteristic, and then randomize within values of that characteristic. For a limited number of characteristics or a small number of experimental conditions, this is easily accomplished with built-in branching based on respondent characteristics and completely randomizing within branches using the Qualtrics Survey Flow.

However, for a large number of options or background characteristics, adding these elements to survey flow becomes tedious. Using a combination of Javascript and a Shiny app hosted (for free) on shinyapps.io, however, you can make this a little easier. This Javascript uses my Shiny app to randomize among a (user-entered) number of experimental conditions within a block and pull that randomization into the Javascript to assign treatments within different categories, balancing assignment across conditions.

The R code to set up this Shiny App is available on GitHub under shinyapp/app.R and can be forked for users who with to use their own url for randomization. For users hoping to use my Shiny app url to randomize in their own surveys, make sure to create a unique surveyid in the Javascript. Users may also need to reset the treatment assignment randomization after testing. To do so, simply use a web browser to go to the following url, replacing “[surveyid]” with your surveyid for which you want to reset randomization: https://jdbk.shinyapps.io/blockrandomize/ ?surveyid=[surveyid]&reset=1

In this case, I wanted to block randomize the treatment used in the first example here by respondents’ eye color, which I asked about on my Qualtrics survey (Q56). Javascript code that can be adapted for this purposed is below, and is also available on GitHub: “Randomize QT byblock.js.”

Qualtrics.SurveyEngine.addOnload(function(){
    // replace the QID below with the ID of variable you want to block on:
    var blockvar = "${q://QID56/ChoiceGroup/SelectedChoices}";
    
// replace with your name + a unique name for your survey:
    var surveyid = "jdbk-sample_block_randomize";
    
// replace with number of experimental conditions to randomize among:
    var nconditions = "15";
    var myArray=["treatment1", "treatment2", "treatment3",
     "treatment4", "treatment5", "treatment6", "treatment7",
     "treatment8", "treatment9", "treatment10", "treatment11",
     "treatment12", "treatment13", "treatment14", "control"];

    let xmlHttp = new XMLHttpRequest();
    // the url below hits customizable shiny app with arguments set above
    xmlHttp.open(’GET’, ’https://jdbk.shinyapps.io/blockrandomize/?surveyid=’+
        surveyid + ’&blockvalue=’ + blockvar + ’&nconditions=’ + nconditions, false);
    xmlHttp.send(null);
    condition_assigned = xmlHttp.responseText; // get condition from randomizer
    Qualtrics.SurveyEngine.setEmbeddedData("treatment",myArray[condition_assigned]);
});

Lookup tables

Finally, researchers sometimes wish to look up information about respondents based on previous panel information (for instance, information that corresponds to an ID number). Or perhaps a researcher wanted to present certain information to respondents based on their location and saved information about their representatives in that location. Doing so would require the use of a lookup table — that is, a table with information that can be queried with a key. Doing this is common in data analysis but is a less common (though still useful) practice in survey research. It can, however, be easily implemented in Qualtrics using a custom Javascript function and a user-inputted set of vectors corresponding to the matched information. Sample code that accomplishes this is below and also on GitHub: “Lookup QT.js.”

Qualtrics.SurveyEngine.addOnload(function()
{
var ID_block_table = {
  ID: [’1’, ’2’, ’3’, ’4’, ’5’],
  block: [’a’, ’b’, ’c’, ’d’, ’e’]
};
function getLookupTableByID(mytable, IDfield, ID, returnfield) {
  matchindex = null;
  try {
    var matchindex = mytable[IDfield].indexOf(ID);
    var matchreturn = mytable[returnfield][matchindex];
  } catch (ex) {
    console.log(ex);
    }
  return matchreturn;
}

var MID = Qualtrics.SurveyEngine.getEmbeddedData("MID");

var blockmatch = getLookupTableByID(ID_block_table, "ID", MID, "block");
Qualtrics.SurveyEngine.setEmbeddedData(’block’,blockmatch);
});

References

Druckman, James N, Donald P Green, James H Kuklinski, and Arthur Lupia. 2006. “The Growth and Development of Experimental Research in Political Science.” American Political Science Review 100(4): 627–635.

Horiuchi, Yusaku, Kosuke Imai, and Naoko Taniguchi. 2007. “Designing and Ana- lyzing Randomized Experiments: Application to a Japanese Election Survey Ex- periment.” American Journal of Political Science 51(3): 669–687.

Imai, Kosuke, Gary King, and Elizabeth A Stuart. 2008. “Misunderstandings Be- tween Experimentalists and Observationalists about Causal Inference.” Journal of the Royal Statistical Society: Series A 171(2): 481–502.

Moore, Ryan T. 2012. “Multivariate Continuous Blocking to Improve Political Sci- ence Experiments.” Political Analysis 20(4): 460–479.

Mullinix, Kevin J, Thomas J Leeper, James N Druckman, and Jeremy Freese. 2015. “The Generalizability of Survey Experiments.” Journal of Experimental Political Science 2(2): 109–138.

Posted in Software, Uncategorized | Leave a comment

The 2019 Asian Political Methodology Meeting

[This post was contributed by Kentaro Fukumoto, Professor of Political Science at Gakushuin University.]

We held the joint conference of the 6th Asian Political Methodology Meeting and the second annual meeting of the Japanese Society for Quantitative Political Science at Doshisha University in Kyoto, Japan, on January 5 and 6, 2019. We have a small, intensive conference focusing on innovative quantitative methods and their applications. Although the conference seeks to promote the advancement of quantitative social science research in Asia, we have all regions of the world represented at the conference.

The program is at:

https://www.cambridge.org/core/membership/spm/conferences/asian-political-methodology-meeting/apmm-2019-program

where you can download papers. Presented posters are available at:

https://www.cambridge.org/core/membership/services/aop-file-manager/file/5c195cd72b51be8c6cf4ec46/APMM-2019-Senior-Posters.pdf

and

https://www.cambridge.org/core/membership/services/aop-file-manager/file/5c195cb62b51be8c6cf4ec45/APMM-2019-Junior-Posters.pdf

We called for papers from July to September and received 56 applications for paper (oral) presentations and 40 for posters from 19 countries. The program is composed of a keynote address, 14 panels (29 papers), and two poster sessions (42 posters). We had 117 registrants from 12 countries, while we ended up with 105 attendees (among which we have 68 visitors from abroad). Prof. Robert J. Franzese Jr. (University of Michigan) delivered the keynote address, titled “21st Century Political Methodology: Advances in All Modes of Empirical Analysis in Political Science.” For the first time, the conference hosted papers from graduate students and post-docs.

This year we established the Best Poster Award, sponsored by the Japanese Journal of Political Science and the Society for Political Methodology (SPM). The winners, Soichiro Yamauchi (Harvard University) and Naijia Liu (Princeton University), receive $200 worth of paperback books from Cambridge University Press and the SPM invites them to its annual summer meeting.

This conference is sponsored by Doshisha University, Egusa Foundation for International Cooperation in the Social Sciences, Gakushuin University, the Kajima Foundation, Princeton University (Program for the Quantitative and Analytical Political Science), and SPM. The Program Committee is composed of Kentaro Fukumoto (Gakushuin University, Japan, committee chair), Fang-Yi Chiou (Academia Sinica, Taiwan), Benjamin Goldsmith (Australian National University, Australia), Takeshi Iida (Doshisha University, Japan), Kosuke Imai (Harvard University, U.S.A.), Koji Kagotani (Osaka University of Economics, Japan), Xun Pang (Tsinghua University, China), and Jong Hee Park (Seoul National University, Korea). We plan to have the next meeting in Hong Kong in 2020, which will be hosted by Jiangnan Zhu (University of Hong Kong).

Posted in Uncategorized | Leave a comment

Corrigendum to “Lowering the Threshold of Statistical Significance to p < 0.005 to Encourage Enriched Theories of Politics” and “Questions and Answers: Reproducibility and a Stricter Threshold for Statistical Significance”

Although The Political Methodologist is a newsletter and blog, not a peer-reviewed publication, I still think it’s important for us to recognize and correct substantively important errors.  In this case, I’m sad to report such errors in two things I wrote for TPM. The error is the same in both cases.

In“Lowering the Threshold of Statistical Significance to p < 0.005 to Encourage Enriched Theories of Politics,” I claimed that:

When K-many statistically independent tests are performed on pre-specified hypotheses that must be jointly confirmed in order to support a theory, the chance of simultaneously rejecting them all by chance is αK where p < α is the critical condition for statistical significance in an individual test. As K increases, the α value for each individual study can fall and the overall power of the study often (though not always) increases.

This argument is offered to support the conclusion that “moving the threshold for statistical significance from α = 0.05 to α = 0.005 would benefit political science if we adapt to this reform by developing richer, more robust theories that admit multiple predictions.”

Similarly, in “Questions and Answers: Reproducibility and a Stricter Threshold for Statistical Significance,” I claimed that:

Another measure to lower  Type I error (and the one that I discuss in my article in The Political Methodologist ) is to pre-specify a larger number of different hypotheses from a theory and to jointly test these hypotheses. Because the probability of simultaneously confirming multiple disparate predictions by chance is (almost always) lower than the probability of singly confirming one of them, the size of each individual test can be larger than the overall size of the test, allowing for the possibility that the overall test is substantially more powerful at a given size.

This reasoning, which is similar to reasoning offered in Esarey and Sumner (2018b), is incorrect; it would only be true when all predicted parameters were equal to zero. When the alternative hypothesis is that multiple directional predictions for parameters,  for example βi > 0 for i 1…K, separate t-tests rejecting each individual null (βi 0) separately using t-tests with size α will jointly reject all the null hypotheses at most α proportion of the time. The key insight is that the joint null hypothesis space includes the possibility that some βi parameters match the predictions while others do not; if (for example) β1 = 0 and all other βi=/=1 are very large, the probability of falsely rejecting the joint null hypothesis is the α for the test of β1. As we note in Esarey and Sumner (2018a), this is discussed and proved in Silvapulle and Sen (2005, Section 5.3), especially in proposition 5.3.1, and in Casella and Berger 2002, Section 8.2.3 and 8.3.3. Silvapulle and Sen cite Lehmann (1952); Berger (1982); Cohen, Gatsonis and Marden (1983); and Berger (1997) (among others) as sources for this argument. Associated calculations (such as that in Figure 4 of “Lowering the Threshold of Statistical Significance to p < 0.005 to Encourage Enriched Theories of Politics”) are based on the same error and therefore incorrect.

The upshot is that my argument for making additional theoretical predictions in order to facilitate lowering the threshold for statistical significance to α = 0.005 is based on faulty reasoning and incorrect.

I plan to post this correction as an addendum to both of the print editions featuring these articles.

References

Berger, Roger L. 1982. “Multiparameter Hypothesis Testing and Acceptance Sampling.” Technometrics 24(4):295–300.

Berger,  Roger  L.  1997.  Likelihood  ratio  tests  and  intersection-union  tests.  In  Advances  in statistical decision theory and applications, ed. Subramanian Panchapakesan and Narayanaswamy Balakrishnan.  Boston:  Birkhäuser pp. 225–237.

Casella, George and Roger L.. Berger. 2002. Statistical Inference, Second Edition. Belmont,
CA: Brooks/Cole.

Cohen, Arthur, Constantine Gatsonis and John I. Marden. 1983. “Hypothesis testing  for marginal probabilities in a 2 x 2 x 2 contingency table with conditional independence.”   Journal of the American Statistical Association 78(384):920–929.

Esarey, Justin and Jane Lawrence Sumner. 2018a. “Corrigendum to Marginal Effects in Interaction Models: Determining and Controlling the False Positive Rate.” Online. URL: http://justinesarey.com/interaction-overconfidence-corrigendum.pdf.

Esarey, Justin and Jane Lawrence Sumner. 2018b. “Marginal Effects in Interaction Mod-  els: Determining and Controlling the False Positive Rate.” Comparative Political Studies 51(9):1144–1176. DOI: https://doi.org/10.1177/0010414017730080.

Lehmann, Erich L. 1952. “Testing multiparameter hypotheses.” The Annals of Mathematical Statistics pp. 541–552.

Silvapulle, Mervyn J. and Pranab K. Sen. 2005. Constrained Statistical Inference: Inequality, Order, and Shape Restrictions. Hoboken, NJ: Wiley.

Posted in Uncategorized | 2 Comments

Papers Written by Women Authors Are Cited Less Frequently, but the Etiology of this Finding is Complex

Justin Esarey, Wake Forest University
Kristin Bryant, Rice University

Synopsis

A recent symposium in Political Analysis, anchored around Dion, Sumner and Mitchell (2018), discusses their finding that articles authored by women are more likely to cite at least one paper authored by women. Our contribution to this symposium (Esarey and Bryant, 2018) noted that articles in the Dion, Sumner, and Mitchell (2018) data set with at least one female author are cited no more or less often than male-authored articles once we control for the publishing journal and the number of authors. In this paper, we present additional findings that place the results of our original paper into a broader context. This context is important to fully understand how scholarship by women is utilized by the discipline, how scholars’ careers are impacted as a result of this utilization, and how we might achieve greater gender parity in the field.

When looking at the the unadjusted data set, articles with at least one woman author are in fact cited fewer times on average. It is plausible that this citation gap does represent a substantively meaningful barrier to the advancement of women in the discipline. As we reported in Political Analysis, papers with women authors are no more or less likely to be cited once the number of authors and the publishing journal are controlled for via linear regression. However, simply controlling for author count is insufficient to eliminate the gender disparity in citations: controlling for the publishing journal is crucial. An implication is that women may be systematically disadvantaged in the field, but that this disadvantage is not a function of discrimination against women when articles are chosen to be cited. Instead, consistent with the findings of Teele and Thelen (2017), we find that articles in the most-cited journals of the discipline are less likely to have women authors. The etiology of that relationship (and the citation gender gap that it creates among political scientists) is difficult to unravel.

Full Text


Replication File

A replication file for this paper is available at https://doi.org/10.7910/DVN/XC76G3.

Posted in Uncategorized | Leave a comment

International Methods Colloquium: 2018-2019 Schedule!

On behalf of the advisory board (Michelle Dion, Cassy Dorff, Jeff Harden, Dustin Tingley, and Chris Zorn), I am pleased to announce the schedule of International Methods Colloquium series talks for the 2018-2019 academic year!

The International Methods Colloquium (IMC) is a weekly seminar series of methodology-related talks and roundtable discussions focusing on political methodology; the series is supported by Wake Forest University and was previously supported by Rice University and a grant from the National Science Foundation. The IMC is free to attend from anywhere around the world using a PC or Mac, a broadband internet connection, and our free software. You can find out more about the IMC at our website:

http://www.methods-colloquium.com/

where you can join a talk in progress using the “Watch Now!” link. You can also watch archived talks from previous IMC seasons at this site. Registration in advance for a talk is encouraged, but not required.

Note that all talks begin at 12:00p Eastern Time and last precisely one hour.

Here is our schedule of presenters (and a link to our Google Calendar):

Fall Semester
  1. Oct 12: Matthew Blackwell, Harvard [register to attend]
  2. Oct 19: Masha Krupenkin, Stanford [register to attend]
  3. Nov 2: Roundtable on Gender, Citations, and the Methodology Community with Michelle Dion (McMaster), Sara Mitchell (Iowa), Dave Peterson (Iowa State), and Barbara Walter (UCSD) [register to attend]
  4. Nov 9: Luke Keele, University of Pennsylvania [register to attend]
  5. Nov 16: Kevin Munger, Princeton/Penn State [register to attend]
  6. Nov 30: Pablo Barbera, London School of Economics [register to attend]
Spring Semester
  1. Feb 1: Michelle Torres, Rice [register to attend]
  2. Feb 8: Marcel Neunhoeffer, Mannheim [register to attend]
  3. Feb 15: Winston Chou, Princeton [register to attend]
  4. Feb 22: Erin Rossiter, WUSTL [register to attend]
  5. Mar 1: Matthew Tyler/Christian Fong, Stanford [register to attend]
  6. Mar 8: Rob Carroll, Florida State [register to attend]
  7. March 29: Carlos Carvalho, University of Texas Statistics [register to attend]

Additional information for each talk (including a title and a link to a relevant paper) will be released closer to its date.

Please contact me if you need any more information; I hope to see many of you there!

Posted in Uncategorized | Leave a comment

Using Sequence Analysis to Understand Career Progression: An Application to the UK House of Commons

Matia Vannoni, IGIER, Bocconi University
Peter John, Department of Political Economy, King’s College London

Abstract: We argue that sequence analysis, mainly used in sociology, may be effectively deployed to investigate political careers inside legislatures. Career progression is a classic topic in political science, but political scientists have mainly examined access to legislatures. Although data reduction methods, for instance, can provide insight, we argue that sequence analysis can be used to understand better the career patterns inside parliaments. In this paper, we explain the method. Then we show how it can describe steps in political careers and map different patterns of advancement. We apply sequence analysis to a case study of MPs in the UK House of Commons from 1997 to 2015. We describe the variety of career paths and carry out regression analysis on the determinants of MP career progression.

Full Article

 

Online Appendix

 

Replication File

Replication files for this paper are located at: https://doi.org/10.7910/DVN/I8YHPT.

Posted in Uncategorized | Leave a comment