IMC: Roundtable on Preregistration of Research Designs on Friday, 2/5 at 12:00 eastern

This Friday, 2/5 at 12:00 noon Eastern, the International Methods Colloquium will host a roundtable discussion on the movement to encourage preregistration of research designs. Our scheduled roundtable participants are:

  1. Macartan Humphreys of Columbia University
  2. James Monogan of the University of Georgia
  3. Arthur Lupia of the University of Michigan
  4. Ryan T. Moore of American University

To tune in to the roundtable and participate in the discussion after the talk, visit and click “Watch Now!” on the day of the talk. To register for the talk in advance, click here:

The IMC uses GoToWebinar, which is free to use for listeners and works on PCs, Macs, and iOS and Android tablets and phones. You can be a part of the talk from anywhere around the world with access to the Internet. The roundtable will last for a total of one hour.

Posted in Uncategorized | Leave a comment

2015 TPM Annual Most Viewed Post Award Winner: Nicholas Eubank

On behalf of the editorial team at The Political Methodologist, I am proud to announce the 2015 winner of the Annual Most Viewed Post Award: Nicholas Eubank of Stanford University! Nick won with his very informative post “A Decade of Replications: Lessons from the Quarterly Journal of Political Science.” This award entitles the recipient to a line on his or her curriculum vitae and one (1) high-five from the incumbent TPM editor (to be collected at the next annual meeting of the Society for Political Methodology).[1]

The award is determined by examining the total accumulated page views for any piece published between December 1, 2014 and December 31, 2015; pieces published in December are examined twice to give them a fair chance to garner page views. The page views for December 2014 and calendar year 2015 are shown below; orange hash marks next to the post indicate that it was published during the time period (and thus eligible to receive the award).

dec-2014-stats 2015-posts

Nick’s contribution was viewed 2,758 times during the eligible time period, making him the clear winner for this year’s award. Congratulations, Nick!

Our runner-up is Brendan Nyhan and his post “A Checklist Manifesto for Peer Review” which was viewed 1,421 times during the eligibility period. However, because Brendan’s piece was published in December 2015, he’ll be eligible for consideration for next year’s award as well!

In related news, Thomas Leeper‘s post “Making High-Resolution Graphics for Academic Publishing” continues to dominate our viewership statistics; this post also won the 2014 TPM Most-Viewed Post Award. Although we don’t have a formal award for him, I’m happy to recognize that Thomas’s post has had a lasting impact among the readership and pleased to congratulate him for that achievement.

I am also happy to report that The Political Methodologist continues to expand its readership. Our 2015 statistics are here:


This represents a considerable improvement over last year’s numbers, which gave us 43,924 views and 29,562 visitors. I’m especially happy to report the substantial increase in unique visitors to our site: over 8,000 new unique viewers in 2015 compared to 2014! Our success is entirely attributable to the excellent quality of the contributions published in TPM. So, thank you contributors!


[1] A low-five may be substituted upon request.

Posted in Editorial Message, The Discipline | Leave a comment

Call for Paper Proposals: Visions in Methodology 2016 at University of California, Davis

[Ed. note: this post is taken from a message to the PolMeth listserv by Amber E. Boydstun, Assistant Professor at UC Davis. The 2016 VIM conference is hosted by UC Davis and UC Merced.]

The 2016 Visions in Methodology (VIM) Program Committee is now accepting
proposals for papers for the VIM 2016 Workshop to be held at the University
of California, Davis from May 16 to 18, 2016. We invite submissions from
female graduate students and faculty that address measurement, causal
inference, and the application of advanced statistical methods to
substantive research questions.

Participants are expected to arrive at UC Davis by 5:00 pm on May 16. The
workshop consists of two days of research presentations and opportunities
for networking and mentoring and concludes with a dinner on May 18.
Participants are welcome to depart on May 19 or to take part in an optional
trip to Napa Valley on May 19. VIM offers financial support for travel,
lodging, and food for selected participants for activities from May 16 to
May 18.

To apply, please send the title and description of your research project
(100 – 150 words) along with a curriculum vita to The application deadline is January 31,
2016. Female graduate students and faculty at all ranks are encouraged to
apply, although preference will be given to those at earlier stages of
their careers.

The Visions in Methodology (VIM) workshop is part of a broad effort to
support women in the field of political methodology. VIM provides
opportunities for scholarship, networking, and mentoring for women in the
political methodology community. In addition to providing a forum to
present research, VIM aims to connect women in a subfield of political
science where they are underrepresented.
Program Co-Chairs
Amber Boydstun (UCD); Courtenay Conrad, Emily Ritter (UCM)

Program Committee
Cheryl Boudreau, Adrienne Hosek, Heather McKibben, Lauren Peritz (UCD);
Jessica Trounstine (UCM)

Posted in Call for Papers / Conference, The Discipline | Leave a comment

Acceptance rates and the aesthetics of peer review

Based on the contributions to The Political Methodologist‘s special issue on peer review, it seems that many political scientists are not happy with the kind of feedback they receive from the peer review process. A theme seems to be that reviewers focus less on the scientific merits of a piece–viz., what can be learned from the evidence offered–and more on whether the piece is to the reviewer’s taste in terms of questions asked and methodologies employed. While I agree that this feedback is unhelpful and undesirable, I am also concerned that it is a fundamental feature of the way our peer review system works. More specifically, I believe that a system of journals with prestige tiers enforced by extreme selectivity creates a review system where scientific soundness is a necessary but far from sufficient criteria for publication, meaning that fundamentally aesthetic and sociological factors ultimately determine what gets published and inform the content of our reviews.

As Brendan Nyhan says, “authors frequently despair not just about timeliness of the reviews they receive but their focus.” Nyhan seeks to improve the focus of reviews by offering a checklist of questions that reviewers should answer as a part of their reviews (omitting those questions that, presumably, they should /not/ seek to answer). These questions revolve around ensuring that evidence offered is consistent with conclusions (“Does the author control for or condition on variables that could be affected by the treatment of interest?”) and that statistical inferences are unlikely to be spurious (“Are any subgroup analyses adequately powered and clearly motivated by theory rather than data mining?”).

The other contributors express opinions in sync with Nyhan’s point of view. For example, Tom Pepinsky says:

“I strive to be indifferent to concerns of the type ‘if this manuscript is published, then people will work on this topic or adopt this methodology, even if I think it is boring or misleading?’ Instead, I try to focus on questions like ‘is this manuscript accomplishing what it sets out to accomplish?’ and ‘are there ways to my comments can make it better?’ My goal is to judge the manuscript on its own terms.”

Relatedly, Sara Mitchell argues that reviewers should focus on “criticisms internal to the project rather than moving to a purely external critique.” This is explored more fully in the piece by Krupnikov and Levine, where they argue that simply writing “external validity concern!” next to any laboratory experiment hardly addresses whether the article’s evidence actually answers the questions offered; in a way, the attitude they criticize comes uncomfortably close to arguing that any question that can be answered using laboratory experiments doesn’t deserve to be asked, ipso facto.

My own perspective on what a peer review ought to be has changed during my career. Like Tom Pepinsky, I once thought my job was to “protect” the discipline from “bad research” (whatever that means). Now, I believe that a peer review ought to answer just one question: What can we learn from this article? [fn 1]

Specifically, I think that every sentence in a review ought to be:

  1. a factual statement about what the author believes can be learned from his/her research, or
  2. a factual statement of what the reviewer thinks actually can be learned from the author’s research, or
  3. an argument about why something in particular can (or cannot) be learned from the author’s research, supported by evidence.

This feedback helps an editor learn the marginal contribution that the submitted paper makes to our understanding, informing his/her judgment for publication. It also helps the author understand what s/he is communicating in her piece and whether claims must be trimmed or hedged to ensure congruence with the offered evidence (or more evidence must be offered to support claims that are central to the article).

Things that I think that shouldn’t be addressed in a review include:

  1. whether the reviewer thinks the contribution is sufficiently important to be published in the journal
  2. whether the reviewer thinks other questions ought to have been asked and answered
  3. whether the reviewer believes that an alternative methodology would have been able to answer different or better questions
  4. whether the paper comprehensively reviews extant literature on the subject (unless the paper defines itself as a literature review)

In particular, I think that the editor is the person in the most appropriate position to decide whether the contribution is sufficiently important for publication, as that is a part of his/her job; I also think that such a decision should be made (whenever possible) by the editorial staff before reviews are solicited. (Indeed, in another article I offer simulation evidence that this system actually produces better journal content, as evaluated by the overall population of political scientists, compared to a more reviewer-focused decision procedure.) Alternatively, the importance of a publication could be decided (as Danilo Friere alludes) by the discipline at large, as expressed in readership and citation rates, and not by one editor (or a small number of anonymous reviewers); such a system is certainly conceivable in the post-scarcity publication environment created by online publishing.

Of course, as our suite of contributions to TPM make clear, most of us do not receive reviews that are focused narrowly on the issues that I have outlined. Naturally, this is a frustrating experience. I think it is particularly trying to read a review that says something like, “this paper makes a sound scientific contribution to knowledge, but that contribution is simply not important enough to be published in journal X.” It is annoying precisely because the review acknowledges that the paper isn’t flawed, but simply isn’t to the reviewer’s taste. It is the academic equivalent of being told that the reviewer is “just not that into you.” It is a fundamentally unactionable criticism.

Unfortunately, I believe that authors are likely to receive more, not less, of such feedback in the future regardless of what I or anyone else may think. The reason is that journal acceptance rates are so low, and the proportion of manuscripts that make sound contributions to knowledge is so high, that other criteria must necessarily be used to select from those papers which will be published from the set of those that could credibly be published.

Consider that in 2014, the American Journal of Political Science accepted only 9.6% of submitted manuscripts and International Studies Quarterly accepted about 14%. The trend is typically downward: at Political Research Quarterly, acceptance rates fell by 1/3 between 2006 and 2011 (to just under 12 percent acceptance in 2011). I speculate that far more than 10-14% of the manuscripts received by AJPS, ISQ, and PRQ were scientifically sound contributions to political science that could have easily been published in those journals–at least, this is what editors tend to write in their rejection letters!

When (let us say, for argument’s sake) 25% of submitted articles are scientifically sound but journal acceptance rates are less than half that value, it is essentially required that editors (and, by extension, reviewers) must choose on criteria other than soundness when selecting articles for publication. It is natural that the slippery and socially-constructed criterion of “importance” in its many guises would come to the fore in such an environment. Did the paper address questions you think are the most “interesting?” Did the paper use what you believe are the most “cutting edge” methodologies? “Interesting” questions and “cutting edge” methodologies are aesthetic judgments, at least in part, and defined relative to a group of people making these aesthetic judgments. Consequently, I fear that the peer review process must become as much a function of sociology as of science because of the increasingly competitive nature of journal publication. Insomuch that I am correct, I think would prefer that these aesthetic judgments come from the discipline at large (as embodied in readership rates and citations) and not from two or three anonymous colleagues.

Still, as long as there are tiers of journal prestige and these tiers are a function of selectivity, I would guess that the power of aesthetic criteria to influence the peer review process has to persist. Indeed, I speculate that the proportion of sound contributions in the submission pool is trending upward because of the intensive focus of many PhD programs on rigorous research design training and the ever-increasing requirements of tenure and promotion committees. At the very least, the number of submissions is going up (from 134 in 2001 to 478 in 2014 at ISQ), so even if quality is stable selectivity must rise if the number of journal pages stays constant. Consequently, I fear that a currently frustrating situation is likely to get worse over time, with articles being selected for publication in the most prominent journals of our discipline on progressively more whimsical criteria.

What can be done? At the least, I think we can recognize that the “tiers” of journal prestige do not necessarily mean what they might have used to in terms of scientific quality or even interest to a broad community of political scientists and policy makers. Beyond this, I am not sure. Perhaps a system that rewards authors more for citation rates and less for the “prestige” of the publication outlet might help. But undoubtedly these systems would also have unanticipated and undesirable properties, and it remains to be seen whether they would truly improve scholarly satisfaction with the peer review system.


[1] Our snarkier readers may be thinking that this question can be answered in just one word for many papers they review: “nothing.” I cannot exclude that possibility, though it is inconsistent with my own experience as a reviewer. I would say that, if a reviewer believed nothing can be learned from a paper, I would hope that the reviewer would provide feedback that is lengthy and detailed enough to justify that conclusion.

Posted in Peer Review | Leave a comment

Call for Proposals for the 33rd Annual Meeting of the Society of Political Methodology at Rice University, July 21-23rd, 2016

The Program Committee invites applications to attend and / or participate in the 33rd Annual Meeting of the Society for Political Methodology. This year’s meeting will be hosted by Rice University’s Department of Political Science in Houston, TX. The conference will include panel presentations of a single project and poster sessions for faculty members and graduate students that will provide ample opportunity for rich interaction and scholarly interchange.

Who should apply:

  • Faculty members who wish to present their research as a paper or a poster
  • Faculty members who wish to serve as a discussant
  • Faculty members who wish to attend the conference without presenting
  • Junior faculty members who wish to be considered for travel funding
  • Graduate students who wish to attend or present a poster (important note: graduate students also require a letter of support from an advisor; see below)

The deadline for applications is Monday, March 28th, 2016. This deadline is firm and will not be extended.

Click here to apply to attend the summer meeting:

What can be proposed: Faculty applicants are invited to propose a poster, a paper; to act as a discussant; or simply to attend the meeting. Graduate students are strongly encouraged to propose a poster presentation—which has a long history of being career-enhancing—but may also apply just to attend. All research topics relevant to political methodology are welcome.

Funding opportunities: Courtesy of the National Science Foundation, we are able to fund about 30 graduate students whose poster proposals are selected as the most promising. All graduate students need to supply a letter from a faculty sponsor. Also through this NSF grant, we are able to fund, as part of a diversity initiative, a limited number of junior faculty and graduate students from historically under-represented groups as well as junior faculty at departments with restricted travel funds. Other faculty are expected to seek support from their home institutions or otherwise use their own support funds to cover their travel and hotel expenses.

Notice to graduate students and their advisors: Each graduate student attendee must have his or her application supported by a faculty recommendation. These recommendations are submitted through a webform and are due on Monday, March 28, 2016. They are typically 2 to 4 paragraphs in length and discuss the student’s poster proposal, methodological training, and any other information that would be relevant for the Program Committee. We strongly encourage students to give their potential recommenders as much advance notice as possible and to inform them of the March 28, 2016 deadline.

Recommendations can be submitted at this link:

Questions or concerns above the host site and program logistics of the meeting can be directed to Justin Esarey,

Questions regarding general procedures, practices, and other matters of the Society or the Conference can be directed to Jeffrey B. Lewis,

Questions regarding the WUSTL application form can be directed to Jacob M. Montgomery,

This year’s Program Committee is: Justin Esarey (Chair, Rice University); Molly Roberts (UC San Diego); Rick Wilson (Rice University); Maya Sen (Harvard University); Paul Kellstedt (Texas A&M); Michelle Dion (McMaster University); Kevin Clarke (University of Rochester); Meg Shannon (UC Boulder); Jeff Lewis (ex-officio, UCLA / Society President).

Posted in Call for Papers / Conference, The Discipline | Leave a comment

Offering (constructive) criticism when reviewing (experimental) research

by: Yanna Krupnikov and Adam Seth Levine

No research manuscript is perfect, and indeed peer reviews can often read like a laundry list of flaws. Some of the flaws are minor and can be easily eliminated by an additional analysis or a descriptive sentence. Other flaws often stand – at least in the mind of the reviewer – as a fatal blow to the manuscript.

Identifying a manuscript’s flaws is part of a reviewer’s job. And reviewers can potentially critique every kind of research design. For instance, they can make sweeping claims that survey responses are contaminated by social desirability motivations, formal models rest on empirically-untested assumptions, “big data” analyses are not theoretically-grounded, observational analyses suffer from omitted variable bias, and so on.

Yet, while the potential for flaws is ever-present, the key for reviewers is to go beyond this potential and instead ascertain whether such flaws actually limit the contribution of the manuscript at hand. And, at the same time, authors need to communicate why we can learn something useful and interesting from their manuscript despite the research design’s potential for flaws.

In this essay we focus on one potential flaw that is often mentioned in reviews of behavioral research, especially research that uses experiments: critiques about external validity based on characteristics of the sample.

In many ways it is self-evident to suggest that the sample (and, by extension, the population that one is sampling from) is a pivotal aspect of behavioral research. Thus it is not surprising that reviewers often raise questions not only about the theory, research design, and method of data analysis, but also the sample itself. Yet critiques of the sample are often stated in terms of potential flaws – that is, they are based on the possibility that a certain sample could affect the conclusions drawn from an experiment rather than stating how the author’s particular sample affects the inferences that we can draw from his or her particular study.

Here we identify a concern with certain types of sample-focused critiques and offer recommendations for a more constructive path forward. Our goals are complimentary and twofold: first, to clarify authors’ responsibilities when justifying the use of a particular sample in their work and, second, to offer constructive suggestions for how reviewers should evaluate these samples. Again, while our arguments could apply to all manuscripts containing behavioral research, we pay particular attention to work that uses experiments.

What’s the concern?

Researchers rely on convenience samples for experimental research because it is often the most feasible way to recruit participants (both logistically and financially). Yet, when faced with convenience samples in manuscripts, reviewers may bristle. At the heart of such critiques is often the concern that the sample is too “narrow” (Sears 1986). To argue that a sample is narrow means that the recruited participants are homogenous in a way that differs from other populations to which authors might wish to generalize their results (and in a way that affects how participants respond to the treatments in the study). Although undergraduate students were arguably the first sample to be classified as a “narrow database” (Sears 1986), more recently this label has been applied to other samples, such as university employees, residents of a single town, travelers at a particular airport, and so on.

Concerns regarding the narrowness of a sample typically stem from questions of external validity (Druckman and Kam 2011). External validity refers to whether a “causal relationship holds over variations in persons, settings, treatments and outcomes” (Shadish, Cook and Campbell 2002, 83). If, for example, a scholar observes a result in one study, it is reasonable to wonder whether the same result could be observed in a study that altered the participants or slightly adjusted the experimental context. While the sample is just one of many aspects that reviewers might use when judging the generalizability of an experiment’s results – others might include variations in the setting of the experiment, its timing, and/or the way in which theoretical entities are operationalized – sample considerations have often proved focal.

At times during the review process, the type of sample has become a “heuristic” for evaluating the external validity of a given experiment. Relatively “easy” critiques of the sample – those that dismiss the research simply because they involve a particular convenience sample – have evolved over time. A decade ago such critiques were used to dismiss experiments altogether, as McDermott (2002:334) notes: “External validity…tend[s] to preoccupy critics of experiments. This near obsession…tend[s] to be used to dismiss experiments.” More recently, Druckman and Kam (2011) noted such concerns were especially likely to be directed toward experiments with student samples: “For political scientists who put particular emphasis on generalizability, the use of student participants often constitutes a critical, and according to some reviewers, fatal problem for experimental studies.” Even more recently, reviewers lodge this critique against other convenience samples such as those from Amazon’s Mechanical Turk.

Note that, although they are writing almost a decade apart, both McDermott and Druckman and Kam are observing the same underlying phenomenon: reviewers dismissing experimental research simply because it involves a particular sample. The review might argue that the participants (for example, undergraduate students, Mechanical Turk workers, or any other convenience sample) are generally problematic, rather than arguing that they pose a problem for the specific study in the manuscript.

Such general critiques that identify a broad potential problem with using a certain sample can, in some ways, be more even damning than other types of concerns that reviewers might raise. An author could address questions of analytic methods by offering robustness checks. In a well-designed experiment, the author could reason through questions of alternative explanations using manipulation checks and alternative measures. When a review suggests that the core of the problem is that a sample is generally “bad”, however, the reviewer is (indirectly) stating that readers cannot glean much about the research question from the authors’ study and that the reviewer him/herself is unlikely to be convinced by any additional arguments the author could make (save a new experiment on a different sample).

None of the above is to suggest that critiques of samples should not be made during the review process. Rather, we believe that they should adhere to a similar structure as concerns that reviewers might raise about other parts of a manuscript. Just as reviewers evaluate experimental treatments and measures within the context of the authors’ hypotheses and specific experimental design, evaluations of the sample also benefit from being experiment-specific. Rather than asking “is this a `good’ or `bad’ sample?”, we suggest that reviewers ask a more specific question: “is this a `good’ or `bad’ sample given the author’s research goals, hypotheses, measures, and experimental treatments?”

A constructive way forward

When reviewing a manuscript that relies on a convenience sample, reviewers sometimes dismiss the results based on the potential narrowness of a sample. Such a dismissal, we argue, is a narrow critique. The narrowness of a sample certainly can threaten the generalizability of the results, but it does not do so unconditionally. Indeed, as Druckman and Kam (2011) note, the narrowness of a sample is limiting if the sample lacks variance on characteristics that affect the way a participant responds to the particular treatments in a given study.

Consider, for example, a study that examines the attitudinal impact of alternative ways of framing health care policies. Suppose the sample is drawn from the undergraduate population at a local university, but the researcher argues (either implicitly or explicitly) that the results can help us understand how the broader electorate might respond to these alternative framings.

In this case, one potential source of narrowness might stem from personal experience. We might (reasonably) assume that undergraduate students are highly likely to have experience interacting with a doctor or a nurse (just like non-undergraduate adults). Yet, they are perhaps far less likely to have experience interacting with health insurance administrators (unlike non-undergraduate adults). When might this difference threaten the generalizability of the claims that the author wishes to make?

The answer depends upon the specifics of the study. If we believe that personal experience with health care providers and insurance administrators does not affect how people respond to the treatments, then we would not have reason to believe that the narrowness of the undergraduate sample would threaten the authors’ ability to generalize the results. If instead we only believe that experience with a doctor or nurse may affect how people respond to the treatments (e.g. perhaps how they comprehend the treatments, the kinds of considerations that come to mind, and so on) then again we would not have reason to believe that the narrowness of the undergraduate sample would threaten the ability to generalize. Lastly, however, if we also believe that experience with insurance administrators affects how people respond to the treatments, then that would be a situation in which the narrowness might limit the generalizability of the results.

What does this mean for reviewers? The general point is that, even if we have reason to believe that the results would differ if a sample were drawn from a different population, this fact does not render the study or its results entirely invalid. Instead, it changes the conclusions we can draw. Returning to the example above, a study in which experience with health insurance administrators affects responses still offers some political implications about health policy messages. But (for example) its scope may be limited to those with very little experience interacting with insurance administrators.

It’s worth noting that in some cases narrowness might be based on more abstract, psychological factors that apply across several experimental contexts. For instance, perhaps reviewers are concerned that undergraduates are narrow because they are both homogeneous and different in their reasoning capacity from several other populations to which authors often wish to generalize. In that case, the most constructive review would explain why these reasoning capacities would affect the manuscript’s conclusions and contribution.

More broadly, reviewers may also consider the researcher’s particular goals. Given that some relationships are otherwise difficult to capture, experimental approaches often offer the best means for identifying a “proof of concept” – that is, whether under theorized conditions a “particular behavior emerges” (McKenzie 2011). These types of “proof of concept” studies may initially be performed only in the laboratory and often with limited samples. Then, once scholars observe some evidence that a relationship exists, more generalizable studies may be carried out. Under these conditions, a reviewer may want to weigh the possibility of publishing a “flawed” study against the possibility of publishing no evidence of a particularly elusive concept.

What does this mean for authors? The main point is that it is the author’s responsibility to clarify why the sample is appropriate for the research question and the degree to which the results may generalize or perhaps be more limited. It is also the author’s responsibility to explicitly note why the result is important even despite the limitations of the sample.

What about Amazon’s Mechanical Turk?

Thus far we have (mostly) avoided mentioning Amazon’s Mechanical Turk (MTurk). We have done so deliberately, as MTurk is an unusual case. On the one hand, MTurk provides a platform for a wide variety of people to participate in tasks such as experimental studies for money. One result is that MTurk typically provides samples that are much more heterogeneous than other convenience samples and are thus less likely to be “narrow” on important theoretical factors (Huff and Tingley 2015). These participants often behave much like people recruited in more traditional ways (Berinsky, Huber and Lenz 2012). On the other hand, MTurk participants are individuals who were somehow motivated to join the platform in the first place and over time (due to the potentially unlimited number of studies they can take) have become professional survey takers (Krupnikov and Levine 2014; Paolacci and Chandler 2014). This latter characteristic in particular suggests that MTurk can produce an unusual set of challenges for both authors and reviewers during the manuscript review process.

Much as we argued that a narrow sample is not in and of itself a reason to advocate for a manuscript’s rejection (though the interaction between the narrowness of the sample and the author’s goals, treatments and conclusions may provide such a reason), so too when it comes to MTurk we believe that this recruitment approach does not provide prima facie evidence to advocate rejection.

When using MTurk samples, it is the author’s responsibility to acknowledge and address any potential narrowness of the sample that might stem from the sample. It is also the author’s responsibility to design a study that accounts for the fact that MTurkers are professionalized participants (Krupnikov and Levine 2014) and to explain why a particular study is not limited by the characteristics that make MTurk unusual. At the same time, we believe that reviewers should avoid using MTurk as an unconditional heuristic for rejection and instead should always consider the relationship between treatment and sample in the study at hand.

We are not the first to note that reviewers can voice concerns about experiments and/or the samples used in experiments. These types of sample critiques may often seem unconditional, as in: there is no amount of information that the author could offer that could lead the reviewer to reconsider his or her position on the sample. Put another way, the sample type is used as a heuristic, with little consideration of the specific experimental context in the manuscript.

We are not arguing that reviewers should never critique samples. Rather, our argument is that the fact that researchers chose to recruit a convenience sample from the population of undergraduates at a local university, the population of MTurk workers, and so on is not a justifiable reason on its own for recommending rejection of a paper. Rather, the validity of the sample depends upon the author’s goals, the experimental design, and the interpretation of the results. The use of undergraduate students may have few limitations for one experiment but may prove largely crippling for another one. And, echoing Druckman and Kam (2011), even a nationally-representative sample is no guarantee of external validity.

The reviewer’s task, then, is to examine how the sample interacts with all the other components of the manuscript. The author’s responsibility, in turn, is to clarify such matters. And, in both cases, both the reviewer and the author should acknowledge that the only way to truly answer questions about generalizability is to continue examining the question in different settings as part of an ongoing research agenda (McKenzie 2011).

Lastly, while we have focused on a common critique of experimental research, this is just one example of a broader phenomenon. All research designs are imperfect in one way or another, and thus the potential for flaws is always present. Constructive reviews should evaluate such flaws in the context of the manuscript at hand, and then decide if the manuscript credibly contributes to our knowledge base. And, similarly, authors are responsible for communicating the value of their manuscript despite any potential flaws stemming from their research design.


Berinsky, A. J., Huber, G. A., and Lenz, G. S. 2012. Evaluating Online Labor Markets for Experimental Research:’s Mechanical Turk. Political Analysis 20: 351–68.

Druckman, J. N., and Kam, C. D. 2011. Students as Experimental Participants: A Defense of the ‘Narrow Data Base’. In Handbook of Experimental Political Science. eds. Druckman, Green, Kuklinski, and Lupia, New York: Cambridge University Press.

Huff, C. and Tingley, D. 2015. “Who are these people?”Evaluating the Demographic Characteristics and Political Preferences of Mturk Survey Respondents. Working Paper.

Krupnikov, Y and Levine, A.S. 2014. Cross-Sample Comparisons and External Validity. Journal of Experimental Political Science. 1: 59-80.

McDermott, R. 2002. Experimental Methodology in Political Science. Political Analysis 10: 325–42.

McKenzie, D. 2011 “A Rant on the External Validity Double Double-Standard” Development Impact: The World Bank. (Accessed: Dec 10, 2015).

Paolacci, G. and Chandler, J. 2014. Inside the Turk: Understanding Mechanical Turk as a Participant Pool. Current Directions in Psychological Science 23: 184-188.

Sears, D. 1986. College Sophomores in the Laboratory: Influences of a Narrow Data Base on Social Psychology’s View of Human Nature. Journal of Personality and Social Psychology 51: 515–30.

Shadish, W. R., Cook, T. D., and Campbell, D. T. 2002. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin.




Posted in Peer Review | 1 Comment

An Editor’s Thoughts on the Peer Review Process

[Ed. note: This post is contributed by Sara McLaughlin Mitchell, Professor and Chair of the Department of Political Science at the University of Iowa.]

As academics, the peer review process can be one of the most rewarding and frustrating experiences in our careers. Detailed and careful reviews of our work can significantly improve the quality of our published research and identify new avenues for future research. Negative reviews of our work, while also helpful in terms of identifying weaknesses in our research, can be devastating to our egos and our mental health. My perspectives on peer review have been shaped by twenty years of experience submitting my work to journals and book publishers and by serving as an Associate Editor for two journals, Foreign Policy Analysis and Research & Politics. In this piece, I will 1) discuss the qualities of good reviews, 2) provide advice for how to improve the chances for publication in the peer review process, and 3) discuss some systemic issues that our discipline faces for ensuring high quality peer review.

Let me begin by arguing that we need to train scholars to write quality peer reviews. When I teach upper level graduate seminars, I have students submit a draft of their research paper about one month before the class ends. I then assign two other students as peer reviewers for the papers anonymously and then serve as the third reviewer myself for each paper. I send students examples of reviews I have written for journals and provide general guidelines about what improves the qualities of peer reviews. After distributing the three peer reviews to my students, they have two weeks to revise their papers and write a memo describing their revisions. Their final research paper grade is based on the quality of their work at each of these stages, including their efforts to review classmates’ research projects.[1]

Writing High Quality Peer Reviews

What qualities do good peer reviews share? My first observation is that tone is essential to helping an author improve their research. If you make statements such as “this was clearly written by a graduate student” or “this paper is not important enough to be published in journal X” or “this person knows nothing about the literature on this topic”, you are not being helpful. These kinds of blanket negative statements can only serve to discourage junior (and senior!) scholars from submitting work to peer reviewed outlets.[2] Thus one should always consider what contributions a paper is making to the discipline and then proceed with ideas for making the final product better.

In addition to crafting reviews with a positive tone, I also recommend that reviewers focus on criticisms internal to the project rather than moving to a purely external critique. For example, suppose an author was writing a paper on the systemic democratic peace. An internal critique might point to other systemic research in international relations that would help improve the authors’ theory or identify alternative ways to measure systemic democracy. An external critique, however, might argue that systemic research is not useful for understanding the democratic peace and that the author should abandon this perspective in favor of dyadic analyses. If you find yourself writing reviews where you are telling authors to dramatically change their research question or theoretical perspective, you are not helping them produce publishable research. As an editor, it is much more helpful to have reviews that accept the authors’ research goals and then provide suggestions for improvement. Along these lines, it is very common for reviewers to say things like “this person does not know the literature on the democratic peace” and then fail to provide a single citation for research that is missing in the bibliography.[3] If you think an author is not engaging with an important literature for their topic, help the scholar by citing some of that work in your review. If you do not have time to add full citations, even providing authors’ last names and the years of publication can be helpful.

Another common strategy that reviewers take is to ask for additional analyses or robustness checks, something I find very useful as a reader of scholarly work. However, reviewers should identify new analyses or data that are essential for checking the robustness of the particular relationship being tested, rather than worrying about all other control variables out there in the literature or all alternative statistical estimation techniques for a particular problem. A person reviewing a paper on the systemic democratic peace could reasonably ask for alternative democracy measures or control variables for other major systemic dynamics (e.g. World Wars, hegemonic power). Asking the scholar to develop a new measure for democracy or to test her model against all other major international relations systemic theories is less reasonable. I understand the importance for checking the robustness of empirical relationships, but I also think we can press this too far when we expect an author to conduct dozens of additional models to demonstrate their findings. In fact, authors are anticipating that reviewers will ask for such things and they are preemptively responding by including appendices with additional models. In conversations with my junior colleagues (who love appendices!), I have noted that they are doing a lot of extra work on the front end and getting potentially fewer publications from these materials when they relegate so much of their work to appendices. Had Bruce Russett and John Oneal adopted this strategy, they would have published one paper on the Kantian tripod for peace, rather than multiple papers that focused on different legs of the tripod. I also feel that really long appendices are placing additional burdens on reviewers who are already paying costs to read a 30+ page paper.[4]

Converting R&Rs to Publications

Once an author receives an invitation from a journal to revise and resubmit (R&R) a research paper, what strategies can they take to improve their chances for successfully converting the R&R to a publication? My first recommendation is to go through each review and the editors’ decision letter and identify each point being raised. I typically move each point into a document that will become the memo describing my revisions and then proceed to work on the revisions. My memos have a general section at the beginning that provides an overview of the major revisions I have undertaken and then this is followed by separate sections for the editors’ letter and each of the reviews. Each point that is addressed by the editors or reviewers is presented and then I follow that with information about how I revised the paper in light of that comment and the page number where the revised text or results can be found. It is a good idea to identify criticisms that are raised by multiple reviewers because these issues will be very imperative to address in your revisions. You should also read the editors’ letter carefully because they often provide ideas about which criticisms are most important to address from their perspective. Additional robustness checks that you have conducted can be included in an appendix that will be submitted with the memo and your revised paper.

As an associate editor, I have observed authors failing at this stage of the peer review process. One mistake I often see is for authors to become defensive against the reviewers’ advice. This leads them to argue against each point in their memo rather than to learn constructively from the reviews about how to improve the research. Another mistake is for authors to ignore advice that the editors explicitly provide. The editors are making the final decision on your manuscript so you cannot afford to alienate them. You should be aware of the journal’s approach to handling manuscripts with R&R decisions. Some journals send the manuscript to the original reviewers plus a new reviewer, while other journals either send it back only to the original reviewers or make an in-house editorial decision. These procedures can dramatically influence your chances for success at the R&R stage. If the paper is sent to a new reviewer, you should expect another R&R decision to be very likely.[5]

Getting a revise and resubmit decision is exciting for an author but also a daunting process when one sees how many revisions might be expected. You have to determine how to strike a balance between defending your ideas and revising your work in response to the criticisms you have received in the peer review process. My observation is that authors who are open to criticism and can learn from reviewers’ suggestions are more successful in converting R&Rs to publications.

Peer Review Issues in our Discipline

Peer review is an essential part of our discipline for ensuring that political science publications are of the highest quality possible. In fact, I would argue that journal publishing, especially in the top journals in our field, is one of the few processes where a scholars’ previous publication record or pedigree are not terribly important. My chances of getting accepted at APSR or AJPS have not changed over the course of my career. However, once I published a book with Cambridge University Press, I had many acquisitions editors asking me about ideas for future book publications. There are clearly many books in our discipline that have important influences on the way we think about political science research questions, but I would contend that journal publications are the ultimate currency for high caliber research given the high degree of difficulty for publishing in the best journals in our discipline.[6]

Having said that, I recognize that there are biases in the journal peer review process. One thing that surprised me in my career was how the baseline probability for publishing varied dramatically across different research areas. I worked in some areas where R&R or conditional acceptance was the norm and in other research areas where almost every piece was rejected.[7] For example, topics that have been very difficult for me to publish journal articles on include international law, international norms, human rights, and maritime conflicts. One of my early articles on the systemic democratic piece (Mitchell, Gates, and Hegre 1999) was published in a good IR journal despite all three reviewers being negative; the editor at the time (an advocate of the democratic peace himself) took at a chance on the paper. Papers I have written on maritime conflicts have been rejected at six or more journals before getting a single R&R decision. My work that crosses over into international law also tends to be rejected multiple times because satisfying both political science and international law reviewers can be difficult. Other topics I have written on have experienced more smooth sailing through journal review processes. Work on territorial and cross-border river conflicts has been more readily accepted, which is interesting given that maritime issues are also geopolitical in nature. Diversionary conflict and alliance scholars are quite supportive of each other’s work in the review process. Other areas of my research agenda fall in between these extremes. My empirical work on militarized conflict (e.g. the issue approach) or peaceful conflict management (e.g. mediation) can draw either supportive or really tough reviewers, a function I believe of the large number of potential reviewers in these fields. I have seen similar patterns in advising PhD students. Some students who were working in emerging topics like civil wars or terrorism found their work well-received as junior scholars, while others working on topics like foreign direct investment and foreign aid experienced more difficulties in converting their dissertation research into published journal articles.

Smaller and more insulated research communities can be helpful for junior scholars if the junior members are accepted into the group, as the chances for publication can be higher. On the other hand, some research areas have a much lower baseline publication rate. Anything that is interdisciplinary in my experience lowers the probability of success, which is troubling from a diversity perspective given the tendency for women and minority scholars to be drawn to interdisciplinary research. As noted above, I have also observed that certain types of work (e.g. empirical conflict work or research on gender) face more obstacles in the review process because there are a larger number of potential reviewers, which also increases the risks that at least one person will dislike your research. In more insulated communities, the number of potential reviewers is small and they are more likely to agree on what constitutes good research. Junior scholars may not know the baseline probability of success in their research area, thus it is important to talk with senior scholars about their experiences publishing on specific topics. I typically recommend a portfolio strategy with journal publishing, where junior scholars seek to diversify their substantive portfolio, especially if the research community for their dissertation project is not receptive to publishing their research.

I also think that journal editors have a collective responsibility to collect data across research areas and determine if publication rates vary dramatically. We often report on general subfield areas in annual journal reports, but we do not typically break down the data into more fine-grained research communities. The move to having scholars click on specific research areas for reviewing may facilitate the collection of this information. If reviewers’ recommendations for R&R or acceptance vary across research topics, then having this information would assist new journal editors in making editorial decisions. Once we collect this kind of data, we could also see how these intra-community reviewing patterns influence the long term impact of research fields. Are broader communities with lower probabilities of publication success more effective in the long run in terms of garnering citations to the research? We need additional data collection to assess my hypothesis that baseline publication rates vary across substantive areas of our discipline.

We also need to remain vigilant in ensuring representation of women and minority scholars in political science journals. While women constitute about 30% of faculty in our discipline (Mitchell and Hesli 2013), the publication rate by women in top political science journals is closer to 20% of all published authors (Bruening and Sanders 2007). Much of this dynamic is driven by a selection effect process whereby women spend less time on research relative to their male peers and submit fewer papers to top journals (Allen 1998; Link, Swann, and Bozeman 2008; Hesli and Lee 2011). Journal editors need to be more proactive in soliciting submissions by female and minority scholars in our field. Editors may also need to be more independent from reviewers’ recommendations, especially in low success areas that comprise a large percentage of minority scholars. It is disturbing to me that the most difficult areas for me to publish in my career have been those that have the highest representation of women (even though it is still small!). We cannot know whether my experience generalizes more broadly without collection of data on topics for conference presentations, submissions of those projects to journals, and the average “toughness” of reviewers in such fields. I believe in the peer review process and I will continue to provide public goods to protect it. I also believe that we need to determine if the process is generating biases that influence the chances for certain types of scholars or certain types of research to dominate our best journals.



Allen, Henry L. 1998. “FacultyWorkload and Productivity: Gender Comparisons.” In The NEA Almanac of Higher Education.Washington, DC: National Education Association.

Breuning, Marijke, Jeremy Backstrom, Jeremy Brannon, Benjamin Isaak Gross, and Michael Widmeier. 2015. “Reviewer Fatigue? Why Scholars Decline to Review their Peers’ Work.” PS: Political Science & Politics 48(4): 595-600.

Breuning, Marijke, and Kathryn Sanders. 2007. “Gender and Journal Authorship in Eight Prestigious Political Science Journals.” PS: Political Science and Politics 40(2): 347–51.

Djupe, Paul A. 2015. “Peer Reviewing in Political Science: New Survey Results.” PS: Political Science & Politics 48(2): 346-352.

Hesli, Vicki L., and Jae Mook Lee. 2011. “Faculty Research Productivity: Why Do Some of Our Colleagues Publish More than Others?” PS: Political Science and Politics 44(2): 393–408.

Link, Albert N., Christopher A. Swann, and Barry Bozeman. 2008. “A Time Allocation Study of University Faculty.” Economics of Education Review 27(4): 363–74.

Miller, Beth, Jon Pevehouse, Ron Rogowski, Dustin Tingley, and Rick Wilson. 2013. “How To Be a Peer Reviewer: A Guide for Recent and Soon-to-be PhDs.” PS: Political Science & Politics 46(1): 120-123.

Mitchell, Sara McLaughlin, Scott Gates, and Håvard Hegre. 1999. “Evolution in Democracy-War Dynamics.” Journal of Conflict Resolution 43(6): 771-792.

Mitchell, Sara McLaughlin and Vicki L. Hesli. 2013. “Women Don’t Ask? Women Don’t Say No? Bargaining and Service in the Political Science Profession.” PS: Political Science & Politics 46(2): 355-369.



[1] While I am fairly generous in my grading of students’ peer reviews given their lack of experience, I find that I am able to discriminate in the grading process. Some students more effectively demonstrate that they read the paper carefully, offering very concrete and useful suggestions for improvement. Students with lower grades tend to be those who are reluctant to criticize their peers. Even though I make the review process double blind, PhD students in my department tend to reveal themselves as reviewers of each other’s work in the middle of the semester.

[2] In a nice piece that provides advice on how to be a peer reviewer, Miller et al (2013:122) make a similar point: “There may be a place in life for snide comments; a review of a manuscript is definitely not it.”

[3] As Miller et al (2013:122) note: “Broad generalizations—for instance, claiming an experimental research design ‘has no external validity’ or merely stating ‘the literature review is incomplete’—are unhelpful.”

[4] Djupe’s (2015: 346-347) survey of APSA members shows that 90% of tenured or tenure-track faculty reviewed for a journal in the past calendar year, with the average number of reviews varying by rank (assistant professors-5.5, associate professors-7, and full professors-8.3). In an analysis of review requests for the American Political Science Review, Breuning et al (2015) find that while 63.6% of review requests are accepted, scholars declining the journal’s review requests often note that they are too busy with other reviews. There is reasonable evidence that many political scientists feel overburdened by reviews, although the extent to which extra appendices influence those attitudes is unclear from these studies.

[5] I have experienced this process myself at journals like Journal of Peace Research which send a paper to a new reviewer after the first R&R is resubmitted. I have only experienced three or more rounds of revisions on a journal article at journals that adopt this policy. My own personal preference as an editor is to make the decision in-house. I have a high standard for giving out R&Rs and thus feel qualified to make the final decision myself. One could argue, however, that by soliciting advice from new reviewers, the final published products might be better.

[6] Clearly there are differences in what can be accomplished in a 25-40 page journal article versus a longer book manuscript. Books provide space for additional analyses, in-depth case studies, and more intensive literature reviews. However, many books in my field that have been influential in the discipline have been preceded by a journal article summarizing the primary argument in a top ranked journal.

[7] This observation is based on my own personal experience submitting articles to journals and thus qualifies as more of a hypothesis to be tested rather than a settled fact.

Posted in Peer Review, The Discipline | 4 Comments