The problems with peer review are increasingly recognized across the scientific community. Failures to provide timely reviews often lead to interminable delays for authors, especially when editors force authors to endure multiple rounds of review (e.g., Smith 2014). Other scholars simply refuse to contribute reviews of their own, which recently prompted the AJPS editor to propose a rule stating that he “reserves the right to refuse submissions from authors who repeatedly fail to provide reviews for the Journal when invited to do so” (Jacoby 2015).
Concerns over delays in the publication process have prompted a series of proposals intended to improve the peer review system. Diana Mutz and I recently outlined how frequent flier-type systems might improve on the status quo by rewarding scholars who provide rapid, high-quality reviews (Mutz 2015; Nyhan 2015). Similarly, Chetty, Saez, and Sándor (2014) report favorable results from an experiment testing the effects of requesting reviews on shorter timelines, promising to publish reviewer turnaround times, and offering financial incentives.
While these efforts are worthy, their primary goal is to speed the review process and reduce the frequency that scholars decline to participate rather than to improve the reviews that journals receive. However, there are also reasons for concern about the value of the content of reviews under the status quo, especially given heterogeneous definitions of quality among reviewers. Esarey (N.d.), for instance, uses simulations to show that it is unclear to what extent (if at all) reviews help select the best articles. Similarly, Price (2014) found only modest overlap in the papers selected for a computer science conference when they were assigned to two sets of reviewers (see also Mahoney 1977).
More generally, authors frequently despair not just about timeliness of the reviews they receive but their focus. Researchers often report that reviewers tend to focus on the way that articles are framed (e.g., Cohen 2015; Lindner 2015). These anecdotes are consistent with the findings of Goodman et al. (1994), who estimate that three of the five areas where medical manuscripts showed statistically significant improvements in quality after peer review were related to framing (“discussion of limitations,” “acknowledgement and justification of generalizations,” and “appropriateness of the strength or tone of the conclusions”). While sometimes valuable, these suggestions are largely aesthetic and can often bloat published articles, especially in social science journals with higher word limits. Useful suggestions for improving measurement or statistical analyses are seemingly more rare. One study found, for instance, that authors were most likely to cite their discussion sections as having been improved by peer review; methodology and statistics were much less likely to be cited (Mulligan, Hall, and Raphael 2013).
Why not try to shift the focus of reviews in a more valuable direction? I propose that journals try to nudge reviewers to focus on areas where they can most effectively improve the scientific quality of the manuscript under consideration using checklists, which are being adopted in medicine after widespread use in aviation and other fields (e.g., Haynes et al. 2009; Gawande 2009). In this case, reviewers would be asked to check off a set of yes or no items indicating that they had assessed whether both the manuscript and their review meet a set of specific standards before their review would be complete. (Any yes answers would prompt the journal website to ask the reviewer to elaborate further.) This process could help bring the quality standards of reviews into closer alignment.
The specific checklist items listed below have two main goals. First, they seek to reduce the disproportionate focus on framing, minimize demands on authors to include reviewer-appeasing citations, and deter unreasonable reviewer requests. Second, the items on the checklists seek to cue reviewers to identify and correct recurring statistical errors and to also remind them to avoid introducing such mistakes in their own feedback. Though this process might add a bit more time to the review process, the resulting improvement in review quality could be significant.
- Does the author properly interpret any interaction terms and include the necessary interactions to test differences in relationships between subsamples? (Brambor, Clark, and Golder 2006)
- Does the author interpret a null finding as evidence that the true effect equals 0 or otherwise misinterpret p-values and/or confidence intervals? (Gill 1999)
- Does the author provide their questionnaire and any other materials necessary to replicate the study in an appendix?
- Does the author use causal language to describe a correlational finding?
- Does the author specify the assumptions necessary to interpret their findings as causal?
- Does the author properly specify and test a mediation model if one is proposed using current best practices? (Imai, Keele, and Tingley 2010)
- Does the author control for or condition on variables that could be affected by the treatment of interest? (Rosenbaum 1984; Elwert and Winship 2014)
- Does the author have sufficient statistical power to test the hypothesis of interest reliably? (Simmons 2014)
- Are any subgroup analyses adequately powered and clearly motivated by theory rather than data mining?
- Did you request that a control variable be included in a statistical model without specifying how it would confound the author’s proposed causal inference?
- Did you request any sample restrictions or control variables that would induce post-treatment bias? (Rosenbaum 1984; Elwert and Winship 2014)
- Did you request a citation to or discussion of an article without explaining why it is essential to the author’s argument?
- Could the author’s literature review be shortened? Please justify specifically why any additions are required and note areas where corresponding reductions could be made elsewhere.
- Are your comments about the article’s framing relevant to the scientific validity of the paper? How specifically?
- Did you request a replication study? Why is one necessary given the costs to the author?
- Does your review include any unnecessary or ad hominem criticism of the author?
- If you are concerned about whether the sample is sufficiently general or representative, did you provide specific reasons why the author’s results would not generalize and/or propose a feasible design that would overcome these limitations? (Druckman and Kam 2011; Aronow and Samii 2015)
- Do your comments penalize the authors in any way for null findings or suggest ways they can find a significant relationship?
- If your review is positive, did you explain the contributions of the article and the reasons you think it is worthy of publication in sufficient detail? (Nexon 2015)
I thank John Carey, Justin Esarey, Jacob Montgomery, and Thomas Zeitzoff for helpful comments.
Aronow, Peter M., and Cyrus Samii. 2015. “Does Regression Produce Representative Estimates of Causal Effects?” American Journal of Political Science.
Brambor, Thomas, William Roberts Clark, and Matthew Golder. 2006. “Understanding Interaction Models: Improving Empirical Analyses.” Political Analysis 14 (1): 63–82.
Chetty, Raj, Emmanuel Saez, and László Sándor. 2014. “What Policies Increase Prosocial Behavior? An Experiment with Referees at the Journal of Public Economics.” Journal of Economic Perspectives 28 (3): 169–188.
Cohen, Philip N. 2015. “Our broken peer review system, in one saga.” October 5, 2015. Accessed November 3, 2015 from https://familyinequality.wordpress.com/2015/10/05/our-broken-peer-review-system-in-one-saga/.
Druckman, James N., and Cindy D. Kam. 2011. “Students as Experimental Participants.” In Cambridge Handbook of Experimental Political Science, ed. James N. Druckman, Donald P. Green, James H. Kuklinski, and Arthur Lupia. Cambridge University Press.
Elwert, Felix, and Christopher Winship. 2014. “Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable.” Annual Review of Sociology.
Esarey, Justin. N.d. “How Does Peer Review Shape Science? A Simulation Study of Editors, Reviewers, and the Scientific Publication Process.” Unpublished manuscript. URL: http://jee3.web.rice.edu/peer-review.pdf
Gawande, Atul. 2009. The checklist manifesto: How to get things right. New York: Metropolitan Books.
Gill, Jeff. 1999. “The Insignificance of Null Hypothesis Significance Testing.” Political Research Quarterly 52 (3): 647–674.
Goodman, Steven N., Jesse Berlin, Suzanne W. Fletcher, and Robert H. Fletcher. 1994. “Manuscript quality before and after peer review and editing at Annals of Internal Medicine.” Annals of Internal Medicine 121 (1): 11–21.
Haynes, Alex B., Thomas G. Weiser, William R. Berry, Stuart R. Lipsitz, AbdelHadi S. Breizat, E. Patchen Dellinger, Teodoro Herbosa, Sudhir Joseph, Pascience L. Kibatala, Marie Carmela M. Lapitan et al. 2009. “A surgical safety checklist to reduce morbidity and mortality in a global population.” New England Journal of Medicine 360 (5): 491–499.
Imai, Kosuke, Luke Keele, and Dustin Tingley. 2010. “A general approach to causal mediation analysis.” Psychological methods 15 (4): 309–334.
Jacoby, William G. 2015. “Editor’s Midterm Report to the Executive Council of the Midwest Political Science Association.” August 27, 2015. Accessed November 5, 2015 from https://ajpsblogging.files.wordpress.com/2015/09/ajps-2015-midyear-report-8-27-15.pdf.
Lindner, Andrew M. 2015. “How the Internet Scooped Science (and What Science Still Has to Offer).” March 17, 2015. Accessed November 4, 2015 from https://academics.skidmore.edu/blogs/alindner/2015/03/17/how-the-internet-scooped-science-and-what-science-still-has-to-offer/.
Mahoney, Michael J. 1977. “Publication prejudices: An experimental study of confirmatory bias in the peer review system.” Cognitive therapy and research 1 (2): 161–175.
Mulligan, Adrian, Louise Hall, and Ellen Raphael. 2013. “Peer review in a changing world: An international study measuring the attitudes of researchers.” Journal of the American Society for Information Science and Technology 64 (1): 132–161.
Mutz, Diana C. 2015. “Incentivizing the manuscript-review system using REX.” PS: Political Science & Politics 48 (S1): 73–77.
Nexon, Daniel. 2015. “Notes for Reviewers: The Pitfalls of a First Round Rave Review.” November 4, 2015. Accessed November 5, 2015 from http://www.isanet.org/Publications/ISQ/Posts/ID/4918/Notes-for-Reviewers-The-Pitfalls-of-a-First-Round-Rave-Review.
Nyhan, Brendan. 2015. “Increasing the Credibility of Political Science Research: A Proposal for Journal Reforms.” PS: Political Science & Politics 48 (S1): 78– 83.
Price, Eric. 2014. “The NIPS Experiment.” December 15, 2014. Accessed November 4, 2015 from http://blog.mrtz.org/2014/12/15/the-nips-experiment.html.
Rosenbaum, Paul R. 1984. “The Consequences of Adjustment for a Concomitant Variable That Has Been Affected by the Treatment.” Journal of the Royal Statistical Society, Series A (General) 147 (5): 656–666.
Simmons, Joe. 2014. “MTurk vs. the lab: Either way we need big samples.” April 4, 2014. Accessed November 6, 2015 from http://datacolada.org/2014/04/04/18-mturk-vs-the-lab-either-way-we-need-big-samples/.
Smith, Robert Courtney. 2014. “Black Mexicans, Conjunctural Ethnicity, and Operating Identities Long-Term Ethnographic Analysis.” American Sociological Review 79 (3): 517–548.