By the Numbers: Toward More Precise Numerical Summaries

By Gaurav Sood and Andrew Guess

Unlike the natural sciences, there are few true zeros in the social sciences (p. 960, Gelman, 2011). All sorts of variables are often weakly related to each other. And however lightly social scientists, exogenous events, or other actors intervene, effects of those interventions are rarely precisely zero. When there are few true zeros, categorical statements, like “two variables are significantly related” or “the intervention had a significant effect,” convey limited information—about sample size and luck (and manufactured luck). Yet these kinds of statements are the norm in abstracts of the top political science journal, the American Political Science Review (APSR). As we show later, only 10% of the empirical articles in recent volumes of APSR have abstracts with precise quantitative statements. The comparable number for the American Economic Review (AER) is 35%.

Informal inspection also suggests that coarse descriptions are common in other sections of social science articles, aside from being exceedingly frequent in the social science vernacular. (We would like to quantify both.) Studies are often summarized as “the study shows this broad phenomenon exists.” For instance, Pasek, Sood, and Krosnick (2015) write, “Moreover, much past research shows that guessing is often biased.” Surprisingly, and more problematically, comparison of results often enough also takes the same form. For instance, a study on motivated learning contextualizes the results as follows: “Our finding is consistent with other research showing that even when people agree on factual information, they often still interpret the information in a motivated manner” (from Khanna and Sood, 2015). The phrase “finding is consistent with” yields nearly 300,000 results on Google Scholar. And informal inspection suggests that “consistent” covers a surprisingly large range of effect sizes and measures.

None of this is to say that the decision to summarize imprecisely is made without deliberation. Undoubtedly, some resort to coarse summaries because they are not confident about their theories, measures, or models. Others likely use coarse summaries because they think coarse summaries are a more effective way to communicate. In fact, there is some support for the latter thesis. A survey of undergraduate and graduate students found that 77% of the students thought that people preferred to receive verbal expressions of uncertainty over numerical expressions in their everyday lives (Wallsten et al., 1993). (However, the paper also notes that 85% of the students felt comfortable switching to another mode if they thought the other mode suited their needs better.)

Our hunch, however, is that the most common form of coarse summaries in scientific communication—categorical statements around statistical significance—arise as a natural consequence of scientists thinking in terms of the Null Hypothesis Statistical Testing (NHST) framework, which in turn is likely underpinned by a Popperian understanding of science (Gelman and Shalizi, 2013). For instance, hypotheses written in the form of coarse categorical statements around statistical significance, such as, “X will be significantly associated with Y,” are exceedingly frequent. These kinds of hypotheses reflect an understanding of science in which scientific progress comes from falsification rather than from improvements in measurement.

Whatever the root cause, the use of coarse summaries likely leads to serious problems. First, coarse summarizations risk misinterpretation. Partly because the mapping between verbal phrases and numerical ranges varies between communicators and recipients (Capriotti and Waldrup, 2011), recipients map the same verbal expression to very different numbers (Beyth-Marom, 1982; Bocklisch et al., 2010; Brun and Teigen, 1988; Simpson, 1944, 1963). For instance, Beyth-Marom (1982) elicited numerical mappings of 30 verbal expressions on a 100-point scale and found that the average inter-quartile range of the numerical mapping for a phrase was 14.4. Analogous numbers—standard deviations of the numerical mappings—from Brun and Teigen (1988), based on 27 phrases, and Bocklisch et al. (2010), based on 13 phrases, were 14.2% (translated from a 0–6 scale) and 11.15%, respectively. Relatedly, mapping numerical ranges to verbal phrases in a way that minimizes misclassification error still yields an error rate of nearly 28% (Elsaesser and Henrion, 2013) (see also Bocklisch et al. 2010).

Not only are the mappings variable but the variation is also systematic. Numerical mappings of verbal phrases vary systematically as a function of the phrases used and the characteristics of the recipient. Prominently, numerical mappings of verbal phrases about infrequent events (e.g., “seldom,” “rarely,” “uncommon”) tend to be much less reliable (Wallsten, Fillenbaum, and Cox, 1986). Interpretation of vague verbal summaries is also subject to cognitive errors. The vaguer a statement, the lower the information, but also the greater the opportunity to fill in the missing detail. And it is plausible, perhaps likely, that people do not resist the opportunity to impute, using common heuristics such as overweighting accessible information and interpreting evidence in a way that confirms prior beliefs (Nickerson, 1998; Tomz and Van Houweling, 2009; Brun and Teigen, 1988; Wright, Gaskell and O’Muircheartaigh, 1994).

Use of such cognitive shortcuts is liable to lead to systematic biases in inferences. For instance, in the extreme, confirmation bias implies that people will read an uninformative vague statement as evidence that their priors are correct. It follows that on reading such a statement, people will walk away with yet greater certainty about their priors. For instance, a person who initially believes that a law allowing concealed carry would increase gun crime may optimistically conclude after reading a study summary reporting a positive effect (“the study shows that laws allowing concealed carry increase gun crime”) that allowing concealed carry increases gun crime by 20%. The same person reading a summary reporting the opposite effect (“the study shows that laws allowing concealed carry decrease gun crime”) may optimistically conclude that the decline is real but of a much lower magnitude.

Finally, coarse summaries may lead to erroneous inferences because of how people use language ordinarily. For instance, a person with flat priors about selective exposure may reasonably interpret a vague summary (“people engage in selective exposure”) as implying that most people read news stories from sources that they think are aligned with their party. A more precise numerical statement of the sort that gives the proportion of news stories consumed from ideologically congenial sources would preempt the risk of such misinterpretation.

Besides misinterpretations of topical effect sizes, coarse summarizations also risk conveying misleading ways of thinking about science—as falsification or simple directional claims rather than as a constant effort to obtain less biased and more precise estimates of actual quantities of interest. Presenting more precise estimates may instill in readers a better appreciation of the point that Donald Green made in an interview in the aftermath of the LaCour scandal: “That’s what makes the study interesting. Everybody knows that there’s some degree of truth in these propositions, and the reason you do an experiment is you want to measure the quantity.”[1]

Making more precise numerical statements may also improve how we understand the results of studies. And over the longer term, by making us think more carefully about our priors, precise numerical summaries may improve how we think about science and interpret scientific results. For instance, presenting precise numerical summaries in abstracts may help us to more quickly filter studies in which results appear “too big.”[2]

With this preface, we proceed to examine the frequency of vague judgments in social science abstracts.

How Common Are Precise Numerical Summaries in Abstracts?

To assess how common coarse summaries are vis-à-vis more precise numerical summaries of results, we coded 310 abstracts—117 APSR, 100 AER, and 93 AER Papers & Proceedings (AER P & P). The AER and AER P & P samples span June 2014–June 2016 (Vol. 104, 6 through Vol. 105, 6), while the sample of APSR abstracts spans February 2013–May 2015 (Vol. 107, 1 through Vol. 109, 2). Given we are only interested in articles in which results can be summarized precisely, we subset on empirical papers, which leaves us with 66 AER, 68 AER P & P, and 81 APSR abstracts.

What we mean by precise numerical summaries of results deserves careful attention. Precision is on a continuum, with summaries ranging from very imprecise to very precise. But for clarity and convenience, our coding scheme captures only one end of the scale. We code summaries of results that take the following form as precise: “A% change in X caused a B% change in Y” or “the intervention caused B% change in Y.” For instance, we code the following statements as precise: “The average proportion of ‘no’ votes is about 40% higher for applicants from (the former) Yugoslavia and Turkey,” “I find that a one-percentage-point increase in the personal vote received by a gubernatorial candidate increases the vote share of their party’s secretary of state and attorney general candidates by 0.1 to 0.2 percentage points.” The complementary set includes statements like: “increasing numbers of armed military troops are associated with reduced battlefield deaths,” “We find support for these arguments using original data from Uganda,” etc.

Only about 10% of the empirical articles in recent volumes of APSR have abstracts with precise quantitative statements, similar to the percentage for AER P & P. The comparable number for AER is 35% (see Figure 1). None of the numbers are appealing, but the numbers for APSR stand out.

Figure 1: Proportion of Precise Numerical Statements in Abstracts of Empirical Papers in the APSR, AER, and the AER P & P

apsr_aer

The frequency of coarse summaries of empirical results in abstracts is, however, an imperfect indicator of the dominance of NHST inspired reasoning. The disparity between APSR and AER likely also stems from a lack of widely understood measures in political science. For instance, in Economics, variables like unemployment, inflation, GDP, etc. are widely understood and studied. In political science, only a few variables like turnout come close to being widely understood.

In all, the data shed much needed, but still weak, light on the issue. It is our hope, however, that this note will stimulate discussion about social scientific writing, and increase efforts to address (what we contend is) the root cause of a particularly common coarse description—categorical statements around statistical significance.

Acknowledgment: We thank Justin Esarey, Andrew Gelman, Don Green, Kabir Khanna, Brendan Nyhan, and Daniel Stone for useful comments. The data and the scripts behind the analysis presented here can be downloaded at: https://github.com/soodoku/quant-discipline/tree/master/tpm

[1] “An Interview With Donald Green, the Co-Author of the Faked Gay-Marriage Study.” Jesse Singal. New York Magazine. Published on May 21, 2015. http://nymag.com/scienceofus/2015/05/co-author-of-the-faked-study-speaksout.html. Green was giving the interview in the aftermath of revelations that Green’s co-author had fabricated data in a highly-publicized study on the persuasive effects of canvassing on attitudes toward gays (Broockman, Kalla, and Aronow, 2015; McNutt, 2015).

[2] Pushing at an open door: When can personal stories change minds on gay rights? Andrew Gelman. Monkey Cage. The Washington Post. Published on December 19, 2014.

 

References

Beyth-Marom, Ruth. 1982. “How probable is probable? A numerical translation of verbal probability expressions.” Journal of forecasting 1(3):257–269.

Bocklisch, Franziska, Steffen F Bocklisch, Martin RK Baumann, Agnes Scholz and Josef F Krems. 2010. The role of vagueness in the numerical translation of verbal probabilities: A fuzzy approach. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society. pp. 1974–1979.

Broockman, David, Joshua Kalla and Peter Aronow. 2015. “Irregularities in LaCour (2014).” Work. pap., Stanford Univ. http://stanford. edu/ dbroock/broockman_kalla_aronow_lg_irregularities. pdf .

Brun, Wibecke and Karl Halvor Teigen. 1988. “Verbal probabilities: ambiguous, context-dependent, or both?” Organizational Behavior and Human Decision Processes 41(3):390–404.

Capriotti, Kim and Bobby E Waldrup. 2011. “Miscommunication of uncertainties in financial statements: a study of preparers and users.” Journal of Business & Economics Research (JBER)

3(1).

Elsaesser, Christopher and Max Henrion. 2013. “How Much More Probable is” Much More Probable”? Verbal Expressions for Probability Updates.” arXiv preprint arXiv:1304.1501 .

Gelman, Andrew. 2011. “Causality and Statistical Learning1.” American Journal of Sociology 117(3):955–966.

Gelman, Andrew and Cosma Rohilla Shalizi. 2013. “Philosophy and the practice of Bayesian statistics.” British Journal of Mathematical and Statistical Psychology 66(1):8–38.

Khanna, Kabir and Gaurav Sood. 2015. “Motivated Learning or Motivated Responding? Using

Incentives to Distinguish Between the Two Processes.” Typescript .

McNutt, Marcia. 2015. “Editorial retraction.” Science p. aaa6638.

Nickerson, Raymond S. 1998. “Confirmation bias: A ubiquitous phenomenon in many guises.” Review of general psychology 2(2):175.

Pasek, Josh, Gaurav Sood and Job Krosnick. 2015. “Misinformed About the Affordable Care Act?

Leveraging Certainty to Assess the Prevalence of Misinformation.” Journal of Communication .

Simpson, Ray H. 1944. “The specific meanings of certain terms indicating differing degrees of

frequency.”.

Simpson, Ray H. 1963. “Stability in meanings for quantitative terms: A comparison over 20 years.” Quarterly Journal of Speech 49(2):146–151.

Tomz, Michael and Robert P Van Houweling. 2009. “The electoral implications of candidate ambiguity.” American Political Science Review 103(01):83–98.

Wallsten, Thomas S, David V Budescu, Rami Zwick and Steven M Kemp. 1993. “Preferences and reasons for communicating probabilistic information in verbal or numerical terms.” Bulletin of the Psychonomic Society 31(2):135–138.

Wallsten, Thomas S, Samuel Fillenbaum and James A Cox. 1986. “Base rate effects on the interpretations of probability and frequency expressions.” Journal of Memory and Language 25(5):571–587.

Wright, Daniel B, George D Gaskell and Colm A O’Muircheartaigh. 1994. “How much is ‘quite a bit’? Mapping between numerical values and vague quantifiers.” Applied Cognitive Psychology

8(5):479–496.

Posted in Uncategorized | Leave a comment

2017 Joint Quantitative Political Science Conference for Asia and Australasia

The first Joint Quantitative Political Science Conference for Asia and Australasia was held from January 9th to 11th at the University of Sydney, Australia. This combined the 4th annual Asian Political Methodology conference (Asian Polmeth IV) and the 5th annual meeting of the Australian Society for Quantitative Political Science (ASQPS V). The conference was co-sponsored by: the Program for Quantitative and Analytical Political Science, Princeton University; the School of Social and Political Sciences and the United States Studies Centre, University of Sydney; the School of Politics and International Relations, Australian National University; and the School of Social and Political Sciences, University of Melbourne. Participants were welcomed by Professor Simon Tormey, Head of SSPS at University of Sydney, who opened the conference.

The Asian Polmeth meeting is devoted to promoting research and education in quantitative social sciences in Asia, and providing opportunities for individual quantitative social scientists and local research communities around the globe to exchange research ideas and build academic ties. ASQPS is committed to the promotion and development of quantitative research in political science in Australia and the wider region, with an emphasis on expanding the use of quantitative methods, supporting the training of PhD students, and fostering the community of active quantitative researchers and their connections with scholars around the world.

The Program Committee issued a Call for Papers in March 2016 and received 2 panel proposals, 39 paper proposals, and 14 poster proposals by September 2016. The final program included 36 papers in 15 panels, and 13 poster presentations.  Conference attendance including presenters and non-presenters totalled 71 people. Participants were from Australia, Canada, China, Germany, Japan, South Korea, the United Arab Emirates, the United Kingdom, and the United States (the online program can be found here: http://asiapolmeth.princeton.edu/online-program ).

The keynote speaker was Professor Douglas Rivers of Stanford University, whose talk, “Estimating the Mean Square Error of a Survey with Unknown Selection Bias,” focused on the challenges of using survey data for analysis or prediction in an era of low survey response rates and changing patterns in political participation. The talk was a highlight of the conference, emphasizing cutting-edge approaches to vexing problems within and beyond academia. It was an honor to host Prof. Rivers.

The conference papers and posters focused on new research in and applications of quantitative methods in political science. Methodology topics covered included estimating treatment effects, text as data, prediction, causal inference from observational data, and dimensionality and measurement. Applied topics included intra- and inter-state conflict, parties and elections, human rights, and international political economy. There was also a special panel on Australian Survey Data and Data Sources organized by Professor Ian McAllister of the Australian National University. The poster sessions provided many opportunities for PhD students and junior scholars to discuss their research and receive feedback from peers and senior colleagues.

On January 8th there were two excellent pre-conference workshops for PhD students and early career researchers. Professor Leonie Huddy (SUNY Stony Brook) led a workshop on Population Survey Experiments and Professor Stanley Feldman (SUNY Stony Brook) led a workshop on Measurement Theory. Sponsored by ASQPS, these were attended by 18 students and faculty.

The Program Committee for this conference was composed of Benjamin E. Goldsmith (University of Sydney; 2017 host and organizer), Svitlana Chernykh (Australian National University), Fang-Yi Chiou (Academia Sinica), Richard Frank (Australian National University), Kentaro Fukumoto (Gakushuin University), Leonie Huddy (SUNY Stony Brook), Kosuke Imai (Princeton University), Aaron Martin (University of Melbourne), Xun Pang (Tsinghua University), Jong Hee Park (Seoul National University), Shaun Ratcliff (University of Sydney), and Shawn Treier (Australian National University).

The Asian Political Methodology Conference was first organized by Kosuke Imai in 2013, and the first ASQPS meeting was organized by Ben Goldsmith, Aaron Martin, and Yusaku Horiuchi in 2012.

The 2018 Asian Polmeth Meeting will be held in Seoul, South Korea in January. The next ASQPS meeting will be held in Wellington, New Zealand, in December 2017.

We look forward to seeing you in Seoul and Wellington!

Ben Goldsmith is Associate Professor, Department of Government & International Relations, University of Sydney, Australia.

Conference Day 1

photo1photo2

Conference Day 2

photo3photo4

Conference Day 3

photo5photo6

 

 

Posted in Uncategorized | Leave a comment

IMC: Women Also Know Stuff, “A Discussion of the Role of Gender in Political Science with Women Also Know Stuff — Methodology and Beyond” Friday 11/18 at noon

This Friday, November 18 at noon Eastern time, we will host a panel discussion with Women Also Know Stuff (Emily Beaulieu, Amber Boydstun, Yanna Krupnikov, Melissa Michelson, Christina Wolbrecht). The panel is titled “A Discussion of the Role of Gender in Political Science with Women Also Know Stuff — Methodology and Beyond.”

To tune in to the presentation and participate in the discussion after the talk, visit http://www.methods-colloquium.com/ and click “Watch Now!” on the day of the talk. To register for the talk in advance, click here:

https://riceuniversity.zoom.us/webinar/register/355419004757de0ccde7dc3c8da9331e

The IMC uses Zoom, which is free to use for listeners and works on PCs, Macs, and iOS and Android tablets and phones. You can be a part of the talk from anywhere around the world with access to the Internet. The presentation and Q&A will last for a total of one hour.

Posted in Uncategorized | Leave a comment

IMC: Pete Mohanty, “Messy Data, Robust Inference? Navigating Obstacles to Inference with bigKRLS” Friday 11/11 at noon

This Friday, November 11th at noon Eastern time, the International Methods Colloquium will host a presentation by Pete Mohanty of Stanford University. Pete’s presentation is titled “Messy Data, Robust Inference? Navigating Obstacles to Inference with bigKRLS.” The presentation will draw on the following paper, joint work with Robert B. Shaffer of the University of Texas:

http://jee3.web.rice.edu/mohanty_shaffer_bigKRLS_paper.pdf

To tune in to the presentation and participate in the discussion after the talk, visit http://www.methods-colloquium.com/ and click “Watch Now!” on the day of the talk. To register for the talk in advance, click here:

https://riceuniversity.zoom.us/webinar/register/fbbce4af80db33507510d14dfea9e911

The IMC uses Zoom, which is free to use for listeners and works on PCs, Macs, and iOS and Android tablets and phones. You can be a part of the talk from anywhere around the world with access to the Internet. The presentation and Q&A will last for a total of one hour.

Posted in Uncategorized | Leave a comment

IMC: Marc Ratkovic, “Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Designs” this Friday, 11/4 at noon Eastern

This Friday, November 4th at noon Eastern time, the International Methods Colloquium will host a presentation by Marc Ratkovic of Princeton University. Marc’s presentation is titled “Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Designs.” The presentation will draw on the following paper, joint work with Dustin Tingley of Harvard University:

https://www.princeton.edu/~ratkovic/public/sparsereg.pdf

To tune in to the presentation and participate in the discussion after the talk, visit http://www.methods-colloquium.com/ and click “Watch Now!” on the day of the talk. To register for the talk in advance, click here:

https://riceuniversity.zoom.us/webinar/register/c52325f3b06b75587c24e00bf0acd2b8

The IMC uses Zoom, which is free to use for listeners and works on PCs, Macs, and iOS and Android tablets and phones. You can be a part of the talk from anywhere around the world with access to the Internet. The presentation and Q&A will last for a total of one hour.

Posted in Uncategorized | Leave a comment

IMC: Jacob Montgomery, “Human computation scaling for measuring meaningful latent traits in political texts”

This Friday, October 28th at noon Eastern time, the International Methods Colloquium will host a presentation by Jacob Montgomery of Washington University in St. Louis. Professor Montgomery’s presentation is titled “Human computation scaling for measuring meaningful latent traits in political texts.” The presentation will draw on a paper of the same name, co-authored with David Carlson:

http://jee3.web.rice.edu/SentimentIt.pdf

To tune in to the presentation and participate in the discussion after the talk, visit http://www.methods-colloquium.com/ and click “Watch Now!” on the day of the talk. To register for the talk in advance, click here:

https://riceuniversity.zoom.us/webinar/register/be56c6d1438e3d478c34be5db4a05ad8

The IMC uses Zoom, which is free to use for listeners and works on PCs, Macs, and iOS and Android tablets and phones. You can be a part of the talk from anywhere around the world with access to the Internet. The presentation and Q&A will last for a total of one hour.

Posted in Uncategorized | Leave a comment

IMC: Sarah Bouchat, “Engaging Experts: Dealing with Divergent Elicited Priors in Political Science” this Friday, 10/21 at 12:00 PM Eastern

This Friday, October 21st at noon Eastern time, the International Methods Colloquium will host a presentation by Sarah Bouchat of the University of Wisconsin-Madison. Sarah’s presentation is titled “Engaging Experts: Dealing with Divergent Elicited Priors in Political Science.” The presentation will draw on this paper (of the same title):

http://bouchat.github.io/Bouchat-IMCpaper.pdf

To tune in to the presentation and participate in the discussion, visit http://www.methods-colloquium.com/ and click “Watch Now!” on the day of the talk. To register for the talk in advance, click here:

https://riceuniversity.zoom.us/webinar/register/c15c083b1f333260dc2040ba88984b7b

The IMC uses Zoom, which is free to use for listeners and works on PCs, Macs, and iOS and Android tablets and phones. You can be a part of the talk from anywhere around the world with access to the Internet. The presentation and Q&A will last for a total of one hour.

Posted in Uncategorized | Leave a comment