Tim Harford headshot and Data Detective book cover

The Data Detective: notes from Tim Harford’s 2020 guide on numeracy

Our preconceptions warp our interpretation about statistics, and our political decisions shape what data we even collect.

It can seem statistics aren’t worth trouble. But statistics show us things we cannot see any other way, like the human or anecdotal scale. So goes economics journalist Tim Harford’s 2020 celebration of and guide through statistical analysis for the layperson, the book The Data Detective: Ten Easy Rules To Make Sense of Statistics.

“It’s easy to lie with statistics,” goes the line that gets attributed to American mathematician Frederick Mosteller (1916-2006). “But it’s even easier to lie without them.”

I’ve read a few of Harford’s books, which are friendly, fun and readable. They’re full of stories and collected wisdom for those interested in overcoming statistical trickery.

Below I share my notes from the book for future reference.

My notes:

  • The 1954 book “How to Lie with Statistics” is a classic of the form, but author worries it made statistics seem like a “stage magician’s trick”. Why was our bestselling statistics book entirely about disinformation and not how to use them properly?
  • That same year (1954), Hill and Doll showed cigarettes caused lung cancer by using excellent statistical analysis — many thought cars were cause of lung cancer. They did methodical and important work we take for granted
  • “Good statistics are not a trick, Although they are a kind of magic”
  • Steve Bannon: Flood the zone with shit
  • What’s worse: a world where people will believe anything? Or one in which people will believe nothing?
  • Darrell Huff was later paid by tobacco lobby to compare smoking to storks and babies (big places have lots of both)
  • “It’s easy to lie with statistics — but it’s even easier to lie without them.” Disputed, but often credited to Frederick Mosteller (19)
  • Ziva Kunda showed women who drink coffee were less believing of research about women who drank coffee
  • Guy Mayraz 2011: wishful thinking study
  • Motivated reasoning: Linda Babcock and George Löwenstein
  • Moliere: “A learned fool is more foolish than an ignorant one”
  • Palmer and Crandall: showed fake research to test how people responded against their political will
  • Taber and Lodge: better informed people are more likely (and able) to evade information contrary to their opinions  
  • Often liberals trust their scientists and conservatives trust their economists but not the other way around
  • Naive realism
  • Daniel Kahneman writes in his book Thinking Fast and Slow: “When faced with a difficult question, we often answer an easier one instead, usually without noticing the substitution.” His “Fast Statistics” include news reports of extreme events and simple shocking stories that we rely on instead of difficult questions
  • Economist Frederick Hayek had a phrase for the kind of awareness that is hard to capture in metrics in maps: the “knowledge of the particular circumstances of time and place “
  • Charles Goodheart wrote in 1975 “any observed statistical regularity will tend to collapse. Once pressure is placed upon it for control purposes“ or, as it is sometimes shortened “when a measure becomes a target, it ceases to be a good measure.“
  • Donald T Campbell, the psychologist wrote “the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be the distort and corrupt the social processes it is intended to monitor “
  • Micro finance leader Muhammad Yunus writes about balancing  the “bird’s eye view” and “worm’s eye view” (this is just like journalism!)
  • Anna Rosling: Dollar Street combined both personal and data
  • Hans Rosling, a famous statistician (who also happened to be Anna Rosling’s father in law): “Numbers will never tell the full story of what life on earth is all about”
  • Some of the infant mortality difference between countries and the United States is attributed to the fact that pregnancy ended at 22 weeks is called a death rather than a miscarriage like in Finland (Live births and death rather than miscarriage)
  • Steven Pinker: The “curse of knowledge” keeps many from communicating clearly (Though a bit of recent research noted that even lawyers are confused by legalese, which is better understood as habit and copy-and-edit)
  • Income inequality numbers: all of the people in the world with $0 net worth just keep totaling zero.
  • Income inequality : 50/10 split vs other indicators tell you different things. But income likely more helpful than wealth inequality. In UK between 1990 and 2017, top 1% saw income rise and poor households caught up to the middle. Is that good? That’s why data analysis is so messy
  • In, 1965 Johan Galtung and Mari Ruhe showed amount of publication will impact what is considered newsworthy (daily publishers chose differently than weekly )
  • With time-scale in mind, Max Roser (Our World in Data) then conceived of 50-year newspapers, or publishing every half-century to consider what would be included and what wouldn’t
  • News doesn’t focus on the negative but the novel, and since we are overconfident it’s the negative that is more often novel
  • In his 2018 book Factfulness, Hans Rosling writes of “the negativity instinct” for us to pursue negative news because it surprises us
  • Steven Pinker argues negative news more often happens suddenly; good news is a slower. Amos Tvsrsky asked Pinker to imagine the best and worst things that could happen to guy and one is more sudden than the other
  • Daniel Kahneman says that overconfidence is “the most significant of the cognitive biases”
  • Sheena Iyengar and Mark Lepper: Research that set up a roadside stall to sell jam; one version with fewer options sold more than more options (though it attracted more visitors)
  • Abraham Wald and the famous WW2 story of the survivorship bias of planes (don’t repair where the bullet holes are on the planes that return but consider the ones that don’t come back)
  • Journal of Personality’s publication bias is like the 2016 $55k potato salad on Kickstarter: Something so wildly unusual that it is wildly reported gives an inaccurate view of the world
  • Reproducibility crisis: Nosek’s replication failure showed only 39 of 100 psychology experiments replicate
  • Derren Brown filmed flipping coins 10 heads in a row because he filmed for 8 hours
  • HARKing: hypothesizing after results known (120)
  • Burton Malkiel’s research: remove survivorship bias of investment funds by including the ones that fail and the industry’s returns are more mediocre
  • Cochran e Collaboration
  • The Asch conformity experiments at Swarthmore in 1951: famous, though its subjects were all white men (though women attended the school). Diverse opinions allow for dissent but his experiments didn’t show that
  • Bias of WEIRD subjects (Western Educated Industrialized Rich Democracies)
  • Sampling error vs sampling bias: The classic Gallup vs Literary Digest prediction of the FDR election; Gallup had a far smaller sample size, but it weighted for a more accurate representation of the country
  • David Hand: dark data
  • Our societal relationship to data shifted in the 2010s. For example in 2013, Big Data was an optimistic book and by 2016 Cathy O’Neil’s more pessimistic Weapons of Math Destruction (which referenced Technical.ly reporting)
  • “In 2013, the relatively few people who were paying attention to big data often imagined themselves to be the carpenters; by 2016, many of us realized that we were nails.”
  • In 2011, DC schools fired 206 teachers based on an algorithm (as referenced in Cathy O’Neill book)
  • The rule-of-thumb of a standard 98.6 Fahrenheit body temp is from an early data set but it wasn’t ever measured to the tenth degree, just a consequence of translating 37 degree centigrade
  • Amazon “women” sorting algorithm fiasco 2014-2018: (It had hired more men in the past so the algorithm favored men in recruiting)
  • Why did chemistry advance science but alchemy did not? Historian of science David Wooten says it was because alchemy was a pursuit in secret while science depended on open debate
  • Marin Mersenne: distributed early scientific community letters like a science journalist
  • Onora Hilll, a philosopher on trust: trust should be discriminating, like what specific algorithms (or people) we trust.
    • information should be accessible
    • decisions should be understandable
    • information should be usable and
    • decisions should be assessable
  • Should important algorithms have to undergo peer review or RCT?
  • Alice Rivlin gave the influential Congressional Budget Office its independent founding culture (she got her job in some sense because a legendary congressman was caught with a stripper and her opponents changed over)
  • CBO analysis is routinely more accurate than the US Treasury, which has presidential appointees and pressure (Reagan called CBO numbers phony, Carter and Clinton administration also challenged)
  • Andreas Georgiou a statistician trying to revive Greeks stats
  • Florence Nightingale was one of the first data visualizers, including her famous rose diagram
  • Adolphe Quetlet: created the idea of an average
  • David McCandless’s popular Debtris infographic confused stock and flows (i.e wealth vs. income), and author notes this is the misleading nature of beautiful data visualization
  • Graph duck is a nickname for one that looks like its topic (an orange that charts sales of oranges) and is named after a duck shaped store in New Jersey
  • Andy Cotgreave reimagined Scarr’s Iraq war chart: both true but change argument
  • Irving Fisher is an earlier famous economist whose 1891 phd thesis thought to establish the form
  • Phillip Tetlock: expert political judgement research over 18 years and then for the Good Judgement Project
  • To set a good prediction start with a base rate: if you’re guessing the probability of a marriage to last then first get close data to guess whether this couple is better or worse (Daniel Kahneman calls this “the outside view and the inside view”
  • Keynes may not have actually said his famous quote “the market can stay irrational longer than you can stay solvent.”
  • Two weeks before the stock market crash, Irving Fisher was quoted by the New York Times saying “stocks have reached what looks like a permanently high plateau“
  • Research by Morton Deutch and Harold Gerard in 1955 showed those who wrote their predictions in erasable marker, or more likely to be willing to change their predictions than those who wrote it in permanent marker
  • Late in his life, influential economist John Maynard Keynes reflected on his life “My only regret is that I have not drunk more champagne in my life.”
  • Roger Babson described Irving Fisher’s failing by saying: “He thinks the world is ruled by figures instead of feelings”
  • The philosopher, Onora O’Neill, said “well-placed trust grows out of active inquiry, rather than blind acceptance.”
  • Dan Kahan’s protest audience research showing what we think we see matters: the research was called They Saw a Protest which was reminiscent of the 1954 research called They Saw a Game
  • Curiosity was the best trait to combat it all
  • George Lowenstein’s “information gap”: when there’s a gap “between what we know and what we want to know”
  • Rozenblit and Keil: “the illusion of explanatory depth”
  • Orson Welles once wrote of an audience: “once they are interested, they understand anything in the world.”
  • Kathleen Hall Jamieson showed there was learning via comedian Stephen Colbert report on PACs

