Moneyball, the Oakland A’s, and the history of “data-driven” anything
Sometime in the early 2010s, “data driven” became the new management buzzword meaning “let’s stop making decisions based on hunches, and let’s start using all the data we have sitting around somewhere”. It all started some ten years earlier in … professional baseball.
Anecdote time. I didn’t become an Oakland Athletics fan because of “Moneyball” (which didn’t exist yet). I started going to games when they offered one dollar bleacher tickets plus dollar hot dogs to fill seats at the mostly empty Coliseum at a time when the new San Francisco Giants ballpark across the Bay was humming.
By sheer luck, my first game in 2000 was a comeback win against the then division-leading Seattle Mariners, after which the A’s went on a tear to take the AL West for the first time in seemingly forever. Needless to say, I was hooked.
I quickly joined the fan forums where the traditional A’s fans were fighting a rearguard action against the encroaching quant types, emboldened by Billy Beane’s embrace of “statistical analysis” — sabermetrics — for player selection.
At that time, Silicon Valley was still dominated by old school engineers from Intel or Hewlett Packard who knew their statistics from “six sigma” quality assurance, rather than the dot-comers.
The Silicon Valley engineers slowly displaced the traditional fans by means of technical superiority: the typical chatter about player trades was no longer about “Hey, I like his tools” but about arcane sabermetrics comparisons.
In this one-upmanship the traditional fans were quickly outnumbered and eventually out-proven by the A’s astounding run of regular season successes (and postseason collapses) that came to define the Moneyball era.
Fun side note: For all the bluster with which the early sabermetrics scores were wielded in these battles, they were almost universally awfully constructed.
At that point, this was really just the displacement of a traditional belief system with another one that claimed to be on the side of “science” but also just put their faith in a different set of counterfactuals.
The A’s spectacular string of postseason collapses made sure this new belief systems didn’t get its ultimate validation until a few time zones further East, sabermetrics disciple Theo Epstein took over first the Boston Red Sox and then the Chicago Cubs to help break a bunch of century-old curses.
At about the same time, a Chicago-trained statistician, quant, and poker player named Nate Silver started a baseball blog and a performance metric called PECOTA. Nate later branched out into politics with a website called 538.
Even though he doesn’t like it, Nate Silver did a lot to help popularize the term and the profession of data scientist, loosely described as “If you sit on a huge pile of data, sifting through it mostly aimlessly will eventually create some insight”.
Over time, data science didn’t only branch out into politics, but also into all kinds of other fields that didn’t seem to be amenable to statistical analytics. Today, even top European football teams run a sizeable quant division.
The business school obsession with KPIs, after two decades of deemphasizing the role of management science for business success, stems from around the same time. Education is all quantified now.
But the lessons from that early-day fan forum still apply: A lot of “data science” is about not getting pushed into decision paralysis by a massive amount of data; lots attempts to quantify selection processes get undermined by poorly constructed metrics; and…
Building decision tools on “data not theory” is a good idea as long as the conditions underlying your implicit model hold, but put you at the risk of getting blindsided by changing circumstances such as…
The collapse of the Hotelling consensus in American politics. History does not repeat, it rhymes.