What Does the American Prairie Have to Do with the Modern Data Stack?
Successful data analytics teams are emergent; they are rarely the result of conscious human design, reasoning, and planning. Instead, the best teams often emerge out of spontaneity that responds to the needs of internal customers and tradeoffs of serving those customers efficiently over the short-run and long-run. They tend to be built from modest beginnings and grow as a result of virtuous cycles, as opposed to mirroring the best practices of successful companies.
I recently re-listened to an EconTalk podcast on the American prairie, a storied and romanticized ecological system that today is virtually extinct. The host, Russ Roberts, urged listeners to picture the prairie. There “would be an abundance of plants, long grasses of various colors, a palette of flowers, some trees. You'd see the original ‘amber waves of grain’ [and] you'd also see a variety of animals going about their daily ritual.”
Roberts then challenged listeners to recreate a prairie. So, you try, do the thought experiment. How, with an unlimited budget, might you construct a prairie; what steps would you need to produce one? He then proposed a solution:
"Buy a plot of land where prairie likely thrived in the past.
Check the libraries. Look for old photos of the prairie. Obtain the most complete list available of all the plant and animal members of a prairie ecosystem.
Collect samples of all the relevant species, e.g., seeds of plants, male and female pairs of animals.
Clear the plot of land and plant your seeds, along with a few trees.
Release the animals into the plot of land.
Watch and wait.
Perhaps you added a few steps with more intervention, like fertilization or watering. But overall, you likely suggested some kind of approach that we will loosely call ‘Assemble.’ That is, the steps you listed were to clear out your workspace, get the component plants and animals, lay out the blueprint, follow the directions, and start assembling. You'd then piece together the various components of the prairie and hope that somehow a prairie emerges. The approach is quite reasonable. It seems intuitively correct. If you were assembling a car, house, or a toaster, it would probably work. All you'd have to do is to assemble the components of the desired system on a reasonably attractive plot of land, and eventually a prairie would emerge. It makes sense, right?”
The host poses that strategy to point out its flaws and highlights that it fails. Rather than assembling a prairie, a “prairie is something that grows. It has to start small. It has pieces that interact and build on each other. Once it's up and running, the prairie works as a complex system. It is dependent on the intricate interaction of all the components of the system. A prairie cannot be brought to life with one abracadabra, one wave of the magic wand. Ecologists have in fact experimented with trying to grow prairies. Early experimenters took the assemble approach. But they ran into complications. Urban weeds are one such complication. Relative to most prairie species, these noxious weeds are aggressive and fast-growing. Given a chance, these tough weeds will muscle out the more timid prairie species and prohibit them from thriving.”
Ultimately, a beautiful lesson was learned in building a prairie; “not all of the essential ingredients to growing a prairie savannah are visible at the end.” Fire, not present in the final output, was a necessary ingredient to prairie systems thriving. “Fire triggers certain prairie seeds to sprout and eliminates many fire-intolerant urban plants. Without fire, there is no prairie.”
I thought of my world of data teams and data infrastructure. The modern data stack is incredibly powerful, and like the “amber waves of grain,” it’s beautiful. It powers companies moving at an unprecedented pace, without many of the burdens of data infrastructure of the past. It gives teams access to countless data sources without needing to write ETL, manage servers, or grab a coffee before getting an answer. Aside from being beautiful, it’s also “seamless” — the pieces add to one another and there are many recipes, consultants, and best practices to help stand up this infrastructure.
In 2010, I joined Yammer to run their data organization. A former member of the team, Benn Stancil, recently wrote a retrospective on our team. It was “a data organization that looks a lot like those of 2022, but without the glossy sheen from ten years of IT consumerization, YC-backed data startups, and trendy thought leadership. In other words, we were a modern data team without the modern data stack… We ingested data from a handful of sources into a centralized warehouse; we transformed it in the warehouse using a DAG of SQL scripts; we pushed that data out through ad hoc analysis, BI reports, and a handful of department-specific tools… We didn’t recognize any of these patterns by their current brands — ELT, analytics engineering, data apps—but the shapes were the same. The experience of using it, however, was not.”
Stancil, in reminiscing on the past, poses a question, “Nearly [every] part of the industry is breathtakingly easier, faster, more powerful, and more reliable than it was a few short years ago. Except us. Because there’s one nagging inconvenience in the comparison between today’s data teams and the one I was on in 2012: Yammer’s data team was as impactful as any that I’ve ever worked with. It was a key part of the product development process; its members were honorary members of the marketing and customer success leadership groups; it was respected, in-demand, and had a voice in the strategic direction of the company. And all this was done on top of technology that was, relative to what’s available today, fragile, narrow, expensive, and powered by now-archaic computing capacity. That’s the paradox we need to solve. Why has data technology advanced so much further than value a data team provides?”
The author posits two hypotheses; the modern data stack distracts us (from the highest value data by lowering the cost of accessing lesser value data) or “the industry’s talent [has] not caught up with the capacity of its tools, and we just need to be patient.“ Either way, it speaks to a shortcoming (either distraction or skill) of the consumers of the modern data stack.
My answer, instead, is the prairie.
Similar to building a prairie, impactful data teams are not assembled with data tools. Like fire on the prairie, there’s a missing link for data teams, data wins. Also much like fire, there needs to be tinder and kindling – this is of the form of demand by operating teams for data insights. Generating these data wins (fire) builds trust, and trust is the currency that creates more demand for insights and affords the tooling that speeds up anomaly detection and delivers more data wins faster. This is the virtuous cycle that creates effective data teams.
Very often with the power of modern tools, we jump to universally applied solutions — maybe that’s an approach to measuring LTV or a best practice for designing a dimension table. With modern tools, this can be done absurdly quickly (that’s the promise of these tools, and also the feature that often sells them). However, in practice, these solutions might often deliver trivial value. Knowing LTV is great when you can profitably purchase ads with reliable populations, but this is rarely the case. Often knowing something subtle about the data or populations is more important than the basic estimate of LTV. Just as you can’t put a bison on a plot of land to make a prairie, you can’t merely import a data person to solve an LTV estimation problem because that's rarely what you really need.
Building a team solely around the tools that make them successful ignores the urban weeds — these are data errors (whether it’s a broken pipeline, a mistaken line of code, or a confused causal relationship). These mistakes and the remembered sting of them often make data practitioners and their internal customers resilient (like the prairie grass and flowers).
One should not completely discount the value of modern tooling — in fact, I work for Mozart Data, a set of tools that enable teams with the modern data stack. Modern tooling can bring about success (with fewer resources and more cost effectively) in the same way that tractors and modern seeds could accelerate replicating a prairie. But one cannot ignore the emergent nature of successful data teams and the often hidden ingredients that are essential.
** An exorbitant amount of credit belongs to Russ Roberts/Pete Geddes (EconTalk September 28, 2015) and Benn Stancil (“Should we be grateful for the modern data stack?” November 25, 2022).