David Weinberger
KMWorld Archive
This column is part of an archive of David Weinberger's columns for KMWorld . Used with permission. Thanks, KMWorld!
Original at KMWorld  Index

David's home page | Bio | Speaking | Everyday Chaos

Data Is Never Just Data

September 4, 2021

Data seems like a simple answer to complex problems, for data itself is simple, even elemental. And data is what it is, no matter how much we’d like it to be different. These stubborn elemental marbles reveal patterns and enable predictions that we would be wise to heed.

The only part of that paragraph I actually agree with is its conclusion: We are wise to heed data and are fools to ignore it. Usually. But data is not itself facts that exist independent of us. Data is never just pure, simple data. Data is never just data.

For one thing, data is not a thing in the world independent for us. For example, the warmth of your body is not data. The 98.6 on a thermometer in your mouth is data.

Likewise, the tiny pits on a DVD are not data. Without a DVD player, they’re just pits.

Data is the readings on the instruments we chose to build and deploy because they’ll help us in a project we have deemed worthwhile. It is a tool.

Context matters

As with all tools, data has uses because of complex contexts that include other objects, physics, social norms, social institutions, and human intentions. A can opener is only a tool because we created cans, and cans make sense only because a class of stuff that humans desire is subject to spoiling if exposed to the atmosphere, and we have a metal industry and an economy that lets us seal things in cans quite economically. The instruments we create to measure things also only make sense in complex contexts. Without those instruments, data would be no more real than would be all the holes we could dig in the ground if we had only invented shovels.

Without a context, data is just a number. For example, the number 2.4534 is not data unless I give you a context. So, watch me turn 2.4534 into data: that happens to be the average number of times a day I lose my mobile phone. Boom, data! Like magic!

We should heed data, but because data is always part of complex contexts, it can be dangerous if we ignore the fact that the way data exists often simplifies the environments in which it serves.

What is significant

For example, as I write this in July, a local suburb of Boston gathers and publishes data about local coronavirus infections. What that data measures makes it a useful tool for some purposes, but an inadequate one for others. The decisions about what to measure and report tell us a lot about this upscale town.

First, the existence of this data confirms the obvious fact that the town thinks the virus is significant enough to warrant collecting and posting information about its progress. We generally don’t gather data on matters of no conceivable relevance, such as how many basketball bounces there were in Massachusetts this past week.

That the COVID-19 data is updated weekly tells us about the speed of the spread of infection; if there were a town page about local heart attacks, it would be unlikely to be updated every week, and if it were, we’d suspect something significant was happening with local hearts.