So, there’s a bit of a debate amongst language geeks and professionals about the word “data.”
You see, one of the ways to split up nouns is between singular nouns (which have separate singular and plural forms), and collective nouns (which don’t). “Cat” is a singular noun. It’s possible to have one cat, or many cats. The unit of cats is a cat. “Milk” is a collective noun; you can’t have one milk or many milks, you just have less milk or more milk. The unit of milk is hard to define, but I suppose it would be one each of the molecules that make up milk? Or one of the least common molecule, and however many of the others you need to get the right ratio?
“Data” used to be the plural of a singular noun, “datum.” A datum is the information contained in a single point on a graph or a single cell in a table. It’s a clearly defined unit, and when you have a bunch of them, that’s data. You can have one datum, or many data; the unit of data is a datum.
Except… then computers happened. Now data is a thing your harddrive is full of. You don’t have one data or many data, you have less data or more data. The unit of data isn’t a datum, it’s a bit, or possibly a byte depending on how you look at it.
Why does this matter? Well, some of us write for a living, and we might end up having to write about data. Grammar is important, not only for clarity of communication, but also as a matter of professional pride and a measure of quality. We don’t want our bosses or coworkers telling us we made a mistake on something as simple as subject-verb agreement, and the possible collectivity of data creates an issue there. If “data” is the plural of “datum,” then “The data are reliable,” is correct grammar. But if “data” is a collective noun, then “The data is reliable,” is correct.
Cue years of debate.
I have, generally speaking, come down heavily on the collective noun side of the debate. I think data, in the modern world, behaves more like a fluid than a collection of solid objects (because the Internet isn’t a truck, it’s a series of tubes–that’s what he was trying to say!).
But I recently started a new job, and a lot of the writing there involves communicating with statisticians and statistical tables and most of what I’ve seen consistently treats “data” as the plural of “datum.” It bugged me at first, and I chalked it up to the typical lag of government standards behind the times.
But then I thought about it, and I realized that my milk example is incomplete. It’s not true that no one talks about “one milk” or “five milks.” In a restaurant, “one milk” is a glass of milk. It’s meaningful and sensible, in that context, to treat milk as a singular noun, to say “The milks are ready.”
And in the context of making statistical tables, well, isn’t that exactly what a datum is? One cell of a table? So wouldn’t the contents of many cells be many data? It sounds weird to me because I’m not used to the context, but that’s my problem, not theirs. So rather than try to force this community I’ve just entered to adapt to my ways, maybe I should try to see the sense behind theirs.
I dunno, just felt like something that ended up having a wider applicability than I expected.