The Data Debate

So, there’s a bit of a debate amongst language geeks and professionals about the word “data.”

You see, one of the ways to split up nouns is between singular nouns (which have separate singular and plural forms), and collective nouns (which don’t). “Cat” is a singular noun. It’s possible to have one cat, or many cats. The unit of cats is a cat. “Milk” is a collective noun; you can’t have one milk or many milks, you just have less milk or more milk. The unit of milk is hard to define, but I suppose it would be one each of the molecules that make up milk? Or one of the least common molecule, and however many of the others you need to get the right ratio?

“Data” used to be the plural of a singular noun, “datum.” A datum is the information contained in a single point on a graph or a single cell in a table. It’s a clearly defined unit, and when you have a bunch of them, that’s data. You can have one datum, or many data; the unit of data is a datum.

Except… then computers happened. Now data is a thing your harddrive is full of. You don’t have one data or many data, you have less data or more data. The unit of data isn’t a datum, it’s a bit, or possibly a byte depending on how you look at it.

Why does this matter? Well, some of us write for a living, and we might end up having to write about data. Grammar is important, not only for clarity of communication, but also as a matter of professional pride and a measure of quality. We don’t want our bosses or coworkers telling us we made a mistake on something as simple as subject-verb agreement, and the possible collectivity of data creates an issue there. If “data” is the plural of “datum,” then “The data are reliable,” is correct grammar. But if “data” is a collective noun, then “The data is reliable,” is correct.

Cue years of debate.

I have, generally speaking, come down heavily on the collective noun side of the debate. I think data, in the modern world, behaves more like a fluid than a collection of solid objects (because the Internet isn’t a truck, it’s a series of tubes–that’s what he was trying to say!).

But I recently started a new job, and a lot of the writing there involves communicating with statisticians and statistical tables and most of what I’ve seen consistently treats “data” as the plural of “datum.” It bugged me at first, and I chalked it up to the typical lag of government standards behind the times.

But then I thought about it, and I realized that my milk example is incomplete. It’s not true that no one talks about “one milk” or “five milks.” In a restaurant, “one milk” is a glass of milk. It’s meaningful and sensible, in that context, to treat milk as a singular noun, to say “The milks are ready.”

And in the context of making statistical tables, well, isn’t that exactly what a datum is? One cell of a table? So wouldn’t the contents of many cells be many data? It sounds weird to me because I’m not used to the context, but that’s my problem, not theirs. So rather than try to force this community I’ve just entered to adapt to my ways, maybe I should try to see the sense behind theirs.

I dunno, just felt like something that ended up having a wider applicability than I expected.

Advertisements

7 thoughts on “The Data Debate

  1. chris the cynic April 10, 2013 at 12:42 pm

    I think you oversimplify a bit. It is possible to have something that divides between singular and collective rather than singular and plural. I’d argue that data is the collective side of a singular-collective noun.

    It is also the case that some nouns are collective treated as singular, and that collective nouns can have plural forms. For example, milk. “We make 7 milks here at the factory: skim, two-percent, whole, chocolate, strawberry, blueberry, banana.” Collective noun with a plural form. Because it became necessary to distinguish between different types of collectives.

    Which is also how we end up with plural-plural (peoples) and a similar thing leads to singular-plural (persons) resulting in a strange framework of person–>persons–>people–>peoples.

    Now whether one is discussing “people” or “a people” there is a unit: and individual person, as we expect from a plural, even though the second is singular in form (note the article) and thus verbed as a collective-singular.

    Data doesn’t work that way, unlike milk it can’t be pluralized (no such thing as “datas”) and it can’t have an indefinite article (no such thing as “a data”, unless you mean Star Trek but then you didn’t capitalize properly) on the other hand a definite article (“the data”) works just fine which gives it an apparent singular form. Instead you have to make it into an adjective for plurals (data sets) and use “of” to make things partitive for singular (a piece of data) but there is a word for, “a piece of data” and that would be datum. So to an extent it makes sense to see Data as the collective-plural of datum. After all, once you’ve got many individual datums(not a word, couldn’t think of a good way to phrase it using real words), what you have is data.

    I think “group” is an example of a noun like milk. You can have “a group” just like you can have “a milk” (camel is a milk I like much better than goat/ camel is a much better milk than goat’s), you have have “the group” just like you can have “the milk” (“get me the milk”) you can have groups just like you can have milks (I already did this one) and the governing verbs are the same. The difference, though, is that “my group” has discrete units where, “my milk” does not. The units of a group are the individuals within it. Or members if you prefer. “My group has three members,” defines the size of the group (myself, a dragon, and a unicorn) in a way that “My milk has three [constituent parts],” can not. Just try it, “My milk has three pints,” is clearly wrong and painful event to write. Milk does not divide that way even though in every other respect it is identical in usage to group.

    In random other things just quickly said because I have to leave:

    Formerly plural now usually singular:
    Agenda.

    Agendum, the singular form, would be “Thing to be done.” Thus agenda means things to be done.

  2. froborr April 10, 2013 at 2:55 pm

    The whole point of collective nouns is that the concept of number doesn’t apply; you can have an amount of a collective noun but not a quantity. Put another way, for a singular/plural noun the question “How many?” has meaning; for a collective noun the correct question is “How much?” As such, a “singular-collective noun” is a contradiction in terms.

    I don’t see much distinction between using “a milk” as shorthand for “a type of milk” and “a milk” as shorthand for “a glass of milk.” In both cases, it’s using the object of a preposition to imply both the preposition and the singular noun modified by that preposition, with the whole construct treated as a singular noun that takes a regular plural. In the case of milk, the object happens to be a collective noun normally; it’s not that it’s a collective noun with a plural form, simply that it’s being used in a construction that elides the singular noun and therefore the plural has to be applied to a normally collective noun.

    Person/people, on the other hand, is a plain old-fashioned irregular plural. There’s nothing collective about “people”; as you note, the word has a singular that indicates a single unit, “person.” The reason we have the construction “a people” is because it’s the same elision as above, a shorthand for “a group of people.” Thus, we end up applying the regular plural to what is already an irregular plural, resulting in “peoples.” (“Persons” is just a result of lawyers and bureaucrats erroneously thinking a regular plural is somehow more “official” than an irregular one, and then doing it long enough that it becomes the consensus.)

    There’s no reason that same construction can’t be used for data. If, for example, you worked for an organization that sold files of raw data and files of calculated statistics, you might well say “Today we sent out five datas and three statistics.” In such a context you could have a [file of] data just as easily as you can have a [group of] people or a [type of] milk.

    The existence of a definite article implies nothing about number: This is the hamster. These are the hamsters. That is the stuff.

    “Group” is a singular noun that takes a regular plural, “groups.” Grammatically speaking, it’s a discrete entity, not a collective like milk, which is why you can have one group or many groups without needing to introduce a preposition (stated or implied).

    As for “agendum”/”agenda”: I pretty much agree. I think “agendum” is more or less officially (insofar as anything regarding the English language can be regarded as official) an archaic usage, with agenda now a singular (meaning roughly “to-do list”) that takes the regular plural agendas. I imagine this annoys pedants to no end, which makes me very happy.

  3. chris the cynic April 10, 2013 at 3:22 pm

    The existence of a definite article implies nothing about number

    Yeah, I’m not sure why I wrote that and I meant to delete it, the only reason I didn’t delete it is because I thought I already had. No matter how many times I read over something it seems like I should have given it one last check.

  4. froborr April 10, 2013 at 3:26 pm

    No worries. Nobody can edit themselves all that well; a second set of eyes will always catch things the first missed.

  5. chris the cynic April 10, 2013 at 9:39 pm

    I’ve written too many versions of this but it never comes out quite right.

    First off, your usage of the term “collective noun” which I was originally fine going along with as a colloquial form is really starting to grate on me. Not you using it, but me using it when I want to respond to you. (Thus I won’t be using it.) A collective noun is a noun that denotes a group but is singular in form (a flock of birds, a gaggle of geese, a murder of crows, a set of action figures.) What you’ve been talking about is an uncountable noun (water, air, earth, fire, beryllium.)

    Second, things don’t divide up nearly as nicely as I would like. The dictionary example given for an uncountable noun is “information” which is only usually uncountable. As for milk, the dictionary naturally points out the same thing you did, can be used informally to designate servings.

    Your test for whether a noun is countable or not has some problems. Sand is going to be my example. If you ask, “How much?” I can answer in all the same ways I might with milk. If you ask, “How many?” I can count up grains. Probably more easily than I can count cats because cat’s don’t stay in the “I already counted you” pile. But the point is that for every, “How many?” I can have of cats, I can do the same of sand. Does “sand” have its feet in both worlds?

    But more than that, the difference between an uncountable noun and a countable one is how many categories things are divided into. Once upon a time English (not Modern English) had a dual so we divided things into four categories:
    –None, One, Two, More than that
    Now we commonly divide into three:
    –None, One, More than that
    Uncountable nouns divide things into two:
    –None, More than that.

    If a countable noun loses its singular form, as has largely happened with datum, then there are two possibilities. One would be to simply label it defective but keep on using the plural as if there were a singular attached. The other is to notice that you just went from None-One-More to None-More, and there’s an entire category of nouns that are set up on a None-More basis, adjust verb use appropriately, and transform it into an uncountable noun.

    That’s what you argued happened with data to make it not-the-plural-of-datum.

    I agree. But there’s another side to this transformation. A countable noun losing its singular sends it off to become uncountable (or defective), an uncountable noun gaining a singular should have the reverse effect.

    If “data” became uncountable, and then “datum” was reintroduced (say by increased contact with people who never stopped using it) then you’re left with the uncountable “data” and the singular “datum”, which are both definitely part of the same word, but rather than nil and countable or nil, singular, and plural you’ve got a noun with a singular form and an uncountable form but no plural form.

    This situation may be unstable, it could lead to the word splitting so that the singular has a plural based on it even though the original plural is still in use (i.e. datum=s. datums=pl, data=u.c.), it could lead to data stopping being uncountable and starting being plural again. It could lead to datum being dropped again. It could lead to the end of the world. But, until one of those things happens you’ve got a singular-uncountable word.

    Or, for a different thing, I already brought up sand. Say someone got sick of having to say, “grains of,” all the time and invented a singular (“sund”?). Even assuming it’s adopted into the lexicon that’s not going to make people say, “There are 16 sand on my plate,” because it’s not going to stop “sand” from being a uncountable noun. It’ll still be, “There’s sand on my plate.” You’d, again, have a singular and an uncountable with no plural.

    And leaving the hypothetical, that’s where I think data stands now. Datum is the singular, but the former plural, data, is no longer countable. So if you have more than one datum you have data, which sounds plural enough until you realize that you can’t say, “We’ve got 27 data here,” because Data is uncountable, not plural.

  6. inquisitiveraven April 11, 2013 at 8:18 am

    Also, the correct verb form for a singular collective noun is not necessarily the singular form of the verb. at least, In English. Outside the US, the plural form is typically used, e.g. the “the team are.” Interestingly, the American use of the singular verb in this context goes back to the Civil War, although the distinction between “the United States is” vs. “the United States are” has more significance than just the choice of verb form. It’s statement about the nature of the entity known as “the United States.” Before the Civil War, the United States = collection of states united under a single government”; after the Civil War, the United States = single country with smaller administrative units called “states.”

  7. chris the cynic April 11, 2013 at 3:04 pm

    That’s interesting. What about indefinite cases? Not “the team” but just “a team” Would you say, “A team are coming to handle things,” or for that matter “one team”. Basically, does the adjective change the verb to singular, or does the verb stay plural?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: