Pros and cons of the Google book deal

01 May 2009

The Google Book Search settlement is huge, complex and overall a big step forward. But it’s also quite scary. The world of print is about to change, mainly for the better.

The settlement came about because Google was scanning books willy-nilly, bless its informational heart. Google indexed the entire book, which meant that you could search in it—or across entire libraries—to find the gobbet of information you needed. Sometimes, it’s true, that was enough to satisfy your desire. But often it meant that you’d been given a strong incentive to buy the book so you could read all that it had to say.

But the Authors Guild and the Association of American Publishers didn’t see it that way. They sued. At first Google said it was going to fight it on Fair Use grounds. After all, Google had constrained people’s access to the scanned books precisely to stay within Fair Use: Readers could only see a few pages before the site clammed up. But then Google stopped fighting and agreed to a settlement. That settlement is now in federal court, awaiting approval by the same judge who oversaw the Bernie Madoff court case. There’s no particular reason to think it won’t go through, although many people are objecting to various parts of it.

We’ll get to the objections and why it’s a scary deal. First, the goodness.

The publishers are likely to make submitting their books for indexing a regular part of publishing. That means that we’ll be able to search them via Google, see a preview and press a button to buy a copy. Books that are out of copyright will be fully readable and downloadable for free, as is only proper.

The most significant part of the settlement has to do with books that are still under copyright but are out of print. Unless the rights owner opts out, they will be available for preview and purchase. And, most interestingly, if a book is in copyright and out of print and the rights holder can’t be found—say it’s a book published in 1930, and the author is long dead—users will still be able to purchase a digital copy. The money the settlement generates will go to support the Book Rights Registry being established to track books’ rights, and the leftover money will be split between Google and the authors and publishers. So, the treasure trove of books that have been held shut by the hands of dead rights holders (because the length of copyright has been extended absurdly) will now become available.

There’s more goodness. The entire corpus of scanned work will be available to researchers who want to run algorithmic analyses. Want to track the development of the word "cool" throughout the 20th century? Want to see how frequently the word "knowledge" is used in conjunction with "expert" over the course of time? Now you’ll be able to.

But, for all this joy, there are big, worrisome issues, mainly because this is a settlement between Google, authors and publishers. Can you think of people whose interests are not directly represented in this agreement, hmm? Readers, perhaps? Scholars? Educators? Libraries?

Among the objections I’ve heard raised, three strike me as especially trenchant:

First, this establishes a de facto monopoly in the United States on the scanning, indexing and online accessing of books. Google is about to become our national library. There’s even a "most favored nation" clause in the settlement that says that if the publishers do a deal with another company—say if Yahoo wants to scan books—Google has to be offered the deal on the same terms. That clause ought to go because competition would be better than monopoly. And, if Google is going to be granted this monopoly, we should at least have more guarantees that the service will be open to other scanned collections, and will support open standards across the board.

Second, the settlement should clearly maintain at least the old standards of Fair Use. We don’t want to end up with even less ability to reuse our culture than we had before. The existing settlement is a lost opportunity to clarify and expand Fair Use.

Third, institutions will be charged for accessing the new digital library. It would be good to have more clarity and more safeguards around how this monopoly is going to set the price.

There are plenty of other objections. For example, it’d be helpful if the agreement explicitly acknowledged the presence of Creative Commons licenses in order to encourage their use ... a clear case of interests not represented by all those doing the deal. And, since the people who will make money from orphaned works are precisely not the people who created or published the works—by definition, those people cannot be found—I like the idea of shunting some of that money into a fund to help increase open access to all the works of our culture. The settlement does talk about donating unclaimed funds to non-profits "that advance literacy, freedom of expression, and/or education," but makes no commitment to the amounts, and does not directly name open access. This should be a requirement, not an afterthought.

The settlement is not what you would come up with if you began with a blank piece of paper and designed the optimal system for all the interested parties. But it is a big step forward ... especially if we can get some changes to address the needs of the class of people we call "readers."