Text-to-speech: Amazon caves

There’s been a lot of interesting copyright controversies lately.  Given the economic realities of betting an entire business on a lawsuit with an uncertain outcome, though, the controversies almost never seem to reach a judicial determination of legality or illegality.

Unsurprisingly, it looks like the controversy over text-to-speech in the Kindle 2 is going to suffer the same fate.  The New York Times is reporting that Amazon is making changes to let authors decide whether they want to enable the text-to-speech feature on a title-by-title basis (h/t slashdot):

Amazon maintains that the feature is legal and that it would in fact increase the market for audio books.

But it said, “We strongly believe many rights holders will be more comfortable with the text-to-speech feature if they are in the driver’s seat.”

Oh well.


Wil Wheaton is man enough to take on a disembodied electronic voice

First of all, Wil Wheaton is awesome.  I think I was one of the few people who didn’t greatly detest Wesley Crusher, so I had no great bias against him.  However, he won my heart as the Best Geek Ever in his Slashdot interview, when he related this tale:

Once, I was working on a movie in Kansas. We were driving from the set to the house where we were all staying, and it was close to a 40 minute drive. Now, 40 minutes in a city is nothing. But 40 minutes along a rural highway seems like an eternity. So we’re driving along, and I ask my friend if we’re there yet, and he says no, and I say, “Jesus. By the time we get there, the kid won’t even be dead anymore.” There is this pause in the car, and one of the other actors says, “Dude. Did you just quote your own movie?” I answered in the affirmative, and he says, “That was very cool.”

Anyway, Wil responded to Cory Doctorow’s comments about the Kindle:

But what if we’re all wrong? As an author, performer, and consumer of audiobooks, what does this mean for me?

To find out, I picked a short passage from Sunken Treasure and read it. Then, I took the identical passage, and let my computer read it. I recorded the whole thing and put together something I call “Wil Wheaton versus Text 2 Speech” so you can hear for yourself.

I haven’t downloaded the MP3, but from the comment thread, it sounds like people don’t exactly believe that the Authors Guild has much to worry about.

Authors Guild President on the “Kindle Swindle”

Continuing the back-and-forth over the new Kindle’s text-to-speech feature, Roy Blount Jr., president of the Authors Guild, wrote an op-ed in the New York Times.  Along with coming up with a snappy, derogatory catch phrase (is the “Kindle Swindle” an argument against unfairly exploiting authors, or the next dance craze?  you be the judge!), he tries to clarify some of the things that worry the Authors Guild about this technology:

True, you can already get software that will read aloud whatever is on your computer. But Kindle 2 is being sold specifically as a new, improved, multimedia version of books — every title is an e-book and an audio book rolled into one. And whereas e-books have yet to win mainstream enthusiasm, audio books are a billion-dollar market, and growing. Audio rights are not generally packaged with e-book rights. They are more valuable than e-book rights. Income from audio books helps not inconsiderably to keep authors, and publishers, afloat.

This is the real problem for the Authors Guild: text-to-speech may not be copyright infringement per se, but it may pose a threat to the market for an extremely important part of their portfolio nonetheless.  Being able to allege copyright infringement merely gives them a lever to try to protect the market for this property.

He continues:

You may be thinking that no automated read-aloud function can compete with the dulcet resonance of Jim Dale reading “Harry Potter” or of authors, ahem, reading themselves. But the voices of Kindle 2 are quite listenable. There’s even a male version and a female version. (A book by, say, Norman Mailer on Kindle 2 might do a brisk business among people wondering how his prose would sound in measured feminine tones.)

And that sort of technology is improving all the time. I.B.M. has patented a computerized voice that is said to be almost indistinguishable from human ones. This voice is programmed to include “ums,” “ers” and sighs, to cough for attention, even to “shhh” when interrupted. According to Andy Aaron, of I.B.M.’s Thomas J. Watson research group speech team: “These sounds can be incredibly subtle, even unnoticeable, but have a profound psychological effect. It can be extremely reassuring to have a more attentive-sounding voice.”

The Author’s Guild might be overestimating this threat, though.  Sure, these voices get better all the time.  However, when interviewed about the new Kindle on the Daily Show, even Jeff Bezos admitted that the computerized voice is “a little freaky.”

Blount goes on to try to reassure people that the Authors Guild isn’t trying to go after blind people or parents reading to their children:

In fact, publishers, authors and American copyright laws have long provided for free audio availability to the blind and the guild is all for technologies that expand that availability. (The federation, though, points out that blind readers can’t independently use the Kindle 2’s visual, on-screen controls.) But that doesn’t mean Amazon should be able, without copyright-holders’ participation, to pass that service on to everyone.

Cory Doctorow isn’t convinced, though.  On Boing Boing, he writes:

Time and again, the Author’s Guild has shown itself to be the epitome of a venal special interest group, the kind of grasping, foolish posturers that make the public cynically assume that the profession it represents is a racket, not a trade. This is, after all, the same gang of weirdos who opposed the used book trade going online.

Doctorow posits that, even assuming that text-to-speech violates copyright, it would be hard to show that Amazon would be liable, as they are simply making the software and hardware capable of performing the infringement available, much like Sony’s production of betamax players.

If the Authors Guild cannot rely on a direct infringement theory, Doctorow may be right.  However, as discussed in my last post on the topic, the fact that Amazon is transmitting the e-book to the Kindle, and the possibility that the text-to-speech processing could be considered a rendering, the question of whether Amazon could be liable for infringing the right of public performance could be a difficult one.

Is Kindle’s text-to-speech a “public performance”?

Amazon recently released a new version of its Kindle E-book reader. Among the new features in the updated reader is the ability to have the reader perform a text-to-speech conversion to render the book in audio form. The Author’s Guild, however, was not amused:

Some publishers and agents expressed concern over a new, experimental feature that reads text aloud with a computer-generated voice.

“They don’t have the right to read a book out loud,” said Paul Aiken, executive director of the Authors Guild. “That’s an audio right, which is derivative under copyright law.”

Although there doesn’t appear to have been any sort of complaint filed yet, the Author’s Guild appears to be firing a warning shot across Amazon’s bow concerning this new feature.

David Post analyzed the issue a bit at the Volokh Conspiracy, and feels that Amazon has the upper hand here:

There’s no “audiobook” involved in the Kindle transaction. The copy that customers receive is just the (marked-up) text, in Kindle format – same as before. The sounds are generated on-the-fly when the user presses the right button — the sounds aren’t “fixed” anywhere, i.e. they’re not stored separately from the text itself. Therefore, no sound recording; therefore, no derivative work; therefore, no additional royalty revenue for the copyrightholder.

I think Post has a good point in stating that text-to-speech as implemented by the Kindle is probably not a derivative work.  Works covered by the copyright statute must be “fixed in a tangible medium of expression,” but the text-to-speech feature at best produces a transient audio representation of the full-text content.

However, I think that Post errs in assuming that just because there is no derivative work, there is no possible copyright infringement.  It’s possible that the text of the statutes provides infringement theories that do not rely on the creation of a derivative work.

Continue reading