In the late ’90s, I developed wrist problems. I had something called tenosynovitis, which is kind of like carpal tunnel syndrome except that there’s no surgery to fix it.
My salvation was speech-recognition software. It was crude in those days—“You. Had. To. Separate. Your. Words. Like. This.”—but it was enough to keep my writing career going.
Today, when people complain about the accuracy of their cellphones’ voice transcriptions, I smack my forehead. Do you have any idea what a miracle it is that our cellphones can come even close to understanding our speech?
Anyway, here’s where speech recognition stands today. There’s what we get on our phones, which is impressive but could stand improvement.
Then there’s the built-in recognition software feature on Mac and Windows, which it seems like very few people use (or even know about).
And then there’s the pinnacle: the sold-separately software from Nuance, like Dragon NaturallySpeaking for the PC, and Dragon Dictate for the Mac. (I was astounded to learn that only 5 percent of Nuance’s revenue comes from these consumer products; the rest comes from supplying speech-recognition services to companies like Apple, for the iPhone’s dictation feature.)
Dragon NaturallySpeaking, for Windows, is almost freakishly fast and accurate. It’s so good that in the latest versions, 11 and 12, you don’t even have to train it. (That’s when you read a few minutes’ worth of prepared text, so that the software can learn how you speak.) You open the box and start dictating.
The Mac version, whose 4.0 version just came out, is also freakishly fast and accurate; I can dictate many pages without spotting a single transcription mistake.
You can control your Mac, too: opening and closing windows, switching programs, operating menus, “pressing keys” by voice, and so on. I love creating “voice macros”: You set these up so that you say one thing, and it types another. When you answer a lot of repetitive email, voice macros are a godsend.
Unfortunately, the Mac version, Dragon Dictate, is not as mature or as polished as the Windows version.
Here’s an example: a feature called Full Text Control. It’ll take some explanation; stay with me here.
Suppose you’re typing (that is, speaking) along, and you want to change something you said two paragraphs earlier. In programs that offer Full Text Control, like Microsoft Word, you can say, “Select ‘four score and seven years ago.’ ” The Dragon software instantly highlights that phrase, several paragraphs back. That’s Full Text Control: The dictation software can “read” and jump around in your document just as easily as you can.
In a program that doesn’t offer Full Text Control, if you say, “Select ‘four score and seven years ago,’ ” the speech software visibly walks the insertion point back through all the text until it reaches that spot, as though you were pressing the left-arrow key over and over. It thinks of your text as a continuous river, and the only way it can get back upstream is to swim there.
If you now fix the error and say “Go back,” then the software walks the insertion point forward again to the point where you’d stopped. Unfortunately, if you’ve clicked elsewhere since you typed, forget it; you’ve interrupted the river, and the dictation software no longer has any idea where it is in your document.
In Windows, you get Full Text Control in all kinds of programs: Internet Explorer, Firefox, Outlook, Word, Notepad, WordPad, OpenOffice Writer, WordPerfect, and Excel.
On the Mac, the list is much shorter: Notepad, TextEdit, Microsoft Word 2011, and Pages 4.3. It’s not in any email programs, unless you happen to use Gmail on its website. Dragon Dictate 4 includes a plug-in that gives you Full Text Control there.
Transcription’s Holy Grail
There are many other examples where the Windows version offers more, or better, features than the Mac version. But here comes a surprise. At least for this moment in marketing time, the Mac version offers a huge new feature that the Windows version doesn’t have: It can transcribe the audio recordings of total strangers.
In other words, you can feed it an MP3 file of a speech, a college lecture, or even an interview, and Dragon Dictate will turn it into typed text.
For a lot of people, that’s the Holy Grail. Every time I review dictation software, about 50 people write to ask, “Does that mean it can make transcriptions of recorded interviews now?” Until now, the answer has always been no.
This feature, if it works, would be a big, big deal for TV editors and producers. College students. Reporters. Or anyone online trying to settle a point about what Politician X did or didn’t say.
If it works.
To set it up, you create a new “profile” (voice file), just as you would if you were setting up Dictate for use by a different family member.
Then you open the recorded audio file.
Dictate spends about a minute transcribing the first 60 seconds of the recording. It’ll be full of mistakes—because, of course, it doesn’t know this new speaker’s voice at all. And, of course, you can’t exactly ask the speaker to put on a headset and train the software, as you might yourself; it’s way too late for that.
So Dictate does something clever: It shows you a phrase at a time of the 60-second excerpt and asks you to approve its accuracy. (A Play button lets you hear the original to help your judgment.)
Once you’ve corrected the 60 seconds, Dictate proceeds to transcribe the rest of the recording. In essence, you’ve done the standard training backward: Instead of learning to associate a new voice a with canned script of words, the software associates new words with an existing voice recording. Very clever.
I tried the recording-transcription feature on recordings of Steve Jobs’ famous commencement speech and President Obama’s inauguration speech. The good news: The transcription was extremely accurate.
The awful, absurd, heartbreaking news: The transcriptions have no punctuation, line breaks, sentence capitalization, or paragraph breaks. It comes out as one huge blob of undifferentiated text, like this:
“thank you I’m honored to be with you today for your commencement from one of the finest universities in the world truth be told I never graduated from college and this is the closest I’ve ever gotten to college graduation today I want to tell you three stories from my life that’s it no big deal just three stories the first story is about connecting the dots I dropped out of Reed College after the first six months but then stayed around as a drop in for another 18 months or so before I really quit so why’d I drop out it started before I was born my biological mother was a young unwed graduate student and she decided to put me up for adoption”
That format is useful if you want to search a long interview for a certain phrase. And cleaning it up—adding punctuation, capitals, and sentence breaks—is probably faster and easier than having to create the transcript yourself. But it’s still not quite there.
And if you use this feature to transcribe an interview, only the primary person’s voice gets accurately transcribed. The software can learn only one voice per recording. So the interviewer’s voice will get some goofy transcriptions. Forget it for group discussions.
A few bugs
I had a bunch of problems getting going with Dictate 4.0. I spent the first day dismissing a lot of goofy, and incorrect, error messages (“You have two copies running,” for example).
Then I had trouble installing the web browser plug-in. And even once I did that, I had to answer a permission request every time I visited any website for the first time.
I finally contacted Nuance, which told me that there’s an undocumented way to turn that off. Email me if you need the instructions.
There are also all these annoying little limitations. For example, when you dictate into Microsoft Word, all the apostrophes and quotation marks appear as the ugly straight kind instead of curly, as they should be.
And in a web browser, incredibly, you can’t highlight the address bar by voice, to dictate or type a new web address. You have to use the mouse or the keyboard for that.
Make no mistake: Dragon Dictate 4.0 for the Mac ($180, with headset) is infinitely better than the speech recognition you get on your phone. It’s also much better than the built-in speech recognition software on the Mac or Windows (which offers no Full Text Control at all, no formatting commands like “Bold that,” no computer commands like “Open Photoshop,” and no way to create voice macros). If you can’t type, or don’t like to type, on your Mac, then Dragon Dictate is a great option.
But at the same time, there are all those bugs. Basically, Dragon Dictate 4.0 for the Mac gives you a Ferrari engine—but the chassis has a few loose bolts.