It all comes back to Star Trek. Whenever I have a complaint about the way technology works—or doesn’t—a little voice in the back of my head says, “They don’t worry about this on Star Trek.” One of the things they don’t, apparently, worry about aboard starships a few centuries hence is being understood by computers. Keyboards and mice, we are led to believe, are relics of the distant past, and voice recognition has been perfected. That’s a rosy and probably overly optimistic future, but one small aspect of the Star Trek computer interface is closer than many people realize. Have you ever noticed that when giving spoken commands to the onboard computer, starship crewmembers never worry about where the microphone is located? Somehow, the entire ship manages to listen to everything that’s spoken, and intelligently pick out particular voices—as well as determining what words should count as commands. While we’re not quite there yet, technology has taken a meaningful stride in that direction, thanks to devices called array microphones (PDF link).
Over the past couple of decades, speech recognition software has improved dramatically. Where once you were restricted to a very limited vocabulary of commands—or dictation software that required you to pause after every word—now your computer, tablet, or phone can, with a fair degree of accuracy, recognize and even transcribe normal, continuous speech. But speech recognition is still a very imperfect undertaking for a variety of reasons. Not least is the way microphones work.
The Ears Have It
It’s tempting to think of microphones as being something like human ears, but ears have a couple of very important advantages over microphones. First, they generally come in directionally optimized pairs. And second, they make use of a sophisticated signal processor known as a brain. It’s actually the brain that does the bulk of what we consider listening—sorting out which sounds deserve to be focused on, distinguishing one speaker from another, and following a conversation even when the speaker is moving around. Conventional microphones lack any intelligence and so simply pick up whatever is around them—the sound of computer fans, telephone calls, nearby conversations, music, or traffic. They have no way to discriminate and give you just the particular sounds you want. In terms of speech recognition, this causes some serious problems for your computing device, which can’t figure out which sounds are meant to be understood as commands and which should be discarded as noise.
One way to “solve” these problems is to wear a headset microphone or a similar device that puts a highly directional pickup close to your mouth. At such a close range the gain (or input volume) doesn’t have to be very high, so extraneous noises are usually avoided. Headset mics (and even the latest crop of wireless earbuds) do provide pretty good results, but most people find it distracting to wear (and configure and charge) headgear just to talk to their gadgets.
Creating a Digital Ear
An array microphone does for microphones what your brain does for your ears. It gives them some intelligence. The general idea is that you take a set of microphones—which could be two, or eight, or thousands—and add a digital signal processor (DSP) chip with some sophisticated logic. The array microphone’s processor continuously figures out where the primary speaker is in the field of audio input it’s receiving, and selectively adjusts the output so that most of it is coming from the microphone that is getting the best signal. This amounts to “focusing” on a certain direction and distance so that the speaker’s voice, and little or no other noise, actually reaches the output. (Some array microphones are actually much more sophisticated than this, doing advanced noise cancellation and other tricks.)
The implication of this is that in principle, with an array microphone on the desk, a speaker can walk from side to side, or move backward or forward—and maintain the same level of accuracy in speech recognition as with a headset mic. But that’s in principle. Just as ordinary microphones vary in quality and sophistication, so do array mics. Some work well only at close distances; some cancel out certain kinds of noise better than others. As with anything, you get what you pay for—there are array mics that cost less than US$25, and those that cost well over $3,000. But even the most expensive array mics are only designed to pick a single speaker out of a background of noise; the all-important Star Trek tricks like distinguishing one speaker from another and intelligently distinguishing commands from conversation are problems for software to solve—and we still have some distance to go there.
An Array of Uses
Array microphones are used in numerous applications besides computer speech recognition. They are sometimes used in recording audio for films and TV shows, where extraneous noises are a no-no. And they can be used to improve group conference calls, to amplify the voices of performers in a play, or to pick up the voices of lecturers (or students) in large classrooms. Some hearing aids use an array of miniature microphones to help wearers focus on a single speaker in a noisy environment. You may also find array microphones in cars, where they’re used for hands-free phones. Home automation enthusiasts have been known to use microphone arrays—sometimes separate arrays in each room—so they can speak a command from anywhere in the house (“Turn kitchen lights on,” “Activate force field”), although smart speakers (with or without their own array microphones) are more commonly used now than a centralized processing system. In any case, progress! A few centuries more, and we may have the other speech recognition issues ironed out for good. Then we can move on to that whole warp drive problem.
Note: This is an updated version of an article that originally appeared on Interesting Thing of the Day on June 13, 2003, and again in a slightly revised form on October 11, 2004.