Interesting Thing

Eye-to-Eye Video

October 16, 2018

13962

Image credit: miguelb [CC BY 2.0], via Flickr

Our computers, smartphones, and tablets all have built-in, user-facing video cameras. Video chats and video conferences—whether one-to-one, one-to-many, or many-to-many—have become commonplace both in business and among the general public, and I think most of us would say that’s a big improvement over audio-only communication. There’s something about seeing another person’s face that makes communication much richer and more satisfying. But there’s one thing about the current state of the art in video communication that still bothers me greatly: the inability of the participants to make eye contact with each other. This was never a problem on Star Trek, which was of course the source of all my technological expectations.

Look at Me When You Say That

If you have ever participated in a video chat, you undoubtedly know what I mean. The camera that’s pointing at your face is positioned above (or, occasionally, below or to the side of) your display. This means the angle at which you’re viewing the screen is different from the angle at which the camera (and therefore the person on the other end) sees you—an effect known as parallax. Only if you were looking directly into the camera would the viewer have the impression you’re looking into their eyes. As a result, while you see your friend’s image on the screen, your friend appears to be looking down (or in some direction other than right at you), and you appear the same way on your friend’s screen. You could, of course, position the camera directly in front of your own screen, but then, the camera itself would block your view of the person on the other end.

Eye contact is extremely important for meaningful communication, and after all, seeing the person you’re talking to is the whole point of using video instead of just audio. But if you can’t look that person in the eye, this eliminates much of the advantage of video over a regular phone call. Guides to effective business videoconferencing usually say you should look at the camera when speaking, to give the people on the other side the sense that you’re speaking directly to them. But this is unnatural, and prevents you from seeing their reactions as you speak. What we really need is exactly what they have on Federation starships: video displays that also somehow function as cameras, such that wherever you direct your gaze on the screen, that’s where your eyes will appear to be looking on the other end. Sure enough, engineers are trying to achieve this effect right now, working from several different angles (as it were).

It’s All Done with Mirrors

One fairly easy way to get eye-to-eye contact over a video link is to use technology borrowed from the television industry: the teleprompter. If you watch a news broadcast on TV, for example, you’ll notice that the announcer is looking directly at the camera. TV news anchors don’t memorize their reports in advance; they read them from a special video screen that appears to be directly in front of the camera. In reality, the screen (an ordinary flat panel, in most cases) is positioned face-up just below and in front of the camera lens, with its text displayed as a mirror image. Above this display, and thus in front of the lens, is a partially silvered (or two-way) mirror positioned at a 45° angle. The announcer sees the text reflected onto it from below, while the camera sees only the announcer.

Teleprompters are a simple and tested technology; they’ve been around for more than 60 years. (When similar designs are used for video communication, they’re sometimes referred to as video tunnels.) They do, however, have some problems. One issue is size: the equipment is by nature quite bulky, because it requires that angled mirror in front of the camera as well as special shielding to protect the camera from glare. Teleprompters also tend to be heavy, fragile, and expensive—all factors that make them unattractive for ordinary consumers.

I do own a device called ProPrompter Desktop, which cost “only” about $500 and can sit atop my computer (desktop or laptop) or tablet. It’s basically a miniature teleprompter, and you can opt to position the video in such a way that the other person’s image (rather than text you’re reading) appears on the mirror directly in front of the camera. It’s a bit clunky, but functional, and it’s useful when I’m giving remote video presentations to large groups or recording scripted videos.

With or without a teleprompter-like device, there’s yet another problem that comes into play when more than two people are involved in a video conversation. If I look directly into a camera, all the people who see me on the screen will perceive that I’m making eye contact, even if they’re in different locations. So the participants will not have the impression that my gaze shifts as I turn my attention from one person to another—nor can I tell who is looking at me (or my image) at any given time. A system called GAZE-2, developed at Queen’s University in Kingston, Ontario, attempted to address this by using multiple cameras in a video tunnel, along with an extra camera that detected where the user’s eyes were looking; software then switched to the camera nearest the user’s direction of gaze and rotating the image on the other end to match.

Just Like Being There

Another proposed solution to the problem of gaze direction, developed by researchers at Keio University in Tokyo in 1996, was called MAJIC (multi-attendant joint interface for collaboration). This system replaced the two-way glass mirror of the teleprompter with a large, curved screen made of a thin, perforated material that provided a reflective surface on one side and from the other side was mostly transparent. Cameras behind the screen recorded the participants in one location, while ordinary video projectors displayed the images of other participants (in one or more locations) on the front of the screen. What’s unique about MAJIC was that behind each person’s image on the screen in each location was a separate camera that functioned as that person’s virtual eyes for that location (along with a speaker to reproduce the person’s voice). The result was that each person always appeared to be looking at whichever participant they were facing at the moment, and they could even tell when one participant was looking at another. An additional bonus: the life-size projections made it feel as though participants were really sitting across a table from each other. Unfortunately, as far as I can tell, that design was never commercialized, which is perhaps unsurprising given the bulk and cost of the equipment involved.

Ten years later, a much more compact variation on that theme appeared. In January 2006, Apple was awarded a patent for an eye-to-eye video system in which a large array of microscopic (and thus, effectively invisible) cameras is embedded in a monitor along with the display elements; software combines all these thousands or millions of images into a single picture. That should enable a similar effect to what MAJIC offered. Time will tell if, when, or in what form this technology becomes available to consumers.

An different and likely more promising approach, called gaze correction, is being studied by researchers at major companies such as HP, Microsoft, and AT&T, among others. It starts with one or two ordinary video cameras mounted near a conventional computer display. A special video processor digitally modifies the image of each person’s face in real time so that it appears that his or her eyes are looking straight at the camera, even though they’re not. Early demonstrations of these systems appear relatively convincing—maybe even a bit spooky—but they are not yet ready for commercial use. They also have not yet been adapted to work well with multiple participants in a single location, or to permit selective eye contact with just one of several remote participants.

It’s great that progress is being made, but given the massive amount of processing power in today’s computing devices, I’m surprised and disappointed to see that no software-based gaze-correction tools are publicly available. Frustratingly, such a tool was available for a while—a Windows app called CatchEye, which worked with Skype, Google Hangouts, Facebook Messenger, and other products. The app was pulled from the market in 2017 with no explanation. I’d like to think that means the developer was acquired by a big company like Apple or Microsoft that’s working hard to bring this capability to the masses, but that may be overly optimistic. If only the technology giants and I could…see eye to eye.

Note: This is an updated version of an article that originally appeared on Interesting Thing of the Day on July 23, 2004.