Be careful what you say around your Lays. (Thinkstock)
That bag of chips you ate this weekend is holding more than empty calories.
It could very well be the object that allows someone to eavesdrop on your private conversations, according to new research from the Massachusetts Institute of Technology, Microsoft, and Adobe.
Engineers on the team developed an algorithm that can decipher speech via the teeny, tiny vibrations of common objects as recorded in video footage. During the study, the team was able to salvage and understand human speech by analyzing the slight vibrations in a bag of Utz Crab Chips, aluminum foil, a glass of water, and the leaves of a houseplant.
“Some research finds new ways to solve old problems, and some research finds new problems to solve,” computer science and electrical engineering graduate student Abe Davis, who is first author on the paper, told Yahoo Tech. “I see this work as falling more into that second category. Now that we can see how objects respond to sound outside of laboratory settings, it adds a new dimension to the way we image the world around us.”
Davis is the first to admit that this research could be used for espionage. But that doesn’t mean that just anybody with a cameraphone can listen in on your conversations. The process of translating sound from video is complicated, and first requires that the frequency of the video you’re recording — the number of frames you’re capturing per second of video — is higher than the frequency of the audio signal in question. In this experiment, researchers used cameras that could capture between 2,000 and 6,000 frames per second. Though that’s well beyond the capability of most people’s phones, some commercial cameras can record up to 100,000 frames per second.
The video below gives a great idea of how this works.
In other trials, the team used ordinary digital cameras (like the kind you and I have on our phones) to infer specific details about a conversation, such as the gender of a speaker, how many people were speaking, and — in some cases — their identities. They did so by exploiting a certain quirk on the camera sensors, which allowed them to recover frequencies much higher than the standard 60 frames per second.
The team developed two separate algorithms to analyze the results of each of these experiments, which helped to automatically process their recorded material into audio equivalents. The group, which is composed of MIT professors of computer science and engineering Frédo Durand and Bill Freeman, graduate student Neal Wadhwa, Michael Rubinstein of Microsoft Research, and Gautham Mysore of Adobe Research, will present the findings at this year’s SIGGRAPH, a computer graphics conference.
Does this mean we should worry that the minute movements of our snack food packaging will reveal our most private conversations?
“In its current state I really don’t think that our work is going to threaten most people’s privacy,” Davis said. “The high-speed version of the technique is a pretty expensive way to record sound, and it involves processing a ton of data. It’s possible to recover sound from video captured at regular frame rates, but the results we’ve managed to get from this kind of video so far are pretty murky.”
That being said, Davis does have one snippet of advice for the especially paranoid.
“I wouldn’t recommend wearing a tinfoil hat to avoid surveillance. Tinfoil makes an excellent visual microphone.”