Facebook (FB) users spend over 100 million hours a day gobbling up video on the social network. But despite all that content flowing through — and the technology and ingenuity powering it – Facebook still hasn’t figured out how to wrap its algorithmic prowess around video the way it already does with photos, using facial recognition, for instance, to identify you and your friends.
The reason: sheer complexity. A photo is one static image, but a video is essentially copious images sequenced in a particular order to show a narrative in motion: a Siamese kitten purring or a professor in the middle of a BBC interview interrupted by his two young kids.
Using artificial intelligence to scan and analyze a video on the fly — “video understanding,” as it’s called — is a multi-year challenge Facebook argues could transform the social network experience for the better.
“We think video understanding is going to be ridiculously impactful, because if you go back in time and you think about the News Feed — even before photos were that prevalent — it was mostly text, and so that was the content you needed to understand in order to rank [people’s feeds],” Joaquin Candela, Facebook’s Director of Applied Machine Learning, told Yahoo Finance.
“We’re at a point now where we’re pretty good at understanding photos, but now there’s video,” Candela added. “You even have live video, and the question becomes, well, how fast can you figure out what’s going on in this video?”
If anyone at the social network can tackle that challenge, it’s Candela, who leads Facebook’s Applied Machine Learning group (AML). The group’s mission? Take the heady ideas and theories generated by the neighboring Facebook Artificial Intelligence Research group (FAIR) and turn those ideas into reality.
Already, the FAIR and AML groups algorithms are capable of identifying certain elements in a video — objects like a house, a pizza box or pet — but they remain light years away from fully deciphering and tracking the most important aspect: people’s behavior.
“The majority of the videos that come to Facebook are people-centric,” explained Manohar Paluri, computer vision lead at the AML group. “And if we don’t understand what people are doing, we will never understand what the video is about.”
Indeed, the context of a video is every bit as important as quickly figuring out who is in the video. Is this Facebook user attending a rally? Giving a speech? Playing squash?
Once they do that, Facebook contends there are numerous practical applications for Facebook users. Although Facebook does not disclose how much Live video users shoot on any given day, the social network says people are 10 times more likely to comment on Facebook Live videos than on regular videos.
But how much more likely are you to check out that video if you received a notification because a friend of yours is being filmed? Not only that, but what if the notification told you exactly what your friend was doing in that moment, like say, running on Zuma Beach in Malibu, Calif., or chowing down on sashimi at Nobu in New York City?
That kind of hypothetical is what Facebook hopes to offer its users in the next three to five years. And while the level of granularity may sound disconcerting to some — an algorithm smart enough to understand exactly what you’re doing — remember that just five years ago, people were up in arms over Facebook using facial recognition to identify people in photos. Now, many people take facial recognition within Facebook for granted.
For Facebook, the payoff for nailing and rolling out a feature like video understanding is increased user engagement. Although the social network has evolved into the third most-trafficked website in the world, it is always developing new ways to keep users inside Facebook or its stable of products and services, whether they be subtle speed improvements or rolling out the Snapchat Stories-like feature Facebook Messenger Day.
As time marches on, people become accustomed to, even dependent upon, many of those technological improvements. Facebook is hoping the same for video understanding, too.
More from JP: