Google's generative AI can now analyze hours of video

Kyle Wiggers

Updated May 14, 2024 at 2:56 PM·4 min read

Gemini, Google’s family of generative AI models, can now analyze longer documents, codebases, videos and audio recordings than before.

During a keynote at the Google I/O 2024 developer conference Tuesday, Google announced the private preview of a new version of Gemini 1.5 Pro, the company’s current flagship model, that can take in up to 2 million tokens. That’s double the previous maximum amount.

At 2 million tokens, the new version of Gemini 1.5 Pro supports the largest input of any commercially available model. The next-largest, Anthropic’s Claude 3, tops out at 1 million tokens.

In the AI field, “tokens” refer to subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.” Two million tokens is equivalent to around 1.4 million words, two hours of video or 22 hours of audio.

Beyond being able to analyze large files, models that can take in more tokens can sometimes achieve improved performance.

Unlike models with small maximum token inputs (otherwise known as context), models such as the 2-million-token-input Gemini 1.5 Pro won’t easily “forget” the content of very recent conversations and veer off topic. Large-context models can also better grasp the flow of data they take in — hypothetically, at least — and generate contextually richer responses.

Developers interested in trying Gemini 1.5 Pro with a 2-million-token context can add their names to the waitlist in Google AI Studio, Google’s generative AI dev tool. (Gemini 1.5 Pro with 1-million-token context launches in general availability across Google's developer services and surfaces in the next month.)

Beyond the larger context window, Google says that Gemini 1.5 Pro has been “enhanced” over the last few months through algorithmic improvements. It’s better at code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding, Google says. And in the Gemini API and AI Studio, 1.5 Pro can now reason across audio in addition to images and video — and be “steered” through a capability called system instructions.

Gemini 1.5 Flash, a faster model

For less demanding applications, Google’s launching in public preview Gemini 1.5 Flash, a “distilled” version of Gemini 1.5 Pro that’s small and efficient model built for “narrow,” “high-frequency” generative AI workloads. Flash — which has up to a 2-million-token context window — is multimodal like Gemini 1.5 Pro, meaning it can analyze audio, video and images as well as text (but it generates only text).

“Gemini Pro is for much more general or complex, often multi-step reasoning tasks,” Josh Woodward, VP of Google Labs, one of Google’s experimental AI divisions, said during a briefing with reporters. “[But] as a developer, you really want to use [Flash] if you care a lot about the speed of the model output.”

Woodward added that Flash is particularly well-suited for tasks such as summarization, chat apps, image and video captioning and data extraction from long documents and tables.

Flash appears to be Google’s answer to small, low-cost models served via APIs like Anthropic’s Claude 3 Haiku. It, along with Gemini 1.5 Pro, is very widely available, now in over 200 countries and territories including the European Economic Area, U.K. and Switzerland. (The 2-million-token context version is gated behind a waitlist, however.)

https://twitter.com/Google/status/1790432952767115432

In another update aimed at cost-conscious devs, all Gemini models, not just Flash, will soon be able to take advantage of a feature called context caching. This lets devs store large amounts of information (say, a knowledge base or database of research papers) in a cache that Gemini models can quickly and relatively cheaply (from a per-usage standpoint) access.

The complimentary Batch API, available in public preview today in Vertex AI, Google's enterprise-focused generative AI development platform, offers a more cost-effective way to handle workloads such as classification and sentiment analysis, data extraction and description generation, allowing multiple prompts to be sent to Gemini models in a single request.

Another new feature arriving later in the month in preview in Vertex, controlled generation, could lead to further cost savings, Woodward suggests, by allowing users to define Gemini model outputs according to specific formats or schemas (e.g. JSON or XML).

“You’ll be able to send all of your files to the model once and not have to resend them over and over again,” Woodward said. “This should make the long context [in particular] way more useful — and also more affordable.”

Read more about Google I/O 2024 on TechCrunch

Engadget
Opera is adding Google's Gemini AI to its browser
Opera ha teamed up with Google to integrate its Gemini AI models into its Aria AI browser assistant.
TechCrunch
The top AI announcements from Google I/O
Google’s going all in on AI — and it wants you to know it. During the company’s keynote at its I/O developer conference on Tuesday, Google mentioned “AI” more than 120 times. Google plans to use generative AI to organize entire Google Search results pages.
TechCrunch
Google is bringing Gemini capabilities to Google Maps Platform
Gemini model capabilities are coming to the Google Maps Platform for developers, starting with the Places API, the company announced at the Google I/O 2024 conference on Tuesday. The summaries are created based on Gemini's analysis of insights from Google Maps’ community of more than 300 million contributors. The new summaries are available for many types of places, including restaurants, shops, supermarkets, parks and movie theaters.
TechCrunch
Google will use Gemini to detect scams during calls
At the Google I/O 2024 developer conference on Tuesday, Google previewed a feature it believes will alert users to potential scams during the call. The feature, which will be built into a future version of Android, uses Gemini Nano, the smallest version of Google’s generative AI offering, which can be run entirely on-device. Google gives the example of someone pretending to be a “bank representative.”
Yahoo Finance
Google's AI search overhaul raises 'more questions than answers' for its dominant ad business
Google's AI-driven approach is a bulwark against an emergent threat. It’s also a strategic gamble.
Engadget
Google I/O 2024: Everything revealed including Gemini AI, Android 15 and more
Here's all the big news that Google announced at I/O 2024 in a single place.
TechCrunch
Google TV to launch AI-generated movie descriptions
As anticipated, numerous AI-related announcements were made at this year's Google I/O 2024 conference, including the unveiling of a new feature for Google TV. Gemini, the company's family of generative AI models, will enhance the smart TV operating system so it can generate descriptions for movies and TV shows. When a description is missing on the home screen, the AI will fill it in automatically to ensure that viewers never have to wonder what a title is about, Google explains.
TechCrunch
Patreon and Grammarly are already experimenting with Gemini Nano, says Google
Mobile app developers, including Patreon and Grammarly, are already integrating with Gemini Nano, its smallest AI model, the company announced during the Google I/O 2024 developer keynote on Tuesday. The companies, along with other select developers, were invited to work with Gemini Nano through an early access program announced last year, the company said. In the coming months, Google says it will open up the Gemini Nano model to more developers.
TechCrunch
Google will soon start using GenAI to organize some search results pages
At the Google I/O 2024 developer conference on Tuesday, Google announced that it plans to use generative AI to organize the entire search results page for some search results. The AI Overview feature becomes generally available Tuesday, after a stint in Google's AI Labs program. A search results page using generative AI for its ranking mechanism will have wide-reaching consequences for online publishers.
Engadget
Engadget Podcast: The good, the bad and the AI of Google I/O 2024
In this bonus episode, Cherlynn and Devindra dive into the biggest Google I/O 2024 news.
Engadget
Google Search will now show AI-generated answers to millions by default
With the new features, Google is positioning Search as more than a way to simply find websites. Instead, the company wants people to use its search engine to directly get answers and help them with planning events and brainstorming ideas.
Engadget
Watch the Google I/O 2024 Developer keynote live
It’s time for Google’s I/O 2024 conference. This event will spotlight Android 15 and Gemini, with room for plenty of surprises.
Engadget
Google teases new camera-powered AI feature one day ahead of I/O
Google is teasing an intriguing new AI feature one day ahead of its IO developer conference.
TechCrunch
Google partners with Airtel to offer cloud and GenAI products to Indian businesses
Airtel, India's second-largest telecom operator, said on Monday that it has entered into a long-term partnership with Google Cloud to develop and deliver cloud and generative AI products to Indian businesses. The partnership aims to tap Airtel's extensive customer base, which, according to the company, includes 2,000 large enterprises and a million emerging businesses. The companies plan to offer AI solutions, including generative AI, which Airtel will train using its vast datasets.
Engadget
What to expect at Google I/O 2024: Gemini, Android 15, WearOS and more details
Google's I/O developer conference is right around the corner. Here's what we're expecting to see, including Android 15 details and a whole bunch of AI news.
Engadget
The Morning After: What to expect at Google I/O 2024
The biggest news stories this morning: How to watch Apple’s iPad launch event on Tuesday, Doctor Who is back, louder and more chaotic than before, Sonos’s long-rumored wireless headphones have leaked.
TechCrunch
Bedrock Studio is Amazon's attempt to simplify generative AI app development
Amazon is launching a new tool, Bedrock Studio, designed to let organizations experiment with generative AI models, collaborate on those models, and ultimately build generative AI-powered apps. Available in public preview starting today, the web-based Bedrock Studio -- a part of Bedrock, Amazon's generative AI tooling and hosting platform -- provides what Amazon describes in a blog post as a "rapid prototyping environment" for generative AI. Bedrock Studio guides developers through the steps to evaluate, analyze, fine-tune and share generative AI models from Anthropic, Cohere, Mistral, Meta and other Bedrock partners, as well as test different model settings and guardrails and integrate outside data sources and APIs.
Engadget
Google is bringing a slew of AI-powered software features to Chromebook Plus laptops
Google has a host of new AI-powered features coming to its Chromebook Plus models.
TechCrunch
AI models have favorite numbers, because they think they're people
AI models are always surprising us, not just in what they can do, but what they can't, and why. An interesting new behavior is both superficial and revealing about these systems: they pick random numbers as if they're human beings. This is actually a very old and well known limitation we, humans, have: we overthink and misunderstand randomness.
Yahoo Finance
Nvidia stock leaps to latest record — thanks to Elon Musk
The increase of AI investment has continued to boost optimism over Nvidia's growth as the chipmaker continues its record-setting stock rally.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Google's generative AI can now analyze hours of video

Gemini 1.5 Flash, a faster model

Recommended Stories

Opera is adding Google's Gemini AI to its browser

The top AI announcements from Google I/O

Google is bringing Gemini capabilities to Google Maps Platform

Google will use Gemini to detect scams during calls

Google's AI search overhaul raises 'more questions than answers' for its dominant ad business

Google I/O 2024: Everything revealed including Gemini AI, Android 15 and more

Google TV to launch AI-generated movie descriptions

Patreon and Grammarly are already experimenting with Gemini Nano, says Google

Google will soon start using GenAI to organize some search results pages

Engadget Podcast: The good, the bad and the AI of Google I/O 2024

Google Search will now show AI-generated answers to millions by default

Watch the Google I/O 2024 Developer keynote live

Google teases new camera-powered AI feature one day ahead of I/O

Google partners with Airtel to offer cloud and GenAI products to Indian businesses

What to expect at Google I/O 2024: Gemini, Android 15, WearOS and more details

The Morning After: What to expect at Google I/O 2024

Bedrock Studio is Amazon's attempt to simplify generative AI app development

Google is bringing a slew of AI-powered software features to Chromebook Plus laptops

AI models have favorite numbers, because they think they're people

Nvidia stock leaps to latest record — thanks to Elon Musk