Google ups ante in GenAI with Gemini enhancements
Google has updated Gemini 1.5 Pro with a two-million-token context window and debuted a smaller, lightweight model optimised for high-frequency, specialised tasks
Google has fired the latest salvo in the race for artificial intelligence (AI) supremacy with significant enhancements to its Gemini model, including a groundbreaking two-million-token context window for Gemini 1.5 Pro.
Gemini 1.5 Pro, Google’s multimodal generative AI (GenAI) model, can analyse and classify video, audio, code and text. This enables applications like chatbots to handle complex scenarios involving various content types, such as processing motor claims with related video and textual evidence.
Launched earlier this year with a one-million-token context window, the model now boasts double the capacity. This allows it to process significantly more information, such as analysing 30,000 lines of code or uploading entire database tables and schemas for streamlined SQL analysis.
However, the new enhancement, currently available through a waitlist for developers, goes beyond simply handling large volumes of data.
“It’s about smarter, more comprehensive interactions with information,” said Stephanie Wong, Google’s head of technical marketing, in a LinkedIn post. “The coherence provides highly relevant answers across modalities that can refer back to earlier parts of the conversation.” Wong added that Google is aiming for an unlimited context window size in the future.
Next month, Google will also introduce context caching to Gemini 1.5 Pro. This will allow users to send large files and other parts of a prompt only once, making the expansive context window more useful and cost-effective.
Gemini 1.5 Pro's ability to handle larger context windows stems from Google’s Mixture-of-Experts (MoE) architecture. This increases model capacity without a proportional increase in computation, eliminating the need to fine-tune foundation models or rely heavily on retrieval augmented generation (RAG) to ground model responses in external data.
Despite this advancement, RAG still plays a crucial role in refining output accuracy and relevance for use cases such as coding.
“With RAG, you'll be able to parse your private code base to get contextually relevant coding suggestions,” said Brad Calder, vice-president and general manager of Google Cloud Platform and technical infrastructure, during Google Cloud Next ’24 last month.
“It’s going to continue to be an important tool and mechanism for you to take your IP [intellectual property] and find information that's closest to what you're looking for,” he added.
For applications demanding low latency and cost efficiency, Google has introduced Gemini 1.5 Flash. This smaller, lightweight model is optimised for narrower or high-frequency tasks where rapid response times are critical.
Demis Hassabis, CEO of Google DeepMind, said in a blog post that Gemini 1.5 Flash “excels at summarisation, chat applications, image and video captioning, data extraction from long documents and tables, and more”.
“This is because it’s been trained by 1.5 Pro through a process called ‘distillation’, where the most essential knowledge and skills from a larger model are transferred to a smaller, more efficient model,” he added.
Read more about Ai in APAC
- SAP’s chief artificial intelligence officer, Philipp Herzig, outlines the company’s approach towards AI and how it is making the technology more accessible to customers.
- The Australian government is experimenting with AI use cases in a safe environment while it figures out ways to harness the technology to benefit citizens and businesses.
- DBS Bank is building a strong data foundation and upskilling employees on data and artificial intelligence to realise its vision of becoming an AI-fuelled bank.
- Boomi CEO talks up the company’s efforts to build up an AI agent architecture, its upcoming AI capabilities, and its footprint in the Asia-Pacific region.