Getty Images/iStockphoto

Google claims AI advances with Gemini LLM

Code analysis, understanding large volumes of text and translating a language by learning from one read of a book are among the breakthroughs of Gemini 1.5

Google DeepMind CEO Demis Hassabis has unveiled the next version of Google’s Gemini large language model (LLM). The new version of the LLM, formerly known as Bard, is Google’s latest attempt to swing the spotlight of advanced artificial intelligence (AI) away from rival OpenAI’s ChatGPT to the new technology it has developed.

In a blog discussing the version, Gemini 1.5, Hassabis talked about “dramatically enhanced performance”, and said it represents a step change in the approach Google takes in developing AI. The Pro version, which is now available as a developer preview, is optimised for “long-context understanding”, according to Hassabis. His blog post featured a video showing how Gemini 1.5 coped with summarising a 402-page transcript of the Apollo 11 Moon landing mission.

Another video shows analysis of a 44-minute Buster Keaton movie, where Gemini 1.5 is asked to identify a scene where the main character picks up a piece of paper.

In a tweet posted on X, a Google engineer discussed how three JavaScript programs, totalling over 100,000 lines of code, were submitted as inputs to Gemini 1.5. “When we asked Gemini to find the top three examples within the codebase to help us learn a specific skill, it looked across hundreds of possible examples and came back with super-relevant options,” they said.

Using only a screenshot from one of the demos in the codebase, the test showed that Gemini was able to find the right demo – and then explain how to modify the code to achieve a specific change to the image. 

In another example, Gemini was used to locate a specific piece of animation then explain what code is used to control it. The engineer said Gemini 1.5 was able to show exactly how to customise this code to make a specific adjustment to the animation.

When asked to change the text and style in a code example, they claimed Gemini 1.5 was able to identify the exact lines of code to change and showed the developers how to change them. It also gave an explanation about what had been done and why.

Read more about LLMs

  • We look at the main areas enterprise developers need to consider when building, testing and deploying enterprise applications powered by large language models.
  • Where are the business value, risk and deployment difficulties of LLMs? Here are some of Forrester’s top tips.

In another tweet, Jeff Dean, chief scientist at Google DeepMind, discussed how Gemini 1.5 was able to take a language it had never seen before, Kalamang, spoken by the people Western New Guinea, and learn how to translate it into English. The model was trained using a 573-page book, A grammar of Kalamang by Eline Visser, and a bilingual word list. Based on quantitative research, he said Gemini 1.5 scored 4.36 out of 6, compared with a human learning the Kalamang language, who scored 5.52.

Hassabis said Gemini 1.5 uses a new Mixture-of-Experts (MoE) architecture. Depending on the type of input given, he said MoE models learn to selectively activate only the most relevant expert pathways in its neural network. “This specialisation massively enhances the model’s efficiency,” said Hassabis.

Read more on Artificial intelligence, automation and robotics