Prostock-studio - stock.adobe.co

Researchers say AI fails to describe complexities of Holocaust

Using AI in Holocaust education will require the responsible digitisation of historical assets, as well as collaboration between system providers and domain-specific experts, to ensure effective guardrails that protect against misuse

Existing artificial intelligence (AI) models available in the public domain fail to provide “complexities and nuances of the past”, and merely offer oversimplified stories about the Holocaust, according to an international Holocaust research lab.

In November 2024, the University of Sussex launched the Landecker Digital Memory Lab, an initiative to “ensure a sustainable future for Holocaust memory and education in the digital age”.

According to a research-based policy briefing presented by the lab to the International Holocaust Remembrance Alliance (IHRA), Does AI have a place in the future of Holocaust memory, the use of AI in Holocaust memory and education is problematic because mainstream models – including generative AI (GenAI) systems such as ChatGPT and Gemini – lack “good data” about the Holocaust, and need the “right representation” from experts on this subject.

The lab’s primary investigator, Victoria Grace Richardson-Walden, has recommended – in an urgent call to all stakeholders involved in Holocaust memory and education, as well as policymakers – to help solve the problem by digitising their data and human expertise, rather than just bringing people to their sites and museums.

“Very few of them have a clear digitisation strategy,” she said of the Holocaust memory and education sector, which includes archives, museums, memorial sites and libraries all over the world. “They only digitise their material content or their testimonies for specific exhibitions.”

“That is a pressing issue for heritage in general,” said Richardson-Walden, referring to wars in Ukraine and the Middle East.

“All heritage, all these things are at material risk,” she said. “There has been instrumentalisation of history on all sides of the political spectrum for varying political aims. When that becomes very loud on social media, you lose nuance. That’s where the urgency is.”

Unreliable focus

Richardson-Walden highlighted that GenAI systems are not “knowledge machines”, as they only assign probabilistic numerical value to words and sequences of words, rather than a value based on their historical and cultural significance. This leads to lesser-known facts and stories being buried, as the systems will tend to reproduce only the most well-known “canonical” outputs that focus on the most famous stories.

“It gives you a headline answer and bullet points,” she said, describing a typical answer to an enquiry made to ChatGPT. “This idea of summarising really complex histories is problematic. You can’t summarise something that happened over six years in many, many countries, and affected a whole range of different people and perpetrators.”

The research doesn’t seek to provide answers to this complex issue. Instead, Richardson-Walden hopes to find alternatives in discussions with her informatics and engineering colleagues. “Cultural signifiers are difficult to code and then to build into training data,” she said.

Richardson-Walden also highlighted the need to have “good data” in commercial GenAI models, especially in relation to sensitive subjects of history such as those involving genocide, persecution, conflict or atrocities.

“Good data comes from the Holocaust organisation, but first they need to digitise it in a strategic way, and the metadata attached to it needs to be correct and standardised,” she said.

Another problem highlighted by the lab’s policy briefing is the self-censorship that is programmed into most commercial image GenAI models. Almost every time a system is prompted to produce Holocaust images, it will refuse, and the user will be met with censorship guidelines.

Read more about artificial intelligence

The brief cited an example of Dall-E, OpenAI’s image generator. “All it can offer is to produce images of a wreath, elderly hands and a barbed wire fence, or an image that looks like a ghost in a library,” it said.

Richardson-Walden added: “You end up making the Holocaust invisible or abstracting to the point where it’s absurd. So, this idea of putting in censorship within your programming is a good thing as a moral approach that actually creates the opposite effect.”

She believes that, although these guardrails are better than producing false or distorted data, they also prevent people from learning the history and its lessons, adding that the developers of these models should therefore find a “middle ground” in their guardrails that prevent misinformation on the Holocaust, but also don’t pigeonhole them into banning Holocaust information for future generations reliant on digital media.

“The way [middle ground] comes is through dialogue,” said Richardson-Walden. “There needs to be a space to bring more discussion with OpenAI, Meta, Google, sitting down with places like the UN, with us at the lab.” She added that Landecker offers free consultancy to discuss approaches for tech companies that are for the first time engaging in holocaust memory.

“As soon as they delve into it, [they] realise this is so complex and so political, and there’s this whole new area about ethics and digital they never thought about,” she said.

Landecker’s website mentions that the most prominent example of Holocaust memory digitisation is an AI model known as Dimensions in Testimony, developed by the USC Shoah Foundation. It’s an example of a domain-specific GenAI model, described as a small language model, which is “heavily supervised” and relies on “substantial human intervention”. Users and academics can interact with it by asking questions to which the model responds with testimonies from survivors and answers by experts that have been fed into it.

However, other labs and memory centres may not have the same wherewithal and funding as the Landecker lab. Therefore, the focus should be on mass digitisation of assets, which can then be used to responsibly inform commercial large language models.

Read more on Technology startups