Teaching AI common sense an old problem ChatGPT foregrounds

This is a guest blogpost by Jim Webber, Chief Scientist, Neo4j

Is the best way to end the LLM hallucination problem to teach it some basic logic?

Great as they are, we’re starting to have to deal with the eccentricities of Large Language Models (LLMs)—especially their hallucinations. That’s why an increasing number of LLM and Generative AI practitioners are starting to pay close attention to some of the ideas of Yejin Choi.

Choi, a Professor of Computer Science and AI researcher at the University of Washington in Seattle, is well-known for asking hard questions about why LLMs are sometimes so poor at deducing inferences that seem obvious to people.

Take the question, ‘If you travel West far enough from the West coast, will you reach the East coast?’ Sometimes, GPT-3 (OpenAI’s 2020 publicly released LLM) will say yes, arguing that the Earth is round and if you travel in any direction long enough, you return to where you started. On other days, the system might say false, you cannot reach the East Coast by going West.

That simple inconsistency shows that ChatGPT is not good at critical logical reasoning. Choi’s work has shown that if you tell ChatGPT it takes two hours for two shirts to dry in the sun then ask how long would it take five shirts to dry, it will tell you the answer is five hours—which of course we know is not true. It doesn’t matter if you put 10 shirts out, it will still take two hours.

Humans use common sense to shortcut these issues. Computers, even with LLMs, don’t have this extra layer of logic. Choi’s solution is to teach the computer some logic to help, giving it some rules along the lines of the kinds of questions Socrates asked to enable it to better spot consistencies and inconsistencies. (Socrates did this, by the way, because humans can be just as inconsistent as computers, so he wanted to give us some tools for tightening up our thinking.)

Meet the LLM critic—a programme that sorts good knowledge from bad

We might ask what is common sense? Choi jokes, it’s the ‘dark matter’ of computer language and ‘intelligence’. I think we could all agree it’s the basic level of practical knowledge and reasoning concerning everyday situations and events and which people share. Her team’s example is, ‘It’s okay to keep the closet door open, but it’s not okay to keep the fridge door open as the food inside might go bad.’

But how does something like ChatGPT acquire this insight? Choi’s suggestion is to create a helper to whatever LLM you want to use, which she calls a ‘critic’—that’s to say, a programme that aids it to sort good from bad knowledge by asking questions and then only adding in (as rules) answers that always check out.

The critic produces a knowledge graph–a data structure suited for modelling and querying data with complex relationships and interconnected entities. In Choi’s experimental work the critic contains 6.5 million distillations of symbolic knowledge, i.e. of things we know are true, and which in her tests produces answers and insights from Generative AI at much higher levels of accuracy.

This, her Symbolic Knowledge Distillation approach, extracts common sense from ChatGPT into a large knowledge graph to train a ‘student’ LLM which is intended to become the ultimate version since it is smaller and more accurate than the original LLM. In fact Choi’s ATOMIC-10x knowledge graph (a machine-authored knowledge base) delivers better results over a human-authored one in scale, accuracy, and diversity.

Making serious use of LLMs

Choi’s dark matter metaphor is one that makes immense sense to me. Why: because what’s she’s saying is that we work with words and sentences, which is ‘visible’ matter, but the next step with AI (an old AI problem that ChatGPT has reinvigorated) is to weave in the unspoken rules of how the world works, which influence the way people use and interpret language. It’s this part of the universe we now have to teach our AI systems, lest their help proves to be very limited.

These issues aren’t limited to academia. In fact, these are practical questions that any CIO looking to make serious use of LLMs needs to answer. I’m confident that domain-specific critic-style knowledge graphs will soon become part of your data pipeline. But in the interim, we have existing approaches where knowledge graphs and LLMs are used together to improve accuracy and reduce hallucinations.

Knowledge graphs will be the tool you use to teach your enterprise AI why you really can reach the West Coast by flying East. More importantly, they will enable it to find many other useful things you don’t know yet.