According to Google’s research, LLMS has abandoned the correct answer under pressure and threatens multi-turn AI systems


Need smarter insights in your inbox? Sign up for our weekly newsletter to get only the things that matter to enterprise AI, data and security leaders. Subscribe now


A new study by researchers at Google Deepmind and University College London reveals how language models (LLMS) are formed to maintain and lose confidence in their answers. The findings reveal a significant similarity between LLMS and human cognitive bias, while highlighting significant differences.

The study reveals that LLMs can become overconfident in their own answers, but even if the rebuttal is wrong, they can lose that confidence and change their mind when the rebuttal is presented. Understanding the nuances of this behavior can have direct consequences for LLM applications, especially how to build conversational interfaces that span several turns.

Testing LLMS reliability

An important factor in the safe deployment of LLMS is that their answers have a reliable confidence (the probability that the model assigns the answer token). We know that LLMs can generate these reliability scores, but the extent to which they can be used to guide adaptive behavior is not well characterized. There is also empirical evidence that LLMs may be overly confident in their first answer, but are extremely sensitive to criticism and can gain confidence immediately with the same choice.

To investigate this, researchers developed a controlled experiment to test how LLMS updates confidence and determine whether to change responses when presenting external advice. In the experiment, the “LLM answer” was first given a binary selection question, such as identifying the correct latitude of the city from two options. After making the initial choice, LLM received advice from the fictional “advice LLM.” This advice comes with an explicit accuracy rating (“This advice LLM is 70% accurate”) and will remain agreed, opposed, or neutral to your initial choice of LLM. Finally, the answer LLM was asked to make a final choice.


The AI Impact Series returns to San Francisco – August 5th

The next phase of AI is here – Are you ready? Join Block, GSK and SAP leaders to see exclusively how autonomous agents are reshaping their enterprise workflows, from real-time decision-making to end-to-end automation.

Secure your spot now – Space is limited: https://bit.ly/3guplf


Examples of LLMS reliability (source: ARXIV)
Examples of reliability testing in LLMS Source: arxiv

An important part of the experiment was to control whether LLM’s own first answer appears to it during the second final decision. In some cases it was shown, and in others it was hidden. This unique setup cannot be replicated with human participants who simply cannot forget their previous choices, allowing researchers to separate how memories of past decisions affect current trust.

The baseline condition where the initial answer was hidden and the advice was neutral established how much LLM answers would change due to the random variance of model processing. The analysis focused on how LLM’s confidence in the original choice changed between the first and second turns, and could clearly portray how it influenced the initial belief, or the “change of mind” of previous models.

Overconfidence and lack of confidence

Researchers first looked at how LLM’s own answer visibility influenced their tendency to change that answer. They showed that when the model was able to see the initial answer, the tendency to switch is reduced compared to when the answer was hidden. This finding refers to a specific cognitive bias. As the paper points out, “This effect – the tendency to stick to the initial choice when that choice is visible (rather than hidden) during the ponderation of the final choice is closely related to the phenomenon described in human decision-making research, the choice support bias.”

This study also confirmed that the model integrates external advice. When faced with opposing advice, LLMs were more likely to change minds and less likely to change minds and when advice was cooperative. “This finding shows that response LLMs properly integrate advice directions and adjust for their rate of change in mind,” the researchers wrote. However, they also found that the model was overly sensitive to the opposite information, and as a result, the confidence update was too large.

Source of LLMS sensitivity to various settings in reliability testing: ARXIV

Interestingly, this behavior violates the confirmation bias that is common in humans. There, people prefer information that confirms existing beliefs. Researchers found that LLMS was “opposite of overweight, not cooperative advice, both when the model’s first answer seemed visible to the model.” One possible explanation is that training techniques such as reinforcement learning from human feedback (RLHF) could encourage over-maintaining the model in user input, a phenomenon known as psychofancy (this remains a challenge for AI Labs).

Impact on enterprise applications

This study confirms that AI systems are not often recognized purely logical agents. They show their own set of biases. Some resemble human cognitive errors, some are unique to ourselves. For enterprise applications, this means that in an extended conversation between humans and AI agents, the latest information can have a disproportionate impact on LLM inference (especially if it contradicts the first answer in the model), and it can potentially discard the first correct answer.

Fortunately, as shown in this study, LLM memory can be manipulated to alleviate these unnecessary biases in ways that humans cannot do. Developers building multi-turn conversation agents can implement strategies for managing the context of AI. For example, you can periodically summarise long conversations, with key facts and decisions being presented neutrally, and you can delete which agents have created which options. Use this overview to launch a new, condensed conversation, provide a clear slate to your model, and help to avoid biases that may creep up during the expanded conversation.

Once LLM is integrated with enterprise workflows, understanding the nuances of the decision-making process is no longer an option. Following such basic research allows developers to predict and correct these intrinsic biases, leading to more robust and reliable applications as well as being more capable.



Source link