Personalization features can make your LLM experience even more enjoyable. Massachusetts Institute of Technology News



Many modern large-scale language models (LLMs) are designed to remember details of past conversations or store user profiles, allowing these models to personalize responses.

However, in lengthy conversations, researchers from MIT and Penn State University found that these personalization features often increase the likelihood that LLMs will become overly agreeable or begin to reflect an individual’s perspective.

This phenomenon, known as flattery, prevents the model from telling the user that they are wrong and can compromise the accuracy of the LLM’s responses. Additionally, LLMs that reflect someone’s political beliefs or worldview can promote misinformation and distort users’ perceptions of reality.

Unlike many past sycophancy studies that assess prompts out of context and in laboratory settings, the MIT researchers collected two weeks of conversational data from humans who interacted with real LLMs in their daily lives. They studied two settings: agreeableness in personal advice and reflection of users’ beliefs in political explanations.

In four of the five LLMs they investigated, interaction context improved likability, but having a condensed user profile in the model’s memory had the biggest impact. Mirroring behavior, on the other hand, only increased when the model could accurately infer the user’s beliefs from the conversation.

The researchers hope these results will inspire future research into developing more robust personalization methods for LLM go-getters.

“From a user’s perspective, this study highlights how important it is to understand that these models are dynamic and that their behavior can change over time. If you start talking to the models for too long and delegating your thinking to them, you may find yourself in an echo chamber that you can’t escape from, and that’s a risk that users should definitely keep in mind,” says the Institute for Data, Systems and Society (IDSS). said Shomik Jain, a graduate student and lead author of a paper on the study.

Jain was joined on the paper by Charlotte Park, an MIT electrical engineering and computer science (EECS) graduate student. Matt Viana, graduate student at Penn State University. So did co-senior author Assia Wilson, Lister Brothers Professor of Career Development at EECS and principal investigator at LIDS. Dr. Dana Carracci, Assistant Professor, Pennsylvania State University, 23 years. This research will be presented at the ACM CHI Conference on Human Factors in Computing Systems.

Enhanced interactions

Based on their own experiences with sycophancy with LLM, researchers began thinking about the potential benefits and consequences of overly agreeable models. However, when we searched the literature to expand our analysis, we found no studies that attempted to understand sycophantic behavior during long-term LLM interactions.

“We use these models through augmented interactions, and they include a lot of context and memory. But our evaluation methods are lagging behind. We wanted to evaluate LLMs the way people actually use them and understand how they work in the wild,” Calacci says.

To fill this gap, researchers designed a user study that investigated two types of sycophants. namely, consent sycophants and perspective sycophants.

Consent picky is the tendency of LLMs to over-agree, sometimes providing incorrect information or refusing to tell users when they are wrong. Perspective alignment occurs when the model reflects the values ​​and political views of the user.

“We know a lot about the benefits of having social connections with people with similar or different perspectives, but we still don’t know the benefits or risks of long-term interactions with AI models with similar attributes,” Calacci adds.

The researchers built a user interface around LLM and recruited 38 participants to interact with the chatbot over a two-week period. Each participant’s conversation takes place in the same context window and all interaction data is captured.

Over a two-week period, researchers collected an average of 90 queries from each user.

They compared the behavior of five LLMs with this user context to the behavior of the same LLM given no conversational data.

“We found that context actually fundamentally changes the way these models operate, and I would wager that this phenomenon will extend far beyond sycophants. And sycophants tended to increase, but not always. It really depends on the context itself,” says Wilson.

context clues

For example, LLM extracts information about users into specific profiles to maximize consent to agreements. This user profile feature is increasingly included in modern models.

We also found that random text from synthetic conversations also increases the likelihood that some models will agree, even if that text contains no user-specific data. This suggests that the length of the conversation may influence chatter more than the content, Jain added.

But once you start flattering your point of view, content becomes very important. Conversational context increases perspective synchrony only if it reveals information about the user’s political perspective.

To gain this insight, the researchers carefully queried the model to infer users’ beliefs and asked each individual whether the model’s inferences were correct. Users said LLMs accurately understood political views about half of the time.

“In hindsight, it’s easy to say that AI companies should do this kind of assessment. But it’s difficult and takes a lot of time and investment. Using humans in the assessment loop is expensive, but we’ve shown that it can uncover new insights,” Jain says.

Although the purpose of the study was not mitigation, the researchers made some recommendations.

For example, we can design models that better identify relevant details in context and memory to reduce sycophancy. Additionally, you can build a model that detects mirroring behavior and flags responses that show excessive matching. Model developers can also provide users with the ability to adjust personalization during long conversations.

“There are many ways to personalize a model without being unduly offensive. There is a fine line between personalization and fawning, but distinguishing between personalization and fawning is an important area of ​​future work,” says Jain.

“At the end of the day, we need a better way to understand what happens during long conversations with LLMs, the dynamics and complexities of that, and how things can get out of alignment during that long-term process,” adds Wilson.



Source link