Anthropic announces solving long-standing AI agent issues with new multi-session Claude SDK -

The longer agents run, the more they forget some instructions and conversations, so companies still want to solve problems in the agent’s memory.

human The company believes it has solved this problem Claude Agent SDKwe develop two solutions that allow agents to work across different context windows.

“A central challenge with long-running agents is that they must work in separate sessions, and each new session begins with no memory of the previous content,” Anthropic wrote. blog post. “Context windows are limited and most complex projects cannot be completed within one window, so agents need a way to bridge the gap between coding sessions.”

Anthropic engineers proposed two approaches to the agent SDK. An initialization agent that sets up the environment, and a coding agent that increments each session and leaves artifacts in the next session.

Agent memory issues

Because the agent is built on a foundational model, it remains constrained by a limited but continually expanding context window. For long-running agents, this can cause even bigger problems, causing the agent to forget instructions or behave strangely while performing tasks. Enhance agent memory Essential for consistent, business-safe performance.

Several methods have emerged over the past year, all of which attempt to bridge the gap between the context window and the agent’s memory. rung chainLangMem SDK, memo base and OpenAISwarm is an example of a company that provides memory solutions. Research on agent memory has also exploded recently, with the following proposals: Frameworks like Memp and nested learning paradigm from google We offer new options to improve your memory.

Many current memory frameworks are open source, making them ideally adaptable to a variety of large-scale language models (LLMs) that power agents. Anthropic’s approach improves the Claude Agent SDK.

structure

Anthropic pointed out that while the Claude Agent SDK has context management capabilities that “should allow agents to continue doing useful work for any length of time,” that alone is not enough. The company said in a blog post: like opus 4.5 When running the Claude Agent SDK, “just seeing a high-level prompt like ‘Build a clone of claude.ai’ may not be enough to build a production-quality web app.”

According to Anthropic, the failures manifested themselves in two patterns. First, the agent was trying to do too many things, causing the model to lose context along the way. In that case, the agent has to guess what happened and cannot pass clear instructions to the next agent. The second failure occurs after some functionality has already been built. The agent simply confirms that progress has been made and declares the job completed.

Human researchers have detailed their solution. The idea is to set up an initial environment to lay the foundation for functionality, encourage each agent to make incremental progress toward a goal, and leave a clean slate at the end.

This is where Anthropic’s two-part agent solution comes in handy. The initialization agent sets up the environment and records what the agent has done and what files have been added. The coding agent then asks the model to step forward and leave structured updates.

“The inspiration for these practices came from seeing what talented software engineers do every day,” says Antropik.

The researchers said they added testing tools to the coding agent, improving its ability to identify and fix bugs that weren’t obvious in the code alone.

Future research

Anthropic said its approach is “one set of possible solutions in the long-term agent harness.” However, this is just the beginning of what may become a broader area of research for many in the AI field.

The company said its experiments in enhancing agents’ long-term memory do not show whether a single general-purpose coding agent works best across contexts or in a multi-agent structure.

That demo also focused on full-stack web app development, so other experiments should focus on generalizing the results across different tasks.

“Some or all of these lessons may be applicable to the types of long-term agent tasks required in scientific research, financial modeling, etc.,” Antropic said.

Source link

Categories

Agent memory issues

structure

Future research

Related News

Aave deploys Aave Shield after $50M user loss incident

Differences in the reaction of Bitcoin and gold to the impact of the Iran war