AI agents help search for best results from large language models | Massachusetts Institute of Technology News



Whether you’re a scientist brainstorming research ideas or a CEO looking to automate HR or finance tasks, you’ll find that artificial intelligence tools are becoming the assistants you never knew you needed. In particular, many experts are leveraging the talents of semi-autonomous software systems called AI agents, which can call on AI at specific points to solve problems and complete tasks.

AI agents are particularly effective when using large-scale language model (LLM) systems because they are powerful, efficient, and adaptive. One way to program such technology is to write in code what you want the system to do (the “workflow”), such as when to use LLM. If you’re a software company looking to modernize an older codebase to use a more modern programming language for better optimization and security, you might use LLM to convert your codebase one file at a time and build a system to test each file.

But what if the LLM makes a mistake? You may want your agent to backtrack and take another try incorporating the lessons learned from previous failures. Coding this may take as much effort as implementing the original agent. If the system you’re translating the codebase into has thousands of lines of code, you’ll end up changing or adding thousands of lines of code to support the logic to backtrack if LLM makes a mistake.

To save programmers time and effort, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Asari AI have developed a framework called “EnCompass.”

With EnCompass, you no longer have to make these changes yourself. Instead, when EnCompass runs the program, LLM automatically backtracks if it makes a mistake. EnCompass can also clone the program runtime to perform multiple trials in parallel for the best solution. In general, EnCompass searches the different paths an agent might take as a result of the different outputs of every LLM call, and looks for the path on which the LLM finds the optimal solution.

Then, simply annotate the locations where you want to backtrack or replicate the program runtime and record any information that may be useful in your strategy for searching the agent’s different execution paths (search strategy). You can then specify search strategies individually. You can use the standard search strategies provided by EnCompass or implement your own custom search strategies if needed.

“By using EnCompass, we separated the search strategy from the underlying workflow of the AI ​​agent,” said first author Zhening Li ’25, MEng ’25. He is an MIT Electrical Engineering and Computer Science (EECS) PhD student, CSAIL researcher, and research consultant for Asari AI. “Our framework allows programmers to easily experiment with different search strategies to find the one that gives the AI ​​agent the best performance.”

EnCompass was used for agents implemented as Python programs that call LLM and demonstrated significant code savings. EnCompass has reduced the coding effort by up to 80% to implement search across agents, such as those for code repository translation and digital grid transformation rule discovery. In the future, EnCompass could enable agents to tackle large-scale tasks such as managing large code libraries, designing and running scientific experiments, and creating blueprints for rockets and other hardware.

branch out

When programming the agent, mark certain operations (such as calls to LLM) that may have different results. These annotations are called “branch points.” Imagine that your agent program generates one plot line for your story, and adding branch points transforms your story into a choose-your-own-adventure story game. A branch point is a place where a plot branches into multiple future plot lines.

You can then specify the strategy that EnCompass uses to navigate its story game and find the best ending for your story. This includes spawning parallel threads of execution and backtracking to previous branch points when you hit a dead end.

Users can immediately plug and play with several common search strategies provided by EnCompass, or they can also define their own custom strategies. For example, you can choose Monte Carlo tree search, which builds a search tree that balances exploration and exploitation, or beam search, which maintains the best few outputs from each step. With EnCompass, you can easily try different approaches to find the best strategy that maximizes your chances of successfully completing a task.

EnCompass coding efficiency

So, how code-efficient is EnCompass when it comes to adding search to agent programs? Researchers’ findings show that the framework significantly reduced the amount programmers needed to add to agent programs to add search, and helped them experiment with different strategies to find the best one.

For example, researchers applied EnCompass to an agent that converts code repositories in the Java programming language, commonly used for programming apps and enterprise software, to Python. They found that implementing a search using EnCompass (which primarily involves adding branch point annotations and annotations that record the execution status of each step) required 348 fewer lines of code (about 82%) than implementing it manually. We also demonstrated that EnCompass allows us to easily try out different search strategies and identify the optimal strategy, a two-level beam search algorithm, that can achieve 15-40 percent accuracy improvement across five different repositories with a search budget that is 16 times greater than LLM calls made by the agent without search.

“As LLM becomes an increasingly integral part of everyday software, it becomes more important to understand how to efficiently build software that leverages LLM’s strengths and avoids its limitations,” said co-author Armando Solar-Lezama, MIT professor at EECS and CSAIL Principal Investigator. “EnCompass is an important step in that direction.”

The researchers added that EnCompass targets agents whose programs specify steps in high-level workflows. The current iteration of the framework is less applicable to agents that are fully controlled by LLM. “With these agents, instead of having a program that specifies the steps and using the LLM to execute those steps, the LLM itself decides everything,” Li says. “With no underlying program workflow, you can perform inference-time searches on what LLM invented on the fly. In this case, there is less need for tools like EnCompass, which uses search and backtracking to change the way programs run.”

Li and his colleagues plan to extend EnCompass into a more general search framework for AI agents. We also plan to test the system with more complex tasks and refine it for real-world applications, including in enterprises. Additionally, they are evaluating how well EnCompass can help agents work collaboratively with humans on tasks such as brainstorming hardware designs and translating larger code libraries. Currently, EnCompass is a powerful building block that allows humans to more easily interact with AI agents and improve their performance.

“EnCompass comes at a timely time as AI-driven agents and search-based technologies are beginning to reshape software engineering workflows,” said Yiming Yang, a professor at Carnegie Mellon University who was not involved in the study. “By clearly separating agent programming logic from inference-time search strategies, this framework provides a principled way to explore how structured search can enhance code generation, transformation, and analysis. This abstraction provides a solid foundation for a more systematic and reliable search-driven approach to software development.”

Li and Solar-Lezama co-authored the paper with two AI researchers at Asari. Stephan Zheng, senior author, founder and CEO. Their work was supported by Asari AI.

The team’s results were presented at the Neural Information Processing Systems Conference (NeurIPS) in December.



Source link