MIT News Accelerating Scientific Discovery with AI |



Several researchers have seen scientific advances widely over the past 50 years and have come to the same troubling conclusions. Scientific productivity is declining. You need more time, more money, and a larger team to make discoveries that were once faster and cheaper. While various explanations are provided for slowing down, the point is that as research becomes more complex and specialized, scientists must spend more time reviewing publications, designing sophisticated experiments, and analyzing data.

Now, the charitable, funded research lab Futurehouse is seeking to accelerate scientific research with an AI platform designed to automate many of the key steps on the path to scientific advancement. The platform consists of a set of AI agents specialized for tasks, including information retrieval, information integration, chemical synthesis design, and data analysis.

Futurehouse founders Sam Rodriques Phd ’19 and Andrew White believe that by allowing all scientists to access AI agents, they can break through the biggest bottlenecks of science and solve some of humanity’s most pressing problems.

“Natural language is the true language of science,” says Rodriquez. “Other people are building basic models of biology where machine learning models speak the language of DNA or proteins. It’s powerful. But discoveries are not represented in DNA or proteins. Natural language is the only way to know how to express discoveries, hypotheses, and reasons.”

Find a big problem

For his doctoral studies at MIT, Rodriquez tried to understand the internal mechanisms of the brain in Professor Ed Boyden’s lab.

“The whole idea behind Future House was inspired by this impression of earning a PhD from MIT. Even if you have all the information you need to know how your brain works, no one knows that because no one has time to read all the literature,” explains Rodriquez. “Even if they could read it all, they wouldn’t be able to frame it into a comprehensive theory. It was a fundamental part of the future house puzzle.”

In the final chapter of his 2019 doctoral dissertation, Rodriquez wrote about the need for a new kind of large-scale research cooperation, and after graduating, he ran a lab at the Francis Crick Institute in London, where he found himself drawn to a wide range of science issues that a single lab could not take on.

“I was interested in how to automate or scale up science and what new organizational structures and techniques could lock in higher scientific productivity,” says Rodriquez.

When Chat-GPT 3.5 was released in November 2022, Rodriques saw the path to a more powerful model where you can generate scientific insights for yourself. Around that time he also met Andrew White, a computational chemist at the University of Rochester. This was granted early access to Chat-GPT 4.

Founders began to want to create individual AI tools for tasks such as literature search, data analysis, and hypothesis generation. They started with data collection and eventually released PaperQA in September 2024. It calls the best AI agents in the world to obtain and summarize information from the scientific literature. Around the same time, they all released it. This is a tool that allows scientists to determine whether a scientist has conducted a particular experiment or investigated a particular hypothesis.

“We were asking, ‘What kind of question we always ask as scientists?” Rodriquez recalls.

When Future House officially launched its platform on May 1st this year, it rebranded some of its tools. Paper QA is now a crow, and everyone is called an owl. Falcon is an agent that can compile and review more sources than Crow. Another new agent, Phoenix, can use specialized tools to enable researchers to plan chemical experiments. Finch is an agent designed to automate data-driven discovery in biology.

On May 20, the company demonstrated a multi-agent science discovery workflow to automate critical steps in the scientific process and identify new treatment candidates for macular degeneration (DAMD), which is a leading cause of irreversible blindness around the world. In June, Futurehouse released Ether0, a 24b open-weight inference model for chemistry.

“These agents need to be considered part of a larger system,” says Rodriquez. “Soon, the literature search agent will be integrated with data analysis agents, hypothesis generation agents, and experimental planning agents, all designed to work together seamlessly.”

Agent for everyone

Today, anyone can access Futurehouse agents at Platform.futurehouse.org. The launch of the company’s platform began talking about scientists using agents to generate excitement in the industry and accelerate research.

One Futurehouse scientist used agents to identify potential genes associated with polycystic ovarian syndrome and came up with a new therapeutic hypothesis for the disease. Another researcher at the Lawrence Berkeley National Laboratory used crows to create an AI assistant that can search for information related to Alzheimer’s disease in the PubMed research database.

Scientists from another research institute used agents to conduct a systematic review of genes associated with Parkinson’s disease, finding that agents at Future House are superior to common agents.

Rodriques is a scientist who thinks of agents like Google Scholar, and says that smart assistant scientists are the kind of scientists who make the most of their platform.

“Those looking for speculation tend to get more mileage from Chat-GPT O3 deep search, while those looking for a review of the truly loyal literature tend to get more from agents,” explains Rodriquez.

Rodriques also believes Futurehouse will soon be able to use raw data from research papers to test the reproducibility of results and verify conclusions.

In the long run, to advance scientific advances, Rodriques says Futurehouse offers agents the ability to use computational tools to explore hypotheses while embedding tacit knowledge in their agents to embed more sophisticated analysis.

“There have been a lot of advancements around basic science models and language models of protein and DNA, so agents need to access these models and all the other tools commonly used to do science,” says Rodriquez. “It will be important to build infrastructure so agents can use more specialized tools for science.”



Source link

Leave a Reply