Improving the ability to explain AI model predictions | Massachusetts Institute of Technology News



In high-stakes situations like medical diagnostics, users often want to know what caused a computer vision model to make a particular prediction so they can decide whether to trust its output.

Conceptual bottleneck modeling is one way to enable artificial intelligence systems to explain decision-making processes. These methods force deep learning models to make predictions using a set of concepts that humans can understand. In new research, MIT computer scientists have developed a way to guide models to achieve higher accuracy and clearer, more concise explanations.

The concepts used by the model are typically predefined by human experts. For example, a clinician can suggest using concepts such as “brown dot clusters” or “variegated pigmentation” to predict that a medical image shows melanoma.

However, previously defined concepts may be irrelevant or lack sufficient detail for a particular task, reducing the accuracy of the model. The new technique produces better explanations than standard conceptual bottleneck models by extracting concepts that the model has already learned while being trained to perform that particular task and forcing the model to use them.

This approach leverages a pair of specialized machine learning models that automatically extract knowledge from a target model and translate it into plain language concepts. Ultimately, their technique can transform pre-trained computer vision models into models that can use concepts to explain inferences.

“In a sense, we want to be able to read the minds of these computer vision models. Concept bottleneck models are one way for users to tell what the model is thinking and why it has made certain predictions. Our method uses better concepts, which improves accuracy and ultimately improves accountability for black-box AI models,” said first author and graduate student at Politecnico di Milano, University of Computer Science. said Antonio de Santis, who completed the research as a visiting graduate student at the Institute for Science and Artificial Intelligence. (CSAIL) at MIT.

He is participating in the research dissertation of Schrasing Tong SM ’20, PhD ’26. Marco Brambilla, Professor of Computer Science and Engineering at Politecnico di Milano. and Lalana Kagal, senior author and principal investigator at CSAIL. This research will be presented at the International Conference on Learning Representations.

Build better bottlenecks

Conceptual bottleneck models (CBMs) are a popular approach to improving explainability in AI. These techniques add an intermediate step by having a computer vision model predict the concepts present in the image and use those concepts to make the final prediction.

This intermediate step, or “bottleneck,” helps users understand the model’s inferences.

For example, a model that identifies bird species can choose concepts such as “yellow feet” or “blue wings” before predicting swallows.

However, these concepts are often generated in advance by humans or large-scale language models (LLMs), so they may not be suitable for a particular task. Furthermore, even given a predefined set of concepts, models can sometimes utilize unwanted learning information, a problem known as information leakage.

“These models are trained to maximize performance, so the models may be secretly using concepts that we are unaware of,” De Santis explains.

Researchers at MIT had a different idea. Because the model has been trained on vast amounts of data, it has likely learned the concepts needed to generate accurate predictions for the specific task at hand. They sought to build a CBM by extracting this existing knowledge and converting it into human-understandable text.

In the first step of their method, a specialized deep learning model called a sparse autoencoder selectively takes the most relevant features learned by the model and reconstructs them into several concepts. The multimodal LLM then explains each concept in plain language.

This multimodal LLM also annotates images in the dataset by identifying which concepts are present and which are absent in each image. Researchers use this annotated dataset to train a conceptual bottleneck module to recognize concepts.

They incorporate this module into their target model, forcing it to make predictions using only the set of learned concepts extracted by researchers.

control the concept

They overcame many challenges in developing this method, from ensuring that LLM annotated concepts are accurate to determining whether the sparse autoencoder has identified concepts that humans can understand.

To prevent the model from using unknown or unnecessary concepts, it is limited to using only five concepts per prediction. This forces the model to select the most relevant concepts, making the explanations easier to understand.

When their approach was compared to state-of-the-art CBMs for tasks such as predicting bird species and identifying skin lesions in medical images, their method achieved the highest accuracy while providing more accurate descriptions.

Their approach also generated concepts that were more applicable to the images in the dataset.

“Although we showed that extracting concepts from the original model can outperform other CBMs, there is still a trade-off between interpretability and accuracy that needs to be addressed. Even uninterpretable black-box models still perform better than our model,” De Santis says.

In the future, the researchers would like to study potential solutions to the information leakage problem, perhaps by adding a concept bottleneck module to prevent unnecessary concepts from being leaked. We also plan to scale up the method by using a larger multimodal LLM to annotate a larger training dataset, which may improve performance.

“I’m excited about this research because it pushes interpretable AI in a very promising direction and creates a natural bridge to symbolic AI and knowledge graphs,” said Andreas Hot, professor and head of the Department of Data Science at the University of Wurzburg, who was not involved in the study. “Deriving conceptual bottlenecks from the internal mechanisms of the model itself, rather than solely from human-defined concepts, provides a path to a more faithful explanation of the model and opens up many opportunities for follow-up work with structured knowledge.”

This research was supported by the Progetto Rocca Doctoral Fellowship, the Italian Ministry of Universities and Research under the National Recovery and Resilience Plan, Thales Alenia Space and the European Union under the NextGenerationEU project.



Source link

Leave a Reply