Guided learning allows ‘untrainable’ neural networks to realize their potential | Massachusetts Institute of Technology News



Networks long thought to be “untrainable” can be effectively trained with a little help. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have shown that short-term adjustments between neural networks, a method they call guidance, can dramatically improve the performance of architectures previously thought unsuitable for modern tasks.

Their findings suggest that many so-called “inefficient” networks may simply start from less-than-ideal starting points, and that short-term guidance can put the network in a place where it is easier to learn.

Team guidance techniques work by encouraging the target network to match the internal representation of the guide network during training. Unlike traditional methods such as knowledge distillation, which focus on imitating the teacher’s work, guidance directly transfers structural knowledge from one network to another. This means that rather than the target simply copying the guide’s behavior, it learns how the guide organizes information within each layer. Notably, untrained networks also contain architectural biases that can be transferred, whereas trained guides further convey learned patterns.

“We found these results to be quite surprising,” says CSAIL researcher Vighnesh Subramaniam ’23, MEng ’24, a doctoral student in the MIT Department of Electrical Engineering and Computer Science (EECS) and lead author of the paper presenting these results. “It’s impressive that we were able to use representational similarities to actually make a traditionally ‘scrappy’ network work. ”

guiding angel

The central question was whether the guidance needs to be continued throughout the training, or whether its main effect is to provide a better initialization. To investigate this, the researchers performed an experiment using deep fully connected networks (FCNs). Before training on real problems, the network spent several steps practicing on another network using random noise, such as pre-exercise stretching. The results were amazing. Typically, networks that quickly overfit remained stable and had low training losses, avoiding the classic performance degradation seen in what is called a standard FCN. This adjustment acted like a beneficial warm-up for the network, showing that even short practice sessions can yield lasting benefits without the need for ongoing guidance.

The study also compared knowledge distillation and guidance, a common approach in which student networks seek to imitate teacher work. If the teacher network was not trained, the distillation failed completely because the output contained no meaningful signal. In contrast, guidance leveraged internal representations rather than final predictions and therefore still yielded strong improvements. This result highlights an important insight. Untrained networks already encode valuable architectural biases that can guide other networks to effective learning.

Beyond the experimental results, this discovery has far-reaching implications for our understanding of neural network architectures. The researchers suggest that success or failure often depends more on the network’s position in parameter space than on task-specific data. In conjunction with a guiding network, the contribution of architectural biases can be separated from the contribution of learned knowledge. This allows scientists to identify which features of the network design support effective learning and which challenges are simply due to poor initialization.

The guidance also opens new avenues for studying relationships between architectures. By measuring how easily one network can guide another, researchers can explore the distance between functional designs and reconsider neural network optimization theory. Because this method relies on representational similarity, it can reveal previously hidden structures in network designs and help identify which components contribute most to learning and which do not.

rescue desperate people

Ultimately, this study shows that so-called “untrainable” networks are not inherently doomed. Guidance helps eliminate failure modes, avoid overfitting, and align previously inefficient architectures with modern performance standards. The CSAIL team plans to investigate which architectural elements contribute most to these improvements and how these insights will influence future network designs. By revealing hidden potential in even the most stubborn networks, the guidance provides powerful new tools to understand and hopefully shape the fundamentals of machine learning.

“It is generally believed that different neural network architectures have certain advantages and disadvantages,” says Leila Isik, assistant professor of cognitive science at Johns Hopkins University, who was not involved in the study. “This exciting work shows that one type of network can inherit the advantages of another architecture without losing its original functionality. Remarkably, the authors show that this can be done using small, untrained ‘guide’ networks.” In this paper, we introduce a novel and concrete method to add various induced biases to neural networks. This is important for developing more efficient and human-cooperative AI. ”

Subramaniam co-authored the paper with research scientist Brian Cheung, a colleague at CSAIL. PhD student David Mayo ’18, MEng ’19; Researcher Colin Conwell. The principal investigators are CSAIL Principal Investigator Boris Katz and MIT Professor of Brain and Cognitive Sciences Tommaso Poggio. Andrei Barbu, former CSAIL research scientist. Their research was supported in part by the Center for Brains, Minds, and Machines, the National Science Foundation, the MIT CSAIL Machine Learning Applications Initiative, the MIT-IBM Watson AI Lab, the U.S. Defense Advanced Research Projects Agency (DARPA), the U.S. Department of the Air Force Artificial Intelligence Accelerator, and the U.S. Air Force Office of Scientific Research.

Their research was recently presented at the Conference and Workshop on Neural Information Processing Systems (NeurIPS).



Source link