Coordinating complex interactive systems is an increasingly important theme for software designers to tackle whether it is the different modes of transport in cities or the different components they have to collaborate on to create effective and efficient robots. Currently, MIT researchers have developed an entirely new way of approaching these complex problems, using simple diagrams as a tool to reveal a better approach to software optimization in deep learning models.
They say new methods can reduce these complex tasks to a drawing that fits into the back of the napkin.
New approaches can be found in the journal Machine learning research transactionsin a paper by doctoral doctoral program Vincent Abbott and Professor Jore Zardini of the Information and Decision Systems (Cover).
“We designed a new language to talk about these new systems,” says Zardini. This new, diagram-based “language,” he explains, is heavily based on what is called categorical theory.
It all relates to the design of the architecture underlying computer algorithms. This is a program that actually detects and controls different parts of a system that is being optimized. “Components are different parts of the algorithm, and not only need to discuss and exchange information with each other, but also explain energy usage, memory consumption, etc.” Such optimizations are notoriously difficult because changes in one part of the system can cause changes to the other parts, which can further affect other parts.
The researchers have decided to focus on a specific class of deep learning algorithms, which is currently a hot topic in their research. Deep learning is the foundation of large-scale artificial intelligence models, including large-scale language models such as ChatGPT and image generation models such as Midjourney. These models manipulate the data with a series of “deep” matrix multiplications scattered with other operations. Numbers in the matrix are parameters that are updated during long training runs, allowing you to find complex patterns. The model consists of billions of parameters, which makes resource usage and optimization extremely valuable because of the expensive calculations.
The diagrams show details of the parallelized operations constructed by the deep learning model, revealing the relationship between the algorithm and parallelized graphics processing unit (GPU) hardware, provided by companies such as NVIDIA. “I’m very excited about this,” Zardini says. “We seem to have found a language that explains very well the deep learning algorithms that explicitly express all the important ones that are operators you use.
Much of the advancement within deep learning is attributed to optimizing resource efficiency. The latest DeepSeek model showed that small teams can compete with top models from OpenAI and other major labs by focusing on resource efficiency and the relationship between software and hardware. Usually, when deriving these optimizations, he says, “people need a lot of trial and error to discover new architectures.” For example, he says that a widely used optimization program called Flashattention took more than four years to develop. But the new framework they have developed allows us to “approach this issue in a more formal way.” And all of these are visually expressed in precisely defined graphical language.
However, the methods used to find these improvements are “very limited,” he says. “I think this shows that there is a big gap in that there is no formal systematic way to associate algorithms with optimal execution, or in that they really don’t understand the number of resources needed to execute them.” But now, with the new diagram-based method they devised, there is such a system.
The categorical theory underlying this approach is a way of mathematically explaining the different components of a system and how they interact in a generalized, abstract way. Different perspectives can be relevant. For example, formulas may be related to algorithms that implement them and use resources. The system description relates to the robust “monoid string diagram”. These visualizations allow you to play and experiment directly with how different parts connect and interact. What they have developed is equivalent to a “steroid string diagram” and incorporates more graphical rules and more properties.
“Category theory can be thought of as the mathematics of abstraction and construction,” Abbott says. “All constituent systems can be explained using categorical theory, and the relationships between constituent systems can also be studied.” Algebraic rules typically associated with functions can also be represented as diagrams, he says. “Next, many of the visual tricks you can make in the diagram can be related to the tricks and functions of algebra, so we create this correspondence between these different systems.”
As a result, he says, “This solves a very important problem: that is, there are these deep learning algorithms, but they are not clearly understood as mathematical models.” However, by expressing them as diagrams, he says, it is possible to approach them formally and systematically.
One thing that allows this is a clear visual understanding of how parallel real-world processes can be represented by the parallel processing of multi-core computer GPUs. “This way, the diagram can represent a function and reveal how to best perform it on the GPU.”
The “caution” algorithm is an important phase of serialized blocks that are used in deep learning algorithms that require general contextual information and make up large-scale language models such as ChatGPT. Flashattention is an optimization that took years to develop, but has improved the speed of attentional algorithms by six times.
Zardini applies the method to established flash broadcasting algorithms, saying, “here we can literally derive from a napkin.” He then adds, “OK, maybe it’s a big napkin.” However, to drive the point about how simple their new approach can simplify addressing these complex algorithms, they praised the formal research paper “The Flash Tinning of Napkins.”
This method “in contrast to the general method, it can derive optimization very quickly,” Abbott said. They first applied this approach to existing flash broadcasting algorithms and tested its effectiveness, but “now we want to use this language to automate the detection of improvements,” says Lids’ principal investigator, as well as Rudge and Nancy Allen, assistant professor at Civil and Environmental Engineering, working with the Society’s association.
The plan is ultimately to develop software to the point where “students upload code and use new algorithms to automatically detect what can be improved and what can be optimized, and return an optimized version for the user.”
In addition to automating algorithm optimization, Zardini points out that a strong analysis of how deep learning algorithms relate to hardware resource usage allows for systematic co-design of hardware and software. This series of work is integrated with Zardini’s focus on co-designing of categories. Category co-design uses tools from category theory to simultaneously optimize the various components of the engineering system.
Abbott said, “I don’t think this whole area of optimized deep learning models has been highly critically condemned. That’s why these diagrams are so exciting. They open the door to a systematic approach to this issue.”
“I was very impressed with the quality of this research. …The new approach to schematizing the deep learning algorithms used in this paper can be a very important step.” “This paper is the first time I have seen such a notation used to deeply analyze the performance of deep learning algorithms on real-world hardware. …The next step is to see if we can achieve real performance improvements.”
“This is a beautifully executed theoretical study, a trait rarely found in this type of paper, Petar Velickovic, lecturer at the University of Cambridge, said: “These researchers are clearly great communicators and can’t wait to see what they come up with next!”
The new diagram-based language posted online has already attracted a lot of attention and interest from software developers. A reviewer of Abbott’s previous paper, who presented the diagram, said, “The proposed neural diagram looks great from an artistic perspective (as far as I can judge this). “It’s technical research, but it’s flashy!” Zardini says.