Ship fast, optimize later: Top AI engineers prioritize deployment, regardless of cost -

Across industries, rising computing costs are often cited as a barrier to AI adoption, but leading companies are realizing that cost is no longer the real constraint. What are the tougher challenges (and the most important for many technology leaders)? Latency, flexibility, capacity. For example, at At Wonder, the AI adds just a few centers to each order. Food delivery and takeout companies are placing greater emphasis on cloud capacity, where demand is surging. Recursion focuses on balancing small and large-scale training and deployment via on-premises clusters and the cloud. This has given biotech companies flexibility for rapid experimentation. Companies’ real-world experiences highlight broader industry trends. For companies operating AI at scale, economics is not the key deciding factor. The discussion has shifted from how to pay for AI to how quickly it can be deployed and sustained. AI leaders from both companies recently sat down with Venturebeat CEO and Editor-in-Chief Matt Marshall as part of VB’s traveling AI Impact series. Here’s what they shared:

Wonder: Rethink what you assume about capacity

Wonder uses AI to power everything from recommendations to logistics, and CTO James Chen says that currently the extra cost of AI is only a few cents per order. Chen explained that the technology part of ordering a meal costs 14 cents and the AI costs 2 to 3 cents, but that cost has “increased very quickly” to 5 to 8 cents. Still, it seems almost insignificant compared to the total operating costs. Rather, the primary concern for 100% cloud-native AI companies is capacity as demand increases. Wonder is built on the “premise” (which turned out to be false) that it has “unlimited capacity” so it can move “blazing fast” and not have to worry about managing infrastructure, Chen said. But the company has grown considerably in recent years, he said. As a result, about six months ago, “we started slowly getting signals from our cloud providers saying, ‘Maybe we should consider moving to Region 2.'” That’s because their facilities were running out of CPU or data storage capacity as demand increased. It was “very shocking” to have to move to Plan B sooner than expected. “It’s clear that spanning multiple regions is good practice, but we thought it was probably two years away,” Chen says.

economically unfeasible (yet)

Wonder has built a unique model to maximize conversion rates, Chen said. The goal is to introduce new restaurants to relevant customers as much as possible. These are “individual scenarios” in which the model is trained over time to be “very efficient and very fast.” Currently, Chen says large models are the best fit for Wonder’s use cases. But in the long term, we want to move to smaller models that are hyper-customized to individuals based on their purchase history and clickstream (via AI agents and concierges). “It would definitely be great to have these micro models, but the cost is prohibitive at the moment,” Chen said. “If we tried to make one for each person, it wouldn’t be economically viable.”

Budgeting is an art, not a science

Wonder gives developers and data scientists as much playroom as possible for experimentation, and internal teams review usage costs to ensure that no one turns on a model and “does a huge amount of computing at a huge cost,” Chen said. The company is making various attempts to offload the load to AI and operate within its margins. “But budgeting is very difficult because you don’t know anything,” he said. One of the challenges is the pace of development. Once a new model is launched, “we can’t just sit around, right? We have to use it.” Budgeting for the unknown economics of a token-based system is “definitely an art versus a science.” He explained that a key element in the software development lifecycle is preserving context when working with large-scale native models. Once you find something that works, you can add it to your company’s “corpus of context” and send it with every request. It’s big so it costs money each time. “More than 50%, up to 80% of the cost is just resubmitting the same information to the same engine every time you request it,” Chen says. In theory, the more features you have, the lower the cost per unit should be. “I know I’m going to pay X cents in taxes on every transaction that happens, but I don’t want my use of technology to be limited to all my other creative ideas."

Recursion’s “vindication moment”

Recursion focuses on meeting a wide range of computing needs through a hybrid infrastructure of on-premises clusters and cloud inference. When the company was initially looking to build an AI infrastructure, it had to adopt a custom setup because “cloud providers didn’t have a lot of great services,” explained CTO Ben Mabey. “The moment of validation was that we needed more compute, and we asked our cloud provider and they said, ‘Maybe in a year or so.'” The company’s first cluster in 2017 included Nvidia gaming GPUs (1080 series, released in 2016). We’ve since added Nvidia H100s and A100s and are using Kubernetes clusters running in the cloud or on-premises. On the question of longevity, Mabey says: “These gaming GPUs are actually still in use today, which is crazy, right? The myth is that GPUs only last three years, but that’s simply not the case. The A100 is still at the top of the list and an industry workhorse.”

Best use cases for on-premises and cloud. difference in cost

These days, Mabey’s team trains a foundational model on Recursion’s image repository, which consists of petabytes of data and over 200 photos. Large-scale training jobs like this and other types of large-scale training jobs required “large clusters” and connected multi-node setups. “If you need access to a fully connected network and large amounts of data in a highly parallel file system, use on-premises,” he explained. Meanwhile, shorter workloads run in the cloud. Recursion’s technique is to “preempt” the GPU and Google Tensor Processing Units (TPUs). This is the process of interrupting a running GPU task and processing a higher priority task. “Some of the inference workloads where you upload biological data, like images or sequence data or DNA data, you don’t care about speed,” Mabey explained. “I can say, ‘Give me this in an hour,’ and I don’t care if I lose my job.” From a cost perspective, moving large workloads on-premises is 10 times cheaper “to say the least,” Mabey said. TCO over 5 years is half the cost. On the other hand, if your storage needs are small, the cloud can be “quite competitive” in terms of cost. Ultimately, Mabey urged technology leaders to take a step back and decide whether they really want to tackle AI. Cost-effective solutions typically require multi-year buy-in. “From a psychological perspective, we have seen our colleagues not invest in computing and as a result are always paying on demand." Mabey said. "Their team doesn’t want to run up their cloud bill, so they’re using significantly less compute. Innovation is actually hindered by people who don’t want to waste money. ”

Source link

Categories

Wonder: Rethink what you assume about capacity

economically unfeasible (yet)

Budgeting is an art, not a science

Recursion’s “vindication moment”

Best use cases for on-premises and cloud. difference in cost

Related News

Aave deploys Aave Shield after $50M user loss incident

Differences in the reaction of Bitcoin and gold to the impact of the Iran war