Bringing AI to the Edge

This year, U.S. rail carrier Amtrak will be installing two novel inspection gateways from Duos Technologies along its busy Northeast Corridor. The barn-like Duos structures straddle railway tracks; as passenger trains speed through at up to 125 miles per hour, 97 cameras and dozens of LED lights arrayed around the sides, top, and bottom of the tracks will capture thousands of high-resolution images of the railcars. These images are aggregated and processed on site in real time to present a complete, 360-degree, highly detailed view of the train. Artificial intelligence (AI) algorithms running on Nvidia GPUs will analyze the images locally; if the model flags a potential structural or mechanical flaw, train personnel will be notified in less than a minute.

The Duos portal is one of many new examples of what is loosely categorized as edge AI, or the deployment and operation of AI models outside of massive cloud datacenters.

The precise definition of what constitutes an edge varies. “There’s a spectrum, from telecommunications points of presence in major cities to smartwatches, smart home devices, and Meta Ray-Bans,” said Shishir G. Patil, a Ph.D. student in computer science at the University of California, Berkeley. “They all come under this pretty broad category of edge devices.”

Operating AI models at the edge is challenging for a number of reasons. Typically, there is less computational capacity available, relative to the cloud. The power demands of AI models are much larger than traditional applications, which puts tremendous pressure on local hardware, forcing mobile devices to exhaust their battery power faster. Yet the move to the edge also reduces latency and eliminates the risk of inconsistent or unreliable bandwidth because there is no round trip to distant cloud datacenters. Purpose-built edge AI processors, like the on-device versions from Qualcomm or those from Hailo Inc., allow for real-time intelligent decision making, and there are multiple privacy and security benefits to edge processing. These and other factors have everyone from academic computer scientists to technology giants racing to develop more efficient means of pulling AI out of the cloud and closer to users.

There is computational capacity available all the way from local devices to the cloud, explained distributed AI researcher Lauri Lovén at the University of Oulu in Finland. Exactly where an AI model operates along this edge-cloud continuum depends in part on the use case. An autonomous vehicle that has to make a rapid, real-time traffic decision is better off eliminating the cloud latency and generating that result onboard. Conversely, a consumer photo-editing application powered by AI does not necessarily need to run on the user’s personal device—a latency-induced lag resulting from spotty bandwidth would be perfectly tolerable.

Raghubir Singh, an assistant professor of computer science at the U.K.’s University of Bath, suggests the nature of the given problem may determine where, along the continuum, a task is computed. “There will be a tradeoff. Some problems you will be able to solve locally using edge AI, while others will need GPUs with a lot more processing power,” he said. “Think of it like a primary school. You have a classroom teacher who can answer most of your questions, but then maybe there’s one that needs to be solved by someone outside the classroom.” A more complex problem or task could be pushed to a more robust AI model in the cloud for resolution.

Singh cites security and privacy as equally important variables. If a patient visits a local health center for a checkup, and machine learning algorithms can process data collected during the exam within that facility, then privacy concerns are minimized. “If that patient data is stored and assessed locally, and not going to the cloud, that sensitive information remains secure,” he explained.

Economics have become increasingly important as well. A personalized AI agent that understands your preferences might be too expensive to operate in the cloud, noted Berkeley’s Patil, because this would require reserving high-end cloud compute capacity just for the individual user, who would have to pay the compute costs to keep the model ready even when it is not in use. On the other hand, the costs of a general AI model maintained in the cloud for mass consumption can be distributed more efficiently across all its users. As a result, Patil and his colleagues are developing techniques to shrink, fine-tune, and personalize models so they can run on consumer hardware like smartphones and other edge devices.

The cost of operating the models in the cloud is also incentivizing AI leaders and cloud giants to push more tasks out to the edge, explained Patil. As advanced AI models grow larger, they consume more electricity and drive up the price of each inference. The most advanced large language models (LLMs) are one obvious example, according to Patil. “LLM inference today is super expensive,” he said. “It’s in the tens of cents per inference, if not higher, especially for the bigger models. Basically, the cloud providers are going to be bleeding money if they give away inferences for free.” But if more AI tools operate on edge devices like smartphones, the cloud provider will assume less of the electricity bill. Instead, the user will pay that cost by charging their phones more often.

The latest smartphones—considered the extreme edge—are now being optimized for edge AI. The recently released Pixel phones incorporate new Google Tensor G4 chips designed specifically for AI workloads. The models running on these phones are also shrinking. When Apple announced its Apple Intelligence features, the company noted that the language model running on the device would have roughly 3 billion parameters, compared to cloud-based models that run to hundreds of billions or even a trillion-plus parameters. Similarly, Meta’s Llama 3.2 release included lightweight models with 1 billion and 3 billion parameters designed to run on devices.

These smaller, purpose-built models are far more suitable to the edge. An advanced generative text-to-video model like OpenAI’s Sora requires significant cloud compute capacity, but simpler AI-enhanced tasks like transforming text to speech could be done with lightweight, efficient models. “You don’t need a Lamborghini to cross the road,” said Patil.

Electricity consumption is not just an economics problem, but an environmental one. Suzan Bayhan, an associate professor on the Faculty of Electrical Engineering, Mathematics and Computer Science at the University of Twente in the Netherlands, points to the sustainability of AI-specific edge devices or edge accelerators—technologies optimized for edge AI processing. “Deploying smarter systems means you want to collect, process, and act on data, and you want this computation to be closer to the user,” she explains. “But if you are running this computation, especially some sort of AI model, on devices that were not actually optimized for these kinds of workloads, they will use a lot of energy.”

The creation of new and more efficient devices could alleviate that problem, she noted, yet this will drive up consumption of raw materials and could lead to a different kind of sustainability problem because of newly outdated, discarded devices. Plus, even though AI models are driving up electricity consumption in datacenters—Goldman Sachs estimates a 160% increase by 2030—the cloud still allows for the potential to control the problem. “When you have cloud or more centralized resources, then you are benefiting from the efficiency of sharing among different parties,” Bayhan explained, “but if we move to devices with edge accelerators, we are limiting this benefit of economies of scale and multi-tenant operation.”

Not all applications will move out to that extreme edge, and researchers are working on novel methods of determining where different tasks might be completed or even how to break down and spread out different tasks along the edge-cloud continuum. “We want to create a platform that allows you to distribute applications within this compute continuum and dynamically optimize for, say, latency, computational capacity, money, or resource usage,” said the University of Oulu’s Lovén. “Based on the situation, the platform would rebalance or readjust the distribution of components.”

This decision of where the processing takes place could also be left to individuals, said Bayhan. Her research group is developing methods that would give control to the users, so they can decide where the computation is done based on their own particular needs or preferences. A user could prioritize sustainability, fast performance, or personal security and privacy. “For instance,” Bayhan explained, “if it is my health data, I may not want this computation to be done in the cloud, but closer to my home or on my personal device, for instance, my smartwatch.”

Although Singh notes there is some hype surrounding edge AI at the moment, he suspects the combination of proliferating smart systems and devices and continued advances in AI will lead to major edge AI contributions and breakthroughs in the next five to 10 years. “There are so many research avenues with edge AI,” said Singh. “It’s such an interesting field, and I hope this work will bring a lot of benefits and advantages and allow more people to utilize edge AI models.”

Bringing AI to the Edge

Further Reading