The huge amount of energy required to train artificial intelligence (AI) is becoming a concern.
To train the large language model (LLM) powering Chat GPT-3, for example, almost 1,300 megawatt hours of energy was used, according to an estimate by researchers from Google and the University of California, Berkeley, a similar quantity of energy to what is used by 130 American homes in one year.
Furthermore, an analysis by OpenAI suggests that the amount of power needed to train AI models has been growing exponentially since 2012, doubling roughly every 3.4 months as the models become bigger and more sophisticated. However, our energy production capacity is not increasing as steeply, and doing so is likely to further contribute to global warming: generating electricity is the single biggest contributor to climate change given that coal, oil, and gas are still widely used to generate electricity, compared to cleaner energy sources.
“At this rate, we are running into a brick wall in terms of the ability to scale up machine learning networks,” said Menachem Stern, a theoretical physicist at the AMOLF research institute in the Netherlands.
Machine learning models such as LLMs typically are trained on vast datasets for weeks or even months using power-hungry graphics processing units (GPUs), the state-of-the-art approach for the task. Invented by computer chip company Nvidia for rendering graphics, GPUs also can perform many calculations at the same time through parallel processing. When machine learning models learn patterns from data during training, complex mathematical operations are involved as millions of parameters are adjusted. Using GPUs, therefore, can significantly speed up training compared to using conventional central processing units (CPUs), which process data sequentially.
In particular, Nvidia’s GPUs have become the go-to choice for AI training, since they are optimized for the task and their software makes them easy to use. The company has about 95% of the market for machine learning, according to a recent report by market intelligence company CB Insights. ChatGPT was trained using 10,000 Nvidia GPUs clustered together in a supercomputer, for example.
However, lower-energy alternatives to GPUs are now being sought out to reduce the energy footprint of AI training. One of them involves creating a new type of machine called a neuromorphic computer, which mimics certain aspects of how the human brain works.
Similarly to GPUs, our brain is able to process multiple sources of information at the same time. However, it is much more energy-efficient and can perform a billion-billion mathematical operations per second—an exaflop—on just 20 watts of power. In comparison, one of the world’s most powerful supercomputers used by the U.S. Department of Energy, which contains over 37,000 GPUs, requires about 20 megawatts—over a million times more—to achieve the same feat, as reported in the journal Science.
The human brain uses several tactics to save power. Conventional computers represent information digitally with binary 0s and 1s, which consumes energy each time a value is flipped. However, our brain uses analog signals in many cases, for example when neurons transmit information by using a range of voltages, which consume less energy. Furthermore, memory and computation take place in the same location in our brain, which saves energy compared to when they occur in separate locations as in today’s computers.
“With the information and the computation in the same place, there’s no need to shuttle information between them,” said Stern. “In many standard computers, this is what dominates energy consumption.”
In recent work, Stern and his colleagues at the University of Pennsylvania developed a prototype of a neuromorphic computer in the form of a circuit that sits on breadboards connected together with wires. Their current design is large, measuring about a meter by half a meter, and contains just 32 variable resistors, which are the learning elements. What distinguishes it from similar approaches is that learning happens within the system itself, whereas other designs typically offload training to a silicon-chip computer and only rely on neuromorphic hardware during use.
“Our neuromorphic computer can improve energy consumption during learning, not only during use,” said Stern.
At present, the power used by each learning element in their neuromorphic design is comparable to the amount consumed by each parameter of one of the most energy-efficient supercomputers, known as Henri. However, the system should demonstrate a clear advantage in terms of energy efficiency as it is scaled up by including more resistors and hence computing power, said Samuel Dillavou, Stern’s colleague at the University of Pennsylvania. GPUs expend energy per operation, so being able to do more computations per second also drives up their energy use. On the other hand, the energy consumption of analog approaches like theirs simply depends on how long the system is on: if it is three times as fast, it will also be three times more energy-efficient.
Doing away with digitization could be a disadvantage of neuromorphic computing, though. Analog signals are much more noisy than digital ones, which means that they can be ill-suited for applications where a high degree of precision is required. Stern doesn’t think it is much of a concern for machine learning. Many tasks that algorithms are trained to do, such as image recognition, have a set level of accuracy that is considered to be acceptable to obtain realistic results, often between 70% and 90%.
However, programming neuromorphic computers is likely to be a challenge. With conventional computers, the hardware and software are separate components, but the two are intertwined in neuromorphic designs. Neuromorphic designs can take on different physical shapes, for example if they are incorporated into smart materials, from programmable clay or elastic substances.
“Every candidate for a neuromorphic computer requires thinking from scratch how you would implement learning in it, and that is a really hard problem,” said Stern. “The people who are going to program these machines would have to know much more about them than a person who’s writing computer programs (for conventional machines).”
Another emerging technology that could compete with GPUs is optical computers that transmit information using light waves, rather than electrons as in traditional computers. Using light particles, called photons, also allows large amounts of data to be processed simultaneously but with several advantages. Optical signals travel faster than electrical ones, at close to the speed of light, and can transmit data over a wide range of frequencies, allowing for faster computation. And while electrons encounter resistance when moving through materials, which results in heat and energy loss, photons are able to move freely.
“Photonic circuit approaches are inherently very low-power,” said Steve Klinger, vice president of Product at Lightmatter, a computer hardware company in Mountain View, CA.
In theory, this means that developing computers that solely use light would be more energy-efficient than conventional computers during use. However, since it would require a complete overhaul of existing technology, approaches that integrate optical components into silicon chips are currently commercially viable.
Klinger and his colleagues at Lightmatter, who are taking this hybrid approach, are developing two solutions that focus on using light for computation-heavy processing. During AI training, for example, a lot of communication takes place between different processing elements, which uses up a lot of bandwidth, the amount of data that can be transmitted in a given amount of time. This limits the amount of bandwidth available for computation, resulting in many compute elements often sitting idle.
One of Lightmatter’s products, called Passage, is harnessing the properties of light to link up different processors so information can be sent between them more efficiently. It is expected to boost bandwidth by a factor of 10, with the goal of increasing it by 100 times in five years’ time. The company is also working on another light-based component, called Envise, that is designed to take over the mathematical operations, called matrix multiplications, that GPUs perform when a model is being trained. Using photonic circuits should significantly reduce the energy consumption of AI training.
“You’re saving a whole lot of power just by making the available compute much more efficient, requiring fewer overall compute elements to achieve a certain level of performance,” said Klinger.
Lightmatter is currently looking to partner with silicon chip suppliers and foresees their products being used in datacenters to scale up the performance of AI training. One of the challenges they face is meeting the density and size requirements of datacenter chips, since the size of optical fibers limits how many can fit. Klinger says improvements are being made within the industry, such as developing new ways to attach fibers so that more can be packed in.
New computing approaches hold promise, but it will take time for them to be developed and adopted. Shaolei Ren, an associate professor of electrical and computer engineering at the University of California, Riverside, whose research focuses on making AI more sustainable, thinks current approaches can be made more energy-efficient in the meantime. Since energy use is tied to cost, there is an incentive for model developers to reduce energy consumption, and much research is being carried out in this area.
Instead of scaling up LLMs, for example, there is a growing trend to use smaller, fine-tuned models since they have been shown to outperform larger ones in certain cases. Microsoft announced their Phi-3 family of small language models earlier this year, for example, which outperform some bigger models on certain math, language, and coding benchmarks. This should result in energy savings during training, since less compute and data typically are needed. If you reduce a model size by a factor of 10, then energy consumption could be reduced by a factor of 100, said Ren.
“Choosing a smaller model is very energy-efficient and effective as well, if you focus on particular domains,” he added. “We’re seeing a lot of these specialized models now, more than before.”
Further Reading
-
Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., and Dean, J.
Carbon Emissions and Large Neural Network Training, arXiv, 2021. https://arxiv.org/abs/2104.10350 -
Analyzing Nvidia’s growth strategy: How the chipmaker plans to usher in the next wave of AI, CB Insights, June 2024. https://www.cbinsights.com/research/nvidia-strategy-map-partnerships-investments-acquisitions/
-
Service, R.F.
World’s fastest supercomputers are helping to sharpen climate forecasts and design new materials, Science, 17 November 2023, https://www.science.org/content/article/world-s-fastest-supercomputers-are-helping-sharpen-climate-forecasts-and-design-new -
Kibebe, C.G., Liu, Y., and Tang, J.
Harnessing optical advantages in computing: a review of current and future trends, Frontiers in Physics, 15 March 2024. https://www.frontiersin.org/journals/physics/articles/10.3389/fphy.2024.1379051/full#h5 -
Beaty, S.
Tiny but mighty: The Phi-3 small language models with big potential, Microsoft, April 23, 2024. https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/