Smaller AI Models and Smarter Prompts Could Significantly Reduce Energy Use Without Losing Accuracy
A UNESCO report outlines how smaller AI models, efficient prompts, and compression techniques can cut energy use and costs significantly, making AI more sustainable and accessible without greatly affecting performance when applied correctly.

According to a new report, even small modifications to the way large language models (LLMs) are constructed and used could result in significant energy savings.The UNESCO report illustrates how developers and users may reduce AI's power consumption while maintaining excellent performance.
The report focuses on three major strategies: employing smaller models, keeping prompts and responses short, and using compression techniques to reduce model size.
Utilising small models: Smaller AI models, optimised for specialised purposes, can be just as accurate as large, general-purpose models while using significantly less power, sometimes up to 90% less. This is because they need fewer parameters, memory, and processing power to provide results. Smaller models are also faster, less expensive to run, and more convenient in areas with limited internet or computational resources.
Shorter prompts and responses: Long, conversational prompts or highly comprehensive responses cause AI to work harder, increasing energy use and costs. Keeping inputs and outputs simple can reduce energy consumption by more than 50%. However, suggestions must still be concise and precise in order to avoid errors, thus the goal is to eliminate unimportant words while retaining essential information.
Model Compression: Model compression is a technique for reducing the size and efficiency of AI models while maintaining high accuracy. It operates by reducing unnecessary components or simplifying the model's data storage and processing methods. One common technique is quantization, which reduces the precision of the values used in the model's calculations. For example, instead of employing large, detailed numbers which require more memory and power, the model employs smaller, simpler numbers. This makes the model lighter, faster, and more economical to run.
Quantization and other compression technologies can reduce energy use by up to 44%, save costs, and speed up operations. The trick is finding the correct balance. Over-compressing a model can significantly reduce its accuracy.
Experts caution that while efficiency is essential, there is always a cost to consider. Too much compression or too brief prompts can degrade performance. The most effective strategy is to adapt the model and method to the problem, beginning with simpler solutions and progressing to complex, general-purpose systems.
In artificial intelligence, more is not always better. Smarter design decisions, such as smaller models, efficient prompts, and compression, can lower costs, save energy, and make AI more approachable. When implemented correctly, these solutions maintain high performance while promoting sustainability and affordability for users and developers worldwide.
This article is based on information from Tech News World