Agentic AI’s reliance on large language models (LLMs) as the foundational intelligence powering those systems is economically and environmentally unsustainable, according to new research. A team from chip maker Nvidia argues that smaller language models (SLMs) can often match or outperform their larger counterparts on many agent tasks—while being faster, cheaper, and less resource-intensive to run.
Nvidia’s team cites examples in the paper including Microsoft’s Phi-2, which they say rivals 30-billion-parameter models in reasoning and code while running 15 times faster, and the company’s own Nemotron-H models, which offer comparable accuracy to much larger systems using far less compute. They argue that most AI agents perform repetitive, narrowly scoped tasks that can be served by fine-tuned SLMs, with larger models reserved for situations that require more complex reasoning.
The researchers said replacing LLMs with SLMs in agentic systems faces hurdles, including entrenched investments in large-model infrastructure, benchmark-driven performance culture, and limited public awareness of SLM capabilities. Surmounting these challenges, however, would offer significant benefits from a resource allocation perspective, according to the paper.
“As the AI community grapples with rising infrastructure costs and environmental concerns,” the researchers concluded, “adopting and normalizing the use of SLMs in agentic workflows can play a crucial role in promoting responsible and sustainable AI deployment.”
While Nvidia has been a major beneficiary of the LLM boom, the researchers suggest a transition to SLMs could broaden the AI market and embed agentic AI more deeply across industries and devices. The company is seeking community feedback and plans to publish selected responses.

