In the world of generative AI, it’s a battle of computing power and getting the fastest and most powerful chips. Now, AI edge company Kneron announced it will ship its new neural processing units (NPU) chips by the end of the year.
Kneron said the NPU chips, called the KL730, would make it cheaper to run large language models (LLMs) as the processor is built specifically for machine learning and AI applications.
The KL730 is the next generation of previous processors from Kneron. In 2021, the company shipped out the KL530 chips that supported transformer models that underpinned some generative AI models.
Albert Liu, CEO of Kneron, tells The Verge that NPU chips are specifically designed for AI and aren’t forcing something originally made for processing graphics to work for it — an implicit dig at reigning AI chipmaker Nvidia.
“I will say that if you have a pretty powerful and lightweight chip like ours, then you can bring a powerful transformer model like GPT to many kinds of devices,” Liu said.
Liu would not disclose the price of the KL730 but notes that users of its KL530 chip saw a 75 percent drop in operating costs compared to GPU chips.
Most AI companies and cloud providers flock to Nvidia’s H100 Tensor Core GPU chips, as people believe GPUs are the most accessible processors capable of compiling the calculations needed to run generative AI models. But even with that power, it usually takes a lot of H100s to run one large language model, so users have to “break up” the model to get it to run.
Even so, prices for the H100 soared to roughly $40,000 per chip as demand continued to grow. Nvidia already announced plans to release a more powerful AI chip in the second quarter of 2024. Competitors are already waiting in the wings, with AMD planning to release its own AI chips in the fourth quarter of this year.
Kneron said the KL730 “yields a three to four times leap” in energy efficiency compared to previous chips and has a base-level compute power starting at 0.35 tera operations per second.
The company said the new chip also allows users to run LLMs fully offline without the need to connect to a cloud provider and handle data more securely.