China’s DeepSeek launches next-gen AI model. Here’s what makes it different

China’s DeepSeek launches next-gen AI model. Here’s what makes it different


Anna Barclay | Getty Images News | Getty Images

Chinese startup DeepSeek’s latest experimental model promises to increase efficiency and improve AI’s ability to handle a lot of information at a fraction of the cost, but questions remain over how effective and safe the architecture is.  

DeepSeek sent Silicon Valley into a frenzy when it launched its first model R1 out of nowhere last year, showing that it’s possible to train large language models (LLMs) quickly, on less powerful chips, using fewer resources.

The company released DeepSeek-V3.2-Exp on Monday, an experimental version of its current model DeepSeek-V3.1-Terminus, which builds further on its mission to increase efficiency in AI systems, according to a post on the AI forum Hugging Face.

“DeepSeek V3.2 continues the focus on efficiency, cost reduction, and open-source sharing,” Adina Yakefu, Chinese community lead at Hugging Face, told CNBC. “The big improvement is a new feature called DSA (DeepSeek Sparse Attention), which makes the AI better at handling long documents and conversations. It also cuts the cost of running the AI in half compared to the previous version.”

“It’s significant because it should make the model faster and more cost-effective to use without a noticeable drop in performance,” said Nick Patience, vice president and practice lead for AI at The Futurum Group. “This makes powerful AI more accessible to developers, researchers, and smaller companies, potentially leading to a wave of new and innovative applications.”

The pros and cons of sparse attention 

An AI model makes decisions based on its training data and new information, such as a prompt. Say an airline wants to find the best route from A to B, while there are many options, not all are feasible. By filtering out the less viable routes, you dramatically reduce the amount of time, fuel and, ultimately, money, needed to make the journey. That is exactly sparse attention does, it only factors in data that it thinks is important given the task at hand, as opposed to other models thus far which have crunched all data in the model.

“So basically, you cut out things that you think are not important,” said Ekaterina Almasque, the cofounder and managing partner of new venture capital fund BlankPage Capital.

Sparse attention is a boon for efficiency and the ability to scale AI given fewer resources are needed, but one concern is that it could lead to a drop in how reliable models are due to the lack of oversight in how and why it discounts information.

“The reality is, they [sparse attention models] have lost a lot of nuances,” said Almasque, who was an early supporter of Dataiku and Darktrace, and an investor in Graphcore. “And then the real question is, did they have the right mechanism to exclude not important data, or is there a mechanism excluding really important data, and then the outcome will be much less relevant?”

This could be particularly problematic for AI safety and inclusivity, the investor noted, adding that it may not be “the optimal one or the safest” AI model to use compared with competitors or traditional architectures. 

DeepSeek, however, says the experimental model works on par with its V3.1-Terminus. Despite speculation of a bubble forming, AI remains at the centre of geopolitical competition with the U.S. and China vying for the winning spot. Yakefu noted that DeepSeek’s models work “right out of the box” with Chinese-made AI chips, such as Ascend and Cambricon, meaning they can run locally on domestic hardware without any extra setup.

Deepseek trains breakthrough R1 model at a fraction of US costs

DeepSeek also shared the actual programming code and tools needed to use the experimental model, she said. “This means other people can learn from it and build their own improvements.”

But for Almasque, the very nature of this means the tech may not be defensible. “The approach is not super new,” she said, noting the industry has been “talking about sparse models since 2015” and that DeepSeek is not able to patent its technology due to being open source. DeepSeek’s competitive edge, therefore, must lie in how it decides what information to include, she added.

The company itself acknowledges V3.2-Exp is an “intermediate step toward our next-generation architecture,” per the Hugging Face post.

As Patience pointed out, “this is DeepSeek’s value prop all over: efficiency is becoming as important as raw power.”

“DeepSeek is playing the long game to keep the community invested in their progress,” Yakefu added. “People will always go for what is cheap, reliable, and effective.”



Source

The Tech Download: Agentic tools and chips take center stage at Nvidia’s ‘Super Bowl of AI’
World

The Tech Download: Agentic tools and chips take center stage at Nvidia’s ‘Super Bowl of AI’

This report is from this week’s The Tech Download newsletter. Like what you see? You can subscribe here. Nvidia’s yearly showcase event — dubbed the ‘Super Bowl of AI’ by some — kicked off at the start of the week to much fanfare across the tech sector. The event sees tens of thousands of attendees gather […]

Read More
U.S. Treasury yields edge higher as Iran war drives inflation pressure
World

U.S. Treasury yields edge higher as Iran war drives inflation pressure

U.S. Treasury yields edged slightly higher in early Friday trading as investors continue to navigate growing uncertainty over how the Middle East conflict is impacting the economy. The 10-year Treasury yield — the benchmark for U.S. government borrowing — rose 1.7 basis points to 4.3%. The yield on the 2-year Treasury note, which is more sensitive to […]

Read More
Banks eye three ECB rate hikes this year as former Governor says he sees no stagflation — yet
World

Banks eye three ECB rate hikes this year as former Governor says he sees no stagflation — yet

The Euro Sculpture at Willy-Brandt-Platz in the financial district of Frankfurt, Germany, on March 6, 2025. Bloomberg | Bloomberg | Getty Images Brokers now forecast multiple European Central Bank interest rate hikes this year as the specter of higher inflation and lower growth piles pressure on central banks to act.  J.P. Morgan, Morgan Stanley and […]

Read More