DeepSeek might not be such good news for energy after all

Aatube@kbin.melroy.org · 2 days ago

DeepSeek might not be such good news for energy after all

JohnDClay@sh.itjust.works · 2 days ago

The original claims of energy efficiency came from mixing up the energy usage of their much smaller model with their big model I think.

peanuts4life@lemmy.blahaj.zone · 2 days ago

This article is comparing apples to oranges here. The deepseek R1 model is a mixture of experts, reasoning model with 600 billion parameters, and the meta model is a dense 70 billion parameter model without reasoning which preforms much worse.

They should be comparing deepseek to reasoning models such as openai’s O1. They are comparable with results, but O1 cost significantly more to run. It’s impossible to know how much energy it uses because it’s a closed source model and openai doesn’t publish that information, but they charge a lot for it on their API.

Tldr: It’s a bad faith comparison. Like comparing a train to a car and complaining about how much more diesel the train used on a 3 mile trip between stations.

Aatube@kbin.melroy.org · 2 days ago

It’s more like comparing them while they use the same fuel (as the article directly compares them in joules): Let’s say the train also uses gasoline. The car is a far more “independent”, controllable, and “doesn’t waste fuel driving to places you don’t want to go” and thus seen as “better” and more appealing, but that wide appeal and thus wide usage creates far more demand for gasoline, dries up the planet, and clogs up the streets, wasting fuel idling at traffic stops.

peanuts4life@lemmy.blahaj.zone · 2 days ago

Yeah, I was thinking diesel powered trains

Aatube@kbin.melroy.org · 2 days ago

The AI models use the same fuel for energy.

peanuts4life@lemmy.blahaj.zone · edit-2 2 days ago

Yes, sorry, where I live it’s pretty normal for cars to be diesel powered. What I meant by my comparison was that a train, when measured uncritically, uses more energy to run than a car due to it’s size and behavior, but that when compared fairly, the train has obvious gains and tradeoffs.

Deepseek as a 600b model is more efficient than the 400b llama model (a more fair size comparison), because it’s a mixed experts model with less active parameters, and when run in the R1 reasoning configuration, it is probably still more efficient than a dense model of comparable intelligence.

DeepSeek might not be such good news for energy after all

DeepSeek might not be such good news for energy after all

archive.ph