cross-posted from: https://lemmy.sdf.org/post/28910537
Researchers claim they had a ‘100% attack success rate’ on jailbreak attempts against Chinese AI DeepSeek
“DeepSeek R1 was purportedly trained with a fraction of the budgets that other frontier model providers spend on developing their models. However, it comes at a different cost: safety and security,” researchers say.
A research team at Cisco managed to jailbreak DeepSeek R1 with a 100% attack success rate. This means that there was not a single prompt from the HarmBench set that did not obtain an affirmative answer from DeepSeek R1. This is in contrast to other frontier models, such as o1, which blocks a majority of adversarial attacks with its model guardrails.
…
In other related news, experts are cited by CNBC that DeepSeek’s privacy policy “isn’t worth the paper it is written on."
…
Why do you care? It’s entirely open source and you can download the whole thing and run it on your own hardware for $2,000.
https://digitalspaceport.com/how-to-run-deepseek-r1-671b-fully-locally-on-2000-epyc-rig/
@Onno
No, it’s not entirely open source as the datasets and code used to train the model are not.
AI safety still matters and is arguably more important for open-weights models.
Doesn’t change that OpenAI pissed away $200bn making shitty models