todayonchain.com

DeepSeek-R1 Hallucinates 4x More Than V3, Raising Red Flags for Crypto AI Agent Tokens

BeInCrypto
Vectara benchmarks show DeepSeek-R1 hallucinates significantly more than its predecessor, posing potential risks for autonomous crypto AI agents.

Summary

Research by Vectara using the HHEM 2.1 benchmark reveals that the DeepSeek-R1 reasoning model has a 14.3% hallucination rate, nearly four times higher than the 3.9% rate of DeepSeek-V3. This tendency to 'overhelp' and fabricate information creates significant operational risks for crypto AI agents, which rely on large language models for tasks like trading and on-chain execution. While developers explore mitigation strategies, the findings suggest that the trade-off between complex reasoning and factual accuracy could impact the reliability of AI-driven financial projects.

(Source:BeInCrypto)