GPT-4o’s Chinese token-training data is polluted by spam and porn websites

GPT-4o’s Chinese token-training data is polluted by spam and porn websites
technologyreview.com

by Zeyi Yang • 4 months ago

The problem, which is likely due to inadequate data cleaning, could lead to hallucinations, poor performance, and misuse.Soon after OpenAI released

Summarized in 80 words

Latest AI Tools

More Tech Bytes...