GPT-4o’s Chinese token-training data is polluted by spam and porn websites
technologyreview.comby Zeyi Yang • 4 months ago
The problem, which is likely due to inadequate data cleaning, could lead to hallucinations, poor performance, and misuse.Soon after OpenAI released