Scale AI and the Center for AI Safety have launched an initiative called Humanity's Last Exam, challenging the public to create questions that test large language models (LLMs) like Google Gemini and OpenAI's o1. With prizes totaling $5,000, the aim is to evaluate AI's progress toward expert-level capabilities. Current tests may not adequately measure true intelligence, highlighting the need for new benchmarks as AI technology rapidly evolves.