The Center for AI Safety (CAIS) and Scale AI are inviting the public to contribute questions for what they are calling “the world’s most difficult artificial intelligence test.” The initiative aims to address concerns that existing AI evaluations have become too simplistic, failing to accurately gauge the advancements in artificial intelligence.
In a statement, the quiz creators highlighted the shortcomings of current tests, noting that they no longer effectively measure how close AI systems are to reaching expert-level proficiency. “Existing tests have become too easy, making it difficult to track the true progress and capabilities of AI,” they said.
A few years ago, AI systems often provided random or nonsensical answers to exam questions. However, recent developments have shown significant improvements. OpenAI’s latest model, known as OpenAI o1, recently set new benchmarks by outperforming many popular reasoning tests, according to Dan Hendrycks, executive director of CAIS.
Despite these advancements, AI still faces challenges in tackling complex research questions and other intellectual inquiries. Additionally, the technology struggles with tasks involving planning and visual pattern-recognition puzzles, as noted in the Stanford University AI Index Report from April.
The new AI test is designed to push the boundaries of what artificial intelligence can achieve and provide a more accurate assessment of its capabilities. By incorporating challenging questions from the public, the creators hope to better understand the limitations and potential of current AI systems.
As AI continues to evolve and integrate into various aspects of society, the ability to effectively measure and evaluate its progress becomes increasingly important. The launch of this new test represents a critical step in ensuring that AI development is aligned with its growing role in technology and research.
The CAIS and Scale AI’s initiative marks a significant effort to enhance our understanding of AI’s capabilities and limitations, aiming to set new standards for evaluating artificial intelligence in the future.