FrontierMath, a new benchmark from Epoch AI, challenges advanced AI systems with complex math problems, revealing how far AI still has to go before achieving true human-level reasoning.
The researchers started with the GSM8K's standardized set of 8,000 grade-school level mathematics ... without changing the problem logic and dubbed it the GSM-Symbolic test. The first set saw ...