First, thank you for commiting such a great IMO level evaluation dataset to the community. Each question is worth at least $200 on the market.
Problems created through crowdsourcing often suffer from numerical anomalies. In order to construct questions that are difficult enough to challenge large models, some contributors deliberately choose particularly cumbersome numbers. As a result, among the 50 published problems, several have overly complex numerical expressions in both the problem statements and the answers. Such numbers are too difficult to calculate for both humans and large models. Meanwhile, problems in the IMO tend to be relatively elegant.



First, thank you for commiting such a great IMO level evaluation dataset to the community. Each question is worth at least $200 on the market.
Problems created through crowdsourcing often suffer from numerical anomalies. In order to construct questions that are difficult enough to challenge large models, some contributors deliberately choose particularly cumbersome numbers. As a result, among the 50 published problems, several have overly complex numerical expressions in both the problem statements and the answers. Such numbers are too difficult to calculate for both humans and large models. Meanwhile, problems in the IMO tend to be relatively elegant.