AI Benchmarks...

We built an agent that helped us hack eight benchmarks. We achieved near-perfect scores on all of them without solving a single task.

https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/