2026-04-17_1776462172

AI Benchmarks...

We built an agent that helped us hack eight benchmarks. We achieved near-perfect scores on all of them without solving a single task.