GUIDE How to Build AI Benchmarks that Evolve with your Model