Cloud Predictions Under Fire as Leaderboards Expose Flawed Distribution Shift
What: A new IBM paper, "Beyond Static Leaderboards", argues that the way we rank AI agents is broken: a leaderboard collapses each agent into one aggregate score and sorts by it. The fix it proposes is predictive validity — the rank correlation between a benchmark's ranking and the ranking you'd see





