New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously
Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margi…