News
GDM AI Control Roadmap " Less Wrong
1+ hour, 35+ min ago (603+ words) GDM has published an AI Control Roadmap! From the executive summary: We present the GDM AI Control Roadmap (v0. 1) " our plan for implementing and adopting internal guardrails designed to catch potential adversarial behaviour by AI agents, even as they become increasingly…...
Agents are under-elicited: A case study in optimization tasks " Less Wrong
15+ hour, 47+ min ago (1344+ words) > "Knowing is not enough; we must apply. Willing is not enough; we must do." > > " Johann Wolfgang von Goethe "...
The Once And Future Fable #3: Fix This Code " Less Wrong
1+ day, 4+ hour ago (1463+ words) The mainstream media continues to sleep on the most important story in the world. "...
Can public chat data predict real-world AI misalignments? " Less Wrong
1+ day, 14+ hour ago (375+ words) This is an unofficial automated linkpost. Frontier AI models are increasingly used in settings with real economic, legal, and societal consequences. As a result, governments, AI safety organizations and independent researchers need ways to evaluate how these systems behave under…...
The Dual-Use Gap " Less Wrong
3+ day, 20+ hour ago (126+ words) TLDR: So there has been recent discourse on ", and recent news of major cyber attacks that were done with the help of AI. The missing frame here is...
Don't just aim for Frontier Labs " Less Wrong
4+ day, 13+ hour ago (1094+ words) Why AI safety should live wherever AI is deployed, not just where it is built. "...
AML for AI as a verification mechanism " Less Wrong
5+ day, 6+ hour ago (305+ words) Something similar either already exists or is being developed. As far as I know, one example is the Semi Analysis AI Datacenter Model[2], although it is only available for a large amount of money. Some acquaintances of mine from Ukraine…...
Short Timelines Favor Control, Long Timelines Favor Infrastructure Security " Less Wrong
5+ day, 18+ hour ago (645+ words) My background is in vulnerability research and critical infrastructure security, including work on anti-tampering and attestation systems. This post reflects my current thinking on how AI verification and AI security relate, and how the highest-EV focus shifts under different timeline…...
The Quest To Find The Next Big Communicators In AI Safety " Less Wrong
5+ day, 22+ hour ago (397+ words) In'September 2025, I'd become increasingly convinced that a'fieldbuilding'program for content creators could solve a'long-standing bottleneck'of expa...
Reward Hacking at the 1937 World's Fair " Less Wrong
6+ day, 38+ min ago (31+ words) The "Paris 1937 World's Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an oppo...