News

lesswrong. com
lesswrong. com > posts > h Etky KGo Pp Fe Wn Kk X > gdm-ai-control-roadmap

GDM AI Control Roadmap " Less Wrong

1+ hour, 35+ min ago  (603+ words) GDM has published an AI Control Roadmap! From the executive summary: We present the GDM AI Control Roadmap (v0. 1) " our plan for implementing and adopting internal guardrails designed to catch potential adversarial behaviour by AI agents, even as they become increasingly…...

Symbols: d05.S0,u11.S0,z74.S0
lesswrong. com
lesswrong. com > posts > cn Hoj P3 Cc Ayc R7 D6 F > agents-are-under-elicited-a-case-study-in-optimization-tasks-1

Agents are under-elicited: A case study in optimization tasks " Less Wrong

15+ hour, 47+ min ago  (1344+ words) > "Knowing is not enough; we must apply. Willing is not enough; we must do." > > " Johann Wolfgang von Goethe "...

Symbols: yuan-1,gps-ai
lesswrong. com
lesswrong. com > posts > Ha Hzwvhb Wam4n8h JB > the-once-and-future-fable-3-fix-this-code

The Once And Future Fable #3: Fix This Code " Less Wrong

1+ day, 4+ hour ago  (1463+ words) The mainstream media continues to sleep on the most important story in the world. "...

Symbols: nyse:fix,cert-in
lesswrong. com
lesswrong. com > posts > Texab XFDJ8vz TBt2 P > can-public-chat-data-predict-real-world-ai-misalignments

Can public chat data predict real-world AI misalignments? " Less Wrong

1+ day, 14+ hour ago  (375+ words) This is an unofficial automated linkpost. Frontier AI models are increasingly used in settings with real economic, legal, and societal consequences. As a result, governments, AI safety organizations and independent researchers need ways to evaluate how these systems behave under…...

lesswrong. com
lesswrong. com > posts > CRfw CQc ALQ2 Hp Ckgj > the-dual-use-gap

The Dual-Use Gap " Less Wrong

3+ day, 20+ hour ago  (126+ words) TLDR: So there has been recent discourse on ", and recent news of major cyber attacks that were done with the help of AI. The missing frame here is...

Symbols: ted-ai,fim-92,cafe-3
lesswrong. com
lesswrong. com > posts > e AMyx M28h Np4ew Gd T > don-t-just-aim-for-frontier-labs

Don't just aim for Frontier Labs " Less Wrong

4+ day, 13+ hour ago  (1094+ words) Why AI safety should live wherever AI is deployed, not just where it is built. "...

Symbols: nyse:pl
lesswrong. com
lesswrong. com > posts > AMTx3 EBBgy Gc32t KH > aml-for-ai-as-a-verification-mechanism

AML for AI as a verification mechanism " Less Wrong

5+ day, 6+ hour ago  (305+ words) Something similar either already exists or is being developed. As far as I know, one example is the Semi Analysis AI Datacenter Model[2], although it is only available for a large amount of money. Some acquaintances of mine from Ukraine…...

Symbols: tpc-ds
lesswrong. com
lesswrong. com > posts > LCT7w K8q4 QLBod Q4 F > short-timelines-favor-control-long-timelines-favor

Short Timelines Favor Control, Long Timelines Favor Infrastructure Security " Less Wrong

5+ day, 18+ hour ago  (645+ words) My background is in vulnerability research and critical infrastructure security, including work on anti-tampering and attestation systems. This post reflects my current thinking on how AI verification and AI security relate, and how the highest-EV focus shifts under different timeline…...

Symbols: btc-usd
lesswrong. com
lesswrong. com > posts > ALBd BRas7 G3kdb PSq > the-quest-to-find-the-next-big-communicators-in-ai-safety

The Quest To Find The Next Big Communicators In AI Safety " Less Wrong

5+ day, 22+ hour ago  (397+ words) In'September 2025, I'd become increasingly convinced that a'fieldbuilding'program for content creators could solve a'long-standing bottleneck'of expa...

Symbols: btc-usd
lesswrong. com
lesswrong. com > posts > TTHi7y Nheaoep WKf R > reward-hacking-at-the-1937-world-s-fair

Reward Hacking at the 1937 World's Fair " Less Wrong

6+ day, 38+ min ago  (31+ words) The "Paris 1937 World's Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an oppo...

Symbols: btc-usd,eth-usd