Automated Alignment Researchers: Using large language models to scale scalable oversight
Large language models' ever-accelerating rate of improvement raises two particularly important questions for alignment research.
40 articles
New research highlights that top AI models continue to struggle with clinical reasoning despite growing medical applications, while Anthropic's Claude Mythos raises significant cybersecurity concerns prompting CISOs to overhaul security programs. Meanwhile, Anthropic is reportedly in talks to establish a hyperscale data center in Southeast Michigan, signaling continued aggressive infrastructure expansion in the AI sector.
Large language models' ever-accelerating rate of improvement raises two particularly important questions for alignment research.
In a recent interview, Marc Succi, MD, discussed findings from a new study examining the clinical reasoning capabilities of 21 large language models (LLMs),...
Researchers tested 21 frontier large language models on 29 stepwise MSD Manual clinical vignettes and found that, although many models performed well on...
Experts warn AI-driven cyber threats outpace defenses; current guidance may be insufficient.
Anthropic is the intended end user of a controversial hyperscale data center proposed in Lyon Township, a source close to the matter told Crain's.
Could Claude Mythos Preview, Anthropic's latest large language model, be leveraged for fully automated cyber attacks?
NVIDIA Released Audio Flamingo Next (AF-Next): A Super Powerful and Open Large Audio-Language Model (LALM)
Every time a user asks an AI assistant for a product recommendation, service comparison, or vendor shortlist, the underlying language model makes a choice...
LLMs forget everything between sessions. This post shows how to add episodic, semantic, and procedural memory to an AI agent using Spring AI and a single...
Discover the 5 essential developer tips from the Google Cloud AI Agent Bake-Off for building production-grade AI. Learn how to transition from basic prompts...
Weird AI use cases are unexpected, unconventional or experimental applications of artificial intelligence; They often highlight emerging capabilities like...
Learn to find and exploit real-world agentic AI vulnerabilities through five progressive challenges in this free, open source game that over 10,000...
KnowBe4 has launched Agent Risk Manager, a security product for autonomous AI agents aimed at organisations using them in operational workflows.
There's a lot of hype about AI agents. If you are a lawyer in a big law firm, the story goes, you will soon have a team of AI agents that you can use to...
Growmark's new AI agent in the myFS Agronomy app delivers faster, data-driven insights to improve crop decisions and farm profitability.
Find out why Meta's proposed CEO AI agent signals a shift toward exec-level autonomous agents, exposing governance and compliance gaps that CIOs must...
Adoption of artificial intelligence (AI) agents amongst Asia-Pacific (APAC) consumer businesses is set to rise from 29% to 76% within two years.
Boycotts, sabotage and other types of civil disobedience have long served collective action against injustice.
AI-assisted targeting has featured prominently in recent conflicts, such as the Russia-Ukraine war, the Israel-Gaza war, and the U.S.-Israel-Iran war.
Physical attacks on Sam Altman signal dangerous new phase in AI development debate.
Steve Sosnik (The Compound and Friends) frames the pending OpenAI and SpaceX IPO wave as “an existential risk” to passive fund mechanics, arguing that...
Artificial intelligence is pervasive, controversial — and featured in 'The AI Doc,' in theaters now.
OpenAI CEO's home hit twice in week as AI safety concerns turn violent.
AI coding assistants are quickly becoming part of everyday development. Tools like Cursor, Claude Code, and GitHub Copilot can now do more than suggest code...
Learn to find and exploit real-world agentic AI vulnerabilities through five progressive challenges in this free, open source game that over 10,000...
In addition to integrating its assortment with the chatbots, the bridal retailer will audit its inventory to make it more suitable for AI shopping...
A fast-growing AI-powered workplace search platform that competes with Microsoft CoPilot is growing in Silicon Valley through a sublease deal with a medical...
Microsoft is creating an OpenClaw-inspired AI agent for Microsoft 365 Copilot that emphasizes enterprise security and proactive assistance.
The company is also developing AI agents for various professional roles; it's expected to highlight the tools at its Build developer conference in June.
Tom Mikluch, Head of AI and Technology Strategy at Fiserv, shared how the company is embedding AI across its workforce, platforms, and products,...
Stop settling for Copilot. Learn how to integrate Claude's superior reasoning directly into Microsoft Word to transform your writing workflow and bypass AI...
Microsoft announced its image generation AI, ' MAI-Image-2-Efficient, ' on April 14, 2026. MAI-Image-2-Efficient is touted as being cheaper and faster than...
In early April 2026, a model most of us had never heard of took the top spot on the Artificial Analysis Video Arena for both text-to-video and...
Blockchain.com has released a fully AI-generated commercial for the 2025 NFL season that will begin running Sunday during the Cowboy's 2025 home opener.
Google is testing "lazy loading," which may speed up page-loading times on Chromium-based browsers, loading elements of a page only when they need to be...
Vogue Business brings you a weekly update of the most interesting stories in the world of AI that you need on your radar.Stay tuned as we spotlight AI...
A previously unknown AI video model called 'Happy Horse 1.0' has suddenly risen to the top of the Artificial Analysis Video Arena leaderboard,...
Prime Video has picked up a second season of 'Young Sherlock,' starring Hero Fiennes Tiffin as the title character.
This video generation model promises to give you complete control over the video and to avoid the common pitfalls of AI video generation.