ChatGPT, Claude stumble in attention test, raising questions for AGI
ChatGPT and Claude, among the latest large language models, performed worse than expected in the Stroop test, a psychology experiment used to gauge human... digitaltoday.co.kr
32 articles
Anthropic made headlines by urging a global pause in AI development and warning of critical "self-improvement" risks, while the enterprise AI agent race intensified with Meta, Google Cloud-IBM, Microsoft, and NVIDIA all announcing major agentic AI deployments and infrastructure pushes. In a notable scientific breakthrough, Princeton researchers successfully applied machine learning to prevent plasma instabilities in fusion tokamaks at commercial-scale conditions for the first time.
We're reaching an inflection point where AI agents have stopped being a research curiosity and become operational infrastructure. The question is no longer whether they'll be deployed—they already are—but whether we understand the risks well enough to deploy them safely.
The past year has given us a crash course in that problem. A full year of red teaming against agentic systems has surfaced a meaningful shift in how we should categorize failure modes. We're not talking about hallucinations or prompt injection in the old sense anymore. When an AI agent runs autonomously across multiple steps, maintaining state and calling external tools, the failure surface becomes fundamentally different. An error in step three compounds into step five. A tool call made without proper validation cascades. I find this distinction important because it changes where we focus defensive effort.
Meanwhile, the practical deployment is accelerating hard. Meta, Google Cloud and IBM, NVIDIA—the infrastructure players are all racing to make agents easier to run at scale. Arena's research shows the adoption pattern is clear: people use agents most heavily in professional contexts, especially in tech. That makes sense. A developer can supervise an agent writing code or querying a database in ways a non-technical user cannot. But supervision and control weaken as agents become more autonomous and long-running. NVIDIA's work on efficiency for multi-turn agents is genuinely useful, but it's also greasing the wheels for deployment before we've fully solved the safety question.
By the way, Anthropic's recent warning about self-improvement deserves more than a dismissive eye-roll. The company is flagging a specific concern: models approaching the capability threshold where they could improve themselves without explicit human intervention. Whether that's imminent or still years away depends on your model of capability progression, but it's not paranoia to think about it now. The fusion researchers at Princeton just demonstrated something interesting in the other direction—using machine learning to prevent instabilities in tokamaks at commercial scale. That's agents working as intended: augmenting human expertise in a domain where the stakes are physical and measurable.
The tension is real. We need agents deployed to solve actual problems—fusion energy, drug discovery, complex logistics. We also need to genuinely understand what breaks when they fail. The red teaming is helping with the latter, but it's a race against the rate of deployment. I'm watching whether the safety work keeps pace or becomes an afterthought once these systems are too embedded to pause.
ChatGPT and Claude, among the latest large language models, performed worse than expected in the Stroop test, a psychology experiment used to gauge human... digitaltoday.co.kr
The imperative for Large Language Model (LLM) agents to adapt and learn continuously in dynamic, interactive environments is clear. startuphub.ai
Large language models (LLMs), which are the artificial intelligence (AI) systems behind modern chatbots, translation tools, and virtual assistants,... techxplore.com
Anthropic is regarded as a giant among AI companies, but perhaps what it really excels in is anthropomorphism. Earlier this year, the company released an... theatlantic.com
A surge in real-world attacks against agentic AI systems is reshaping how we think about risk. Based on 12 months of red teaming, this update introduces... microsoft.com
A San Francisco start-up called Arena found that people are most likely to use A.I. agents on the job, particularly if they are in the tech industry. nytimes.com
Meta has unveiled an AI tool designed to help grow companies by automating various tasks and services. The Facebook-Instagram-WhatsApp parent company said... aibusiness.com
Google Cloud's AI agent technology and IBM's AI-powered consulting platform joint forces to help Google-IBM customers scale agentic AI into production. crn.com
Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete… developer.nvidia.com
Agentic AI, embedded across end-to-end workflows, is emerging as a critical enabler of a more autonomous SCM operating model. news.sap.com
Third-party AI service Poke was just approved for use in Apple's Messages app on iPhone, bringing an AI agent directly into iMessage for the first time. 9to5mac.com
New infrastructure lets US Visa cardholders pay via AI agents while keeping credentials tokenised, reducing fraud risk and ensuring PCI-compliant security. fintechmagazine.com
The $1 trillion startup warns artificial-intelligence models are nearing capability to improve without human intervention. wsj.com
Deadline: June 7, 2026. Applications for the MATS Program Autumn 2026 are now open. The MATS Program is a 10 to 12-week research fellowship designed to... opportunitydesk.org
Democrat Mallory McMorrow has released an unusually detailed AI agenda. Will it be a vote winner? transformernews.ai
A surge in real-world attacks against agentic AI systems is reshaping how we think about risk. Based on 12 months of red teaming, this update introduces... microsoft.com
Autopilots increase the level of AI autonomy and automation. Build event also places major focus on agentic AI development -- and controls -- at the... cloudwars.com
Microsoft launches Scout, an autonomous AI assistant that manages workplace tasks, schedules meetings, and coordinates projects across applications. americanbazaaronline.com
Microsoft Corporation (NASDAQ: MSFT), which carries an upside potential of 24.4% and strong hedge fund and Wall Street backing, is moving to prove it can. foreignpolicyjournal.com
Microsoft announced a lineup of AI models and tools this week that it hopes will hook users—and allow it to outgrow its dependence on AI competitors. techbrew.com
Suralink's Agent Library includes five powerful new agents that enable firm and client users to deliver their most efficient engagements ever. cpapracticeadvisor.com
xAI has released "grok-imagine-video-1.5-preview," an image-to-video model that turns still images into cinematic videos at up to 720p based on text prompts... the-decoder.com
ai labeling - The new AI disclosure labels on YouTube are designed to help viewers quickly identify content that has been created or significantly modified... trendhunter.com
Nvidia is moving deeper into humanoid robotics, combining robot hardware, secure computing, world models and developer platforms as AI shifts from digital... digitimes.com
1X launches the World Model Lab, a new embodied AI research organization led by Sam Sinha to accelerate autonomous humanoids through large-scale world model... 1x.tech
It may appear that humanoid robots capable of handling any task have almost arrived—especially when tech companies showcase them performing acrobatic feats... arstechnica.com
Researchers at the U.S. Department of Energy's Princeton Plasma Physics Laboratory have cleared a significant hurdle in the pursuit of commercial fusion. energiesmedia.com
LifeSkill framework enables LLM agents to continuously learn from test-time feedback, significantly improving performance on long-horizon tasks by... startuphub.ai
Database startup Supabase announced a $500 million funding round that values the company at $10.5 billion, including the fresh capital. cnbc.com
Suno, the AI music-generation startup, raised more than $400 million in a Series D funding round at a $5.4 billion post-money valuation, the company said on... qz.com
Nvidia on Monday unveiled its RTX Spark chip for Windows laptops at its GTC Taipei conference. finance.yahoo.com
Sen. Elizabeth Warren is pressing Jensen Huang's Nvidia over export controls, China sales, Trump and data-center policy as Congress scrutinizes the AI chip... cnbc.com