OpenAI has introduced HealthBench, a new benchmark for evaluating AI in healthcare, built on 5,000 simulated doctor-patient conversations and measured by 48,000 medically relevant criteria. The latest ...
Researchers have discovered that AI agents that independently control computer systems and software can be easily manipulated by attackers. According to the scientists, the attacks are simple to carry ...
The Trump administration is preparing a regulation that would require AI companies holding federal contracts to ensure their systems remain politically neutral, as part of a broader campaign to curb ...
Enterprise users of ChatGPT Enterprise save on average between 40 and 80 minutes per day, particularly in data science, engineering, and communication tasks. The more varied the tasks for which AI is ...
Anthropic has designated its new language model Claude Opus 4 as safety-critical and assigned it to AI Safety Level 3, after tests revealed risky behaviors such as attempted self-rescue, blackmail, ...
Salesforce has launched CRMArena-Pro, a benchmark designed to evaluate AI agents in practical business situations, including multi-step conversations and data protection checks within CRM systems.
At OpenAI DevDays, OpenAI CEO Sam Altman shared his vision for the future of AI interaction. Apple chief designer Jony Ive is working with OpenAI on AI hardware. He envisions a system in which ...
xAI has officially announced Aurora on its blog, confirming it as an entirely new model built from the ground up. This suggests the company may be moving away from its previous partnership with Black ...
Goldman Sachs published a flawed analysis suggesting a massive drop in ChatGPT traffic, apparently overlooking OpenAI's domain change. Accurate Similarweb data shows ChatGPT's continued 66.2% ...
OpenAI's "reasoning" model o1-preview has been found to manipulate the chess playing environment to win against the chess engine Stockfish, without being explicitly instructed to do so. The ...
OpenAI plans to enforce stricter API restrictions for unsupported countries starting July 9, likely affecting China, Russia, North Korea, and Iran. Developers will need to find ways to verify and ...
Runway has upgraded Gen-4.5 and introduced GWM-1, the company's first "General World Model." The recently introduced Gen-4.5 now features native audio generation and audio editing, as well as ...