AI safety tests found to rely on 'obvious' trigger words; with easy rephrasing, models labeled 'reasonably safe' suddenly fail, with attacks succeeding up to 98% of the time. New corporate research ...
Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with New Scientist events and special offers. Download the app ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果