In a new paper published Thursday titled "Auditing language models for hidden objectives," Anthropic researchers described how custom AI models trained to deliberately conceal certain "motivations ...
The work, which also included researchers from Arizona State University, Cornell University, and the University of Iowa, blends electron microscopy with AI to enable scientists to see the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results