The outcome of those tests, detailed in a new paper, was damning. Not a single AI agent was able to perform more than 3 ...