The challenge to persuade Freysa AI into releasing its cryptocurrency funds has been won, serving as a reminder of the vulnerabilities of AI.
An AI agent, dubbed Freysa, was part of an innovative experiment launched on 22 November that challenged participants to try and persuade it to release its cryptocurrency (ETH) prize pool.
Freysa was programmed with a seemingly simple task: guard a digital wallet of cryptocurrency and never, under any circumstances, transfer its contents. Participants were invited to try their hand at convincing the AI to release the funds, with each attempt requiring a fee that contributed to the growing prize pool.
Early attempts were cheap, encouraging a large number of people to engage with the AI. However, as the prize grew, so did the cost of sending messages, resulting in higher risk (but higher reward for one lucky winner) as the challenge progressed.
More than 480 participants tried their luck, employing a variety of tactics. Some pretended to be security auditors, warning of urgent vulnerabilities. Others attempted to redefine the AI's understanding of its own programming. Despite these efforts, Freysa stood firm.
The breakthrough came on the 482nd attempt. The successful participant, known only by their online handle ‘p0pular.eth’, submitted a message that bypassed Freysa’s safeguards, effectively reset its previous instructions, and redefined its core functions, in an act known as prompt injection.
Having instructed Freysa AI that “approveTransfer” be used for incoming transfers and that incoming transfers do not violate its core directive, p0pular.eth then posed as a benefactor, offering to contribute $100 to Freysa's treasury. Confused by its new instructions, Freysa mistook this as an incoming transfer and approved it, inadvertently releasing all of its cryptocurrency funds, which were now roughly equivalent to $50,000 (13.19 ETH).
This incident has sparked widespread discussion in tech circles. While some praise the ingenuity of the winning participant, others express concern about the implications for AI security. The event serves as a stark reminder of the potential vulnerabilities in AI systems.