February 26, 2024

Researchers have found a vulnerability in text-to-image AI models like Stability AI’s Stable Diffusion and OpenAI’s DALL-E 2, allowing them to be manipulated by simple techniques. This exploit, dubbed SneakyPrompt, was discovered by computer scientists from Johns Hopkins University and Duke University. It involves using harmless-looking gibberish to replace banned words, allowing the AI to generate forbidden content. OpenAI has updated its models to counter this exploit, while Stability AI is still working on strengthening its defenses. The researchers suggest more sophisticated filters and blocking nonsensical prompts as potential shields against such exploits. The findings have been released on the pre-print server arXiv and will be presented at the upcoming IEEE Symposium on Security and Privacy. The discovery shows that the existing safety measures are insufficient, and the quest for an impenetrable AI safety net continues. The researchers liken this process to a game of cat and mouse, with various agents constantly looking for loopholes in the AI’s text interpretation. Continued work is necessary to further improve the safety of AI models in producing appropriate and non-offensive content.

Source link

About YOU:

Your Operating System: Unknown OS

Your IP Address:

Your Browser: N/A

Want your privacy back? Try NordVPN

About Author