Authors: Guangyu Shen, Siyuan Cheng, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Lu Yan, Zhuo Zhang, Shiqing Ma, Xiangyu Zhang
Published on: February 08, 2024
Impact Score: 8.22
Arxiv code: Arxiv:2402.05467
Summary
- What is new: Introduces RIPPLE, an optimization-based method utilizing subconsciousness and echopraxia for generating jailbreaking prompts.
- Why this is important: Aligned Large Language Models (LLMs) are vulnerable to jailbreaking prompts that bypass safety measures.
- What the research proposes: RIPPLE exploits psychological concepts to rapidly generate diverse and efficient prompts that outperform current methods.
- Results: Achieved a 91.5% Attack Success Rate across 10 LLMs, outperforming competitors by up to 47.0% with reduced overhead.
Technical Details
Technological frameworks used: Optimization-based method
Models used: 6 open-source LLMs, 4 commercial LLM APIs
Data used: Not specified
Potential Impact
Technology firms deploying LLMs in sensitive areas, cybersecurity companies.
Want to implement this idea in a business?
We have generated a startup concept here: SafeGuardAI.
Leave a Reply