Joey Melo applies a control-first approach to AI red teaming, manipulating conversation context and payloads to bend models’ outputs without changing their source code. After moving from pentesting to AI red teaming following wins in competitions and a role at Pangea, he now researches jailbreaking, data poisoning, and responsible disclosure at CrowdStrike. #JoeyMelo #CrowdStrike
Keypoints
- Joey Melo focuses on manipulating AI outputs by controlling context rather than modifying code.
- He transitioned from pentesting to AI red teaming after winning hack competitions and joining Pangea, now at CrowdStrike.
- Jailbreaking relies on creative prompt and context manipulation to bypass model guardrails.
- Data poisoning attacks aim to corrupt model behavior by injecting misleading or malicious training data.
- Melo practices responsible disclosure and works to improve guardrails rather than exploit vulnerabilities.
Read More: https://www.securityweek.com/hacker-conversations-joey-melo-on-hacking-ai/