PsyScam: A Benchmark for Psychological Techniques in Real-World Scams

PsyScam: A Benchmark for Psychological Techniques in Real-World Scams

This research paper presents PsyScam, the first benchmark designed to identify and analyze psychological techniques used in real-world online scams by combining scam report data with cognitive and persuasion theories. It introduces a human–LLM collaboration method to label scam reports and evaluates the benchmark through three tasks: psychological technique classification, scam message completion, and scam message augmentation. #PsyScam #PsychologicalTechniques #ScamDetection

Keypoints

  • PsyScam is a comprehensive dataset collected from six scam reporting platforms covering various scam types and includes human-verified annotations of nine psychological techniques (PTs) used by scammers.
  • The taxonomy of PTs is grounded in established psychological theories, including Cialdini’s principles of persuasion, Prospect Theory, and the Elaboration Likelihood Model.
  • A novel human–LLM collaborative annotation process efficiently extracts accurate PT labels from real-world scam reports, combining LLM candidate extraction with human verification.
  • Three downstream tasks defined using PsyScam are: multi-label classification of PTs in scam texts, generation of scam message completions reflecting specific PTs, and augmentation of scam texts to include new PTs.
  • Evaluation shows that modern models like RoBERTa outperform traditional and large language models (LLMs) in PT classification, but all models face challenges due to complexity and multi-label nature of scams.
  • LLMs show moderate ability to generate scam content that reflects psychological techniques, with better performance in message augmentation than completion.
  • PsyScam helps reveal the subtle role of psychological manipulation in scams and points towards better detection, synthetic data generation, and real-time prevention strategies based on PT analysis.

What is this about?
This paper introduces PsyScam, a new research resource that focuses on the psychological tricks scammers use in online scams. Instead of only looking at the technical details of scams, it studies how scammers influence victims’ minds through various psychological techniques by analyzing real scam reports from multiple sources. The research also explores methods to detect and generate scam messages based on these psychological techniques using advanced language models combined with human expertise.

What problem does it solve?
Existing scam detection efforts mainly focus on technical patterns or rely on artificial examples generated by computers, which don’t always capture how scammers manipulate people emotionally and mentally. Without understanding these psychological tricks, security systems can miss subtle cues scammers use to fool victims or fail to recognize new or changing scam tactics. PsyScam fills this gap by providing a detailed database of real scams labeled with the psychological methods they use, helping developers and analysts better identify and respond to these threats.

What’s the idea?
The researchers gathered thousands of real scam reports and categorized the psychological techniques scammers use into nine clear groups (like creating urgency or impersonating authority). To label the data faster and more accurately, they combined the strengths of large language models (LLMs) to suggest possible techniques and humans to verify those suggestions carefully. They then designed three tasks to test how well AI models can recognize these psychological tricks, generate scam messages using them, or rewrite scams to include new psychological tactics.

How does it work?
The team collected data from six scam reporting platforms with diverse scam types and cleaned it by removing duplicates and too-short reports. They built a taxonomy of nine psychological techniques based on trusted psychology theories. The annotation process first uses LLMs with special prompts to find candidate techniques inside scam texts. Humans then check and refine these labels to ensure accuracy. Using this annotated data, they ran experiments on three tasks: 1) PT classification (identifying all psychological techniques in a scam message), 2) scam completion (letting AI complete a scam message to reflect given techniques), and 3) scam augmentation (rewriting scams to add new techniques).

What did they find?
In testing, RoBERTa-based models were best at classifying psychological techniques, achieving strong recall and F1 scores. Large language models struggled more in classification when used without human help. For message generation tasks, AI models showed moderate success at incorporating psychological techniques but had low similarity scores compared to original texts, likely because they used different phrasing. AI was better at rewriting scams to add new techniques than generating completions from scratch. Adding more psychological techniques made these generation tasks harder. Human review also revealed that while AI captures techniques, sometimes the generated scam texts can sound unrealistic or unnatural.

Why is this important?
PsyScam teaches how social engineering and psychology play a big role in scams beyond just technical hacking. It shows the importance of looking at human behavior and persuasion techniques when trying to detect or prevent scams. This research also highlights the challenges AI faces in understanding subtle language cues, encouraging aspiring security analysts to combine technical skills with knowledge of psychology. PsyScam offers a valuable resource to practice analyzing scams and understanding what makes their messages convincing.

In short (summary)
PsyScam is a pioneering benchmark that brings together real-world scam data and psychological theory to better understand how scammers manipulate victims. By combining human insight with AI, it helps improve tools to detect and generate scam content based on emotional and cognitive tricks rather than just technical clues. This work advances cybersecurity by focusing on the human side of scams, offering learners and professionals new ways to identify, study, and combat online fraud more effectively in real life.

The content featured on this site is sourced from arXiv.org, a free distribution service and open-access archive hosting over 2.4 million scholarly articles across a wide range of disciplines. This collection specifically highlights articles focused on cybersecurity, particularly topics relevant to threat intelligence and Security Operations Center (SOC) work.

Please note that materials on arXiv are not peer-reviewed, and are shared as preprints by the authors to foster early dissemination and feedback within the academic and professional community. I recommend using arXiv papers as a starting point for exploration and research, not as definitive sources. Always evaluate findings critically, and whenever possible, cross-check with peer-reviewed publications or operational validation.

Read more: https://arxiv.org/html/2505.15017v1