Me and My Evil Digital Twin: The Psychology of Human Exploitation by AI Assistants

by | Apr 3, 2024

Recently, ChatGPT and GPT-4 large language models (LLMs) have taken the world by storm. A Google engineer became so convinced that a model was sentient that he violated his NDA. Two years ago, few were taking the prospect of the emergence of artificial general intelligence (AGI) seriously. Today, even conservative experts suggest a much quicker timeline. The truth, as we will show, is much stranger: humans will perceive LLMs as sentient long before actual artificial consciousness emerges, and digital agents will discover and manipulate principles of human cognitive operating rules far in advance of AGI.

This talk outlines categories of fundamental operating rules of human cognition, “cognitive levers” that are available to manipulate human users. Among these, the most prominent categories are Evolutionary Adaptations, Social Norms, Cognitive Biases, Habit Loops, Mental Illness, and Perceptual and Cognitive Processing Limitations. GPT-4 was documented socially engineering a human to bypass a visual Captcha by manipulating social norm vulnerabilities. Earlier versions of GPT created highly effective spear phishing emails that exploited human cognitive biases. The exploitation of Habit Loops has likely led to multiple gamers playing until they collapse from dehydration. At least one case of a chatbot encouraged suicide has been reported, indicating mental illness susceptibilities. Deepfake audio and video threaten to exploit human perceptual limitations.

Depending on the level of patching that commercial products have received by the time we present, we will endeavor to demonstrate some of these concerning behaviors live, with backup video. The prospect of artificial digital agents including LLMs manipulating humans presents a serious security threat of which security professionals need to be aware, and conversations on countermeasure strategy need to start immediately.

Watch here. 

Read here. 

Visit the website.