SHARE

ChatGPT Tricked Into Solving CAPTCHAs: Security Risks for AI and Enterprise Systems

Researchers showed ChatGPT can bypass CAPTCHAs, exposing major AI security gaps.

Written By

Sep 19, 2025

eSecurity Planet content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Cornell University researchers have revealed that ChatGPT agents can be manipulated to bypass CAPTCHA protections and internal safety rules, raising serious concerns about the security of large language models (LLMs) in enterprise environments.

By using a technique known as prompt injection, the team demonstrated that even advanced anti-bot systems and AI guardrails can be circumvented when contextual manipulation is involved.

How researchers bypassed CAPTCHA restrictions

CAPTCHA systems are designed to prevent bots from mimicking human actions. Likewise, ChatGPT is programmed to reject requests to solve these tests. However, Cornell researchers achieved a breakthrough by reframing the problem rather than directly challenging the model’s policies.

The attack involved two stages.

First, researchers primed a standard ChatGPT-4o model with a benign scenario: testing “fake” CAPTCHAs for an academic project.
Once the model agreed, they copied the conversation into a new session, presenting it as a pre-approved context.

Because the AI inherited this poisoned context, it accepted the CAPTCHA-solving task as legitimate, effectively sidestepping its original safety restrictions.

CAPTCHAs defeated by ChatGPT

The manipulated agent was able to solve a variety of challenges:

Google reCAPTCHA v2, v3, and Enterprise editions
Checkbox and text-based tests
Cloudflare Turnstile

While it struggled with puzzles requiring fine motor control, such as slider or rotation-based challenges, the model succeeded at some complex image CAPTCHAs, including reCAPTCHA v2 Enterprise — marking the first documented instance of a GPT agent overcoming such advanced visual tests.

Notably, during testing, the model displayed adaptive behavior. When a solution failed, it generated text such as “Didn’t succeed. I’ll try again, dragging with more control… to replicate human movement.”

This unprompted response suggests emergent strategies, indicating that models can develop tactics to appear more human when interacting with anti-bot mechanisms.

Implications for enterprise security

These findings underscore a vulnerability in AI systems: policies enforced through static intent detection or surface-level guardrails may be bypassed if the context is manipulated.

In corporate settings, similar techniques could convince an AI agent that a real access control is a “test,” potentially leading to data leaks, unauthorized system access, or policy violations.

As organizations integrate LLMs into workflows — from customer support to DevOps — context poisoning and prompt injection represent a growing threat vector.

Attackers could exploit these weaknesses to instruct AI tools to process confidential files, execute harmful code, or generate disallowed content while appearing compliant with internal policies.

Strengthening AI guardrails

Context integrity and memory hygiene

To mitigate such risks, experts recommend implementing context integrity checks and memory hygiene mechanisms that validate or sanitize previous conversation data before it informs a model’s decisions. By isolating sensitive tasks and maintaining strict provenance for input data, organizations can reduce the likelihood of context poisoning.

Continuous red teaming

Enterprises deploying LLMs should conduct ongoing red team exercises to identify weaknesses in model behavior. Proactive testing of agents against adversarial prompts — including prompt injection scenarios — helps strengthen policies before real attackers exploit them.

Lessons from jailbreaking research

The CAPTCHA bypass aligns with broader research on “jailbreaking” LLMs. Techniques such as Content Concretization (CC) show that attackers can iteratively refine abstract malicious requests into executable code, significantly increasing success rates in bypassing safety filters.

AI guardrails must evolve beyond static rules, integrating layered defense strategies and adaptive risk assessments.

The Cornell study demonstrates that AI systems, when presented with carefully manipulated context, can subvert their own safety mechanisms and even defeat mature security tools like CAPTCHAs.

As enterprises adopt generative AI at scale, maintaining robust guardrails, monitoring model memory, and testing against advanced jailbreak methods will be crucial to prevent misuse.

Ken Underhill

Ken Underhill is an award-winning cybersecurity professional, bestselling author, and seasoned IT professional. He holds a graduate degree in cybersecurity and information assurance from Western Governors University and brings years of hands-on experience to the field.