Your AI-Generated Password Could Be Cracked in Hours: Why ChatGPT and LLMs Make Terrible Random Number Generators

When users ask ChatGPT, Llama, or DeepSeek to generate a secure password, they expect something truly random — a string of characters that would take centuries to brute-force. But new research reveals a disturbing reality: passwords generated by large language models are far less random than they appear, and many can be cracked in under an hour using a standard GPU.
The findings, published by security researcher Alexey Antonov of Kaspersky, demonstrate a fundamental flaw in how AI models approach the concept of randomness. While the passwords these models produce may look strong to the human eye — mixing uppercase and lowercase letters, numbers, and special characters — they contain subtle statistical patterns that dramatically reduce the search space for attackers. The research has sent ripples through the cybersecurity community, raising urgent questions about the growing reliance on AI tools for security-sensitive tasks.
The Illusion of Randomness in Machine-Generated Passwords
Antonov’s study, detailed on Kaspersky’s Securelist blog, involved generating 1,000 passwords each from ChatGPT (OpenAI), Llama (Meta), and DeepSeek. The researcher then subjected these passwords to systematic analysis, looking for the kinds of statistical biases that would allow a password-cracking tool to prioritize certain character combinations over others.
The results were stark. Despite being asked to create random passwords, all three models exhibited pronounced biases. ChatGPT, for instance, showed a strong preference for certain characters. The letter ‘x’ appeared in 65% of all passwords generated, while ‘p’ showed up in 26%, and ‘l’ and ‘L’ in roughly 20% each. Truly random generation would distribute characters far more evenly. As Slashdot reported, these biases create exploitable patterns that reduce the effective entropy of each password significantly.
DeepSeek and Llama Fare Even Worse
While ChatGPT’s patterns were concerning, the other models tested showed even more pronounced weaknesses. DeepSeek demonstrated a notable tendency to generate passwords using dictionary words or common keyboard patterns, sometimes producing strings like “B@n@n@7” or “S1mP1662#” — passwords that superficially meet complexity requirements but are trivially guessable using dictionary-based attacks. Llama frequently relied on predictable structures, often placing uppercase letters at the beginning and digits at the end, mimicking the exact patterns that human users tend to follow and that password crackers are specifically designed to exploit.
Antonov found that 88% of passwords generated by DeepSeek and 87% of those from Llama could be cracked in under an hour using a modern GPU running the hashcat password recovery tool. ChatGPT performed somewhat better, but still alarmingly: 33% of its generated passwords fell within the same timeframe. The cracking methodology didn’t require exotic hardware — just a standard graphics card and knowledge of the statistical biases present in LLM output.
Why LLMs Cannot Produce True Randomness
The root cause of this vulnerability lies in the fundamental architecture of large language models. LLMs are, at their core, next-token prediction engines. They are trained on vast corpora of text and learn to predict what character, word, or token is most likely to follow a given sequence. This means that even when instructed to be “random,” an LLM is still drawing on learned statistical distributions. It generates text that looks random to humans because it mimics patterns the model has associated with randomness — but it is not performing true random number generation.
As Antonov explained in his research, “LLMs don’t actually generate anything random — they imitate patterns from their training data. This means the passwords they create follow predictable structures that attackers can model and exploit.” True cryptographic randomness requires hardware-based or cryptographically secure pseudorandom number generators (CSPRNGs), which operate on entirely different principles than neural network inference. The distinction is not academic — it is the difference between a password that withstands attack and one that crumbles under systematic analysis.
A Growing Trend With Real-World Consequences
The timing of this research is particularly significant. Surveys consistently show that a growing number of users are turning to AI chatbots for everyday tasks that previously required specialized tools, and password generation is among them. A 2024 Bitwarden survey found that 25% of respondents worldwide had begun using AI tools to generate or manage passwords. This trend appears to have accelerated in 2025 as AI assistants have become more deeply integrated into browsers and operating systems.
The danger is compounded by user trust. When ChatGPT produces a password like “x#Lp9$mQz2!kW” a typical user has no way to assess whether the string is truly random or merely appears to be. The password meets every conventional complexity requirement — length, mixed case, special characters, digits — and would score well on most password strength meters. Yet its underlying predictability means it offers far less protection than advertised. Security professionals have long warned that complexity rules alone are insufficient without genuine entropy, and the LLM password problem provides a vivid illustration of this principle.
What Security Experts Recommend Instead
The consensus among researchers is clear: dedicated password managers remain the gold standard for generating secure passwords. Tools like Bitwarden, 1Password, and KeePass use cryptographically secure random number generators specifically designed to produce output with maximum entropy. These generators do not suffer from the statistical biases inherent in language model output because they are not prediction engines — they are purpose-built to produce unpredictable sequences.
Antonov’s recommendation was unambiguous: “Use a dedicated password manager to generate and store passwords. These tools use cryptographically secure generators, ensuring passwords are truly random and impossible to reproduce through pattern analysis.” The Electronic Frontier Foundation’s “diceware” method, which involves rolling physical dice to select words from a predetermined list, also provides verifiably random passphrases — though it requires more manual effort than most users are willing to invest.
The Broader Implications for AI-Assisted Security
This research raises questions that extend well beyond password generation. If LLMs cannot produce reliable randomness for something as straightforward as a password, what does that mean for other security-adjacent tasks users are delegating to AI? Developers increasingly ask AI assistants to generate API keys, cryptographic salts, session tokens, and other security-critical random values. Each of these use cases carries the same fundamental risk: the model’s output may satisfy a visual inspection but fail under rigorous statistical analysis.
The problem is particularly insidious because it is invisible to the end user. Unlike a coding error that produces a crash or a factual hallucination that can be verified, a weak random number looks identical to a strong one. Only systematic testing — generating thousands of samples and analyzing their statistical distribution — reveals the bias. This means that individual users have essentially no way to detect the problem on their own, making education and tool selection the primary lines of defense.
Industry Response and the Path Forward
Neither OpenAI, Meta, nor DeepSeek have publicly responded to Antonov’s specific findings as of this writing. However, the research adds to a growing body of evidence that LLMs should not be treated as general-purpose replacements for specialized software tools, particularly in domains where mathematical guarantees matter. Cryptography, statistical analysis, and numerical computation all fall into this category.
Some researchers have suggested that AI companies could mitigate the issue by routing password generation requests to a proper CSPRNG rather than generating characters through the language model itself. This would be a relatively simple engineering fix — the chatbot would recognize the user’s intent and call an appropriate backend function rather than attempting to simulate randomness through token prediction. Until such measures are implemented, however, the responsibility falls on users and security professionals to understand the limitations of the tools they are using.
For now, the takeaway is straightforward: AI chatbots are remarkably capable at many tasks, but generating secure passwords is not among them. The appearance of randomness is not randomness, and in security, the distinction between the two can be the difference between a protected account and a compromised one. Anyone currently relying on ChatGPT, Llama, DeepSeek, or similar models for password generation should switch to a dedicated password manager immediately — and treat any previously generated AI passwords as potentially compromised.