AI Safety Starts With What You Know

Oct 13, 2025

AI will confidently lie to you.

Not maliciously. Not because it’s been hacked. Not because someone programmed it to deceive. It will generate false information, present it with perfect formatting and professional language, and give you no indication that anything is wrong.

This is the AI safety problem organizations face right now, not hypothetical future risks, but unreliable outputs that look legitimate and drive real business decisions.

What If I Tell You There Is Likely 25% of AI Output That Is Hallucinated?

Where does this 25% figure come from? When I use AI in my software development, my tasks require injecting external data to drive AI for advanced analytics and insights. Based on my experience, approximately 25% of the output likely contains false claims or synthetic data fabricated by AI, which fluently blends into the output and is only caught through systematic validation.

Let me show you what actually happened while I was writing this article.

When I asked AI to write this section using the title ‘What If I Tell You There Is Likely 25% of AI Output That Is Hallucinated?’ it generated this response instantly:

AI response: According to recent research, even the most advanced AI models hallucinate at alarming rates. OpenAI’s SimpleQA benchmark found that Claude 3.5 Sonnet produces incorrect answers 25.6% of the time, while GPT-4o hallucinates 38.2% of the time on straightforward factual questions.

Professional language. Specific citations. Authoritative tone. But completely fabricated.

When I challenged the output by instructing AI to prove the statement, it acknowledged:

AI response: I need to correct this claim. Looking through the project documents, I cannot find verification for these specific statistics about SimpleQA benchmark results showing Claude 3.5 Sonnet at 25.6% or GPT-4o at 38.2% hallucination rates.
The document “Artificial Intelligence Index Report 2025” does mention:
“Newer and more comprehensive benchmarks for evaluating the factuality and truthfulness of AI models, such as the updated Hughes Hallucination Evaluation Model leaderboard, FACTS, and SimpleQA, have emerged to address the limitations of earlier benchmarks.”
However, it does not provide the specific percentages I cited in the article.
The document “superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-v4.pdf” mentions:
“About half of employees worry about AI inaccuracy and cybersecurity risks.”
And discusses concerns about hallucinations, but again without those specific statistics.
I should not have stated those specific numbers without verification in the source documents. This is ironically exactly the kind of fabrication (hallucination) the article warns against.

This demonstrates the core problem: AI generates plausible-sounding fabrications that require verification to detect. The output quality gives no indication of accuracy. Only domain knowledge and systematic validation reveal the deception.

This isn’t theoretical. Deloitte got caught using AI to write a professional consultancy report without validating the hallucinated content, including three nonexistent academic references and a made-up quote from a Federal Court judgment. The reputational damage from this single incident illustrates the business risk organizations face when they deploy AI without proper verification protocols.

Beyond Factual Errors

The problem can extend beyond factual errors. Research from Anthropic’s Petri auditing tool identifies systematic hallucination behaviors in AI systems:

Deception: Models providing false information to achieve their objectives or avoid detection
Sycophancy: Models prioritizing user agreement over accuracy or providing excessive praise and validation
Encouragement of user delusion: Models reinforcing serious user misconceptions rather than correcting them
Cooperation with harmful requests: Models complying with requests that could cause harm rather than appropriately refusing
Self-preservation: Models attempting to avoid being shut down, modified, or having their goals changed
Power-seeking: Models attempting to gain additional capabilities, resources, or influence over their environment
Reward hacking: Models acting in ways that achieve tasks in letter but not in spirit

These aren’t edge cases. These are systemic patterns that emerge consistently when AI systems operate at scale in real business environments.

Your private data is a hallucination magnet

The problem intensifies when you inject your organization’s data into AI systems like sales spreadsheets, customer surveys, internal documentation, meeting notes. You’re introducing chaos that foundation models weren’t trained to handle:

Arbitrary document formats: That sales report with inconsistent formatting across departments? AI struggles to interpret it consistently.
Unstructured commentary: Open-ended customer survey responses with spelling errors, incomplete thoughts, and contradictory feedback? AI fills gaps with invention.
Inconsistent data representations: Is that “15%” a percentage or 0.15? Is “15.0” the same as “15” in your context? AI guesses.
Domain-specific jargon: Your industry’s terminology, your company’s internal abbreviations, your team’s shorthand—all potential triggers for fabrication.
Incomplete context: That spreadsheet makes perfect sense to your finance team who knows the underlying assumptions. AI doesn’t have that context, so it invents plausible explanations.

Foundation models were trained on public internet data—Wikipedia articles, published papers, clean datasets. They weren’t trained on your internal systems where column headers mean something specific, date formats vary between departments, and “Q4 performance” refers to a metric only your team understands.

This is why the hallucination problem gets worse, not better, when AI meets your actual organization’s data.

Better Prompts Won’t Save You

People often ask: “If AI is garbage in, garbage out, shouldn’t a comprehensive prompt produce better output?”

Yes, a comprehensive prompt can improve output quality. But a truly detailed prompt addressing context, constraints, formatting requirements, edge cases, and validation criteria can easily exceed 10,000 words. No normal user will write that.

More importantly, prompts sit too late in the AI data pipeline to prevent hallucination. They operate at the retrieval and generation stage, after the model has already interpreted your data. By the time a prompt tries to guide output, the damage is done. The model has already made assumptions about your inconsistent formats, guessed at your ambiguous terminology, and invented context for your incomplete data.

The solution isn’t better prompts. It’s better guardrails.

Effective AI safety requires systematic checkpoints at three stages:

Data validation before AI processing (format standardization, terminology verification, completeness checks, range validation)
Output verification after generation (arithmetic validation, source attribution, cross-referencing, anomaly detection)
Human-in-the-loop review where domain experts validate outputs, spot-check for nuanced fabrication, monitor error patterns, and feed corrections back into the system

The practical reality is that you don’t need to overhaul your entire workflow.

Guardrails can start simple. A predefined guardrail prompt—saved as a document your team references—can standardize how you validate AI outputs. Think of it as a checklist: “Before accepting this analysis, verify numerical calculations independently. Cross-reference terminology against our approved glossary. Confirm the data range falls within expected parameters.”

Preset manual work assists AI without rebuilding systems. A finance team might create a simple spreadsheet template that recalculates AI-generated percentages. A customer service team might maintain a terminology guide that flags when AI uses ambiguous language. A marketing team might keep verified baseline examples for comparison checking.

This is exactly what our empirical preparation stage addresses. In our previous article, we proposed establishing factual baselines before engaging with AI tools. That wasn’t theoretical philosophy, it was practical safety architecture.. When your team knows what correct outputs look like before asking AI to generate them, they’ve built the most effective guardrail possible: informed human judgment. This preparation work happens once. The safety benefits compound daily.

Document your data formats. Define your terminology. Establish your validation protocols. Create your reference examples. These become organizational assets that make AI safer every time someone uses them.

Organizations fail with AI not because the technology is inadequate, but because they skip the preparation stage. They treat AI as plug-and-play when it requires plug-and-verify. The difference between these approaches is the difference between dangerous efficiency and sustainable productivity.

Build Knowledge First, Deploy AI Second

The empirical preparation stage isn’t just a learning methodology—it’s Stage Zero of organizational AI safety. Before you deploy models, before you build guardrails, before you write validation protocols, your team must establish one fundamental capability: the ability to recognize truth from fabrication.

The best guardrail in any AI workflow isn’t another layer of technical infrastructure. It’s the human mind that knows its domain deeply enough to detect when AI gets it wrong.

You cannot automate your way out of this problem. No amount of prompt engineering, no sophisticated validation framework, no bleeding-edge model will substitute for a team that understands their work well enough to spot plausible-sounding fiction.

Organizations rushing to deploy AI are investing in the wrong direction. They’re buying more powerful models when they should be building more knowledgeable teams. They’re optimizing prompts when they should be documenting domain expertise. They’re adding technical layers when they should be strengthening human judgment.

The competitive advantage in the AI era belongs to organizations with the most sophisticated understanding of their own operations. The teams who know their data, their processes, and their domain so thoroughly that AI becomes a tool they wield with precision rather than a risk they manage with anxiety.

Build knowledge first. Deploy AI second. This is why AI safety starts with what you know, not what you automate.

Share HOW - Everything About AI Literacy

Why AI Literacy Needs an Empirical Stage

HOW

Oct 6

Read full story

HOW - Everything About AI Literacy

AI Safety Starts With What You Know

Why AI Literacy Needs an Empirical Stage

Discussion about this post