Skip to content

Commit 4e2dc9a

Browse files
committed
feat(content): add accurate hallucination blog
1 parent 2405346 commit 4e2dc9a

File tree

3 files changed

+113
-9
lines changed

3 files changed

+113
-9
lines changed

src/components/blog/SinglePost.astro

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ const { Content } = post;
2222
<section class="py-8 sm:py-16 lg:py-20 mx-auto">
2323
<article>
2424
<header class={post.image ? '' : ''}>
25-
<div class="flex justify-between flex-col sm:flex-row max-w-3xl mx-auto mt-0 mb-2 px-4 sm:px-6 sm:items-center">
25+
<div class="flex justify-between flex-col sm:flex-row max-w-5xl mx-auto mt-0 mb-2 px-4 sm:px-6 sm:items-center">
2626
<p>
2727
<Icon name="tabler:clock" class="w-4 h-4 inline-block -mt-0.5 dark:text-gray-400" />
2828
<time datetime={String(post.publishDate)} class="inline-block">{getFormattedDate(post.publishDate)}</time>
@@ -42,7 +42,7 @@ const { Content } = post;
4242
</div>
4343
{
4444
post.author && (
45-
<div class="flex justify-between flex-col sm:flex-row max-w-3xl mx-auto mt-0 mb-2 px-4 sm:px-6 sm:items-center h-fit group">
45+
<div class="flex justify-between flex-col sm:flex-row max-w-5xl mx-auto mt-0 mb-2 px-4 sm:px-6 sm:items-center h-fit group">
4646
<p class="flex items-center gap-1">
4747
<Icon
4848
name="tabler:user"
@@ -54,14 +54,14 @@ const { Content } = post;
5454
)
5555
}
5656
<h1
57-
class="px-4 sm:px-6 max-w-3xl mx-auto text-4xl md:text-5xl font-bold leading-tighter tracking-tighter font-heading"
57+
class="px-4 sm:px-6 max-w-5xl mx-auto text-4xl md:text-5xl font-bold leading-tighter tracking-tighter font-heading"
5858
>
5959
{post.title}
6060
</h1>
6161

6262
{
6363
(post.displayExcerpt ?? true) && (
64-
<p class="max-w-3xl mx-auto mt-4 mb-8 px-4 sm:px-6 text-xl md:text-2xl text-muted dark:text-slate-400 text-justify">
64+
<p class="max-w-5xl mx-auto mt-4 mb-8 px-4 sm:px-6 text-xl md:text-2xl text-muted dark:text-slate-400 text-justify">
6565
{post.excerpt}
6666
</p>
6767
)
@@ -80,18 +80,18 @@ const { Content } = post;
8080
decoding="async"
8181
/>
8282
) : (
83-
<div class="max-w-3xl mx-auto px-4 sm:px-6 mt-2">
83+
<div class="max-w-5xl mx-auto px-4 sm:px-6 mt-2">
8484
<div class="border-t dark:border-slate-700" />
8585
</div>
8686
)
8787
}
8888
</header>
8989
<div
90-
class="mx-auto px-6 sm:px-6 max-w-3xl prose dark:prose-invert dark:prose-headings:text-slate-300 prose-md prose-headings:font-heading prose-headings:leading-tighter prose-headings:tracking-tighter prose-headings:font-bold prose-a:text-primary dark:prose-a:text-blue-400 prose-img:rounded-md prose-img:shadow-lg mt-8 prose-headings:scroll-mt-[80px]"
90+
class="mx-auto px-6 sm:px-6 max-w-5xl prose dark:prose-invert dark:prose-headings:text-slate-300 prose-md prose-headings:font-heading prose-headings:leading-tighter prose-headings:tracking-tighter prose-headings:font-bold prose-a:text-primary dark:prose-a:text-blue-400 prose-img:rounded-md prose-img:shadow-lg mt-8 prose-headings:scroll-mt-[80px]"
9191
>
9292
{Content ? <Content /> : <Fragment set:html={post.content || ''} />}
9393
</div>
94-
<div class="mx-auto px-6 sm:px-6 max-w-3xl mt-24 flex justify-between flex-col sm:flex-row">
94+
<div class="mx-auto px-6 sm:px-6 max-w-5xl mt-24 flex justify-between flex-col sm:flex-row">
9595
<PostTags tags={post.tags} class="mr-5 rtl:mr-0 rtl:ml-5" />
9696
<SocialShare url={url} text={post.title} class="mt-5 sm:mt-1 align-middle text-gray-500 dark:text-slate-600" />
9797
</div>

src/components/blog/ToBlogLink.astro

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ import Button from '~/components/ui/Button.astro';
77
const { textDirection } = I18N;
88
---
99

10-
<div class="mx-auto px-6 sm:px-6 max-w-3xl pt-8 md:pt-4 pb-12 md:pb-20">
11-
<Button variant="tertiary" class="px-3 md:px-3" href={getBlogPermalink()}>
10+
<div class="mx-auto px-6 sm:px-6 max-w-5xl pt-8 md:pt-4 pb-12 md:pb-20">
11+
<Button variant="tertiary" class="px-0" href={getBlogPermalink()}>
1212
{
1313
textDirection === 'rtl' ? (
1414
<Icon name="tabler:chevron-right" class="w-5 h-5 mr-1 -ml-1.5 rtl:-mr-1.5 rtl:ml-1" />
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
publishDate: 2025-01-07T08:45:00.000Z
3+
author: Dens Sumesh
4+
title: Accurate Hallucination Detection With NER
5+
excerpt: >-
6+
Using a LLM-as-a-judge for hallucinations is slow and imprecise relative to
7+
simple NER. We share how we solved hallucination detection at Trieve.
8+
image: >-
9+
https://cdn.trieve.ai/blog/accurate-hallucination-detection-with-ner/accurate-hallucination-detection-opengraph.webp
10+
category: Tutorials
11+
tags:
12+
- AI
13+
- hallucination-detection
14+
displayImage: true
15+
displayExcerpt: true
16+
---
17+
18+
You can find all the code involved in our NER system, including benchmarks, at [github.com/devflowinc/trieve/tree/main/hallucination-detection](https://github.com/devflowinc/trieve/tree/main/hallucination-detection).
19+
20+
# How We Do It: Smart Use of NER
21+
22+
Our method zeroes in on the most common and critical hallucinations--those that could mislead or confuse users. Based on our research, a large percentage of hallucinations fall into three categories:
23+
24+
1. **Proper nouns** (people, places, organizations)
25+
2. **Numerical values** (dates, amounts, statistics)
26+
3. **Made-up terminology**
27+
28+
Instead of throwing complex language models at the problem with a LLM-as-a-judge approach, we use Named Entity Recognition (NER) to spot proper nouns and compare them between the gen AI completion and the retrieved reference text. For numbers and unknown words, we use similarly straightforward techniques to flag potential issues.
29+
30+
Our approach will only work in use-cases where RAG is present which is fine given that Trieve is a search and RAG API. Further, because the most common approach to limiting hallucinations is RAG, this approach will work for any team building solutions on top of other search engines.
31+
32+
## Why This Is Important:
33+
34+
- **Lightning fast**: Processes in 100-300 milliseconds.
35+
- **Fully self-contained**: No need for external AI services.
36+
- **Customizable**: Works with domain-specific NER models.
37+
- **Minimal setup**: Can run on CPU nodes.
38+
39+
# Benchmark Results
40+
41+
## RAGTruth Dataset Performance
42+
43+
We achieved a 67% accuracy rate on the [RAGTruth dataset](https://github.com/ParticleMedia/RAGTruth), which provides a comprehensive benchmark for hallucination detection in RAG systems. This result is particularly impressive considering our lightweight approach compared to more complex solutions.
44+
45+
## Comparison with Vectara
46+
47+
When tested against [Vectara's examples](https://huggingface.co/datasets/vectara/hcm-examples-aug-2024), our system showed:
48+
49+
- 70% alignment with Vectara's model predictions
50+
- Comparable performance on obvious hallucinations
51+
- Strong detection of numerical inconsistencies
52+
- High accuracy on entity-based hallucinations
53+
54+
This level of alignment is significant because we achieve it without the computational overhead of a full language model.
55+
56+
# Why This Works
57+
58+
Our method focuses on the types of hallucinations that matter most. Made-up entities, wrong numbers, and gibberish words. By sticking to these basics, we've built a system that:
59+
60+
- **Catches high-impact errors**: No more fake organizations or incorrect stats.
61+
- **Runs lightning fast**: Minimal delay in real-time systems.
62+
- **Fits anywhere**: Easily integrates into production pipelines with no fancy hardware needed.
63+
64+
# Why It Matters in the Real World
65+
66+
Speed and simplicity are the stars of this show. Our system processes responses in **100-300ms**, making it perfect for:
67+
68+
- Real-time applications (think chatbots and virtual assistants)
69+
- High-volume systems where efficiency is key
70+
- Low-resource setups, like edge devices or small servers
71+
72+
In short, this approach bridges the gap between effectiveness and practicality. You get solid hallucination detection without slowing everything down or breaking the bank.
73+
74+
# What's Next: Room to Grow
75+
76+
While we're thrilled with these results, we've got a lot of ideas for the future:
77+
78+
1. **Smarter Entity Recognition**
79+
80+
- Train models for industry-specific jargon and custom entity types.
81+
- Improve recognition for niche use cases.
82+
83+
2. **Better Number Handling**
84+
85+
- Add context-aware analysis for ranges, approximations, and units.
86+
- Normalize and convert units for consistent comparisons.
87+
88+
3. **Expanded Word Validation**
89+
90+
- Incorporate specialized vocabularies for different fields.
91+
- Make it multilingual and more context-aware.
92+
93+
4. **Hybrid Methods**
94+
95+
- Optionally tap into language models for tricky edge cases.
96+
- Combine with semantic similarity scores or structural analysis for tougher challenges.
97+
98+
# The Takeaway
99+
100+
Our system shows that **you don't need heavyweight tools** to handle hallucination detection. By focusing on the most common issues, we've built a fast, reliable solution that's production-ready and easy to scale.
101+
102+
It's a practical tool for anyone looking to improve the trustworthiness of AI outputs, especially in environments where speed and resource efficiency are non-negotiable.
103+
104+
Check out our work, give it a try, and let us know what you think!

0 commit comments

Comments
 (0)