Skip to content

Commit 52e4fbd

Browse files
committed
Update content for NVFP4, MobileLLM-R1, and DeepSeek pages to use HTML entities for apostrophes
- Replaced apostrophes with HTML entities in the NVFP4, MobileLLM-R1, and DeepSeek pages to ensure proper rendering in the browser. - Enhanced the user experience by maintaining consistent formatting across the documentation.
1 parent 632eafb commit 52e4fbd

File tree

4 files changed

+56
-44
lines changed

4 files changed

+56
-44
lines changed

app/blog/deepseek-sparse-attention/page.tsx

Lines changed: 36 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -205,30 +205,42 @@ export default function DeepSeekProject() {
205205
<div className="bg-white/5 backdrop-blur-xl border border-white/10 rounded-3xl shadow-2xl overflow-hidden">
206206
{/* Copy Article Button */}
207207
<div className="px-8 sm:px-12 pt-8 pb-4">
208-
<button
209-
onClick={handleCopyArticle}
210-
className={`group flex items-center gap-2 px-4 py-2 rounded-lg font-medium transition-all duration-300 ${
211-
copySuccess
212-
? 'bg-green-500/20 text-green-400 border border-green-500/30'
213-
: 'bg-white/5 hover:bg-white/10 text-slate-300 hover:text-blue-400 border border-white/10 hover:border-blue-500/50'
214-
}`}
215-
>
216-
{copySuccess ? (
217-
<>
218-
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
219-
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
220-
</svg>
221-
{language === 'en' ? 'Copied!' : '已复制!'}
222-
</>
223-
) : (
224-
<>
225-
<svg className="w-4 h-4 group-hover:scale-110 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
226-
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M8 16H6a2 2 0 01-2-2V6a2 2 0 012-2h8a2 2 0 012 2v2m-6 12h8a2 2 0 002-2v-8a2 2 0 00-2-2h-8a2 2 0 00-2 2v8a2 2 0 002 2z" />
227-
</svg>
228-
{language === 'en' ? 'Copy Article' : '复制文章'}
229-
</>
230-
)}
231-
</button>
208+
<div className="relative inline-block">
209+
<button
210+
onClick={handleCopyArticle}
211+
className={`group flex items-center gap-2 px-4 py-2 rounded-lg font-medium transition-all duration-300 ${
212+
copySuccess
213+
? 'bg-green-500/20 text-green-400 border border-green-500/30'
214+
: 'bg-white/5 hover:bg-white/10 text-slate-300 hover:text-blue-400 border border-white/10 hover:border-blue-500/50'
215+
}`}
216+
>
217+
{copySuccess ? (
218+
<>
219+
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
220+
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
221+
</svg>
222+
{language === 'en' ? 'Copied!' : '已复制!'}
223+
</>
224+
) : (
225+
<>
226+
<svg className="w-4 h-4 group-hover:scale-110 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
227+
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M8 16H6a2 2 0 01-2-2V6a2 2 0 012-2h8a2 2 0 012 2v2m-6 12h8a2 2 0 002-2v-8a2 2 0 00-2-2h-8a2 2 0 00-2 2v8a2 2 0 002 2z" />
228+
</svg>
229+
{language === 'en' ? 'Copy Article' : '复制文章'}
230+
</>
231+
)}
232+
</button>
233+
234+
{/* Tooltip */}
235+
<div className="absolute bottom-full left-1/2 transform -translate-x-1/2 mb-2 px-3 py-2 bg-slate-800 text-white text-sm rounded-lg shadow-lg opacity-0 group-hover:opacity-100 transition-opacity duration-200 pointer-events-none whitespace-nowrap z-10 border border-slate-600">
236+
{language === 'en'
237+
? 'Perfect for pasting into AI chatbots for self-studying! 🤖'
238+
: '非常适合粘贴到AI聊天机器人进行自学!🤖'
239+
}
240+
{/* Tooltip arrow */}
241+
<div className="absolute top-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-l-4 border-r-4 border-t-4 border-transparent border-t-slate-800"></div>
242+
</div>
243+
</div>
232244
</div>
233245

234246
{/* Article Body */}

app/blog/mobilellm-r1/page.tsx

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,10 @@ export default function MobileLLMR1Project() {
105105
TL;DR
106106
</h2>
107107
<p className="text-slate-300 leading-relaxed mb-4">
108-
Meta's MobileLLM-R1 challenges two fundamental assumptions about reasoning in language models: (1) that reasoning only emerges in large models, and (2) that it requires massive datasets. They demonstrate that <strong className="text-blue-400">sub-billion parameter models can achieve strong reasoning</strong> with just 2T tokens of carefully curated data.
108+
Meta&apos;s MobileLLM-R1 challenges two fundamental assumptions about reasoning in language models: (1) that reasoning only emerges in large models, and (2) that it requires massive datasets. They demonstrate that <strong className="text-blue-400">sub-billion parameter models can achieve strong reasoning</strong> with just 2T tokens of carefully curated data.
109109
</p>
110110
<p className="text-slate-300 leading-relaxed">
111-
Their <strong className="text-purple-400">950M parameter model achieves an AIME score of 15.5</strong>, compared to just 0.6 for OLMo-2-1.48B and 0.3 for SmoILM-2-1.7B. Remarkably, despite being trained on only 11.7% of the tokens compared to Qwen3's 36T-token corpus, MobileLLM-R1-950M matches or surpasses Qwen3-0.6B across multiple reasoning benchmarks.
111+
Their <strong className="text-purple-400">950M parameter model achieves an AIME score of 15.5</strong>, compared to just 0.6 for OLMo-2-1.48B and 0.3 for SmoILM-2-1.7B. Remarkably, despite being trained on only 11.7% of the tokens compared to Qwen3&apos;s 36T-token corpus, MobileLLM-R1-950M matches or surpasses Qwen3-0.6B across multiple reasoning benchmarks.
112112
</p>
113113
</div>
114114
</div>
@@ -253,7 +253,7 @@ export default function MobileLLMR1Project() {
253253
content={
254254
<div>
255255
<div className="font-bold text-purple-400 mb-2">🔄 Data-Model Co-Evolution</div>
256-
<p className="mb-2">As the model's capacity changes during training, the data mixture adapts to match the model's current capabilities.</p>
256+
<p className="mb-2">As the model&apos;s capacity changes during training, the data mixture adapts to match the model&apos;s current capabilities.</p>
257257

258258
<div className="bg-slate-700/50 rounded p-2 mb-2">
259259
<div className="text-xs text-slate-300 font-semibold mb-1">Early Training</div>
@@ -289,8 +289,8 @@ export default function MobileLLMR1Project() {
289289
</div>
290290
<h3 className="text-xl font-bold text-white">Data-Model Co-Evolution</h3>
291291
</div>
292-
<p className="text-slate-300 mb-3">
293-
Adaptive training strategy where the data mixture evolves alongside the model's growing capacity, ensuring optimal challenge levels throughout training.
292+
<p className="text-slate-300 mb-3">
293+
Adaptive training strategy where the data mixture evolves alongside the model&apos;s growing capacity, ensuring optimal challenge levels throughout training.
294294
</p>
295295
<div className="bg-purple-500/10 border border-purple-500/30 rounded-lg p-3">
296296
<div className="text-purple-400 text-sm font-mono">Adaptive Curriculum = Optimal Learning</div>
@@ -745,7 +745,7 @@ export default function MobileLLMR1Project() {
745745
<div className="text-4xl mb-4"></div>
746746
<h3 className="text-xl font-bold text-white mb-2">Efficiency</h3>
747747
<div className="text-purple-400 text-3xl font-bold mb-2">11.7%</div>
748-
<p className="text-slate-300 text-sm mb-3">Of Qwen3's tokens</p>
748+
<p className="text-slate-300 text-sm mb-3">Of Qwen3&apos;s tokens</p>
749749
<div className="bg-purple-500/10 border border-purple-500/30 rounded-lg p-3">
750750
<div className="text-purple-400 text-sm">2T vs 36T tokens</div>
751751
<div className="text-purple-400 text-sm">Same or better performance</div>
@@ -826,13 +826,13 @@ export default function MobileLLMR1Project() {
826826
content={
827827
<div>
828828
<div className="font-bold text-blue-400 mb-2">🔍 Leave-One-Out Analysis</div>
829-
<p className="mb-2">Systematic evaluation of each dataset's contribution to reasoning capabilities.</p>
829+
<p className="mb-2">Systematic evaluation of each dataset&apos;s contribution to reasoning capabilities.</p>
830830

831831
<div className="bg-slate-700/50 rounded p-2 mb-2">
832832
<div className="text-xs text-slate-300 font-semibold mb-1">Methodology</div>
833833
<div className="text-xs text-slate-300">• Train models excluding one dataset at a time</div>
834834
<div className="text-xs text-slate-300">• Measure negative log-likelihood on capability-probing datasets</div>
835-
<div className="text-xs text-slate-300">• Quantify each dataset's impact on reasoning</div>
835+
<div className="text-xs text-slate-300">• Quantify each dataset&apos;s impact on reasoning</div>
836836
</div>
837837

838838
<div className="bg-slate-700/50 rounded p-2 mb-2">
@@ -859,8 +859,8 @@ export default function MobileLLMR1Project() {
859859
</div>
860860
<h3 className="text-xl font-bold text-white">Leave-One-Out Analysis</h3>
861861
</div>
862-
<p className="text-slate-300 mb-3">
863-
Systematic evaluation of each dataset's contribution to reasoning capabilities by training models with and without specific data sources.
862+
<p className="text-slate-300 mb-3">
863+
Systematic evaluation of each dataset&apos;s contribution to reasoning capabilities by training models with and without specific data sources.
864864
</p>
865865
<div className="bg-blue-500/10 border border-blue-500/30 rounded-lg p-3">
866866
<div className="text-blue-400 text-sm font-mono">Quantify Data Impact = Better Mixtures</div>

app/blog/nvfp4-4bit-training/page.tsx

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ export default function NVFP4Project() {
7272
<div className="relative">
7373
<h1 className="text-4xl md:text-5xl lg:text-6xl font-medium mb-8 leading-tight">
7474
<span className="bg-gradient-to-r from-green-400 via-emerald-400 to-teal-400 bg-clip-text text-transparent">
75-
NVIDIA's 4-Bit Revolution
75+
NVIDIA&apos;s 4-Bit Revolution
7676
</span>
7777
</h1>
7878
<div className="text-lg md:text-xl text-slate-400 mb-4">
@@ -81,7 +81,7 @@ export default function NVFP4Project() {
8181

8282
<div className="absolute inset-0 text-4xl md:text-5xl lg:text-6xl font-medium leading-tight blur-sm">
8383
<span className="bg-gradient-to-r from-green-400/20 via-emerald-400/20 to-teal-400/20 bg-clip-text text-transparent">
84-
NVIDIA's 4-Bit Revolution
84+
NVIDIA&apos;s 4-Bit Revolution
8585
</span>
8686
</div>
8787
</div>
@@ -105,7 +105,7 @@ export default function NVFP4Project() {
105105
TL;DR
106106
</h2>
107107
<p className="text-slate-300 leading-relaxed mb-4">
108-
NVIDIA has figured out how to train massive LLMs using a new <strong className="text-green-400">4-bit number format called NVFP4</strong>, which is a huge deal for efficiency. Training in 4-bit is much faster and uses less memory than the current 8-bit standard (FP8), but it's very difficult to do without the model's performance collapsing.
108+
NVIDIA has figured out how to train massive LLMs using a new <strong className="text-green-400">4-bit number format called NVFP4</strong>, which is a huge deal for efficiency. Training in 4-bit is much faster and uses less memory than the current 8-bit standard (FP8), but it&apos;s very difficult to do without the model&apos;s performance collapsing.
109109
</p>
110110
<p className="text-slate-300 leading-relaxed">
111111
Their solution combines four key techniques to train a <strong className="text-emerald-400">12-billion-parameter hybrid Mamba-Transformer model on 10 trillion tokens</strong> with performance nearly identical to FP8 training. This marks the first successful demonstration of training billion-parameter language models with 4-bit precision over a multi-trillion-token horizon.
@@ -201,7 +201,7 @@ export default function NVFP4Project() {
201201
NVFP4 vs MXFP4
202202
</h2>
203203
<p className="text-slate-400 text-lg">
204-
How NVIDIA's format improves on the standard
204+
How NVIDIA&apos;s format improves on the standard
205205
</p>
206206
</div>
207207

@@ -310,7 +310,7 @@ export default function NVFP4Project() {
310310
The 4 Key Techniques
311311
</h2>
312312
<p className="text-slate-400 text-lg">
313-
The "secret sauce" that makes NVFP4 work
313+
The &quot;secret sauce&quot; that makes NVFP4 work
314314
</p>
315315
</div>
316316

@@ -399,7 +399,7 @@ export default function NVFP4Project() {
399399
<h3 className="text-xl font-bold text-white">Random Hadamard Transforms (RHT)</h3>
400400
</div>
401401
<p className="text-slate-300 mb-3">
402-
Mathematical operation that "smears" extreme outlier values across all values, making distributions more uniform and easier to quantize.
402+
Mathematical operation that &quot;smears&quot; extreme outlier values across all values, making distributions more uniform and easier to quantize.
403403
</p>
404404
<div className="bg-green-500/10 border border-green-500/30 rounded-lg p-3">
405405
<div className="text-green-400 text-sm font-mono">Outliers → Uniform Distribution</div>
@@ -491,7 +491,7 @@ export default function NVFP4Project() {
491491
<h3 className="text-xl font-bold text-white">Stochastic Rounding</h3>
492492
</div>
493493
<p className="text-slate-300 mb-3">
494-
Probabilistic rounding instead of deterministic "round-to-nearest" eliminates systematic bias that accumulates in gradient calculations.
494+
Probabilistic rounding instead of deterministic &quot;round-to-nearest&quot; eliminates systematic bias that accumulates in gradient calculations.
495495
</p>
496496
<div className="bg-green-500/10 border border-green-500/30 rounded-lg p-3">
497497
<div className="text-green-400 text-sm font-mono">Unbiased Gradients = Better Training</div>
@@ -544,7 +544,7 @@ export default function NVFP4Project() {
544544
NVFP4 vs MXFP4
545545
</h3>
546546
<p className="text-slate-300 mb-4">
547-
In direct comparison on an 8B model, MXFP4 needed <strong className="text-green-400">36% more training data</strong> (1.36T vs 1T tokens) to match NVFP4's performance. This proves NVFP4's superior design.
547+
In direct comparison on an 8B model, MXFP4 needed <strong className="text-green-400">36% more training data</strong> (1.36T vs 1T tokens) to match NVFP4&apos;s performance. This proves NVFP4&apos;s superior design.
548548
</p>
549549
<div className="grid md:grid-cols-2 gap-4">
550550
<div className="bg-green-500/10 border border-green-500/30 rounded-lg p-4">

app/page.tsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -483,7 +483,7 @@ export default function Home() {
483483
NVFP4 LLM Pretraining Research
484484
</h3>
485485
<p className="text-slate-300 text-sm mb-4">
486-
Research NVIDIA's NVFP4 (4-bit floating point) training methodology - 2-3x performance boost with 50% memory reduction
486+
Research NVIDIA&apos;s NVFP4 (4-bit floating point) training methodology - 2-3x performance boost with 50% memory reduction
487487
</p>
488488
<div className="space-y-2">
489489
<div className="flex items-center gap-2 text-xs text-slate-400">
@@ -540,7 +540,7 @@ export default function Home() {
540540
MobileLLM-R1 Sub-Billion Reasoning Research
541541
</h3>
542542
<p className="text-slate-300 text-sm mb-4">
543-
Research Meta's MobileLLM-R1 - sub-billion parameter reasoning models with strong capabilities using only ~2T high-quality tokens
543+
Research Meta&apos;s MobileLLM-R1 - sub-billion parameter reasoning models with strong capabilities using only ~2T high-quality tokens
544544
</p>
545545
<div className="space-y-2">
546546
<div className="flex items-center gap-2 text-xs text-slate-400">

0 commit comments

Comments
 (0)