Skip to content

Commit de078d4

Browse files
committed
Add YouTube video embed to QeRL project page for enhanced content engagement and update NVFP4 hardware support description for clarity
1 parent b25f0c9 commit de078d4

File tree

2 files changed

+13
-1
lines changed
  • app/blog/qerl-quantization-reinforcement-learning
  • public/content/qerl-quantization-reinforcement-learning

2 files changed

+13
-1
lines changed

app/blog/qerl-quantization-reinforcement-learning/page.tsx

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,18 @@ export default function QeRLProject() {
282282

283283
{/* Article Body */}
284284
<div className="px-8 sm:px-12 pb-20">
285+
<div className="mb-8">
286+
<div className="relative" style={{ paddingTop: '56.25%' }}>
287+
<iframe
288+
className="absolute top-0 left-0 w-full h-full rounded-lg shadow-2xl"
289+
src="https://www.youtube.com/embed/TVGkUzQTsUM"
290+
title="YouTube video player"
291+
frameBorder="0"
292+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
293+
allowFullScreen
294+
></iframe>
295+
</div>
296+
</div>
285297
<div className="prose prose-lg prose-invert max-w-none">
286298
<MarkdownRenderer content={markdownContent} />
287299
</div>

public/content/qerl-quantization-reinforcement-learning/qerl-content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ QeRL is built on three main pillars to be both efficient and effective.
5757

5858
#### a) High-Performance Quantization (NVFP4 + Marlin Kernel)
5959

60-
Instead of the slow NF4 format from QLoRA, QeRL uses **NVFP4**, a modern 4-bit floating-point format with hardware support on Blackwell (B200) GPUs. All experiments in the paper were conducted on H100 GPUs.
60+
Instead of the slow NF4 format from QLoRA, QeRL uses **NVFP4**, a modern 4-bit floating-point format with hardware support for Blackwell GPUs. All experiments in the paper were conducted on H100 GPUs.
6161

6262
* **Speed:** Combined with optimized kernels like **Marlin**, NVFP4 allows for matrix multiplication to be performed directly on the 4-bit weights without slow de-quantization steps. The hardware support enables these operations to run efficiently, which is what makes the rollout phase **faster** than standard 16-bit training.
6363
* **Memory:** It still provides the massive memory savings of 4-bit quantization, reducing the model's memory footprint by about 75%.

0 commit comments

Comments
 (0)