Add YouTube video embed to QeRL project page for enhanced content engagement and update NVFP4 hardware support description for clarity

vukrosic · vukrosic · commit de078d49906b · 2025-10-17T19:58:28.000+02:00
diff --git a/app/blog/qerl-quantization-reinforcement-learning/page.tsx b/app/blog/qerl-quantization-reinforcement-learning/page.tsx
@@ -282,6 +282,18 @@ export default function QeRLProject() {
 
               {/* Article Body */}
               <div className="px-8 sm:px-12 pb-20">
+                <div className="mb-8">
+                  <div className="relative" style={{ paddingTop: '56.25%' }}>
+                    <iframe
+                      className="absolute top-0 left-0 w-full h-full rounded-lg shadow-2xl"
+                      src="https://www.youtube.com/embed/TVGkUzQTsUM"
+                      title="YouTube video player"
+                      frameBorder="0"
+                      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+                      allowFullScreen
+                    ></iframe>
+                  </div>
+                </div>
                 <div className="prose prose-lg prose-invert max-w-none">
                   <MarkdownRenderer content={markdownContent} />
                 </div>
diff --git a/public/content/qerl-quantization-reinforcement-learning/qerl-content.md b/public/content/qerl-quantization-reinforcement-learning/qerl-content.md
@@ -57,7 +57,7 @@ QeRL is built on three main pillars to be both efficient and effective.
 
 #### a) High-Performance Quantization (NVFP4 + Marlin Kernel)
 
-Instead of the slow NF4 format from QLoRA, QeRL uses **NVFP4**, a modern 4-bit floating-point format with hardware support on Blackwell (B200) GPUs. All experiments in the paper were conducted on H100 GPUs.
+Instead of the slow NF4 format from QLoRA, QeRL uses **NVFP4**, a modern 4-bit floating-point format with hardware support for Blackwell GPUs. All experiments in the paper were conducted on H100 GPUs.
 
 *   **Speed:** Combined with optimized kernels like **Marlin**, NVFP4 allows for matrix multiplication to be performed directly on the 4-bit weights without slow de-quantization steps. The hardware support enables these operations to run efficiently, which is what makes the rollout phase **faster** than standard 16-bit training.
 *   **Memory:** It still provides the massive memory savings of 4-bit quantization, reducing the model's memory footprint by about 75%.