diff --git a/COURSE_STRUCTURE.md b/COURSE_STRUCTURE.md
new file mode 100644
index 0000000..2a38e60
--- /dev/null
+++ b/COURSE_STRUCTURE.md
@@ -0,0 +1,253 @@
+# Course Structure Documentation
+
+## Overview
+
+The learning course has been reorganized to use **markdown files** stored in the `public/content/learn/` directory, following the same pattern as the blog posts. This makes it easy to manage content and add images.
+
+## 📁 File Structure
+
+```
+public/content/learn/
+├── README.md                    # Documentation for content management
+├── math/
+│   ├── functions/
+│   │   ├── functions-content.md
+│   │   └── [add your images here]
+│   ├── derivatives/
+│   │   ├── derivatives-content.md
+│   │   ├── derivative-graph.png
+│   │   └── tangent-line.png
+│   ├── vectors/
+│   │   ├── vectors-content.md
+│   │   └── [images included]
+│   ├── matrices/
+│   │   ├── matrices-content.md
+│   │   └── [images included]
+│   └── gradients/
+│       ├── gradients-content.md
+│       └── [images included]
+└── neural-networks/
+    ├── introduction/
+    │   ├── introduction-content.md
+    │   └── [add your images here]
+    ├── forward-propagation/
+    │   ├── forward-propagation-content.md
+    │   └── [add your images here]
+    ├── backpropagation/
+    │   ├── backpropagation-content.md
+    │   └── [add your images here]
+    └── training/
+        ├── training-content.md
+        └── [add your images here]
+```
+
+## 🎓 Course Modules
+
+### Module 1: Mathematics Fundamentals
+
+1. **Functions** (`/learn/math/functions`)
+   - Linear functions
+   - Activation functions (Sigmoid, ReLU, Tanh)
+   - Loss functions
+   - Why non-linearity matters
+
+2. **Derivatives** (`/learn/math/derivatives`)
+   - What derivatives are
+   - Why they matter in AI
+   - Common derivative rules
+   - Practical examples with loss functions
+
+3. **Vectors** (`/learn/math/vectors`)
+   - What vectors are (magnitude and direction)
+   - Vector components and representation
+   - Vector operations (addition, scalar multiplication)
+   - Applications in machine learning
+
+4. **Matrices** (`/learn/math/matrices`)
+   - Matrix fundamentals
+   - Matrix operations (multiplication, transpose)
+   - Matrix transformations
+   - Role in neural networks
+
+5. **Gradients** (`/learn/math/gradients`)
+   - Understanding gradients
+   - Partial derivatives
+   - Gradient computation
+   - Gradient descent in optimization
+
+### Module 2: Neural Networks from Scratch
+
+1. **Introduction** (`/learn/neural-networks/introduction`)
+   - What neural networks are
+   - Basic architecture (input, hidden, output layers)
+   - How they learn
+   - Real-world applications
+
+2. **Forward Propagation** (`/learn/neural-networks/forward-propagation`)
+   - The forward pass process
+   - Weighted sums and activations
+   - Step-by-step numerical examples
+   - Matrix operations
+
+3. **Backpropagation** (`/learn/neural-networks/backpropagation`)
+   - The backpropagation algorithm
+   - Chain rule in action
+   - Gradient computation
+   - Common challenges (vanishing/exploding gradients)
+
+4. **Training & Optimization** (`/learn/neural-networks/training`)
+   - Gradient descent variants (SGD, mini-batch, batch)
+   - Advanced optimizers (Adam, RMSprop, Momentum)
+   - Hyperparameters and learning rate schedules
+   - Best practices and common pitfalls
+
+## 🛠️ Technical Implementation
+
+### Components Created
+
+1. **LessonPage Component** (`components/lesson-page.tsx`)
+   - Reusable component that loads markdown content
+   - Handles frontmatter parsing
+   - Supports navigation between lessons
+   - Similar to blog post structure
+
+2. **Page Routes** (`app/learn/...`)
+   - Each lesson has a simple page component
+   - Uses `LessonPage` with configuration
+   - Clean and maintainable
+
+### How It Works
+
+1. **Markdown files** are stored in `public/content/learn/[category]/[lesson]/`
+2. Each file has **frontmatter** with hero data (title, subtitle, tags)
+3. **Images** are placed alongside the markdown files
+4. **Page components** load the markdown using the `LessonPage` component
+5. Images are referenced as `![alt](image.png)` and served from `/content/learn/...`
+
+### Example Markdown Frontmatter
+
+```markdown
+---
+hero:
+  title: "Understanding Derivatives"
+  subtitle: "The Foundation of Neural Network Training"
+  tags:
+    - "📐 Mathematics"
+    - "⏱️ 10 min read"
+---
+
+# Your content here...
+
+![Derivative Graph](derivative-graph.png)
+```
+
+## 📝 Adding New Content
+
+### To Add a New Lesson:
+
+1. **Create folder structure:**
+   ```bash
+   mkdir -p public/content/learn/[category]/[lesson-name]
+   ```
+
+2. **Create markdown file:**
+   ```bash
+   touch public/content/learn/[category]/[lesson-name]/[lesson-name]-content.md
+   ```
+
+3. **Add frontmatter and content** to the markdown file
+
+4. **Add images** to the same folder
+
+5. **Create page component:**
+   ```tsx
+   // app/learn/[category]/[lesson-name]/page.tsx
+   import { LessonPage } from "@/components/lesson-page";
+
+   export default function YourLessonPage() {
+     return (
+       <LessonPage
+         contentPath="category/lesson-name"
+         prevLink={{ href: "/previous", label: "← Previous" }}
+         nextLink={{ href: "/next", label: "Next →" }}
+       />
+     );
+   }
+   ```
+
+## 🖼️ Adding Images
+
+### Placeholder Images Currently Referenced:
+
+**Math - Derivatives:**
+- `derivative-graph.png` - Visual showing derivative as slope
+- `tangent-line.png` - Tangent line illustration
+
+**Math - Functions:**
+- `linear-function.png` - Linear function visualization
+- `relu-function.png` - ReLU activation graph
+- `function-composition.png` - Function composition diagram
+
+**Neural Networks - Introduction:**
+- `neural-network-diagram.png` - Basic NN architecture
+- `layer-types.png` - Input, hidden, output layers
+- `training-process.png` - Training loop diagram
+- `depth-vs-performance.png` - Network depth impact
+
+**Neural Networks - Forward Propagation:**
+- `forward-prop-diagram.png` - Data flow diagram
+- `forward-example.png` - Example calculation
+- `activations-comparison.png` - Different activation functions
+- `matrix-backprop.png` - Matrix operations
+
+**Neural Networks - Backpropagation:**
+- `backprop-overview.png` - Algorithm overview
+- `backprop-steps.png` - Step-by-step process
+- `matrix-backprop.png` - Matrix form backprop
+
+**Neural Networks - Training:**
+- `training-loop.png` - Training loop visualization
+- `gradient-descent.png` - Gradient descent illustration
+- `gd-variants.png` - GD variants comparison
+- `optimizers-comparison.png` - Optimizer behaviors
+- `lr-schedules.png` - Learning rate schedules
+- `training-curves.png` - Loss/accuracy curves
+
+### To Add Your Images:
+
+1. Create your images (PNG or JPG recommended)
+2. Place them in the appropriate lesson folder
+3. They're already referenced in the markdown - just replace the placeholders!
+
+## 🎨 Design Features
+
+- **Beautiful gradient backgrounds** matching the site theme
+- **Syntax highlighting** for code blocks
+- **Responsive design** for mobile and desktop
+- **Navigation** between lessons with prev/next buttons
+- **Markdown rendering** with support for:
+  - Headings, paragraphs, lists
+  - Code blocks
+  - Images
+  - Tables
+  - Math formulas (using KaTeX in MarkdownRenderer)
+
+## 🚀 Next Steps
+
+1. **Add your images** - Replace placeholder PNG files with actual visualizations
+2. **Expand content** - Add more lessons or modules as needed
+3. **Test on localhost** - Visit `/learn` to see the course
+4. **Customize styling** - Adjust colors/gradients in the components if desired
+
+## 📋 Summary
+
+✅ Course structure created with 9 lessons (5 math + 4 neural networks)  
+✅ Markdown files in `public/content/learn/`  
+✅ Reusable `LessonPage` component  
+✅ Images ready for math lessons (vectors, matrices, gradients)  
+✅ Navigation between lessons  
+✅ Frontmatter support for hero sections  
+✅ README documentation in content folder  
+
+Your course is ready with comprehensive math fundamentals! 🎉
+
diff --git a/app/learn/activation-functions/relu/page.tsx b/app/learn/activation-functions/relu/page.tsx
new file mode 100644
index 0000000..5daa632
--- /dev/null
+++ b/app/learn/activation-functions/relu/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function ReluPage() {
+  return (
+    <LessonPage
+      contentPath="activation-functions/relu"
+      prevLink={{ href: "/learn/neuron-from-scratch/the-concept-of-learning", label: "← Previous: The Concept of Learning" }}
+      nextLink={{ href: "/learn/activation-functions/sigmoid", label: "Next: Sigmoid →" }}
+    />
+  );
+}
+
diff --git a/app/learn/activation-functions/sigmoid/page.tsx b/app/learn/activation-functions/sigmoid/page.tsx
new file mode 100644
index 0000000..68e1726
--- /dev/null
+++ b/app/learn/activation-functions/sigmoid/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function SigmoidPage() {
+  return (
+    <LessonPage
+      contentPath="activation-functions/sigmoid"
+      prevLink={{ href: "/learn/activation-functions/relu", label: "← Previous: ReLU" }}
+      nextLink={{ href: "/learn/activation-functions/tanh", label: "Next: Tanh →" }}
+    />
+  );
+}
+
diff --git a/app/learn/activation-functions/silu/page.tsx b/app/learn/activation-functions/silu/page.tsx
new file mode 100644
index 0000000..6d215c8
--- /dev/null
+++ b/app/learn/activation-functions/silu/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function SiluPage() {
+  return (
+    <LessonPage
+      contentPath="activation-functions/silu"
+      prevLink={{ href: "/learn/activation-functions/tanh", label: "← Previous: Tanh" }}
+      nextLink={{ href: "/learn/activation-functions/swiglu", label: "Next: SwiGLU →" }}
+    />
+  );
+}
+
diff --git a/app/learn/activation-functions/softmax/page.tsx b/app/learn/activation-functions/softmax/page.tsx
new file mode 100644
index 0000000..5f74f3c
--- /dev/null
+++ b/app/learn/activation-functions/softmax/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function SoftmaxPage() {
+  return (
+    <LessonPage
+      contentPath="activation-functions/softmax"
+      prevLink={{ href: "/learn/activation-functions/swiglu", label: "← Previous: SwiGLU" }}
+      nextLink={{ href: "/learn/neural-networks/introduction", label: "Next: Neural Networks →" }}
+    />
+  );
+}
+
diff --git a/app/learn/activation-functions/swiglu/page.tsx b/app/learn/activation-functions/swiglu/page.tsx
new file mode 100644
index 0000000..4e6656a
--- /dev/null
+++ b/app/learn/activation-functions/swiglu/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function SwigluPage() {
+  return (
+    <LessonPage
+      contentPath="activation-functions/swiglu"
+      prevLink={{ href: "/learn/activation-functions/silu", label: "← Previous: SiLU" }}
+      nextLink={{ href: "/learn/activation-functions/softmax", label: "Next: Softmax →" }}
+    />
+  );
+}
+
diff --git a/app/learn/activation-functions/tanh/page.tsx b/app/learn/activation-functions/tanh/page.tsx
new file mode 100644
index 0000000..51fefa4
--- /dev/null
+++ b/app/learn/activation-functions/tanh/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TanhPage() {
+  return (
+    <LessonPage
+      contentPath="activation-functions/tanh"
+      prevLink={{ href: "/learn/activation-functions/sigmoid", label: "← Previous: Sigmoid" }}
+      nextLink={{ href: "/learn/activation-functions/silu", label: "Next: SiLU →" }}
+    />
+  );
+}
+
diff --git a/app/learn/attention-mechanism/applying-attention-weights/page.tsx b/app/learn/attention-mechanism/applying-attention-weights/page.tsx
new file mode 100644
index 0000000..11e0f91
--- /dev/null
+++ b/app/learn/attention-mechanism/applying-attention-weights/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function ApplyingAttentionWeightsPage() {
+  return (
+    <LessonPage
+      contentPath="attention-mechanism/applying-attention-weights"
+      prevLink={{ href: "/learn/attention-mechanism/calculating-attention-scores", label: "← Previous: Calculating Attention Scores" }}
+      nextLink={{ href: "/learn/attention-mechanism/multi-head-attention", label: "Next: Multi Head Attention →" }}
+    />
+  );
+}
+
diff --git a/app/learn/attention-mechanism/attention-in-code/page.tsx b/app/learn/attention-mechanism/attention-in-code/page.tsx
new file mode 100644
index 0000000..0bb3f76
--- /dev/null
+++ b/app/learn/attention-mechanism/attention-in-code/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function AttentionInCodePage() {
+  return (
+    <LessonPage
+      contentPath="attention-mechanism/attention-in-code"
+      prevLink={{ href: "/learn/attention-mechanism/multi-head-attention", label: "← Previous: Multi Head Attention" }}
+      nextLink={{ href: "/learn/transformer-feedforward/the-feedforward-layer", label: "Next: The Feedforward Layer →" }}
+    />
+  );
+}
+
diff --git a/app/learn/attention-mechanism/calculating-attention-scores/page.tsx b/app/learn/attention-mechanism/calculating-attention-scores/page.tsx
new file mode 100644
index 0000000..6058f59
--- /dev/null
+++ b/app/learn/attention-mechanism/calculating-attention-scores/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function CalculatingAttentionScoresPage() {
+  return (
+    <LessonPage
+      contentPath="attention-mechanism/calculating-attention-scores"
+      prevLink={{ href: "/learn/attention-mechanism/self-attention-from-scratch", label: "← Previous: Self Attention from Scratch" }}
+      nextLink={{ href: "/learn/attention-mechanism/applying-attention-weights", label: "Next: Applying Attention Weights →" }}
+    />
+  );
+}
+
diff --git a/app/learn/attention-mechanism/multi-head-attention/page.tsx b/app/learn/attention-mechanism/multi-head-attention/page.tsx
new file mode 100644
index 0000000..2b3d895
--- /dev/null
+++ b/app/learn/attention-mechanism/multi-head-attention/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function MultiHeadAttentionPage() {
+  return (
+    <LessonPage
+      contentPath="attention-mechanism/multi-head-attention"
+      prevLink={{ href: "/learn/attention-mechanism/applying-attention-weights", label: "← Previous: Applying Attention Weights" }}
+      nextLink={{ href: "/learn/attention-mechanism/attention-in-code", label: "Next: Attention in Code →" }}
+    />
+  );
+}
+
diff --git a/app/learn/attention-mechanism/self-attention-from-scratch/page.tsx b/app/learn/attention-mechanism/self-attention-from-scratch/page.tsx
new file mode 100644
index 0000000..0d31494
--- /dev/null
+++ b/app/learn/attention-mechanism/self-attention-from-scratch/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function SelfAttentionFromScratchPage() {
+  return (
+    <LessonPage
+      contentPath="attention-mechanism/self-attention-from-scratch"
+      prevLink={{ href: "/learn/attention-mechanism/what-is-attention", label: "← Previous: What is Attention" }}
+      nextLink={{ href: "/learn/attention-mechanism/calculating-attention-scores", label: "Next: Calculating Attention Scores →" }}
+    />
+  );
+}
+
diff --git a/app/learn/attention-mechanism/what-is-attention/page.tsx b/app/learn/attention-mechanism/what-is-attention/page.tsx
new file mode 100644
index 0000000..799ee0f
--- /dev/null
+++ b/app/learn/attention-mechanism/what-is-attention/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function WhatIsAttentionPage() {
+  return (
+    <LessonPage
+      contentPath="attention-mechanism/what-is-attention"
+      prevLink={{ href: "/learn/neural-networks/implementing-backpropagation", label: "← Previous: Implementing Backpropagation" }}
+      nextLink={{ href: "/learn/attention-mechanism/self-attention-from-scratch", label: "Next: Self Attention from Scratch →" }}
+    />
+  );
+}
+
diff --git a/app/learn/building-a-transformer/building-a-transformer-block/page.tsx b/app/learn/building-a-transformer/building-a-transformer-block/page.tsx
new file mode 100644
index 0000000..b684901
--- /dev/null
+++ b/app/learn/building-a-transformer/building-a-transformer-block/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function BuildingATransformerBlockPage() {
+  return (
+    <LessonPage
+      contentPath="building-a-transformer/building-a-transformer-block"
+      prevLink={{ href: "/learn/building-a-transformer/rope-positional-encoding", label: "← Previous: RoPE Positional Encoding" }}
+      nextLink={{ href: "/learn/building-a-transformer/the-final-linear-layer", label: "Next: The Final Linear Layer →" }}
+    />
+  );
+}
+
diff --git a/app/learn/building-a-transformer/full-transformer-in-code/page.tsx b/app/learn/building-a-transformer/full-transformer-in-code/page.tsx
new file mode 100644
index 0000000..fc0a45a
--- /dev/null
+++ b/app/learn/building-a-transformer/full-transformer-in-code/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function FullTransformerInCodePage() {
+  return (
+    <LessonPage
+      contentPath="building-a-transformer/full-transformer-in-code"
+      prevLink={{ href: "/learn/building-a-transformer/the-final-linear-layer", label: "← Previous: The Final Linear Layer" }}
+      nextLink={{ href: "/learn/building-a-transformer/training-a-transformer", label: "Next: Training a Transformer →" }}
+    />
+  );
+}
+
diff --git a/app/learn/building-a-transformer/rope-positional-encoding/page.tsx b/app/learn/building-a-transformer/rope-positional-encoding/page.tsx
new file mode 100644
index 0000000..0e02d0b
--- /dev/null
+++ b/app/learn/building-a-transformer/rope-positional-encoding/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function RopePositionalEncodingPage() {
+  return (
+    <LessonPage
+      contentPath="building-a-transformer/rope-positional-encoding"
+      prevLink={{ href: "/learn/building-a-transformer/transformer-architecture", label: "← Previous: Transformer Architecture" }}
+      nextLink={{ href: "/learn/building-a-transformer/building-a-transformer-block", label: "Next: Building a Transformer Block →" }}
+    />
+  );
+}
+
diff --git a/app/learn/building-a-transformer/the-final-linear-layer/page.tsx b/app/learn/building-a-transformer/the-final-linear-layer/page.tsx
new file mode 100644
index 0000000..3e6c49d
--- /dev/null
+++ b/app/learn/building-a-transformer/the-final-linear-layer/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheFinalLinearLayerPage() {
+  return (
+    <LessonPage
+      contentPath="building-a-transformer/the-final-linear-layer"
+      prevLink={{ href: "/learn/building-a-transformer/building-a-transformer-block", label: "← Previous: Building a Transformer Block" }}
+      nextLink={{ href: "/learn/building-a-transformer/full-transformer-in-code", label: "Next: Full Transformer in Code →" }}
+    />
+  );
+}
+
diff --git a/app/learn/building-a-transformer/training-a-transformer/page.tsx b/app/learn/building-a-transformer/training-a-transformer/page.tsx
new file mode 100644
index 0000000..2fe2616
--- /dev/null
+++ b/app/learn/building-a-transformer/training-a-transformer/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TrainingATransformerPage() {
+  return (
+    <LessonPage
+      contentPath="building-a-transformer/training-a-transformer"
+      prevLink={{ href: "/learn/building-a-transformer/full-transformer-in-code", label: "← Previous: Full Transformer in Code" }}
+      nextLink={{ href: "/learn", label: "Next: Course Complete →" }}
+    />
+  );
+}
+
diff --git a/app/learn/building-a-transformer/transformer-architecture/page.tsx b/app/learn/building-a-transformer/transformer-architecture/page.tsx
new file mode 100644
index 0000000..741ece3
--- /dev/null
+++ b/app/learn/building-a-transformer/transformer-architecture/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TransformerArchitecturePage() {
+  return (
+    <LessonPage
+      contentPath="building-a-transformer/transformer-architecture"
+      prevLink={{ href: "/learn/transformer-feedforward/the-deepseek-mlp", label: "← Previous: The DeepSeek MLP" }}
+      nextLink={{ href: "/learn/building-a-transformer/rope-positional-encoding", label: "Next: RoPE Positional Encoding →" }}
+    />
+  );
+}
+
diff --git a/app/learn/math/derivatives/page.tsx b/app/learn/math/derivatives/page.tsx
new file mode 100644
index 0000000..7097874
--- /dev/null
+++ b/app/learn/math/derivatives/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function DerivativesPage() {
+  return (
+    <LessonPage
+      contentPath="math/derivatives"
+      prevLink={{ href: "/learn/math/functions", label: "← Previous: Functions" }}
+      nextLink={{ href: "/learn/math/vectors", label: "Next: Vectors →" }}
+    />
+  );
+}
+
diff --git a/app/learn/math/functions/page.tsx b/app/learn/math/functions/page.tsx
new file mode 100644
index 0000000..669be2b
--- /dev/null
+++ b/app/learn/math/functions/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function FunctionsPage() {
+  return (
+    <LessonPage
+      contentPath="math/functions"
+      prevLink={{ href: "/learn", label: "← Back to Course" }}
+      nextLink={{ href: "/learn/math/derivatives", label: "Next: Derivatives →" }}
+    />
+  );
+}
+
diff --git a/app/learn/math/gradients/page.tsx b/app/learn/math/gradients/page.tsx
new file mode 100644
index 0000000..f1c3022
--- /dev/null
+++ b/app/learn/math/gradients/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function GradientsPage() {
+  return (
+    <LessonPage
+      contentPath="math/gradients"
+      prevLink={{ href: "/learn/math/matrices", label: "← Previous: Matrices" }}
+      nextLink={{ href: "/learn/tensors/creating-tensors", label: "Next: Creating Tensors →" }}
+    />
+  );
+}
+
diff --git a/app/learn/math/matrices/page.tsx b/app/learn/math/matrices/page.tsx
new file mode 100644
index 0000000..6d72757
--- /dev/null
+++ b/app/learn/math/matrices/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function MatricesPage() {
+  return (
+    <LessonPage
+      contentPath="math/matrices"
+      prevLink={{ href: "/learn/math/vectors", label: "← Previous: Vectors" }}
+      nextLink={{ href: "/learn/math/gradients", label: "Next: Gradients →" }}
+    />
+  );
+}
+
diff --git a/app/learn/math/vectors/page.tsx b/app/learn/math/vectors/page.tsx
new file mode 100644
index 0000000..62bbdb2
--- /dev/null
+++ b/app/learn/math/vectors/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function VectorsPage() {
+  return (
+    <LessonPage
+      contentPath="math/vectors"
+      prevLink={{ href: "/learn/math/derivatives", label: "← Previous: Derivatives" }}
+      nextLink={{ href: "/learn/math/matrices", label: "Next: Matrices →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neural-networks/architecture-of-a-network/page.tsx b/app/learn/neural-networks/architecture-of-a-network/page.tsx
new file mode 100644
index 0000000..53930b5
--- /dev/null
+++ b/app/learn/neural-networks/architecture-of-a-network/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function ArchitectureOfANetworkPage() {
+  return (
+    <LessonPage
+      contentPath="neural-networks/architecture-of-a-network"
+      prevLink={{ href: "/learn/activation-functions/softmax", label: "← Previous: Softmax" }}
+      nextLink={{ href: "/learn/neural-networks/building-a-layer", label: "Next: Building a Layer →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neural-networks/backpropagation-in-action/page.tsx b/app/learn/neural-networks/backpropagation-in-action/page.tsx
new file mode 100644
index 0000000..c53ce8d
--- /dev/null
+++ b/app/learn/neural-networks/backpropagation-in-action/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function BackpropagationInActionPage() {
+  return (
+    <LessonPage
+      contentPath="neural-networks/backpropagation-in-action"
+      prevLink={{ href: "/learn/neural-networks/calculating-gradients", label: "← Previous: Calculating Gradients" }}
+      nextLink={{ href: "/learn/neural-networks/implementing-backpropagation", label: "Next: Implementing Backpropagation →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neural-networks/building-a-layer/page.tsx b/app/learn/neural-networks/building-a-layer/page.tsx
new file mode 100644
index 0000000..c549ff2
--- /dev/null
+++ b/app/learn/neural-networks/building-a-layer/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function BuildingALayerPage() {
+  return (
+    <LessonPage
+      contentPath="neural-networks/building-a-layer"
+      prevLink={{ href: "/learn/neural-networks/architecture-of-a-network", label: "← Previous: Architecture of a Network" }}
+      nextLink={{ href: "/learn/neural-networks/implementing-a-network", label: "Next: Implementing a Network →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neural-networks/calculating-gradients/page.tsx b/app/learn/neural-networks/calculating-gradients/page.tsx
new file mode 100644
index 0000000..3ad5ff1
--- /dev/null
+++ b/app/learn/neural-networks/calculating-gradients/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function CalculatingGradientsPage() {
+  return (
+    <LessonPage
+      contentPath="neural-networks/calculating-gradients"
+      prevLink={{ href: "/learn/neural-networks/the-chain-rule", label: "← Previous: The Chain Rule" }}
+      nextLink={{ href: "/learn/neural-networks/backpropagation-in-action", label: "Next: Backpropagation in Action →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neural-networks/implementing-a-network/page.tsx b/app/learn/neural-networks/implementing-a-network/page.tsx
new file mode 100644
index 0000000..0fbb5fd
--- /dev/null
+++ b/app/learn/neural-networks/implementing-a-network/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function ImplementingANetworkPage() {
+  return (
+    <LessonPage
+      contentPath="neural-networks/implementing-a-network"
+      prevLink={{ href: "/learn/neural-networks/building-a-layer", label: "← Previous: Building a Layer" }}
+      nextLink={{ href: "/learn/neural-networks/the-chain-rule", label: "Next: The Chain Rule →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neural-networks/implementing-backpropagation/page.tsx b/app/learn/neural-networks/implementing-backpropagation/page.tsx
new file mode 100644
index 0000000..42f74f1
--- /dev/null
+++ b/app/learn/neural-networks/implementing-backpropagation/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function ImplementingBackpropagationPage() {
+  return (
+    <LessonPage
+      contentPath="neural-networks/implementing-backpropagation"
+      prevLink={{ href: "/learn/neural-networks/backpropagation-in-action", label: "← Previous: Backpropagation in Action" }}
+      nextLink={{ href: "/learn", label: "Next: Course Complete →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neural-networks/the-chain-rule/page.tsx b/app/learn/neural-networks/the-chain-rule/page.tsx
new file mode 100644
index 0000000..2631498
--- /dev/null
+++ b/app/learn/neural-networks/the-chain-rule/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheChainRulePage() {
+  return (
+    <LessonPage
+      contentPath="neural-networks/the-chain-rule"
+      prevLink={{ href: "/learn/neural-networks/implementing-a-network", label: "← Previous: Implementing a Network" }}
+      nextLink={{ href: "/learn/neural-networks/calculating-gradients", label: "Next: Calculating Gradients →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neuron-from-scratch/building-a-neuron-in-python/page.tsx b/app/learn/neuron-from-scratch/building-a-neuron-in-python/page.tsx
new file mode 100644
index 0000000..b7967c1
--- /dev/null
+++ b/app/learn/neuron-from-scratch/building-a-neuron-in-python/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function BuildingANeuronInPythonPage() {
+  return (
+    <LessonPage
+      contentPath="neuron-from-scratch/building-a-neuron-in-python"
+      prevLink={{ href: "/learn/neuron-from-scratch/the-activation-function", label: "← Previous: The Activation Function" }}
+      nextLink={{ href: "/learn/neuron-from-scratch/making-a-prediction", label: "Next: Making a Prediction →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neuron-from-scratch/making-a-prediction/page.tsx b/app/learn/neuron-from-scratch/making-a-prediction/page.tsx
new file mode 100644
index 0000000..0b65430
--- /dev/null
+++ b/app/learn/neuron-from-scratch/making-a-prediction/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function MakingAPredictionPage() {
+  return (
+    <LessonPage
+      contentPath="neuron-from-scratch/making-a-prediction"
+      prevLink={{ href: "/learn/neuron-from-scratch/building-a-neuron-in-python", label: "← Previous: Building a Neuron in Python" }}
+      nextLink={{ href: "/learn/neuron-from-scratch/the-concept-of-loss", label: "Next: The Concept of Loss →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neuron-from-scratch/the-activation-function/page.tsx b/app/learn/neuron-from-scratch/the-activation-function/page.tsx
new file mode 100644
index 0000000..3fd78a2
--- /dev/null
+++ b/app/learn/neuron-from-scratch/the-activation-function/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheActivationFunctionPage() {
+  return (
+    <LessonPage
+      contentPath="neuron-from-scratch/the-activation-function"
+      prevLink={{ href: "/learn/neuron-from-scratch/the-linear-step", label: "← Previous: The Linear Step" }}
+      nextLink={{ href: "/learn/neuron-from-scratch/building-a-neuron-in-python", label: "Next: Building a Neuron in Python →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neuron-from-scratch/the-concept-of-learning/page.tsx b/app/learn/neuron-from-scratch/the-concept-of-learning/page.tsx
new file mode 100644
index 0000000..2bede78
--- /dev/null
+++ b/app/learn/neuron-from-scratch/the-concept-of-learning/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheConceptOfLearningPage() {
+  return (
+    <LessonPage
+      contentPath="neuron-from-scratch/the-concept-of-learning"
+      prevLink={{ href: "/learn/neuron-from-scratch/the-concept-of-loss", label: "← Previous: The Concept of Loss" }}
+      nextLink={{ href: "/learn/activation-functions/relu", label: "Next: ReLU →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neuron-from-scratch/the-concept-of-loss/page.tsx b/app/learn/neuron-from-scratch/the-concept-of-loss/page.tsx
new file mode 100644
index 0000000..9cea839
--- /dev/null
+++ b/app/learn/neuron-from-scratch/the-concept-of-loss/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheConceptOfLossPage() {
+  return (
+    <LessonPage
+      contentPath="neuron-from-scratch/the-concept-of-loss"
+      prevLink={{ href: "/learn/neuron-from-scratch/making-a-prediction", label: "← Previous: Making a Prediction" }}
+      nextLink={{ href: "/learn/neuron-from-scratch/the-concept-of-learning", label: "Next: The Concept of Learning →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neuron-from-scratch/the-linear-step/page.tsx b/app/learn/neuron-from-scratch/the-linear-step/page.tsx
new file mode 100644
index 0000000..1a3c919
--- /dev/null
+++ b/app/learn/neuron-from-scratch/the-linear-step/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheLinearStepPage() {
+  return (
+    <LessonPage
+      contentPath="neuron-from-scratch/the-linear-step"
+      prevLink={{ href: "/learn/neuron-from-scratch/what-is-a-neuron", label: "← Previous: What is a Neuron" }}
+      nextLink={{ href: "/learn/neuron-from-scratch/the-activation-function", label: "Next: The Activation Function →" }}
+    />
+  );
+}
+
diff --git a/app/learn/neuron-from-scratch/what-is-a-neuron/page.tsx b/app/learn/neuron-from-scratch/what-is-a-neuron/page.tsx
new file mode 100644
index 0000000..0d2adda
--- /dev/null
+++ b/app/learn/neuron-from-scratch/what-is-a-neuron/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function WhatIsANeuronPage() {
+  return (
+    <LessonPage
+      contentPath="neuron-from-scratch/what-is-a-neuron"
+      prevLink={{ href: "/learn/tensors/creating-special-tensors", label: "← Previous: Creating Special Tensors" }}
+      nextLink={{ href: "/learn/neuron-from-scratch/the-linear-step", label: "Next: The Linear Step →" }}
+    />
+  );
+}
+
diff --git a/app/learn/page.tsx b/app/learn/page.tsx
new file mode 100644
index 0000000..3962012
--- /dev/null
+++ b/app/learn/page.tsx
@@ -0,0 +1,1225 @@
+'use client';
+
+import Link from "next/link";
+import { useLanguage } from "@/components/providers/language-provider";
+
+export default function LearnPage() {
+  const { language } = useLanguage();
+
+  return (
+    <div className="min-h-screen bg-gradient-to-br from-slate-900 via-slate-800 to-slate-900">
+      {/* Hero Section */}
+      <section className="relative overflow-hidden py-20">
+        <div className="absolute inset-0 bg-gradient-to-r from-blue-600/10 via-purple-600/10 to-blue-600/10"></div>
+        
+        <div className="relative container mx-auto px-6">
+          <div className="max-w-4xl mx-auto text-center">
+            <h1 className="text-5xl md:text-6xl font-bold mb-6">
+              <span className="bg-gradient-to-r from-blue-400 via-purple-400 to-cyan-400 bg-clip-text text-transparent">
+                {language === 'en' ? 'Learn Everything You Need To Be An AI Researcher' : '从零开始学习AI'}
+              </span>
+            </h1>
+            <p className="text-xl text-slate-300 mb-8">
+              {language === 'en' 
+                ? 'Master the fundamentals and publish your own papers'
+                : '掌握基础知识，构建你自己的神经网络'}
+            </p>
+            <div className="max-w-3xl mx-auto bg-amber-500/10 border border-amber-500/30 rounded-xl p-6 mb-8">
+              <p className="text-amber-200 text-sm leading-relaxed">
+                {language === 'en'
+                  ? 'Under active development, some parts are AI generated and not reviewed yet. In the end everything will be carefully reviewed and rewritten by humans to the highest quality.'
+                  : '正在积极开发中，部分内容由AI生成尚未审核。最终所有内容都将由人工仔细审核和重写，确保最高质量'}
+              </p>
+            </div>
+          </div>
+        </div>
+      </section>
+
+      {/* Course Modules */}
+      <section className="py-12">
+        <div className="container mx-auto px-6">
+          <div className="max-w-6xl mx-auto space-y-12">
+            
+            {/* Math Module */}
+            <div className="bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-2xl p-8">
+              <div className="flex items-center gap-4 mb-6">
+                <div className="w-12 h-12 bg-gradient-to-r from-blue-500 to-cyan-500 rounded-lg flex items-center justify-center">
+                  <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 7h6m0 10v-3m-3 3h.01M9 17h.01M9 14h.01M12 14h.01M15 11h.01M12 11h.01M9 11h.01M7 21h10a2 2 0 002-2V5a2 2 0 00-2-2H7a2 2 0 00-2 2v14a2 2 0 002 2z" />
+                  </svg>
+                </div>
+                <div>
+                  <h2 className="text-3xl font-bold text-white">
+                    {language === 'en' ? 'Mathematics Fundamentals' : '数学基础'}
+                  </h2>
+                  <p className="text-slate-400">
+                    {language === 'en' ? 'Essential math concepts for AI' : 'AI必备的数学概念'}
+                  </p>
+                </div>
+              </div>
+              
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <Link 
+                  href="/learn/math/functions"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-cyan-500/50 hover:shadow-xl hover:shadow-cyan-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-cyan-400 transition-colors">
+                      <span className="mr-2">1.</span>{language === 'en' ? 'Functions' : '函数'}
+                    </h3>
+                    <svg className="w-5 h-5 text-cyan-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Linear, quadratic, and activation functions'
+                      : '线性、二次和激活函数'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/math/derivatives"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-blue-500/50 hover:shadow-xl hover:shadow-blue-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-blue-400 transition-colors">
+                      <span className="mr-2">2.</span>{language === 'en' ? 'Derivatives' : '导数'}
+                    </h3>
+                    <svg className="w-5 h-5 text-blue-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Understanding rates of change and gradients'
+                      : '理解变化率和梯度'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/math/vectors"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-violet-500/50 hover:shadow-xl hover:shadow-violet-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-violet-400 transition-colors">
+                      <span className="mr-2">3.</span>{language === 'en' ? 'Vectors' : '向量'}
+                    </h3>
+                    <svg className="w-5 h-5 text-violet-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Understanding magnitude, direction, and vector operations'
+                      : '理解大小、方向和向量运算'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/math/matrices"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-teal-500/50 hover:shadow-xl hover:shadow-teal-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-teal-400 transition-colors">
+                      <span className="mr-2">4.</span>{language === 'en' ? 'Matrices' : '矩阵'}
+                    </h3>
+                    <svg className="w-5 h-5 text-teal-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Matrix operations and transformations'
+                      : '矩阵运算和变换'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/math/gradients"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-orange-500/50 hover:shadow-xl hover:shadow-orange-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-orange-400 transition-colors">
+                      <span className="mr-2">5.</span>{language === 'en' ? 'Gradients' : '梯度'}
+                    </h3>
+                    <svg className="w-5 h-5 text-orange-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Partial derivatives and gradient descent'
+                      : '偏导数和梯度下降'}
+                  </p>
+                </Link>
+              </div>
+            </div>
+
+            {/* PyTorch Fundamentals Module */}
+            <div className="bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-2xl p-8">
+              <div className="flex items-center gap-4 mb-6">
+                <div className="w-12 h-12 bg-gradient-to-r from-green-500 to-emerald-500 rounded-lg flex items-center justify-center">
+                  <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 11H5m14 0a2 2 0 012 2v6a2 2 0 01-2 2H5a2 2 0 01-2-2v-6a2 2 0 012-2m14 0V9a2 2 0 00-2-2M5 11V9a2 2 0 012-2m0 0V5a2 2 0 012-2h6a2 2 0 012 2v2M7 7h10" />
+                  </svg>
+                </div>
+                <div>
+                  <h2 className="text-3xl font-bold text-white">
+                    {language === 'en' ? 'PyTorch Fundamentals' : 'PyTorch基础'}
+                  </h2>
+                  <p className="text-slate-400">
+                    {language === 'en' ? 'Working with tensors and PyTorch basics' : '使用张量和PyTorch基础'}
+                  </p>
+                </div>
+              </div>
+              
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <Link 
+                  href="/learn/tensors/creating-tensors"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-green-500/50 hover:shadow-xl hover:shadow-green-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-green-400 transition-colors">
+                      <span className="mr-2">1.</span>{language === 'en' ? 'Creating Tensors' : '创建张量'}
+                    </h3>
+                    <svg className="w-5 h-5 text-green-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Building blocks of deep learning'
+                      : '深度学习的基本构建块'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/tensors/tensor-addition"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-lime-500/50 hover:shadow-xl hover:shadow-lime-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-lime-400 transition-colors">
+                      <span className="mr-2">2.</span>{language === 'en' ? 'Tensor Addition' : '张量加法'}
+                    </h3>
+                    <svg className="w-5 h-5 text-lime-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Element-wise operations on tensors'
+                      : '张量的逐元素运算'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/tensors/matrix-multiplication"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-emerald-500/50 hover:shadow-xl hover:shadow-emerald-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-emerald-400 transition-colors">
+                      <span className="mr-2">3.</span>{language === 'en' ? 'Matrix Multiplication' : '矩阵乘法'}
+                    </h3>
+                    <svg className="w-5 h-5 text-emerald-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'The core operation in neural networks'
+                      : '神经网络中的核心运算'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/tensors/transposing-tensors"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-teal-500/50 hover:shadow-xl hover:shadow-teal-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-teal-400 transition-colors">
+                      <span className="mr-2">4.</span>{language === 'en' ? 'Transposing Tensors' : '张量转置'}
+                    </h3>
+                    <svg className="w-5 h-5 text-teal-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Flipping dimensions and axes'
+                      : '翻转维度和轴'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/tensors/reshaping-tensors"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-cyan-500/50 hover:shadow-xl hover:shadow-cyan-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-cyan-400 transition-colors">
+                      <span className="mr-2">5.</span>{language === 'en' ? 'Reshaping Tensors' : '张量重塑'}
+                    </h3>
+                    <svg className="w-5 h-5 text-cyan-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Changing tensor dimensions'
+                      : '改变张量维度'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/tensors/indexing-and-slicing"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-sky-500/50 hover:shadow-xl hover:shadow-sky-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-sky-400 transition-colors">
+                      <span className="mr-2">6.</span>{language === 'en' ? 'Indexing and Slicing' : '索引和切片'}
+                    </h3>
+                    <svg className="w-5 h-5 text-sky-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Accessing and extracting tensor elements'
+                      : '访问和提取张量元素'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/tensors/concatenating-tensors"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-indigo-500/50 hover:shadow-xl hover:shadow-indigo-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-indigo-400 transition-colors">
+                      <span className="mr-2">7.</span>{language === 'en' ? 'Concatenating Tensors' : '张量拼接'}
+                    </h3>
+                    <svg className="w-5 h-5 text-indigo-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Combining multiple tensors'
+                      : '组合多个张量'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/tensors/creating-special-tensors"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-violet-500/50 hover:shadow-xl hover:shadow-violet-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-violet-400 transition-colors">
+                      <span className="mr-2">8.</span>{language === 'en' ? 'Creating Special Tensors' : '创建特殊张量'}
+                    </h3>
+                    <svg className="w-5 h-5 text-violet-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Zeros, ones, identity matrices and more'
+                      : '零张量、单位张量、单位矩阵等'}
+                  </p>
+                </Link>
+              </div>
+            </div>
+
+            {/* Neuron From Scratch Module */}
+            <div className="bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-2xl p-8">
+              <div className="flex items-center gap-4 mb-6">
+                <div className="w-12 h-12 bg-gradient-to-r from-yellow-500 to-orange-500 rounded-lg flex items-center justify-center">
+                  <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z" />
+                  </svg>
+                </div>
+                <div>
+                  <h2 className="text-3xl font-bold text-white">
+                    {language === 'en' ? 'Neuron From Scratch' : '从零开始构建神经元'}
+                  </h2>
+                  <p className="text-slate-400">
+                    {language === 'en' ? 'Understanding the fundamental unit of neural networks' : '理解神经网络的基本单元'}
+                  </p>
+                </div>
+              </div>
+              
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <Link 
+                  href="/learn/neuron-from-scratch/what-is-a-neuron"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-yellow-500/50 hover:shadow-xl hover:shadow-yellow-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-yellow-400 transition-colors">
+                      <span className="mr-2">1.</span>{language === 'en' ? 'What is a Neuron' : '什么是神经元'}
+                    </h3>
+                    <svg className="w-5 h-5 text-yellow-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'The basic building block of neural networks'
+                      : '神经网络的基本构建块'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neuron-from-scratch/the-linear-step"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-amber-500/50 hover:shadow-xl hover:shadow-amber-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-amber-400 transition-colors">
+                      <span className="mr-2">2.</span>{language === 'en' ? 'The Linear Step' : '线性步骤'}
+                    </h3>
+                    <svg className="w-5 h-5 text-amber-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Weighted sums and bias in neurons'
+                      : '神经元中的加权和和偏置'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neuron-from-scratch/the-activation-function"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-orange-500/50 hover:shadow-xl hover:shadow-orange-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-orange-400 transition-colors">
+                      <span className="mr-2">3.</span>{language === 'en' ? 'The Activation Function' : '激活函数'}
+                    </h3>
+                    <svg className="w-5 h-5 text-orange-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Introducing non-linearity to neurons'
+                      : '为神经元引入非线性'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neuron-from-scratch/building-a-neuron-in-python"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-red-500/50 hover:shadow-xl hover:shadow-red-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-red-400 transition-colors">
+                      <span className="mr-2">4.</span>{language === 'en' ? 'Building a Neuron in Python' : '用Python构建神经元'}
+                    </h3>
+                    <svg className="w-5 h-5 text-red-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Implementing a single neuron from scratch'
+                      : '从零开始实现单个神经元'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neuron-from-scratch/making-a-prediction"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-pink-500/50 hover:shadow-xl hover:shadow-pink-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-pink-400 transition-colors">
+                      <span className="mr-2">5.</span>{language === 'en' ? 'Making a Prediction' : '进行预测'}
+                    </h3>
+                    <svg className="w-5 h-5 text-pink-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'How a neuron processes input to output'
+                      : '神经元如何处理输入到输出'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neuron-from-scratch/the-concept-of-loss"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-rose-500/50 hover:shadow-xl hover:shadow-rose-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-rose-400 transition-colors">
+                      <span className="mr-2">6.</span>{language === 'en' ? 'The Concept of Loss' : '损失概念'}
+                    </h3>
+                    <svg className="w-5 h-5 text-rose-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Measuring prediction error'
+                      : '测量预测误差'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neuron-from-scratch/the-concept-of-learning"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-fuchsia-500/50 hover:shadow-xl hover:shadow-fuchsia-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-fuchsia-400 transition-colors">
+                      <span className="mr-2">7.</span>{language === 'en' ? 'The Concept of Learning' : '学习概念'}
+                    </h3>
+                    <svg className="w-5 h-5 text-fuchsia-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'How neurons adjust their parameters'
+                      : '神经元如何调整其参数'}
+                  </p>
+                </Link>
+              </div>
+            </div>
+
+            {/* Activation Functions Module */}
+            <div className="bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-2xl p-8">
+              <div className="flex items-center gap-4 mb-6">
+                <div className="w-12 h-12 bg-gradient-to-r from-indigo-500 to-purple-500 rounded-lg flex items-center justify-center">
+                  <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 10V3L4 14h7v7l9-11h-7z" />
+                  </svg>
+                </div>
+                <div>
+                  <h2 className="text-3xl font-bold text-white">
+                    {language === 'en' ? 'Activation Functions' : '激活函数'}
+                  </h2>
+                  <p className="text-slate-400">
+                    {language === 'en' ? 'Understanding different activation functions' : '理解不同的激活函数'}
+                  </p>
+                </div>
+              </div>
+              
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <Link 
+                  href="/learn/activation-functions/relu"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-indigo-500/50 hover:shadow-xl hover:shadow-indigo-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-indigo-400 transition-colors">
+                      <span className="mr-2">1.</span>{language === 'en' ? 'ReLU' : 'ReLU'}
+                    </h3>
+                    <svg className="w-5 h-5 text-indigo-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Rectified Linear Unit - The most popular activation function'
+                      : '修正线性单元 - 最流行的激活函数'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/activation-functions/sigmoid"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-purple-500/50 hover:shadow-xl hover:shadow-purple-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-purple-400 transition-colors">
+                      <span className="mr-2">2.</span>{language === 'en' ? 'Sigmoid' : 'Sigmoid'}
+                    </h3>
+                    <svg className="w-5 h-5 text-purple-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'The classic S-shaped activation function'
+                      : '经典的S形激活函数'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/activation-functions/tanh"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-violet-500/50 hover:shadow-xl hover:shadow-violet-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-violet-400 transition-colors">
+                      <span className="mr-2">3.</span>{language === 'en' ? 'Tanh' : 'Tanh'}
+                    </h3>
+                    <svg className="w-5 h-5 text-violet-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Hyperbolic tangent - Zero-centered activation'
+                      : '双曲正切 - 零中心激活'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/activation-functions/silu"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-blue-500/50 hover:shadow-xl hover:shadow-blue-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-blue-400 transition-colors">
+                      <span className="mr-2">4.</span>{language === 'en' ? 'SiLU' : 'SiLU'}
+                    </h3>
+                    <svg className="w-5 h-5 text-blue-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Sigmoid Linear Unit - The Swish activation'
+                      : 'Sigmoid线性单元 - Swish激活'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/activation-functions/swiglu"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-cyan-500/50 hover:shadow-xl hover:shadow-cyan-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-cyan-400 transition-colors">
+                      <span className="mr-2">5.</span>{language === 'en' ? 'SwiGLU' : 'SwiGLU'}
+                    </h3>
+                    <svg className="w-5 h-5 text-cyan-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Swish-Gated Linear Unit - Advanced activation'
+                      : 'Swish门控线性单元 - 高级激活'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/activation-functions/softmax"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-teal-500/50 hover:shadow-xl hover:shadow-teal-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-teal-400 transition-colors">
+                      <span className="mr-2">6.</span>{language === 'en' ? 'Softmax' : 'Softmax'}
+                    </h3>
+                    <svg className="w-5 h-5 text-teal-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Multi-class classification activation function'
+                      : '多类分类激活函数'}
+                  </p>
+                </Link>
+              </div>
+            </div>
+
+            {/* Neural Networks Module */}
+            <div className="bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-2xl p-8">
+              <div className="flex items-center gap-4 mb-6">
+                <div className="w-12 h-12 bg-gradient-to-r from-purple-500 to-pink-500 rounded-lg flex items-center justify-center">
+                  <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 10V3L4 14h7v7l9-11h-7z" />
+                  </svg>
+                </div>
+                <div>
+                  <h2 className="text-3xl font-bold text-white">
+                    {language === 'en' ? 'Neural Networks from Scratch' : '从零开始的神经网络'}
+                  </h2>
+                  <p className="text-slate-400">
+                    {language === 'en' ? 'Build neural networks from the ground up' : '从头构建神经网络'}
+                  </p>
+                </div>
+              </div>
+              
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <Link 
+                  href="/learn/neural-networks/architecture-of-a-network"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-purple-500/50 hover:shadow-xl hover:shadow-purple-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-purple-400 transition-colors">
+                      <span className="mr-2">1.</span>{language === 'en' ? 'Architecture of a Network' : '网络架构'}
+                    </h3>
+                    <svg className="w-5 h-5 text-purple-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Understanding neural network structure and design'
+                      : '理解神经网络结构和设计'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neural-networks/building-a-layer"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-pink-500/50 hover:shadow-xl hover:shadow-pink-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-pink-400 transition-colors">
+                      <span className="mr-2">2.</span>{language === 'en' ? 'Building a Layer' : '构建层'}
+                    </h3>
+                    <svg className="w-5 h-5 text-pink-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Constructing individual network layers'
+                      : '构建单个网络层'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neural-networks/implementing-a-network"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-indigo-500/50 hover:shadow-xl hover:shadow-indigo-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-indigo-400 transition-colors">
+                      <span className="mr-2">3.</span>{language === 'en' ? 'Implementing a Network' : '实现网络'}
+                    </h3>
+                    <svg className="w-5 h-5 text-indigo-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Putting together a complete neural network'
+                      : '组装完整的神经网络'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neural-networks/the-chain-rule"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-rose-500/50 hover:shadow-xl hover:shadow-rose-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-rose-400 transition-colors">
+                      <span className="mr-2">4.</span>{language === 'en' ? 'The Chain Rule' : '链式法则'}
+                    </h3>
+                    <svg className="w-5 h-5 text-rose-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Mathematical foundation of backpropagation'
+                      : '反向传播的数学基础'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neural-networks/calculating-gradients"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-blue-500/50 hover:shadow-xl hover:shadow-blue-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-blue-400 transition-colors">
+                      <span className="mr-2">5.</span>{language === 'en' ? 'Calculating Gradients' : '计算梯度'}
+                    </h3>
+                    <svg className="w-5 h-5 text-blue-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Computing derivatives for network training'
+                      : '计算网络训练的导数'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neural-networks/backpropagation-in-action"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-cyan-500/50 hover:shadow-xl hover:shadow-cyan-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-cyan-400 transition-colors">
+                      <span className="mr-2">6.</span>{language === 'en' ? 'Backpropagation in Action' : '反向传播实战'}
+                    </h3>
+                    <svg className="w-5 h-5 text-cyan-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Understanding the backpropagation algorithm'
+                      : '理解反向传播算法'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/neural-networks/implementing-backpropagation"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-teal-500/50 hover:shadow-xl hover:shadow-teal-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-teal-400 transition-colors">
+                      <span className="mr-2">7.</span>{language === 'en' ? 'Implementing Backpropagation' : '实现反向传播'}
+                    </h3>
+                    <svg className="w-5 h-5 text-teal-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Coding the backpropagation algorithm from scratch'
+                      : '从零开始编写反向传播算法'}
+                  </p>
+                </Link>
+              </div>
+            </div>
+
+            {/* Attention Mechanism Module */}
+            <div className="bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-2xl p-8">
+              <div className="flex items-center gap-4 mb-6">
+                <div className="w-12 h-12 bg-gradient-to-r from-red-500 to-pink-500 rounded-lg flex items-center justify-center">
+                  <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 12a3 3 0 11-6 0 3 3 0 016 0z" />
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M2.458 12C3.732 7.943 7.523 5 12 5c4.478 0 8.268 2.943 9.542 7-1.274 4.057-5.064 7-9.542 7-4.477 0-8.268-2.943-9.542-7z" />
+                  </svg>
+                </div>
+                <div>
+                  <h2 className="text-3xl font-bold text-white">
+                    {language === 'en' ? 'Attention Mechanism' : '注意力机制'}
+                  </h2>
+                  <p className="text-slate-400">
+                    {language === 'en' ? 'Understanding attention and self-attention' : '理解注意力和自注意力'}
+                  </p>
+                </div>
+              </div>
+              
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <Link 
+                  href="/learn/attention-mechanism/what-is-attention"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-red-500/50 hover:shadow-xl hover:shadow-red-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-red-400 transition-colors">
+                      <span className="mr-2">1.</span>{language === 'en' ? 'What is Attention' : '什么是注意力'}
+                    </h3>
+                    <svg className="w-5 h-5 text-red-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Understanding the attention mechanism'
+                      : '理解注意力机制'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/attention-mechanism/self-attention-from-scratch"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-pink-500/50 hover:shadow-xl hover:shadow-pink-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-pink-400 transition-colors">
+                      <span className="mr-2">2.</span>{language === 'en' ? 'Self Attention from Scratch' : '从零开始自注意力'}
+                    </h3>
+                    <svg className="w-5 h-5 text-pink-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Building self-attention from the ground up'
+                      : '从零开始构建自注意力'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/attention-mechanism/calculating-attention-scores"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-rose-500/50 hover:shadow-xl hover:shadow-rose-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-rose-400 transition-colors">
+                      <span className="mr-2">3.</span>{language === 'en' ? 'Calculating Attention Scores' : '计算注意力分数'}
+                    </h3>
+                    <svg className="w-5 h-5 text-rose-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Computing query-key-value similarities'
+                      : '计算查询-键-值相似度'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/attention-mechanism/applying-attention-weights"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-fuchsia-500/50 hover:shadow-xl hover:shadow-fuchsia-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-fuchsia-400 transition-colors">
+                      <span className="mr-2">4.</span>{language === 'en' ? 'Applying Attention Weights' : '应用注意力权重'}
+                    </h3>
+                    <svg className="w-5 h-5 text-fuchsia-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Using attention scores to weight values'
+                      : '使用注意力分数加权值'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/attention-mechanism/multi-head-attention"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-purple-500/50 hover:shadow-xl hover:shadow-purple-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-purple-400 transition-colors">
+                      <span className="mr-2">5.</span>{language === 'en' ? 'Multi Head Attention' : '多头注意力'}
+                    </h3>
+                    <svg className="w-5 h-5 text-purple-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Parallel attention mechanisms'
+                      : '并行注意力机制'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/attention-mechanism/attention-in-code"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-violet-500/50 hover:shadow-xl hover:shadow-violet-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-violet-400 transition-colors">
+                      <span className="mr-2">6.</span>{language === 'en' ? 'Attention in Code' : '注意力代码实现'}
+                    </h3>
+                    <svg className="w-5 h-5 text-violet-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Implementing attention mechanisms in Python'
+                      : '用Python实现注意力机制'}
+                  </p>
+                </Link>
+              </div>
+            </div>
+
+            {/* Transformer Feedforward Module */}
+            <div className="bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-2xl p-8">
+              <div className="flex items-center gap-4 mb-6">
+                <div className="w-12 h-12 bg-gradient-to-r from-blue-500 to-indigo-500 rounded-lg flex items-center justify-center">
+                  <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M8 9l3 3-3 3m5 0h3M5 20h14a2 2 0 002-2V6a2 2 0 00-2-2H5a2 2 0 00-2 2v12a2 2 0 002 2z" />
+                  </svg>
+                </div>
+                <div>
+                  <h2 className="text-3xl font-bold text-white">
+                    {language === 'en' ? 'Transformer Feedforward' : 'Transformer前馈网络'}
+                  </h2>
+                  <p className="text-slate-400">
+                    {language === 'en' ? 'Feedforward networks and Mixture of Experts' : '前馈网络和专家混合'}
+                  </p>
+                </div>
+              </div>
+              
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <Link 
+                  href="/learn/transformer-feedforward/the-feedforward-layer"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-blue-500/50 hover:shadow-xl hover:shadow-blue-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-blue-400 transition-colors">
+                      <span className="mr-2">1.</span>{language === 'en' ? 'The Feedforward Layer' : '前馈层'}
+                    </h3>
+                    <svg className="w-5 h-5 text-blue-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Understanding transformer feedforward networks'
+                      : '理解Transformer前馈网络'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/transformer-feedforward/what-is-mixture-of-experts"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-indigo-500/50 hover:shadow-xl hover:shadow-indigo-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-indigo-400 transition-colors">
+                      <span className="mr-2">2.</span>{language === 'en' ? 'What is Mixture of Experts' : '什么是专家混合'}
+                    </h3>
+                    <svg className="w-5 h-5 text-indigo-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Introduction to MoE architecture'
+                      : 'MoE架构介绍'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/transformer-feedforward/the-expert"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-cyan-500/50 hover:shadow-xl hover:shadow-cyan-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-cyan-400 transition-colors">
+                      <span className="mr-2">3.</span>{language === 'en' ? 'The Expert' : '专家'}
+                    </h3>
+                    <svg className="w-5 h-5 text-cyan-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Understanding individual expert networks'
+                      : '理解单个专家网络'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/transformer-feedforward/the-gate"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-teal-500/50 hover:shadow-xl hover:shadow-teal-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-teal-400 transition-colors">
+                      <span className="mr-2">4.</span>{language === 'en' ? 'The Gate' : '门控'}
+                    </h3>
+                    <svg className="w-5 h-5 text-teal-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Routing and gating mechanisms in MoE'
+                      : 'MoE中的路由和门控机制'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/transformer-feedforward/combining-experts"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-sky-500/50 hover:shadow-xl hover:shadow-sky-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-sky-400 transition-colors">
+                      <span className="mr-2">5.</span>{language === 'en' ? 'Combining Experts' : '组合专家'}
+                    </h3>
+                    <svg className="w-5 h-5 text-sky-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Merging multiple expert outputs'
+                      : '合并多个专家输出'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/transformer-feedforward/moe-in-a-transformer"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-emerald-500/50 hover:shadow-xl hover:shadow-emerald-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-emerald-400 transition-colors">
+                      <span className="mr-2">6.</span>{language === 'en' ? 'MoE in a Transformer' : 'Transformer中的MoE'}
+                    </h3>
+                    <svg className="w-5 h-5 text-emerald-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Integrating mixture of experts in transformers'
+                      : '在Transformer中集成专家混合'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/transformer-feedforward/moe-in-code"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-green-500/50 hover:shadow-xl hover:shadow-green-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-green-400 transition-colors">
+                      <span className="mr-2">7.</span>{language === 'en' ? 'MoE in Code' : 'MoE代码实现'}
+                    </h3>
+                    <svg className="w-5 h-5 text-green-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Implementing mixture of experts in Python'
+                      : '用Python实现专家混合'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/transformer-feedforward/the-deepseek-mlp"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-lime-500/50 hover:shadow-xl hover:shadow-lime-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-lime-400 transition-colors">
+                      <span className="mr-2">8.</span>{language === 'en' ? 'The DeepSeek MLP' : 'DeepSeek MLP'}
+                    </h3>
+                    <svg className="w-5 h-5 text-lime-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'DeepSeek\'s advanced MLP architecture'
+                      : 'DeepSeek的高级MLP架构'}
+                  </p>
+                </Link>
+              </div>
+            </div>
+
+            {/* Building a Transformer Module */}
+            <div className="bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-2xl p-8">
+              <div className="flex items-center gap-4 mb-6">
+                <div className="w-12 h-12 bg-gradient-to-r from-orange-500 to-red-500 rounded-lg flex items-center justify-center">
+                  <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19.428 15.428a2 2 0 00-1.022-.547l-2.387-.477a6 6 0 00-3.86.517l-.318.158a6 6 0 01-3.86.517L6.05 15.21a2 2 0 00-1.806.547M8 4h8l-1 1v5.172a2 2 0 00.586 1.414l5 5c1.26 1.26.367 3.414-1.415 3.414H4.828c-1.782 0-2.674-2.154-1.414-3.414l5-5A2 2 0 009 10.172V5L8 4z" />
+                  </svg>
+                </div>
+                <div>
+                  <h2 className="text-3xl font-bold text-white">
+                    {language === 'en' ? 'Building a Transformer' : '构建Transformer'}
+                  </h2>
+                  <p className="text-slate-400">
+                    {language === 'en' ? 'Complete transformer implementation from scratch' : '从零开始完整实现Transformer'}
+                  </p>
+                </div>
+              </div>
+              
+              <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <Link 
+                  href="/learn/building-a-transformer/transformer-architecture"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-orange-500/50 hover:shadow-xl hover:shadow-orange-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-orange-400 transition-colors">
+                      <span className="mr-2">1.</span>{language === 'en' ? 'Transformer Architecture' : 'Transformer架构'}
+                    </h3>
+                    <svg className="w-5 h-5 text-orange-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Understanding the complete transformer structure'
+                      : '理解完整的Transformer结构'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/building-a-transformer/rope-positional-encoding"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-amber-500/50 hover:shadow-xl hover:shadow-amber-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-amber-400 transition-colors">
+                      <span className="mr-2">2.</span>{language === 'en' ? 'RoPE Positional Encoding' : 'RoPE位置编码'}
+                    </h3>
+                    <svg className="w-5 h-5 text-amber-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Rotary position embeddings for transformers'
+                      : 'Transformer的旋转位置嵌入'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/building-a-transformer/building-a-transformer-block"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-yellow-500/50 hover:shadow-xl hover:shadow-yellow-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-yellow-400 transition-colors">
+                      <span className="mr-2">3.</span>{language === 'en' ? 'Building a Transformer Block' : '构建Transformer块'}
+                    </h3>
+                    <svg className="w-5 h-5 text-yellow-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Constructing individual transformer layers'
+                      : '构建单个Transformer层'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/building-a-transformer/the-final-linear-layer"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-red-500/50 hover:shadow-xl hover:shadow-red-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-red-400 transition-colors">
+                      <span className="mr-2">4.</span>{language === 'en' ? 'The Final Linear Layer' : '最终线性层'}
+                    </h3>
+                    <svg className="w-5 h-5 text-red-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Output projection and prediction head'
+                      : '输出投影和预测头'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/building-a-transformer/full-transformer-in-code"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-pink-500/50 hover:shadow-xl hover:shadow-pink-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-pink-400 transition-colors">
+                      <span className="mr-2">5.</span>{language === 'en' ? 'Full Transformer in Code' : '完整Transformer代码'}
+                    </h3>
+                    <svg className="w-5 h-5 text-pink-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Complete transformer implementation'
+                      : '完整的Transformer实现'}
+                  </p>
+                </Link>
+
+                <Link 
+                  href="/learn/building-a-transformer/training-a-transformer"
+                  className="group bg-slate-800/50 border border-slate-600/50 rounded-xl p-6 hover:border-rose-500/50 hover:shadow-xl hover:shadow-rose-500/10 transition-all duration-300"
+                >
+                  <div className="flex items-start justify-between mb-3">
+                    <h3 className="text-xl font-semibold text-white group-hover:text-rose-400 transition-colors">
+                      <span className="mr-2">6.</span>{language === 'en' ? 'Training a Transformer' : '训练Transformer'}
+                    </h3>
+                    <svg className="w-5 h-5 text-rose-400 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                  <p className="text-slate-400 text-sm">
+                    {language === 'en' 
+                      ? 'Training process and optimization'
+                      : '训练过程和优化'}
+                  </p>
+                </Link>
+              </div>
+            </div>
+
+          </div>
+        </div>
+      </section>
+    </div>
+  );
+}
+
diff --git a/app/learn/tensors/concatenating-tensors/page.tsx b/app/learn/tensors/concatenating-tensors/page.tsx
new file mode 100644
index 0000000..c3c78a4
--- /dev/null
+++ b/app/learn/tensors/concatenating-tensors/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function ConcatenatingTensorsPage() {
+  return (
+    <LessonPage
+      contentPath="tensors/concatenating-tensors"
+      prevLink={{ href: "/learn/tensors/indexing-and-slicing", label: "← Previous: Indexing and Slicing" }}
+      nextLink={{ href: "/learn/tensors/creating-special-tensors", label: "Next: Creating Special Tensors →" }}
+    />
+  );
+}
+
diff --git a/app/learn/tensors/creating-special-tensors/page.tsx b/app/learn/tensors/creating-special-tensors/page.tsx
new file mode 100644
index 0000000..662ba9c
--- /dev/null
+++ b/app/learn/tensors/creating-special-tensors/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function CreatingSpecialTensorsPage() {
+  return (
+    <LessonPage
+      contentPath="tensors/creating-special-tensors"
+      prevLink={{ href: "/learn/tensors/concatenating-tensors", label: "← Previous: Concatenating Tensors" }}
+      nextLink={{ href: "/learn/neuron-from-scratch/what-is-a-neuron", label: "Next: What is a Neuron →" }}
+    />
+  );
+}
+
diff --git a/app/learn/tensors/creating-tensors/page.tsx b/app/learn/tensors/creating-tensors/page.tsx
new file mode 100644
index 0000000..e4d6dd1
--- /dev/null
+++ b/app/learn/tensors/creating-tensors/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function CreatingTensorsPage() {
+  return (
+    <LessonPage
+      contentPath="tensors/creating-tensors"
+      prevLink={{ href: "/learn/math/gradients", label: "← Previous: Gradients" }}
+      nextLink={{ href: "/learn/tensors/tensor-addition", label: "Next: Tensor Addition →" }}
+    />
+  );
+}
+
diff --git a/app/learn/tensors/indexing-and-slicing/page.tsx b/app/learn/tensors/indexing-and-slicing/page.tsx
new file mode 100644
index 0000000..52b61e6
--- /dev/null
+++ b/app/learn/tensors/indexing-and-slicing/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function IndexingAndSlicingPage() {
+  return (
+    <LessonPage
+      contentPath="tensors/indexing-and-slicing"
+      prevLink={{ href: "/learn/tensors/reshaping-tensors", label: "← Previous: Reshaping Tensors" }}
+      nextLink={{ href: "/learn/tensors/concatenating-tensors", label: "Next: Concatenating Tensors →" }}
+    />
+  );
+}
+
diff --git a/app/learn/tensors/matrix-multiplication/page.tsx b/app/learn/tensors/matrix-multiplication/page.tsx
new file mode 100644
index 0000000..0c502ff
--- /dev/null
+++ b/app/learn/tensors/matrix-multiplication/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function MatrixMultiplicationPage() {
+  return (
+    <LessonPage
+      contentPath="tensors/matrix-multiplication"
+      prevLink={{ href: "/learn/tensors/tensor-addition", label: "← Previous: Tensor Addition" }}
+      nextLink={{ href: "/learn/tensors/transposing-tensors", label: "Next: Transposing Tensors →" }}
+    />
+  );
+}
+
diff --git a/app/learn/tensors/reshaping-tensors/page.tsx b/app/learn/tensors/reshaping-tensors/page.tsx
new file mode 100644
index 0000000..5a18358
--- /dev/null
+++ b/app/learn/tensors/reshaping-tensors/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function ReshapingTensorsPage() {
+  return (
+    <LessonPage
+      contentPath="tensors/reshaping-tensors"
+      prevLink={{ href: "/learn/tensors/transposing-tensors", label: "← Previous: Transposing Tensors" }}
+      nextLink={{ href: "/learn/tensors/indexing-and-slicing", label: "Next: Indexing and Slicing →" }}
+    />
+  );
+}
+
diff --git a/app/learn/tensors/tensor-addition/page.tsx b/app/learn/tensors/tensor-addition/page.tsx
new file mode 100644
index 0000000..c8c738f
--- /dev/null
+++ b/app/learn/tensors/tensor-addition/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TensorAdditionPage() {
+  return (
+    <LessonPage
+      contentPath="tensors/tensor-addition"
+      prevLink={{ href: "/learn/tensors/creating-tensors", label: "← Previous: Creating Tensors" }}
+      nextLink={{ href: "/learn/tensors/matrix-multiplication", label: "Next: Matrix Multiplication →" }}
+    />
+  );
+}
+
diff --git a/app/learn/tensors/transposing-tensors/page.tsx b/app/learn/tensors/transposing-tensors/page.tsx
new file mode 100644
index 0000000..959ce8d
--- /dev/null
+++ b/app/learn/tensors/transposing-tensors/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TransposingTensorsPage() {
+  return (
+    <LessonPage
+      contentPath="tensors/transposing-tensors"
+      prevLink={{ href: "/learn/tensors/matrix-multiplication", label: "← Previous: Matrix Multiplication" }}
+      nextLink={{ href: "/learn/tensors/reshaping-tensors", label: "Next: Reshaping Tensors →" }}
+    />
+  );
+}
+
diff --git a/app/learn/transformer-feedforward/combining-experts/page.tsx b/app/learn/transformer-feedforward/combining-experts/page.tsx
new file mode 100644
index 0000000..34bc471
--- /dev/null
+++ b/app/learn/transformer-feedforward/combining-experts/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function CombiningExpertsPage() {
+  return (
+    <LessonPage
+      contentPath="transformer-feedforward/combining-experts"
+      prevLink={{ href: "/learn/transformer-feedforward/the-gate", label: "← Previous: The Gate" }}
+      nextLink={{ href: "/learn/transformer-feedforward/moe-in-a-transformer", label: "Next: MoE in a Transformer →" }}
+    />
+  );
+}
+
diff --git a/app/learn/transformer-feedforward/moe-in-a-transformer/page.tsx b/app/learn/transformer-feedforward/moe-in-a-transformer/page.tsx
new file mode 100644
index 0000000..d4ead9d
--- /dev/null
+++ b/app/learn/transformer-feedforward/moe-in-a-transformer/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function MoeInATransformerPage() {
+  return (
+    <LessonPage
+      contentPath="transformer-feedforward/moe-in-a-transformer"
+      prevLink={{ href: "/learn/transformer-feedforward/combining-experts", label: "← Previous: Combining Experts" }}
+      nextLink={{ href: "/learn/transformer-feedforward/moe-in-code", label: "Next: MoE in Code →" }}
+    />
+  );
+}
+
diff --git a/app/learn/transformer-feedforward/moe-in-code/page.tsx b/app/learn/transformer-feedforward/moe-in-code/page.tsx
new file mode 100644
index 0000000..876c91d
--- /dev/null
+++ b/app/learn/transformer-feedforward/moe-in-code/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function MoeInCodePage() {
+  return (
+    <LessonPage
+      contentPath="transformer-feedforward/moe-in-code"
+      prevLink={{ href: "/learn/transformer-feedforward/moe-in-a-transformer", label: "← Previous: MoE in a Transformer" }}
+      nextLink={{ href: "/learn/transformer-feedforward/the-deepseek-mlp", label: "Next: The DeepSeek MLP →" }}
+    />
+  );
+}
+
diff --git a/app/learn/transformer-feedforward/the-deepseek-mlp/page.tsx b/app/learn/transformer-feedforward/the-deepseek-mlp/page.tsx
new file mode 100644
index 0000000..7a3ccee
--- /dev/null
+++ b/app/learn/transformer-feedforward/the-deepseek-mlp/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheDeepseekMlpPage() {
+  return (
+    <LessonPage
+      contentPath="transformer-feedforward/the-deepseek-mlp"
+      prevLink={{ href: "/learn/transformer-feedforward/moe-in-code", label: "← Previous: MoE in Code" }}
+      nextLink={{ href: "/learn/building-a-transformer/transformer-architecture", label: "Next: Transformer Architecture →" }}
+    />
+  );
+}
+
diff --git a/app/learn/transformer-feedforward/the-expert/page.tsx b/app/learn/transformer-feedforward/the-expert/page.tsx
new file mode 100644
index 0000000..3046b82
--- /dev/null
+++ b/app/learn/transformer-feedforward/the-expert/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheExpertPage() {
+  return (
+    <LessonPage
+      contentPath="transformer-feedforward/the-expert"
+      prevLink={{ href: "/learn/transformer-feedforward/what-is-mixture-of-experts", label: "← Previous: What is Mixture of Experts" }}
+      nextLink={{ href: "/learn/transformer-feedforward/the-gate", label: "Next: The Gate →" }}
+    />
+  );
+}
+
diff --git a/app/learn/transformer-feedforward/the-feedforward-layer/page.tsx b/app/learn/transformer-feedforward/the-feedforward-layer/page.tsx
new file mode 100644
index 0000000..38bfa34
--- /dev/null
+++ b/app/learn/transformer-feedforward/the-feedforward-layer/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheFeedforwardLayerPage() {
+  return (
+    <LessonPage
+      contentPath="transformer-feedforward/the-feedforward-layer"
+      prevLink={{ href: "/learn/attention-mechanism/attention-in-code", label: "← Previous: Attention in Code" }}
+      nextLink={{ href: "/learn/transformer-feedforward/what-is-mixture-of-experts", label: "Next: What is Mixture of Experts →" }}
+    />
+  );
+}
+
diff --git a/app/learn/transformer-feedforward/the-gate/page.tsx b/app/learn/transformer-feedforward/the-gate/page.tsx
new file mode 100644
index 0000000..c3ed35c
--- /dev/null
+++ b/app/learn/transformer-feedforward/the-gate/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function TheGatePage() {
+  return (
+    <LessonPage
+      contentPath="transformer-feedforward/the-gate"
+      prevLink={{ href: "/learn/transformer-feedforward/the-expert", label: "← Previous: The Expert" }}
+      nextLink={{ href: "/learn/transformer-feedforward/combining-experts", label: "Next: Combining Experts →" }}
+    />
+  );
+}
+
diff --git a/app/learn/transformer-feedforward/what-is-mixture-of-experts/page.tsx b/app/learn/transformer-feedforward/what-is-mixture-of-experts/page.tsx
new file mode 100644
index 0000000..9cb37a7
--- /dev/null
+++ b/app/learn/transformer-feedforward/what-is-mixture-of-experts/page.tsx
@@ -0,0 +1,12 @@
+import { LessonPage } from "@/components/lesson-page";
+
+export default function WhatIsMixtureOfExpertsPage() {
+  return (
+    <LessonPage
+      contentPath="transformer-feedforward/what-is-mixture-of-experts"
+      prevLink={{ href: "/learn/transformer-feedforward/the-feedforward-layer", label: "← Previous: The Feedforward Layer" }}
+      nextLink={{ href: "/learn/transformer-feedforward/the-expert", label: "Next: The Expert →" }}
+    />
+  );
+}
+
diff --git a/app/page.tsx b/app/page.tsx
index 7353cbd..d72bf40 100644
--- a/app/page.tsx
+++ b/app/page.tsx
@@ -73,14 +73,12 @@ export default function Home() {
                     {language === 'en' ? (
                       <>
                         <span className="bg-gradient-to-r from-green-400 via-emerald-400 to-teal-400 bg-clip-text text-transparent">Open</span>
-                        <span className="mx-4 bg-gradient-to-r from-blue-400 via-purple-400 to-cyan-400 bg-clip-text text-transparent">Superintelligence</span>
-                        <span className="bg-gradient-to-r from-cyan-400 via-purple-400 to-blue-400 bg-clip-text text-transparent">Lab</span>
+                        <span className="ml-4 bg-gradient-to-r from-blue-400 via-purple-400 to-cyan-400 bg-clip-text text-transparent">Superintelligence</span>
                       </>
                     ) : (
                       <>
                         <span className="bg-gradient-to-r from-green-400 via-emerald-400 to-teal-400 bg-clip-text text-transparent">开放</span>
-                        <span className="mx-4 bg-gradient-to-r from-blue-400 via-purple-400 to-cyan-400 bg-clip-text text-transparent">超级智能</span>
-                        <span className="bg-gradient-to-r from-cyan-400 via-purple-400 to-blue-400 bg-clip-text text-transparent">实验室</span>
+                        <span className="ml-4 bg-gradient-to-r from-blue-400 via-purple-400 to-cyan-400 bg-clip-text text-transparent">超级智能</span>
                       </>
                     )}
                   </h1>
@@ -90,18 +88,29 @@ export default function Home() {
                     {language === 'en' ? (
                       <>
                         <span className="bg-gradient-to-r from-green-400/20 via-emerald-400/20 to-teal-400/20 bg-clip-text text-transparent">Open</span>
-                        <span className="mx-4 bg-gradient-to-r from-blue-400/20 via-purple-400/20 to-cyan-400/20 bg-clip-text text-transparent">Superintelligence</span>
-                        <span className="bg-gradient-to-r from-cyan-400/20 via-purple-400/20 to-blue-400/20 bg-clip-text text-transparent">Lab</span>
+                        <span className="ml-4 bg-gradient-to-r from-blue-400/20 via-purple-400/20 to-cyan-400/20 bg-clip-text text-transparent">Superintelligence</span>
                       </>
                     ) : (
                       <>
                         <span className="bg-gradient-to-r from-green-400/20 via-emerald-400/20 to-teal-400/20 bg-clip-text text-transparent">开放</span>
-                        <span className="mx-4 bg-gradient-to-r from-blue-400/20 via-purple-400/20 to-cyan-400/20 bg-clip-text text-transparent">超级智能</span>
-                        <span className="bg-gradient-to-r from-cyan-400/20 via-purple-400/20 to-blue-400/20 bg-clip-text text-transparent">实验室</span>
+                        <span className="ml-4 bg-gradient-to-r from-blue-400/20 via-purple-400/20 to-cyan-400/20 bg-clip-text text-transparent">超级智能</span>
                       </>
                     )}
                   </div>
                 </div>
+                
+                {/* Subtitle */}
+                <div className="relative mt-1 pb-12">
+                  <h2 className="relative z-10 text-2xl md:text-3xl lg:text-4xl xl:text-5xl font-semibold bg-gradient-to-r from-amber-300 via-orange-400 to-rose-400 bg-clip-text text-transparent animate-pulse leading-loose">
+                    The Most Difficult Project In Human History
+                  </h2>
+                  {/* Glow effect for subtitle */}
+                  <div className="absolute inset-0 text-2xl md:text-3xl lg:text-4xl xl:text-5xl font-semibold blur-lg opacity-50 pointer-events-none leading-loose">
+                    <span className="bg-gradient-to-r from-amber-300 via-orange-400 to-rose-400 bg-clip-text text-transparent">
+                      The Most Difficult Project In Human History
+                    </span>
+                  </div>
+                </div>
               </div>
               
               {/* Enhanced decorative elements */}
@@ -174,16 +183,19 @@ export default function Home() {
         <div className="container mx-auto px-6">
           <div className="grid grid-cols-1 md:grid-cols-2 gap-8 max-w-4xl mx-auto">
             {/* Road to AI Researcher Project */}
-            <div className="group relative bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-xl p-6 opacity-75 cursor-not-allowed">
+            <Link 
+              href="/learn"
+              className="group relative bg-gradient-to-br from-slate-800/50 to-slate-700/50 backdrop-blur-sm border border-slate-600/50 rounded-xl p-6 hover:border-purple-500/50 hover:shadow-2xl hover:shadow-purple-500/10 transition-all duration-300"
+            >
               <div className="absolute top-4 left-4">
                 <span className="bg-slate-600/50 text-slate-300 text-xs px-2 py-1 rounded-md">Learning Path</span>
               </div>
               <div className="absolute top-4 right-4">
-                <span className="bg-orange-500/20 text-orange-400 text-xs px-2 py-1 rounded-md">Coming Soon</span>
+                <span className="bg-purple-500/20 text-purple-400 text-xs px-2 py-1 rounded-md">New</span>
               </div>
               
               <div className="mt-8">
-                <h4 className="text-xl font-bold mb-3 text-slate-300">
+                <h4 className="text-xl font-bold mb-3 group-hover:text-purple-400 transition-colors">
                   Zero To AI Researcher - Full Course
                 </h4>
                 <p className="text-gray-400 text-sm mb-4 leading-relaxed">
@@ -191,12 +203,12 @@ export default function Home() {
                 </p>
                 <div className="flex items-center justify-between">
                   <span className="text-xs text-gray-500">Open Superintelligence Lab</span>
-                  <span className="text-orange-400 text-sm">
-                    Coming Soon →
+                  <span className="text-purple-400 text-sm group-hover:text-purple-300 transition-colors">
+                    Start Learning →
                   </span>
                 </div>
               </div>
-            </div>
+            </Link>
 
             {/* DeepSeek Sparse Attention Project */}
             <Link 
diff --git a/components/course-navigation.tsx b/components/course-navigation.tsx
new file mode 100644
index 0000000..7972ff1
--- /dev/null
+++ b/components/course-navigation.tsx
@@ -0,0 +1,179 @@
+'use client';
+
+import Link from "next/link";
+import { usePathname } from "next/navigation";
+import { useLanguage } from "@/components/providers/language-provider";
+import { useState, useEffect, useRef } from "react";
+import { getCourseModules } from "@/lib/course-structure";
+
+interface LessonItem {
+  title: string;
+  titleZh: string;
+  href: string;
+}
+
+interface ModuleData {
+  title: string;
+  titleZh: string;
+  icon: React.ReactNode;
+  lessons: LessonItem[];
+}
+
+export function CourseNavigation() {
+  const { language } = useLanguage();
+  const pathname = usePathname();
+  const [isOpen, setIsOpen] = useState(false);
+  const activeLinkRef = useRef<HTMLAnchorElement>(null);
+
+  const modules = getCourseModules();
+
+  // Auto-scroll to active lesson on mount and pathname change
+  useEffect(() => {
+    // Only scroll if we're on a lesson page (pathname starts with /learn/)
+    if (!pathname?.startsWith('/learn/')) {
+      return;
+    }
+
+    // Use a small delay to ensure the DOM is fully rendered
+    const timer = setTimeout(() => {
+      if (activeLinkRef.current) {
+        try {
+          activeLinkRef.current.scrollIntoView({
+            behavior: 'smooth',
+            block: 'center',
+            inline: 'nearest'
+          });
+          console.log('Scrolled to active lesson:', pathname);
+        } catch (error) {
+          console.error('Error scrolling to active lesson:', error);
+        }
+      } else {
+        console.log('Active link ref not found for:', pathname);
+      }
+    }, 100);
+
+    return () => clearTimeout(timer);
+  }, [pathname]);
+
+  const NavigationContent = () => (
+    <>
+      <div className="mb-6">
+        <h3 className="text-lg font-bold text-white mb-2">
+          {language === 'en' ? 'Course Contents' : '课程目录'}
+        </h3>
+        <p className="text-xs text-slate-400">
+          {language === 'en' ? 'Navigate through the lessons' : '浏览课程内容'}
+        </p>
+      </div>
+
+      <nav className="space-y-6">
+        {modules.map((module, moduleIndex) => (
+          <div key={moduleIndex}>
+            <div className="flex items-center gap-2 mb-3">
+              <div className="text-blue-400">
+                {module.icon}
+              </div>
+              <h4 className="text-sm font-semibold text-slate-300">
+                {language === 'en' ? module.title : module.titleZh}
+              </h4>
+            </div>
+            <ul className="space-y-1 ml-7">
+              {module.lessons.map((lesson, lessonIndex) => {
+                const isActive = pathname === lesson.href;
+                const lessonNumber = lessonIndex + 1;
+                
+                // Debug log
+                if (isActive) {
+                  console.log('Active lesson found:', lesson.title, 'at', lesson.href);
+                }
+                
+                return (
+                  <li key={lessonIndex}>
+                    <Link
+                      ref={isActive ? activeLinkRef : null}
+                      href={lesson.href}
+                      onClick={() => setIsOpen(false)}
+                      className={`
+                        block px-3 py-2 rounded-lg text-sm transition-all duration-200
+                        ${isActive 
+                          ? 'bg-blue-600/20 text-blue-400 font-medium border-l-2 border-blue-400' 
+                          : 'text-slate-400 hover:text-slate-200 hover:bg-white/5'
+                        }
+                      `}
+                    >
+                      <span className="font-semibold mr-2">{lessonNumber}.</span>
+                      {language === 'en' ? lesson.title : lesson.titleZh}
+                    </Link>
+                  </li>
+                );
+              })}
+            </ul>
+          </div>
+        ))}
+      </nav>
+
+      <div className="mt-8 pt-6 border-t border-slate-700">
+        <Link
+          href="/learn"
+          className="flex items-center gap-2 text-sm text-slate-400 hover:text-blue-400 transition-colors"
+        >
+          <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M3 12l2-2m0 0l7-7 7 7M5 10v10a1 1 0 001 1h3m10-11l2 2m-2-2v10a1 1 0 01-1 1h-3m-6 0a1 1 0 001-1v-4a1 1 0 011-1h2a1 1 0 011 1v4a1 1 0 001 1m-6 0h6" />
+          </svg>
+          {language === 'en' ? 'Course Home' : '课程首页'}
+        </Link>
+      </div>
+    </>
+  );
+
+  return (
+    <>
+      {/* Mobile Toggle Button */}
+      <button
+        onClick={() => setIsOpen(!isOpen)}
+        className="lg:hidden fixed top-20 left-4 z-50 bg-slate-800 border border-slate-700 p-3 rounded-lg shadow-lg hover:bg-slate-700 transition-colors"
+        aria-label="Toggle course navigation"
+      >
+        <svg className="w-6 h-6 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+          <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 6h16M4 12h16M4 18h16" />
+        </svg>
+      </button>
+
+      {/* Mobile Overlay */}
+      {isOpen && (
+        <div
+          className="lg:hidden fixed inset-0 bg-black/50 z-40"
+          onClick={() => setIsOpen(false)}
+        />
+      )}
+
+      {/* Mobile Sidebar */}
+      <aside
+        className={`
+          lg:hidden fixed top-0 left-0 bottom-0 w-80 bg-slate-900 border-r border-slate-700 z-50 transform transition-transform duration-300 ease-in-out overflow-y-auto
+          ${isOpen ? 'translate-x-0' : '-translate-x-full'}
+        `}
+      >
+        <div className="p-6">
+          <button
+            onClick={() => setIsOpen(false)}
+            className="absolute top-4 right-4 text-slate-400 hover:text-white"
+          >
+            <svg className="w-6 h-6" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+              <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
+            </svg>
+          </button>
+          <NavigationContent />
+        </div>
+      </aside>
+
+      {/* Desktop Sidebar */}
+      <aside className="hidden lg:block fixed left-0 top-0 bottom-0 w-80 bg-slate-900/50 backdrop-blur-sm border-r border-slate-700 overflow-y-auto pt-20">
+        <div className="p-6">
+          <NavigationContent />
+        </div>
+      </aside>
+    </>
+  );
+}
+
diff --git a/components/lesson-page.tsx b/components/lesson-page.tsx
new file mode 100644
index 0000000..5e195d9
--- /dev/null
+++ b/components/lesson-page.tsx
@@ -0,0 +1,237 @@
+'use client';
+
+import Link from "next/link";
+import { usePathname } from "next/navigation";
+import { useLanguage } from "@/components/providers/language-provider";
+import { MarkdownRenderer } from "@/components/markdown-renderer";
+import { CourseNavigation } from "@/components/course-navigation";
+import { useEffect, useState } from "react";
+import { getAdjacentLessons } from "@/lib/course-structure";
+
+interface HeroData {
+  title: string;
+  subtitle: string;
+  tags: string[];
+}
+
+interface LessonPageProps {
+  contentPath: string;
+  prevLink?: { href: string; label: string };
+  nextLink?: { href: string; label: string };
+}
+
+export function LessonPage({ contentPath, prevLink, nextLink }: LessonPageProps) {
+  const { language } = useLanguage();
+  const pathname = usePathname();
+  const [markdownContent, setMarkdownContent] = useState<string>('');
+  const [heroData, setHeroData] = useState<HeroData | null>(null);
+  const [isLoading, setIsLoading] = useState(true);
+
+  // Auto-determine next/prev links from course structure if not provided
+  const adjacentLessons = getAdjacentLessons(pathname);
+  const effectivePrevLink = prevLink || (adjacentLessons.prev ? {
+    href: adjacentLessons.prev.href,
+    label: `← ${language === 'en' ? 'Previous' : '上一课'}: ${language === 'en' ? adjacentLessons.prev.title : adjacentLessons.prev.titleZh}`
+  } : undefined);
+  
+  const effectiveNextLink = nextLink || (adjacentLessons.next ? {
+    href: adjacentLessons.next.href,
+    label: `${language === 'en' ? 'Next' : '下一课'}: ${language === 'en' ? adjacentLessons.next.title : adjacentLessons.next.titleZh} →`
+  } : undefined);
+
+  useEffect(() => {
+    const fetchMarkdownContent = async () => {
+      try {
+        const response = await fetch(`/content/learn/${contentPath}/${contentPath.split('/').pop()}-content.md`);
+        const content = await response.text();
+        
+        // Parse frontmatter
+        const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/);
+        if (frontmatterMatch) {
+          const frontmatterContent = frontmatterMatch[1];
+          const markdownBody = frontmatterMatch[2];
+          
+          // Default hero data
+          const heroData: HeroData = {
+            title: "",
+            subtitle: "",
+            tags: []
+          };
+          
+          // Extract values from frontmatter
+          const lines = frontmatterContent.split('\n');
+          let currentKey = '';
+          let currentArray: string[] = [];
+          
+          for (const line of lines) {
+            const trimmedLine = line.trim();
+            if (trimmedLine.startsWith('hero:')) continue;
+            
+            if (trimmedLine.includes(':')) {
+              const [key, ...valueParts] = trimmedLine.split(':');
+              const value = valueParts.join(':').trim().replace(/^["']|["']$/g, '');
+              
+              switch (key.trim()) {
+                case 'title':
+                  heroData.title = value;
+                  break;
+                case 'subtitle':
+                  heroData.subtitle = value;
+                  break;
+                case 'tags':
+                  currentKey = 'tags';
+                  currentArray = [];
+                  break;
+              }
+            } else if (trimmedLine.startsWith('- ')) {
+              if (currentKey === 'tags') {
+                const tagValue = trimmedLine.substring(2).replace(/^["']|["']$/g, '');
+                currentArray.push(tagValue);
+              }
+            } else if (trimmedLine === '' && currentArray.length > 0) {
+              if (currentKey === 'tags') {
+                heroData.tags = currentArray;
+                currentArray = [];
+                currentKey = '';
+              }
+            }
+          }
+          
+          // Handle final array
+          if (currentArray.length > 0 && currentKey === 'tags') {
+            heroData.tags = currentArray;
+          }
+          
+          setHeroData(heroData);
+          setMarkdownContent(markdownBody);
+        } else {
+          setMarkdownContent(content);
+        }
+      } catch (error) {
+        console.error('Failed to fetch markdown content:', error);
+        setMarkdownContent('# Error loading content\n\nFailed to load the lesson content.');
+      } finally {
+        setIsLoading(false);
+      }
+    };
+
+    fetchMarkdownContent();
+  }, [contentPath]);
+
+  if (isLoading) {
+    return (
+      <div className="min-h-screen bg-gradient-to-br from-slate-950 via-slate-900 to-slate-950 flex items-center justify-center">
+        <div className="text-center">
+          <div className="animate-spin rounded-full h-12 w-12 border-b-2 border-blue-400 mx-auto mb-4"></div>
+          <p className="text-slate-400">Loading lesson...</p>
+        </div>
+      </div>
+    );
+  }
+
+  return (
+    <>
+      {/* Course Navigation Sidebar */}
+      <CourseNavigation />
+
+      {/* Main Content with Sidebar Offset */}
+      <div className="lg:ml-80">
+        {/* Hero Section */}
+        <section className="relative overflow-hidden bg-gradient-to-br from-slate-900 via-slate-800 to-slate-900">
+          <div className="absolute inset-0 bg-gradient-to-r from-blue-600/10 via-purple-600/10 to-blue-600/10"></div>
+          
+          <div className="relative container mx-auto px-6 pt-24 pb-12">
+            <div className="max-w-4xl mx-auto">
+            {/* Back to Course */}
+            <Link 
+              href="/learn"
+              className="inline-flex items-center gap-2 text-blue-400 hover:text-blue-300 mb-8 transition-colors"
+            >
+              <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
+              </svg>
+              {language === 'en' ? 'Back to Course' : '返回课程'}
+            </Link>
+
+            <div className="relative">
+              <h1 className="text-4xl md:text-5xl font-bold mb-4 leading-tight">
+                <span className="bg-gradient-to-r from-blue-400 via-purple-400 to-cyan-400 bg-clip-text text-transparent">
+                  {heroData?.title || 'Lesson'}
+                </span>
+              </h1>
+              <p className="text-xl text-slate-400 mb-6">
+                {heroData?.subtitle || ''}
+              </p>
+              
+              {/* Tags */}
+              {heroData?.tags && heroData.tags.length > 0 && (
+                <div className="flex items-center gap-3 text-sm text-slate-400">
+                  {heroData.tags.map((tag, index) => (
+                    <span key={index} className="flex items-center gap-2">
+                      {index > 0 && <span className="text-slate-600">•</span>}
+                      <span>{tag}</span>
+                    </span>
+                  ))}
+                </div>
+              )}
+            </div>
+          </div>
+        </div>
+      </section>
+
+      {/* Main Content */}
+      <main className="bg-gradient-to-br from-slate-950 via-slate-900 to-slate-950 min-h-screen">
+        <div className="container mx-auto px-4 sm:px-6 lg:px-8 pt-8 pb-16">
+          <article className="max-w-4xl mx-auto">
+            <div className="bg-white/5 backdrop-blur-xl border border-white/10 rounded-3xl shadow-2xl p-8 sm:p-12">
+              <div className="prose prose-lg prose-invert max-w-none">
+                <MarkdownRenderer content={markdownContent} />
+              </div>
+            </div>
+
+            {/* Navigation */}
+            <div className="mt-12 flex flex-col sm:flex-row items-center justify-between gap-4">
+              {effectivePrevLink ? (
+                <Link 
+                  href={effectivePrevLink.href}
+                  className="group flex items-center gap-2 px-6 py-3 bg-white/5 hover:bg-white/10 border border-white/10 hover:border-blue-500/50 text-slate-300 hover:text-blue-400 font-medium rounded-xl transition-all duration-300"
+                >
+                  <svg className="w-5 h-5 group-hover:-translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
+                  </svg>
+                  {effectivePrevLink.label}
+                </Link>
+              ) : (
+                <div></div>
+              )}
+              
+              {effectiveNextLink ? (
+                <Link 
+                  href={effectiveNextLink.href}
+                  className="group flex items-center gap-2 px-6 py-3 bg-blue-600 hover:bg-blue-700 text-white font-medium rounded-xl transition-all duration-300"
+                >
+                  {effectiveNextLink.label}
+                  <svg className="w-5 h-5 group-hover:translate-x-1 transition-transform" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                  </svg>
+                </Link>
+              ) : (
+                <Link 
+                  href="/learn"
+                  className="group flex items-center gap-2 px-6 py-3 bg-emerald-600 hover:bg-emerald-700 text-white font-medium rounded-xl transition-all duration-300"
+                >
+                  {language === 'en' ? 'Course Complete! 🎉' : '课程完成！🎉'}
+                  <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z" />
+                  </svg>
+                </Link>
+              )}
+            </div>
+          </article>
+        </div>
+      </main>
+      </div>
+    </>
+  );
+}
+
diff --git a/components/navigation.tsx b/components/navigation.tsx
index 9e3312c..3962191 100644
--- a/components/navigation.tsx
+++ b/components/navigation.tsx
@@ -30,6 +30,12 @@ export function Navigation({ }: NavigationProps) {
             </div>
           </Link>
           <div className="flex gap-2 items-center">
+            <Link 
+              href="/learn" 
+              className="px-3 py-2 text-sm hover:text-purple-400 transition-colors"
+            >
+              {language === 'en' ? 'Learn' : '学习'}
+            </Link>
             <a 
               href="https://discord.com/invite/6AbXGpKTwN" 
               className="px-3 py-2 text-sm hover:text-blue-400 transition-colors" 
diff --git a/generate_all_missing_images.py b/generate_all_missing_images.py
new file mode 100644
index 0000000..27d20e0
--- /dev/null
+++ b/generate_all_missing_images.py
@@ -0,0 +1,448 @@
+import matplotlib.pyplot as plt
+import matplotlib.patches as patches
+import numpy as np
+
+# Set style
+plt.rcParams['font.family'] = 'sans-serif'
+plt.rcParams['font.sans-serif'] = ['Arial', 'Helvetica']
+
+BASE_PATH = '/Users/vukrosic/AI Science Projects/open-superintelligence-lab-github-io/public/content/learn/'
+
+# ============================================================================
+# NEURON FROM SCRATCH IMAGES
+# ============================================================================
+
+def create_linear_step_visual():
+    """Linear step: weighted sum visualization"""
+    fig, ax = plt.subplots(figsize=(14, 8))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 8)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 7.5, 'Linear Step: Weighted Sum', 
+            fontsize=38, fontweight='bold', color='white', ha='center')
+    
+    # Inputs × Weights = Products
+    y_pos = 5.5
+    inputs = [2, 3, 1]
+    weights = [0.5, -0.3, 0.8]
+    products = [1.0, -0.9, 0.8]
+    
+    for i, (inp, w, prod) in enumerate(zip(inputs, weights, products)):
+        y = y_pos - i*1.2
+        
+        # Input
+        box1 = patches.FancyBboxPatch((1, y), 0.8, 0.8, 
+                                       boxstyle="round,pad=0.05", 
+                                       edgecolor='white', facecolor='#3B82F6', linewidth=2)
+        ax.add_patch(box1)
+        ax.text(1.4, y+0.4, str(inp), fontsize=28, fontweight='bold', color='white', ha='center', va='center')
+        
+        # ×
+        ax.text(2.3, y+0.4, '×', fontsize=32, color='white', ha='center')
+        
+        # Weight
+        box2 = patches.FancyBboxPatch((2.8, y), 0.9, 0.8, 
+                                       boxstyle="round,pad=0.05", 
+                                       edgecolor='white', facecolor='#F59E0B', linewidth=2)
+        ax.add_patch(box2)
+        ax.text(3.25, y+0.4, str(w), fontsize=26, fontweight='bold', color='white', ha='center', va='center')
+        
+        # =
+        ax.text(4.2, y+0.4, '=', fontsize=32, color='white', ha='center')
+        
+        # Product
+        box3 = patches.FancyBboxPatch((4.7, y), 0.9, 0.8, 
+                                       boxstyle="round,pad=0.05", 
+                                       edgecolor='white', facecolor='#8B5CF6', linewidth=2)
+        ax.add_patch(box3)
+        ax.text(5.15, y+0.4, str(prod), fontsize=26, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Sum
+    ax.text(6.5, 4, '+', fontsize=40, color='white', ha='center', fontweight='bold')
+    
+    sum_box = patches.FancyBboxPatch((7.5, 3.5), 2, 1, 
+                                      boxstyle="round,pad=0.1", 
+                                      edgecolor='white', facecolor='#10B981', linewidth=3)
+    ax.add_patch(sum_box)
+    ax.text(8.5, 4, '1.9', fontsize=40, fontweight='bold', color='white', ha='center', va='center')
+    ax.text(8.5, 3, '+ bias', fontsize=22, color='#94A3B8', ha='center')
+    
+    # Formula
+    ax.text(7, 1.5, 'z = (2×0.5) + (3×-0.3) + (1×0.8) + bias', 
+            fontsize=24, color='#94A3B8', ha='center', fontweight='bold')
+    ax.text(7, 0.8, 'z = 1.0 - 0.9 + 0.8 + 0 = 1.9', 
+            fontsize=24, color='#10B981', ha='center', fontweight='bold')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'neuron-from-scratch/the-linear-step/linear-step-visual.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_activation_comparison():
+    """Compare different activations"""
+    fig, ax = plt.subplots(figsize=(14, 8))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 8)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 7.5, 'Common Activation Functions', 
+            fontsize=38, fontweight='bold', color='white', ha='center')
+    
+    # Same input for all
+    ax.text(7, 6.7, 'Input: [-2, -1, 0, 1, 2]', fontsize=26, color='white', ha='center')
+    
+    activations = [
+        ('ReLU', [0, 0, 0, 1, 2], '#10B981'),
+        ('Sigmoid', [0.12, 0.27, 0.50, 0.73, 0.88], '#F59E0B'),
+        ('Tanh', [-0.96, -0.76, 0.00, 0.76, 0.96], '#8B5CF6'),
+    ]
+    
+    y_start = 5.5
+    for idx, (name, outputs, color) in enumerate(activations):
+        y = y_start - idx*1.8
+        
+        # Name
+        ax.text(1.5, y+0.5, name, fontsize=26, color=color, ha='center', fontweight='bold')
+        
+        # Outputs
+        x_start = 3
+        for i, val in enumerate(outputs):
+            box = patches.FancyBboxPatch((x_start + i*1.3, y), 1, 0.7, 
+                                          boxstyle="round,pad=0.05", 
+                                          edgecolor='white', facecolor=color, linewidth=2)
+            ax.add_patch(box)
+            if isinstance(val, float):
+                ax.text(x_start + i*1.3 + 0.5, y+0.35, f'{val:.2f}', 
+                        fontsize=22, fontweight='bold', color='white', ha='center', va='center')
+            else:
+                ax.text(x_start + i*1.3 + 0.5, y+0.35, str(val), 
+                        fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Note
+    ax.text(7, 0.8, 'Different activations, different behaviors!', 
+            fontsize=24, color='#94A3B8', ha='center', style='italic')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'neuron-from-scratch/the-activation-function/activation-comparison.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_loss_visual():
+    """Loss function visualization"""
+    fig, ax = plt.subplots(figsize=(14, 8))
+    
+    # Create data
+    predictions = np.linspace(0, 2, 100)
+    target = 1.0
+    loss = (predictions - target) ** 2
+    
+    # Plot
+    ax.plot(predictions, loss, 'c-', linewidth=4, label='Loss = (pred - target)²')
+    ax.axvline(x=target, color='green', linestyle='--', linewidth=3, label='Target = 1.0')
+    ax.plot(target, 0, 'go', markersize=15, label='Minimum loss')
+    
+    # Mark examples
+    ax.plot(0.5, (0.5-1)**2, 'ro', markersize=12)
+    ax.text(0.5, (0.5-1)**2 + 0.1, 'Bad prediction', fontsize=20, color='red', ha='center')
+    
+    ax.plot(0.95, (0.95-1)**2, 'yo', markersize=12)
+    ax.text(0.95, (0.95-1)**2 + 0.1, 'Good prediction', fontsize=20, color='yellow', ha='center')
+    
+    # Labels
+    ax.set_xlabel('Prediction', fontsize=28, color='white', fontweight='bold')
+    ax.set_ylabel('Loss', fontsize=28, color='white', fontweight='bold')
+    ax.set_title('Loss Function: Measures Error', fontsize=36, color='white', fontweight='bold', pad=20)
+    
+    ax.grid(True, alpha=0.2, color='white')
+    ax.tick_params(colors='white', labelsize=20)
+    ax.legend(fontsize=20, loc='upper right', facecolor='#334155', edgecolor='white', 
+              labelcolor='white', framealpha=0.9)
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    for spine in ax.spines.values():
+        spine.set_color('white')
+    
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'neuron-from-scratch/the-concept-of-loss/loss-function.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_learning_process():
+    """Learning process: weights adjusting"""
+    fig, ax = plt.subplots(figsize=(14, 8))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 8)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 7.5, 'Learning: Adjusting Weights to Reduce Loss', 
+            fontsize=34, fontweight='bold', color='white', ha='center')
+    
+    # Timeline
+    epochs = ['Start', 'Epoch 10', 'Epoch 50', 'Epoch 100']
+    weights = [0.1, 0.4, 0.8, 1.0]
+    losses = [2.5, 1.2, 0.3, 0.05]
+    colors = ['#EF4444', '#F59E0B', '#10B981', '#10B981']
+    
+    y_pos = 5.5
+    for i, (epoch, w, loss, color) in enumerate(zip(epochs, weights, losses, colors)):
+        x = 1.5 + i*3
+        
+        # Epoch label
+        ax.text(x+0.6, 6.5, epoch, fontsize=22, color='white', ha='center', fontweight='bold')
+        
+        # Weight box
+        box1 = patches.FancyBboxPatch((x, y_pos), 1.2, 0.8, 
+                                       boxstyle="round,pad=0.05", 
+                                       edgecolor='white', facecolor='#3B82F6', linewidth=2)
+        ax.add_patch(box1)
+        ax.text(x+0.6, y_pos+0.4, f'w={w:.1f}', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+        
+        # Loss box
+        box2 = patches.FancyBboxPatch((x, y_pos-1.2), 1.2, 0.8, 
+                                       boxstyle="round,pad=0.05", 
+                                       edgecolor='white', facecolor=color, linewidth=2)
+        ax.add_patch(box2)
+        ax.text(x+0.6, y_pos-0.8, f'L={loss:.2f}', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+        
+        # Arrow
+        if i < len(epochs) - 1:
+            ax.annotate('', xy=(x+1.8, 5), xytext=(x+1.5, 5),
+                        arrowprops=dict(arrowstyle='->', lw=3, color='white'))
+    
+    # Bottom explanation
+    ax.text(7, 2.5, 'Weight gets closer to optimal value (1.0)', 
+            fontsize=26, color='#3B82F6', ha='center', fontweight='bold')
+    ax.text(7, 1.8, 'Loss decreases from 2.5 → 0.05', 
+            fontsize=26, color='#10B981', ha='center', fontweight='bold')
+    ax.text(7, 1, 'Learning = Automatic weight adjustment!', 
+            fontsize=24, color='#94A3B8', ha='center', style='italic')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'neuron-from-scratch/the-concept-of-learning/learning-process.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_prediction_flow():
+    """Making a prediction flow diagram"""
+    fig, ax = plt.subplots(figsize=(14, 7))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 7)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 6.5, 'Forward Pass: Making a Prediction', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    steps = ['Input', 'Linear\n(w·x+b)', 'Activation\n(ReLU)', 'Output']
+    values = ['[1, 2]', '0.9', '0.9', '0.9']
+    colors = ['#3B82F6', '#F59E0B', '#10B981', '#8B5CF6']
+    
+    for i, (step, val, color) in enumerate(zip(steps, values, colors)):
+        x = 1 + i*3.5
+        
+        # Box
+        box = patches.FancyBboxPatch((x, 3.5), 2, 1.5, 
+                                      boxstyle="round,pad=0.1", 
+                                      edgecolor='white', facecolor=color, linewidth=3)
+        ax.add_patch(box)
+        
+        # Step name
+        ax.text(x+1, 5.3, step, fontsize=22, fontweight='bold', color='white', ha='center', va='top')
+        
+        # Value
+        ax.text(x+1, 4, val, fontsize=28, fontweight='bold', color='white', ha='center', va='center')
+        
+        # Arrow
+        if i < len(steps) - 1:
+            ax.annotate('', xy=(x+2.5, 4.25), xytext=(x+2.2, 4.25),
+                        arrowprops=dict(arrowstyle='->', lw=4, color='white'))
+    
+    # Bottom note
+    ax.text(7, 2, 'Data flows forward through the network', 
+            fontsize=26, color='#94A3B8', ha='center', style='italic')
+    ax.text(7, 1.3, 'Input → Transform → Activate → Prediction', 
+            fontsize=24, color='#94A3B8', ha='center')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'neuron-from-scratch/making-a-prediction/prediction-flow.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_neuron_code_visual():
+    """Building a neuron code visualization"""
+    fig, ax = plt.subplots(figsize=(14, 8))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 8)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 7.5, 'Neuron Components in Code', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    components = [
+        ('nn.Linear()', 'Weights & Bias', '#3B82F6'),
+        ('nn.ReLU()', 'Activation', '#F59E0B'),
+        ('forward()', 'Computation', '#10B981'),
+        ('backward()', 'Learning', '#8B5CF6'),
+    ]
+    
+    y_start = 6
+    for i, (code, desc, color) in enumerate(components):
+        y = y_start - i*1.4
+        
+        # Code box
+        box = patches.FancyBboxPatch((2, y), 4, 0.9, 
+                                      boxstyle="round,pad=0.1", 
+                                      edgecolor='white', facecolor=color, linewidth=2)
+        ax.add_patch(box)
+        ax.text(4, y+0.45, code, fontsize=26, fontweight='bold', color='white', ha='center', va='center',
+                family='monospace')
+        
+        # Description
+        ax.text(7, y+0.45, '→', fontsize=32, color='white', ha='center')
+        ax.text(9.5, y+0.45, desc, fontsize=24, color='white', ha='left')
+    
+    # Bottom note
+    ax.text(7, 0.8, 'PyTorch handles all the complexity!', 
+            fontsize=26, color='#94A3B8', ha='center', fontweight='bold')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'neuron-from-scratch/building-a-neuron-in-python/neuron-code.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+# ============================================================================
+# NEURAL NETWORKS IMAGES
+# ============================================================================
+
+def create_network_layers():
+    """Network architecture layers visualization"""
+    fig, ax = plt.subplots(figsize=(14, 9))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 9)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 8.5, 'Neural Network Architecture', 
+            fontsize=38, fontweight='bold', color='white', ha='center')
+    
+    layers = [
+        ('Input\nLayer', 784, '#3B82F6', 1.5),
+        ('Hidden\nLayer 1', 128, '#10B981', 4.5),
+        ('Hidden\nLayer 2', 64, '#F59E0B', 7.5),
+        ('Output\nLayer', 10, '#8B5CF6', 10.5),
+    ]
+    
+    for name, size, color, x in layers:
+        # Draw neurons
+        num_display = min(size, 8)
+        y_start = 6 - (num_display * 0.4)
+        
+        for i in range(num_display):
+            y = y_start + i*0.8
+            circle = plt.Circle((x, y), 0.25, color=color, ec='white', linewidth=2)
+            ax.add_patch(circle)
+            
+            if i == num_display - 1 and size > num_display:
+                ax.text(x, y-0.6, '...', fontsize=24, color=color, ha='center')
+        
+        # Label
+        ax.text(x, 7.5, name, fontsize=22, color='white', ha='center', fontweight='bold')
+        ax.text(x, 2, f'{size}', fontsize=20, color='#94A3B8', ha='center')
+        
+        # Connections
+        if x < 10:
+            for i in range(min(3, num_display)):
+                for j in range(min(3, num_display)):
+                    y1 = y_start + i*0.8
+                    y2 = y_start + j*0.8
+                    ax.plot([x+0.25, x+3-0.25], [y1, y2], 'white', alpha=0.2, linewidth=1)
+    
+    # Bottom note
+    ax.text(7, 1, 'Each layer transforms data: 784 → 128 → 64 → 10', 
+            fontsize=24, color='#94A3B8', ha='center', fontweight='bold')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'neural-networks/architecture-of-a-network/network-layers.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_layer_structure():
+    """Single layer structure"""
+    fig, ax = plt.subplots(figsize=(14, 7))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 7)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 6.5, 'Layer = Multiple Neurons in Parallel', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    # Input
+    ax.text(2, 5.5, 'Input (3)', fontsize=24, color='white', ha='center')
+    for i in range(3):
+        circle = plt.Circle((2, 4.5-i*0.8), 0.3, color='#3B82F6', ec='white', linewidth=2)
+        ax.add_patch(circle)
+    
+    # Neurons in layer
+    ax.text(7, 5.5, 'Layer (4 neurons)', fontsize=24, color='white', ha='center')
+    for i in range(4):
+        circle = plt.Circle((7, 5-i), 0.35, color='#10B981', ec='white', linewidth=3)
+        ax.add_patch(circle)
+        
+        # Connections from all inputs
+        for j in range(3):
+            ax.plot([2.3, 6.65], [4.5-j*0.8, 5-i], 'white', alpha=0.3, linewidth=1.5)
+    
+    # Output
+    ax.text(12, 5.5, 'Output (4)', fontsize=24, color='white', ha='center')
+    for i in range(4):
+        circle = plt.Circle((12, 5-i), 0.3, color='#8B5CF6', ec='white', linewidth=2)
+        ax.add_patch(circle)
+        ax.plot([7.35, 11.7], [5-i, 5-i], 'white', alpha=0.4, linewidth=2)
+    
+    # Note
+    ax.text(7, 1.5, 'Each neuron receives ALL inputs', 
+            fontsize=26, color='#94A3B8', ha='center', fontweight='bold')
+    ax.text(7, 0.8, 'nn.Linear(3, 4) creates this layer', 
+            fontsize=24, color='#94A3B8', ha='center', style='italic')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'neural-networks/building-a-layer/layer-structure.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+# Create all neuron and network images
+print("Creating neuron-from-scratch images...")
+create_linear_step_visual()
+create_activation_comparison()
+create_loss_visual()
+create_learning_process()
+create_prediction_flow()
+create_neuron_code_visual()
+
+print("Creating neural-networks images...")
+create_network_layers()
+create_layer_structure()
+
+print("Part 1 complete! Run generate_all_missing_images_part2.py for attention/transformer images...")
+
diff --git a/generate_all_missing_images_part2.py b/generate_all_missing_images_part2.py
new file mode 100644
index 0000000..d7d5845
--- /dev/null
+++ b/generate_all_missing_images_part2.py
@@ -0,0 +1,543 @@
+import matplotlib.pyplot as plt
+import matplotlib.patches as patches
+import numpy as np
+
+# Set style
+plt.rcParams['font.family'] = 'sans-serif'
+plt.rcParams['font.sans-serif'] = ['Arial', 'Helvetica']
+
+BASE_PATH = '/Users/vukrosic/AI Science Projects/open-superintelligence-lab-github-io/public/content/learn/'
+
+# ============================================================================
+# ATTENTION MECHANISM IMAGES
+# ============================================================================
+
+def create_attention_concept():
+    """What is attention: concept visualization"""
+    fig, ax = plt.subplots(figsize=(14, 9))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 9)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 8.5, 'Attention: Focus on Relevant Parts', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    # Sentence
+    sentence = "The cat sat on the mat"
+    words = sentence.split()
+    
+    # Word boxes with attention highlights
+    ax.text(7, 7.5, 'Query: "What did the cat do?"', fontsize=26, color='#F59E0B', ha='center', fontweight='bold')
+    
+    # Attention weights
+    attention = [0.1, 0.6, 0.2, 0.05, 0.02, 0.03]  # "cat" and "sat" most important
+    
+    x_start = 2
+    y_pos = 5.5
+    
+    for i, (word, attn) in enumerate(zip(words, attention)):
+        alpha = 0.3 + attn * 0.7  # Scale alpha by attention
+        size = 1 + attn * 1.5
+        color_intensity = int(255 * attn)
+        
+        # Box with size based on attention
+        box = patches.FancyBboxPatch((x_start + i*1.8, y_pos), 1.5, 0.8+attn, 
+                                      boxstyle="round,pad=0.05", 
+                                      edgecolor='white', facecolor='#10B981' if attn > 0.3 else '#3B82F6', 
+                                      linewidth=2+attn*4)
+        ax.add_patch(box)
+        ax.text(x_start + i*1.8 + 0.75, y_pos + 0.4 + attn/2, word, 
+                fontsize=18+attn*20, fontweight='bold', color='white', ha='center', va='center')
+        
+        # Attention weight below
+        ax.text(x_start + i*1.8 + 0.75, y_pos - 0.4, f'{attn:.0%}', 
+                fontsize=18, color='#94A3B8', ha='center')
+    
+    # Explanation
+    ax.text(7, 3.5, '"cat" (60%) and "sat" (20%) are most relevant', 
+            fontsize=28, color='#10B981', ha='center', fontweight='bold')
+    ax.text(7, 2.8, 'Other words get less attention', 
+            fontsize=24, color='#94A3B8', ha='center')
+    
+    ax.text(7, 1.5, 'Attention weights sum to 100%', 
+            fontsize=24, color='#94A3B8', ha='center', style='italic')
+    ax.text(7, 0.8, 'Model learns which words to focus on!', 
+            fontsize=22, color='#94A3B8', ha='center')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'attention-mechanism/what-is-attention/attention-concept.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_qkv_visual():
+    """Query, Key, Value visualization"""
+    fig, ax = plt.subplots(figsize=(14, 9))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 9)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 8.5, 'Query, Key, Value Mechanism', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    # Input
+    input_box = patches.FancyBboxPatch((6, 7.5), 2, 0.6, 
+                                        boxstyle="round,pad=0.05", 
+                                        edgecolor='white', facecolor='#94A3B8', linewidth=2)
+    ax.add_patch(input_box)
+    ax.text(7, 7.8, 'Input', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Split to Q, K, V
+    components = [
+        ('Query', 'What am I\nlooking for?', '#10B981', 2),
+        ('Key', 'What do I\ncontain?', '#F59E0B', 7),
+        ('Value', 'What info\ndo I have?', '#8B5CF6', 12),
+    ]
+    
+    for name, desc, color, x in components:
+        # Arrow from input
+        ax.annotate('', xy=(x+0.5, 6.2), xytext=(7, 7.3),
+                    arrowprops=dict(arrowstyle='->', lw=3, color=color))
+        
+        # Component box
+        box = patches.FancyBboxPatch((x, 4.8), 2, 1.2, 
+                                      boxstyle="round,pad=0.1", 
+                                      edgecolor='white', facecolor=color, linewidth=3)
+        ax.add_patch(box)
+        ax.text(x+1, 5.8, name, fontsize=26, fontweight='bold', color='white', ha='center', va='center')
+        ax.text(x+1, 5.2, desc, fontsize=18, color='white', ha='center', va='center')
+    
+    # Attention computation
+    ax.text(7, 3.5, '1. Q × K → Scores', fontsize=24, color='#94A3B8', ha='center')
+    ax.text(7, 3, '2. Softmax → Weights', fontsize=24, color='#94A3B8', ha='center')
+    ax.text(7, 2.5, '3. Weights × V → Output', fontsize=24, color='#94A3B8', ha='center')
+    
+    # Output
+    output_box = patches.FancyBboxPatch((5.5, 1), 3, 0.8, 
+                                         boxstyle="round,pad=0.1", 
+                                         edgecolor='white', facecolor='#3B82F6', linewidth=3)
+    ax.add_patch(output_box)
+    ax.text(7, 1.4, 'Attention Output', fontsize=26, fontweight='bold', color='white', ha='center', va='center')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'attention-mechanism/what-is-attention/qkv-mechanism.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_attention_scores_matrix():
+    """Attention scores matrix visualization"""
+    fig, ax = plt.subplots(figsize=(12, 10))
+    ax.set_xlim(0, 12)
+    ax.set_ylim(0, 10)
+    ax.axis('off')
+    
+    # Title
+    ax.text(6, 9.5, 'Attention Score Matrix', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    # Create attention matrix visualization
+    size = 5
+    scores = np.random.rand(size, size)
+    scores = scores / scores.sum(axis=1, keepdims=True)  # Normalize rows
+    
+    box_size = 1
+    x_start = 2.5
+    y_start = 7.5
+    
+    # Row labels (Query positions)
+    for i in range(size):
+        ax.text(x_start - 0.7, y_start - i*1.1 + 0.5, f'Q{i}', 
+                fontsize=20, color='#10B981', ha='center', fontweight='bold')
+    
+    # Column labels (Key positions)
+    for j in range(size):
+        ax.text(x_start + j*1.1 + 0.5, y_start + 0.7, f'K{j}', 
+                fontsize=20, color='#F59E0B', ha='center', fontweight='bold')
+    
+    # Draw matrix
+    for i in range(size):
+        for j in range(size):
+            val = scores[i, j]
+            color_intensity = val
+            color = plt.cm.viridis(color_intensity)
+            
+            rect = patches.FancyBboxPatch((x_start + j*1.1, y_start - i*1.1), box_size, box_size, 
+                                           boxstyle="round,pad=0.05", 
+                                           edgecolor='white', facecolor=color, linewidth=1)
+            ax.add_patch(rect)
+            ax.text(x_start + j*1.1 + 0.5, y_start - i*1.1 + 0.5, f'{val:.2f}', 
+                    fontsize=16, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Note
+    ax.text(6, 1.5, 'Each row shows where one position attends', 
+            fontsize=24, color='#94A3B8', ha='center', fontweight='bold')
+    ax.text(6, 0.9, 'Darker = Higher attention', 
+            fontsize=22, color='#94A3B8', ha='center', style='italic')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'attention-mechanism/calculating-attention-scores/attention-matrix.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_multi_head_visualization():
+    """Multi-head attention visualization"""
+    fig, ax = plt.subplots(figsize=(14, 8))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 8)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 7.5, 'Multi-Head Attention: 8 Heads in Parallel', 
+            fontsize=34, fontweight='bold', color='white', ha='center')
+    
+    # Input
+    input_box = patches.FancyBboxPatch((6, 6.5), 2, 0.6, 
+                                        boxstyle="round,pad=0.05", 
+                                        edgecolor='white', facecolor='#3B82F6', linewidth=3)
+    ax.add_patch(input_box)
+    ax.text(7, 6.8, 'Input', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+    
+    # 8 heads
+    num_heads = 8
+    colors = plt.cm.tab10(np.linspace(0, 1, num_heads))
+    
+    y_start = 5
+    for i in range(num_heads):
+        x = 1.5 + i*1.5
+        
+        # Arrow from input
+        ax.plot([7, x+0.4], [6.4, y_start+0.6], 'white', alpha=0.3, linewidth=2)
+        
+        # Head box
+        box = patches.FancyBboxPatch((x, y_start - 0.3), 0.8, 0.6, 
+                                      boxstyle="round,pad=0.05", 
+                                      edgecolor='white', facecolor=colors[i], linewidth=2)
+        ax.add_patch(box)
+        ax.text(x+0.4, y_start, f'H{i+1}', fontsize=18, fontweight='bold', color='white', ha='center', va='center')
+        
+        # Arrow to concat
+        ax.plot([x+0.4, 7], [y_start-0.5, 3.2], 'white', alpha=0.3, linewidth=2)
+    
+    # Concatenate
+    concat_box = patches.FancyBboxPatch((5, 2.5), 4, 0.6, 
+                                         boxstyle="round,pad=0.05", 
+                                         edgecolor='white', facecolor='#F59E0B', linewidth=3)
+    ax.add_patch(concat_box)
+    ax.text(7, 2.8, 'Concatenate Heads', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Output projection
+    ax.annotate('', xy=(7, 1.5), xytext=(7, 2.3),
+                arrowprops=dict(arrowstyle='->', lw=4, color='white'))
+    
+    output_box = patches.FancyBboxPatch((6, 0.5), 2, 0.8, 
+                                         boxstyle="round,pad=0.1", 
+                                         edgecolor='white', facecolor='#8B5CF6', linewidth=3)
+    ax.add_patch(output_box)
+    ax.text(7, 0.9, 'Output', fontsize=26, fontweight='bold', color='white', ha='center', va='center')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'attention-mechanism/multi-head-attention/multi-head-visual.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_self_attention_visual():
+    """Self-attention concept"""
+    fig, ax = plt.subplots(figsize=(14, 8))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 8)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 7.5, 'Self-Attention: Sequence Attends to Itself', 
+            fontsize=34, fontweight='bold', color='white', ha='center')
+    
+    words = ['The', 'cat', 'sat']
+    positions = [3, 7, 11]
+    
+    for i, (word, x) in enumerate(zip(words, positions)):
+        # Word box
+        box = patches.FancyBboxPatch((x-0.8, 5.5), 1.6, 0.8, 
+                                      boxstyle="round,pad=0.05", 
+                                      edgecolor='white', facecolor='#3B82F6', linewidth=3)
+        ax.add_patch(box)
+        ax.text(x, 5.9, word, fontsize=28, fontweight='bold', color='white', ha='center', va='center')
+        
+        # Show attention connections
+        for j, (word2, x2) in enumerate(zip(words, positions)):
+            if i != j:
+                # Attention line
+                alpha = 0.5 if abs(i-j) == 1 else 0.2
+                ax.plot([x, x2], [5.3, 5.3], 'c-', alpha=alpha, linewidth=2+alpha*4)
+                ax.plot([x, x2], [5.3, 5.3], 'co', markersize=8, alpha=alpha)
+    
+    # Explanation
+    ax.text(7, 3.8, 'Each word attends to ALL words (including itself)', 
+            fontsize=26, color='#94A3B8', ha='center', fontweight='bold')
+    ax.text(7, 3.2, '"cat" learns from "The" and "sat" for context', 
+            fontsize=24, color='#10B981', ha='center')
+    ax.text(7, 2.6, 'Q, K, V all come from the same sequence!', 
+            fontsize=22, color='#94A3B8', ha='center', style='italic')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'attention-mechanism/self-attention-from-scratch/self-attention-concept.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+# ============================================================================
+# TRANSFORMER IMAGES
+# ============================================================================
+
+def create_transformer_architecture_diagram():
+    """Full transformer architecture"""
+    fig, ax = plt.subplots(figsize=(12, 14))
+    ax.set_xlim(0, 12)
+    ax.set_ylim(0, 14)
+    ax.axis('off')
+    
+    # Title
+    ax.text(6, 13.5, 'Transformer Architecture', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    # Input
+    box = patches.FancyBboxPatch((4, 12.5), 4, 0.7, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#3B82F6', linewidth=2)
+    ax.add_patch(box)
+    ax.text(6, 12.85, 'Input Tokens', fontsize=22, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Embeddings
+    y = 11.5
+    box = patches.FancyBboxPatch((4, y), 4, 0.7, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#6366F1', linewidth=2)
+    ax.add_patch(box)
+    ax.text(6, y+0.35, 'Embeddings + Positions', fontsize=20, fontweight='bold', color='white', ha='center', va='center')
+    ax.plot([6, 6], [y+0.8, y+1.4], 'white', linewidth=3)
+    
+    # Transformer blocks (N times)
+    for block_idx in range(3):
+        y_block = 10 - block_idx*3
+        
+        # Block container
+        block_box = patches.FancyBboxPatch((3, y_block-2.5), 6, 2.3, 
+                                            boxstyle="round,pad=0.1", 
+                                            edgecolor='cyan', facecolor='#1E293B', linewidth=2, linestyle='--')
+        ax.add_patch(block_box)
+        ax.text(9.2, y_block - 1.3, f'Block {block_idx+1}', fontsize=18, color='cyan', ha='left')
+        
+        # Multi-head attention
+        box1 = patches.FancyBboxPatch((4, y_block-0.7), 4, 0.6, 
+                                       boxstyle="round,pad=0.05", 
+                                       edgecolor='white', facecolor='#10B981', linewidth=2)
+        ax.add_patch(box1)
+        ax.text(6, y_block-0.4, 'Multi-Head Attention', fontsize=18, fontweight='bold', color='white', ha='center', va='center')
+        
+        # FFN
+        box2 = patches.FancyBboxPatch((4, y_block-1.9), 4, 0.6, 
+                                       boxstyle="round,pad=0.05", 
+                                       edgecolor='white', facecolor='#F59E0B', linewidth=2)
+        ax.add_patch(box2)
+        ax.text(6, y_block-1.6, 'Feed-Forward', fontsize=18, fontweight='bold', color='white', ha='center', va='center')
+        
+        # Arrows
+        ax.plot([6, 6], [y_block-0.1, y_block-1.3], 'white', linewidth=2)
+        
+        if block_idx < 2:
+            ax.plot([6, 6], [y_block-2.6, y_block-3.2], 'white', linewidth=2)
+    
+    # Output head
+    y_out = 1
+    box = patches.FancyBboxPatch((4, y_out), 4, 0.7, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#8B5CF6', linewidth=2)
+    ax.add_patch(box)
+    ax.text(6, y_out+0.35, 'Output Projection', fontsize=20, fontweight='bold', color='white', ha='center', va='center')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'building-a-transformer/transformer-architecture/transformer-diagram.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+def create_transformer_block_diagram():
+    """Transformer block internal structure"""
+    fig, ax = plt.subplots(figsize=(14, 10))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 10)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 9.5, 'Transformer Block Components', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    # Input
+    y = 8.5
+    box = patches.FancyBboxPatch((5.5, y), 3, 0.7, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#3B82F6', linewidth=2)
+    ax.add_patch(box)
+    ax.text(7, y+0.35, 'Input', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Attention sub-block
+    y = 7
+    ax.text(3, y+1, '1. Attention Sub-block', fontsize=22, color='#10B981', ha='left', fontweight='bold')
+    
+    box = patches.FancyBboxPatch((4, y), 6, 0.6, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#10B981', linewidth=2)
+    ax.add_patch(box)
+    ax.text(7, y+0.3, 'Multi-Head Attention', fontsize=20, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Add & Norm
+    y = 6
+    box = patches.FancyBboxPatch((4.5, y), 5, 0.5, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#6366F1', linewidth=2)
+    ax.add_patch(box)
+    ax.text(7, y+0.25, 'Add & Norm (Residual)', fontsize=18, color='white', ha='center', va='center')
+    
+    # FFN sub-block
+    y = 4.8
+    ax.text(3, y+1, '2. FFN Sub-block', fontsize=22, color='#F59E0B', ha='left', fontweight='bold')
+    
+    box = patches.FancyBboxPatch((4, y), 6, 0.6, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#F59E0B', linewidth=2)
+    ax.add_patch(box)
+    ax.text(7, y+0.3, 'Feed-Forward Network', fontsize=20, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Add & Norm
+    y = 3.8
+    box = patches.FancyBboxPatch((4.5, y), 5, 0.5, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#6366F1', linewidth=2)
+    ax.add_patch(box)
+    ax.text(7, y+0.25, 'Add & Norm (Residual)', fontsize=18, color='white', ha='center', va='center')
+    
+    # Output
+    y = 2.5
+    box = patches.FancyBboxPatch((5.5, y), 3, 0.7, 
+                                  boxstyle="round,pad=0.05", 
+                                  edgecolor='white', facecolor='#8B5CF6', linewidth=2)
+    ax.add_patch(box)
+    ax.text(7, y+0.35, 'Output', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Note
+    ax.text(7, 1.2, 'Attention → Add&Norm → FFN → Add&Norm', 
+            fontsize=24, color='#94A3B8', ha='center', fontweight='bold')
+    ax.text(7, 0.5, 'Residual connections help gradients flow!', 
+            fontsize=22, color='#94A3B8', ha='center', style='italic')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'building-a-transformer/building-a-transformer-block/block-diagram.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+# ============================================================================
+# MOE IMAGES
+# ============================================================================
+
+def create_moe_routing():
+    """MoE routing visualization"""
+    fig, ax = plt.subplots(figsize=(14, 10))
+    ax.set_xlim(0, 14)
+    ax.set_ylim(0, 10)
+    ax.axis('off')
+    
+    # Title
+    ax.text(7, 9.5, 'Mixture of Experts: Sparse Routing', 
+            fontsize=36, fontweight='bold', color='white', ha='center')
+    
+    # Input token
+    token_box = patches.FancyBboxPatch((6, 8.5), 2, 0.7, 
+                                        boxstyle="round,pad=0.05", 
+                                        edgecolor='white', facecolor='#3B82F6', linewidth=3)
+    ax.add_patch(token_box)
+    ax.text(7, 8.85, 'Token', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+    
+    # Router
+    ax.annotate('', xy=(7, 7.5), xytext=(7, 8.3),
+                arrowprops=dict(arrowstyle='->', lw=3, color='white'))
+    
+    router_box = patches.FancyBboxPatch((5.5, 6.8), 3, 0.6, 
+                                         boxstyle="round,pad=0.05", 
+                                         edgecolor='white', facecolor='#F59E0B', linewidth=2)
+    ax.add_patch(router_box)
+    ax.text(7, 7.1, 'Router', fontsize=22, fontweight='bold', color='white', ha='center', va='center')
+    
+    # 8 Experts
+    num_experts = 8
+    expert_colors = ['#10B981', '#EF4444', '#94A3B8', '#94A3B8', '#94A3B8', '#10B981', '#94A3B8', '#94A3B8']
+    active = [True, False, False, False, False, True, False, False]
+    
+    y_experts = 5
+    for i in range(num_experts):
+        x = 1.5 + i*1.5
+        
+        # Expert box
+        box = patches.FancyBboxPatch((x, y_experts), 0.9, 0.7, 
+                                      boxstyle="round,pad=0.05", 
+                                      edgecolor='white' if active[i] else '#4B5563', 
+                                      facecolor=expert_colors[i], 
+                                      linewidth=3 if active[i] else 1,
+                                      alpha=1.0 if active[i] else 0.3)
+        ax.add_patch(box)
+        ax.text(x+0.45, y_experts+0.35, f'E{i}', fontsize=18, fontweight='bold', color='white', ha='center', va='center')
+        
+        # Connection from router
+        alpha = 1.0 if active[i] else 0.15
+        linewidth = 3 if active[i] else 1
+        ax.plot([7, x+0.45], [6.7, y_experts+0.8], color=expert_colors[i] if active[i] else '#4B5563', 
+                alpha=alpha, linewidth=linewidth)
+    
+    # Output
+    ax.text(7, 3.5, 'Top-2 Experts Selected: E0 (60%) + E5 (40%)', 
+            fontsize=26, color='#10B981', ha='center', fontweight='bold')
+    
+    output_box = patches.FancyBboxPatch((5, 2.3), 4, 0.8, 
+                                         boxstyle="round,pad=0.1", 
+                                         edgecolor='white', facecolor='#8B5CF6', linewidth=3)
+    ax.add_patch(output_box)
+    ax.text(7, 2.7, 'Combined Output', fontsize=24, fontweight='bold', color='white', ha='center', va='center')
+    
+    ax.text(7, 1, 'Only 2 of 8 experts activated (sparse!)', 
+            fontsize=24, color='#94A3B8', ha='center', style='italic')
+    
+    fig.patch.set_facecolor('#1E293B')
+    ax.set_facecolor('#1E293B')
+    plt.tight_layout()
+    plt.savefig(BASE_PATH + 'transformer-feedforward/what-is-mixture-of-experts/moe-routing.png', 
+                dpi=150, facecolor='#1E293B', bbox_inches='tight', pad_inches=0.3)
+    plt.close()
+
+# Create all images
+print("Creating attention mechanism images...")
+create_attention_concept()
+create_qkv_visual()
+create_attention_scores_matrix()
+create_multi_head_visualization()
+create_self_attention_visual()
+
+print("Creating transformer images...")
+create_transformer_architecture_diagram()
+create_transformer_block_diagram()
+
+print("Creating MoE images...")
+create_moe_routing()
+
+print("\n✅ All missing images created successfully!")
+
diff --git a/lib/course-structure.tsx b/lib/course-structure.tsx
new file mode 100644
index 0000000..3ad030d
--- /dev/null
+++ b/lib/course-structure.tsx
@@ -0,0 +1,391 @@
+export interface LessonItem {
+  title: string;
+  titleZh: string;
+  href: string;
+}
+
+export interface ModuleData {
+  title: string;
+  titleZh: string;
+  icon: React.ReactNode;
+  lessons: LessonItem[];
+}
+
+export const getCourseModules = (): ModuleData[] => [
+  {
+    title: "Mathematics Fundamentals",
+    titleZh: "数学基础",
+    icon: (
+      <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 7h6m0 10v-3m-3 3h.01M9 17h.01M9 14h.01M12 14h.01M15 11h.01M12 11h.01M9 11h.01M7 21h10a2 2 0 002-2V5a2 2 0 00-2-2H7a2 2 0 00-2 2v14a2 2 0 002 2z" />
+      </svg>
+    ),
+    lessons: [
+      {
+        title: "Functions",
+        titleZh: "函数",
+        href: "/learn/math/functions"
+      },
+      {
+        title: "Derivatives",
+        titleZh: "导数",
+        href: "/learn/math/derivatives"
+      },
+      {
+        title: "Vectors",
+        titleZh: "向量",
+        href: "/learn/math/vectors"
+      },
+      {
+        title: "Matrices",
+        titleZh: "矩阵",
+        href: "/learn/math/matrices"
+      },
+      {
+        title: "Gradients",
+        titleZh: "梯度",
+        href: "/learn/math/gradients"
+      }
+    ]
+  },
+  {
+    title: "PyTorch Fundamentals",
+    titleZh: "PyTorch基础",
+    icon: (
+      <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 11H5m14 0a2 2 0 012 2v6a2 2 0 01-2 2H5a2 2 0 01-2-2v-6a2 2 0 012-2m14 0V9a2 2 0 00-2-2M5 11V9a2 2 0 012-2m0 0V5a2 2 0 012-2h6a2 2 0 012 2v2M7 7h10" />
+      </svg>
+    ),
+    lessons: [
+      {
+        title: "Creating Tensors",
+        titleZh: "创建张量",
+        href: "/learn/tensors/creating-tensors"
+      },
+      {
+        title: "Tensor Addition",
+        titleZh: "张量加法",
+        href: "/learn/tensors/tensor-addition"
+      },
+      {
+        title: "Matrix Multiplication",
+        titleZh: "矩阵乘法",
+        href: "/learn/tensors/matrix-multiplication"
+      },
+      {
+        title: "Transposing Tensors",
+        titleZh: "张量转置",
+        href: "/learn/tensors/transposing-tensors"
+      },
+      {
+        title: "Reshaping Tensors",
+        titleZh: "张量重塑",
+        href: "/learn/tensors/reshaping-tensors"
+      },
+      {
+        title: "Indexing and Slicing",
+        titleZh: "索引和切片",
+        href: "/learn/tensors/indexing-and-slicing"
+      },
+      {
+        title: "Concatenating Tensors",
+        titleZh: "张量拼接",
+        href: "/learn/tensors/concatenating-tensors"
+      },
+      {
+        title: "Creating Special Tensors",
+        titleZh: "创建特殊张量",
+        href: "/learn/tensors/creating-special-tensors"
+      }
+    ]
+  },
+  {
+    title: "Neuron From Scratch",
+    titleZh: "从零开始构建神经元",
+    icon: (
+      <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z" />
+      </svg>
+    ),
+    lessons: [
+      {
+        title: "What is a Neuron",
+        titleZh: "什么是神经元",
+        href: "/learn/neuron-from-scratch/what-is-a-neuron"
+      },
+      {
+        title: "The Linear Step",
+        titleZh: "线性步骤",
+        href: "/learn/neuron-from-scratch/the-linear-step"
+      },
+      {
+        title: "The Activation Function",
+        titleZh: "激活函数",
+        href: "/learn/neuron-from-scratch/the-activation-function"
+      },
+      {
+        title: "Building a Neuron in Python",
+        titleZh: "用Python构建神经元",
+        href: "/learn/neuron-from-scratch/building-a-neuron-in-python"
+      },
+      {
+        title: "Making a Prediction",
+        titleZh: "进行预测",
+        href: "/learn/neuron-from-scratch/making-a-prediction"
+      },
+      {
+        title: "The Concept of Loss",
+        titleZh: "损失概念",
+        href: "/learn/neuron-from-scratch/the-concept-of-loss"
+      },
+      {
+        title: "The Concept of Learning",
+        titleZh: "学习概念",
+        href: "/learn/neuron-from-scratch/the-concept-of-learning"
+      }
+    ]
+  },
+  {
+    title: "Activation Functions",
+    titleZh: "激活函数",
+    icon: (
+      <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 10V3L4 14h7v7l9-11h-7z" />
+      </svg>
+    ),
+    lessons: [
+      {
+        title: "ReLU",
+        titleZh: "ReLU",
+        href: "/learn/activation-functions/relu"
+      },
+      {
+        title: "Sigmoid",
+        titleZh: "Sigmoid",
+        href: "/learn/activation-functions/sigmoid"
+      },
+      {
+        title: "Tanh",
+        titleZh: "Tanh",
+        href: "/learn/activation-functions/tanh"
+      },
+      {
+        title: "SiLU",
+        titleZh: "SiLU",
+        href: "/learn/activation-functions/silu"
+      },
+      {
+        title: "SwiGLU",
+        titleZh: "SwiGLU",
+        href: "/learn/activation-functions/swiglu"
+      },
+      {
+        title: "Softmax",
+        titleZh: "Softmax",
+        href: "/learn/activation-functions/softmax"
+      }
+    ]
+  },
+  {
+    title: "Neural Networks from Scratch",
+    titleZh: "从零开始的神经网络",
+    icon: (
+      <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 10V3L4 14h7v7l9-11h-7z" />
+      </svg>
+    ),
+    lessons: [
+      {
+        title: "Architecture of a Network",
+        titleZh: "网络架构",
+        href: "/learn/neural-networks/architecture-of-a-network"
+      },
+      {
+        title: "Building a Layer",
+        titleZh: "构建层",
+        href: "/learn/neural-networks/building-a-layer"
+      },
+      {
+        title: "Implementing a Network",
+        titleZh: "实现网络",
+        href: "/learn/neural-networks/implementing-a-network"
+      },
+      {
+        title: "The Chain Rule",
+        titleZh: "链式法则",
+        href: "/learn/neural-networks/the-chain-rule"
+      },
+      {
+        title: "Calculating Gradients",
+        titleZh: "计算梯度",
+        href: "/learn/neural-networks/calculating-gradients"
+      },
+      {
+        title: "Backpropagation in Action",
+        titleZh: "反向传播实战",
+        href: "/learn/neural-networks/backpropagation-in-action"
+      },
+      {
+        title: "Implementing Backpropagation",
+        titleZh: "实现反向传播",
+        href: "/learn/neural-networks/implementing-backpropagation"
+      }
+    ]
+  },
+  {
+    title: "Attention Mechanism",
+    titleZh: "注意力机制",
+    icon: (
+      <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 12a3 3 0 11-6 0 3 3 0 016 0z" />
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M2.458 12C3.732 7.943 7.523 5 12 5c4.478 0 8.268 2.943 9.542 7-1.274 4.057-5.064 7-9.542 7-4.477 0-8.268-2.943-9.542-7z" />
+      </svg>
+    ),
+    lessons: [
+      {
+        title: "What is Attention",
+        titleZh: "什么是注意力",
+        href: "/learn/attention-mechanism/what-is-attention"
+      },
+      {
+        title: "Self Attention from Scratch",
+        titleZh: "从零开始自注意力",
+        href: "/learn/attention-mechanism/self-attention-from-scratch"
+      },
+      {
+        title: "Calculating Attention Scores",
+        titleZh: "计算注意力分数",
+        href: "/learn/attention-mechanism/calculating-attention-scores"
+      },
+      {
+        title: "Applying Attention Weights",
+        titleZh: "应用注意力权重",
+        href: "/learn/attention-mechanism/applying-attention-weights"
+      },
+      {
+        title: "Multi Head Attention",
+        titleZh: "多头注意力",
+        href: "/learn/attention-mechanism/multi-head-attention"
+      },
+      {
+        title: "Attention in Code",
+        titleZh: "注意力代码实现",
+        href: "/learn/attention-mechanism/attention-in-code"
+      }
+    ]
+  },
+  {
+    title: "Transformer Feedforward",
+    titleZh: "Transformer前馈网络",
+    icon: (
+      <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M8 9l3 3-3 3m5 0h3M5 20h14a2 2 0 002-2V6a2 2 0 00-2-2H5a2 2 0 00-2 2v12a2 2 0 002 2z" />
+      </svg>
+    ),
+    lessons: [
+      {
+        title: "The Feedforward Layer",
+        titleZh: "前馈层",
+        href: "/learn/transformer-feedforward/the-feedforward-layer"
+      },
+      {
+        title: "What is Mixture of Experts",
+        titleZh: "什么是专家混合",
+        href: "/learn/transformer-feedforward/what-is-mixture-of-experts"
+      },
+      {
+        title: "The Expert",
+        titleZh: "专家",
+        href: "/learn/transformer-feedforward/the-expert"
+      },
+      {
+        title: "The Gate",
+        titleZh: "门控",
+        href: "/learn/transformer-feedforward/the-gate"
+      },
+      {
+        title: "Combining Experts",
+        titleZh: "组合专家",
+        href: "/learn/transformer-feedforward/combining-experts"
+      },
+      {
+        title: "MoE in a Transformer",
+        titleZh: "Transformer中的MoE",
+        href: "/learn/transformer-feedforward/moe-in-a-transformer"
+      },
+      {
+        title: "MoE in Code",
+        titleZh: "MoE代码实现",
+        href: "/learn/transformer-feedforward/moe-in-code"
+      },
+      {
+        title: "The DeepSeek MLP",
+        titleZh: "DeepSeek MLP",
+        href: "/learn/transformer-feedforward/the-deepseek-mlp"
+      }
+    ]
+  },
+  {
+    title: "Building a Transformer",
+    titleZh: "构建Transformer",
+    icon: (
+      <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19.428 15.428a2 2 0 00-1.022-.547l-2.387-.477a6 6 0 00-3.86.517l-.318.158a6 6 0 01-3.86.517L6.05 15.21a2 2 0 00-1.806.547M8 4h8l-1 1v5.172a2 2 0 00.586 1.414l5 5c1.26 1.26.367 3.414-1.415 3.414H4.828c-1.782 0-2.674-2.154-1.414-3.414l5-5A2 2 0 009 10.172V5L8 4z" />
+      </svg>
+    ),
+    lessons: [
+      {
+        title: "Transformer Architecture",
+        titleZh: "Transformer架构",
+        href: "/learn/building-a-transformer/transformer-architecture"
+      },
+      {
+        title: "RoPE Positional Encoding",
+        titleZh: "RoPE位置编码",
+        href: "/learn/building-a-transformer/rope-positional-encoding"
+      },
+      {
+        title: "Building a Transformer Block",
+        titleZh: "构建Transformer块",
+        href: "/learn/building-a-transformer/building-a-transformer-block"
+      },
+      {
+        title: "The Final Linear Layer",
+        titleZh: "最终线性层",
+        href: "/learn/building-a-transformer/the-final-linear-layer"
+      },
+      {
+        title: "Full Transformer in Code",
+        titleZh: "完整Transformer代码",
+        href: "/learn/building-a-transformer/full-transformer-in-code"
+      },
+      {
+        title: "Training a Transformer",
+        titleZh: "训练Transformer",
+        href: "/learn/building-a-transformer/training-a-transformer"
+      }
+    ]
+  }
+];
+
+// Get all lessons as a flat array
+export const getAllLessons = (): LessonItem[] => {
+  const modules = getCourseModules();
+  return modules.flatMap(module => module.lessons);
+};
+
+// Get next and previous lessons for a given href
+export const getAdjacentLessons = (currentHref: string) => {
+  const allLessons = getAllLessons();
+  const currentIndex = allLessons.findIndex(lesson => lesson.href === currentHref);
+  
+  if (currentIndex === -1) {
+    return { prev: null, next: null };
+  }
+
+  const prev = currentIndex > 0 ? allLessons[currentIndex - 1] : null;
+  const next = currentIndex < allLessons.length - 1 ? allLessons[currentIndex + 1] : null;
+
+  return { prev, next };
+};
+
diff --git a/public/content/learn/README.md b/public/content/learn/README.md
new file mode 100644
index 0000000..b44ccd9
--- /dev/null
+++ b/public/content/learn/README.md
@@ -0,0 +1,93 @@
+# Course Content Structure
+
+This directory contains markdown files and images for the AI/ML course lessons.
+
+## Directory Structure
+
+```
+learn/
+├── math/
+│   ├── derivatives/
+│   │   ├── derivatives-content.md
+│   │   ├── derivative-graph.png (placeholder - add your image here)
+│   │   └── tangent-line.png (placeholder - add your image here)
+│   └── functions/
+│       ├── functions-content.md
+│       ├── linear-function.png (add your image here)
+│       ├── relu-function.png (add your image here)
+│       └── function-composition.png (add your image here)
+└── neural-networks/
+    ├── introduction/
+    │   ├── introduction-content.md
+    │   ├── neural-network-diagram.png (add your image here)
+    │   ├── layer-types.png (add your image here)
+    │   ├── training-process.png (add your image here)
+    │   └── depth-vs-performance.png (add your image here)
+    ├── forward-propagation/
+    │   ├── forward-propagation-content.md
+    │   ├── forward-prop-diagram.png (add your image here)
+    │   ├── forward-example.png (add your image here)
+    │   ├── activations-comparison.png (add your image here)
+    │   └── matrix-backprop.png (add your image here)
+    ├── backpropagation/
+    │   ├── backpropagation-content.md
+    │   ├── backprop-overview.png (add your image here)
+    │   ├── backprop-steps.png (add your image here)
+    │   └── matrix-backprop.png (add your image here)
+    └── training/
+        ├── training-content.md
+        ├── training-loop.png (add your image here)
+        ├── gradient-descent.png (add your image here)
+        ├── gd-variants.png (add your image here)
+        ├── optimizers-comparison.png (add your image here)
+        ├── lr-schedules.png (add your image here)
+        └── training-curves.png (add your image here)
+```
+
+## How to Add Images
+
+1. Place your PNG/JPG images in the corresponding lesson folder
+2. Reference them in the markdown using:
+   ```markdown
+   ![Alt Text](image-name.png)
+   ```
+3. The images will be served from `/content/learn/[lesson-path]/[image-name]`
+
+## Markdown Frontmatter Format
+
+Each lesson markdown file should start with frontmatter:
+
+```markdown
+---
+hero:
+  title: "Lesson Title"
+  subtitle: "Lesson Subtitle"
+  tags:
+    - "📐 Category"
+    - "⏱️ Reading Time"
+---
+
+# Your content here...
+```
+
+## Adding New Lessons
+
+1. Create a new folder under the appropriate category
+2. Add a `{folder-name}-content.md` file
+3. Add your images
+4. Create a page component in `app/learn/[category]/[lesson-name]/page.tsx`:
+
+```tsx
+import { LessonPage } from "@/components/lesson-page";
+
+export default function YourLessonPage() {
+  return (
+    <LessonPage
+      contentPath="category/lesson-name"
+      prevLink={{ href: "/previous", label: "← Previous" }}
+      nextLink={{ href: "/next", label: "Next →" }}
+    />
+  );
+}
+```
+
diff --git a/public/content/learn/activation-functions/relu/relu-content.md b/public/content/learn/activation-functions/relu/relu-content.md
new file mode 100644
index 0000000..be18afd
--- /dev/null
+++ b/public/content/learn/activation-functions/relu/relu-content.md
@@ -0,0 +1,339 @@
+---
+hero:
+  title: "ReLU"
+  subtitle: "Rectified Linear Unit - The Most Popular Activation Function"
+  tags:
+    - "⚡ Activation Functions"
+    - "⏱️ 10 min read"
+---
+
+ReLU is the **most widely used** activation function in deep learning. It's simple, fast, and works incredibly well!
+
+## The Formula
+
+**ReLU(x) = max(0, x)**
+
+That's it! If the input is negative, output 0. If positive, output the input unchanged.
+
+![ReLU Graph](/content/learn/activation-functions/relu/relu-graph.png)
+
+```yaml
+Input < 0  →  Output = 0
+Input ≥ 0  →  Output = Input
+
+Examples:
+ReLU(-5) = 0
+ReLU(-1) = 0
+ReLU(0) = 0
+ReLU(3) = 3
+ReLU(10) = 10
+```
+
+## How It Works
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# Create ReLU activation
+relu = nn.ReLU()
+
+# Test with different values
+x = torch.tensor([-3.0, -1.0, 0.0, 2.0, 5.0])
+output = relu(x)
+
+print(output)
+# tensor([0., 0., 0., 2., 5.])
+```
+
+**Manual calculation:**
+
+```yaml
+Input:   [-3.0, -1.0,  0.0,  2.0,  5.0]
+          ↓     ↓      ↓     ↓     ↓
+ReLU:    max(0,-3) max(0,-1) max(0,0) max(0,2) max(0,5)
+          ↓     ↓      ↓     ↓     ↓
+Output:  [0.0,  0.0,   0.0,  2.0,  5.0]
+```
+
+![ReLU Example](/content/learn/activation-functions/relu/relu-example.png)
+
+**The rule:** Negative numbers get "zeroed out", positive numbers pass through unchanged.
+
+## In Code (Simple Implementation)
+
+You can implement ReLU yourself:
+
+```python
+import torch
+
+def relu(x):
+    """Simple ReLU implementation"""
+    return torch.maximum(torch.tensor(0.0), x)
+
+# Test it
+x = torch.tensor([-2.0, 3.0, -1.0, 4.0])
+output = relu(x)
+print(output)
+# tensor([0., 3., 0., 4.])
+```
+
+Or even simpler with element-wise operations:
+
+```python
+def relu_simple(x):
+    """Even simpler ReLU"""
+    return x * (x > 0)  # Multiply by boolean mask
+
+x = torch.tensor([-2.0, 3.0, -1.0, 4.0])
+output = relu_simple(x)
+print(output)
+# tensor([0., 3., 0., 4.])
+```
+
+## Why ReLU is Amazing
+
+### 1. Simple and Fast
+
+```yaml
+Computation: Just one comparison!
+  if x > 0: return x
+  else: return 0
+
+No expensive operations:
+  ✓ No exponentials (unlike sigmoid/tanh)
+  ✓ No divisions
+  ✓ Just comparison and selection
+```
+
+### 2. Solves Vanishing Gradient Problem
+
+For positive values, gradient is always 1:
+
+```python
+import torch
+
+x = torch.tensor([5.0], requires_grad=True)
+y = torch.relu(x)
+y.backward()
+
+print(x.grad)  # tensor([1.])
+# Gradient is 1 for positive inputs!
+```
+
+**Why this matters:**
+
+```yaml
+Sigmoid/Tanh: gradients get very small (vanishing)
+ReLU: gradient is 1 for positive inputs
+
+Result: Faster training, deeper networks possible!
+```
+
+### 3. Creates Sparsity
+
+ReLU zeros out negative values, creating sparse activations:
+
+![ReLU Network](/content/learn/activation-functions/relu/relu-network.png)
+
+```python
+# Example: network layer output
+layer_output = torch.tensor([-2.1, 3.5, -0.8, 1.2, -1.5])
+activated = torch.relu(layer_output)
+
+print(activated)
+# tensor([0.0, 3.5, 0.0, 1.2, 0.0])
+
+# 60% of activations are zero!
+sparsity = (activated == 0).sum().item() / activated.numel()
+print(f"Sparsity: {sparsity:.1%}")
+# Output: Sparsity: 60.0%
+```
+
+**Benefits of sparsity:**
+
+```yaml
+Sparse networks:
+  ✓ More efficient (many zeros)
+  ✓ Better generalization
+  ✓ Easier to interpret
+  ✓ Faster computation
+```
+
+## Using ReLU in PyTorch
+
+### Method 1: As a Layer
+
+```python
+import torch.nn as nn
+
+# Create a neural network with ReLU
+model = nn.Sequential(
+    nn.Linear(10, 20),
+    nn.ReLU(),           # ← ReLU activation
+    nn.Linear(20, 5),
+    nn.ReLU(),           # ← Another ReLU
+    nn.Linear(5, 1)
+)
+```
+
+### Method 2: As a Function
+
+```python
+import torch
+import torch.nn.functional as F
+
+x = torch.randn(5, 10)
+
+# Apply ReLU directly
+output = F.relu(x)
+
+# Same as
+output = torch.relu(x)
+```
+
+### Method 3: Manual Implementation
+
+```python
+# In your custom forward pass
+def forward(self, x):
+    x = self.linear1(x)
+    x = torch.relu(x)      # Apply ReLU
+    x = self.linear2(x)
+    return x
+```
+
+## Practical Example: Multi-Layer Network
+
+```python
+import torch
+import torch.nn as nn
+
+# 3-layer network with ReLU
+class SimpleNet(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.fc1 = nn.Linear(784, 256)  # Input layer
+        self.fc2 = nn.Linear(256, 128)  # Hidden layer
+        self.fc3 = nn.Linear(128, 10)   # Output layer
+    
+    def forward(self, x):
+        x = self.fc1(x)
+        x = torch.relu(x)  # ReLU after layer 1
+        
+        x = self.fc2(x)
+        x = torch.relu(x)  # ReLU after layer 2
+        
+        x = self.fc3(x)
+        # No ReLU on output layer!
+        return x
+
+# Test it
+model = SimpleNet()
+input_data = torch.randn(32, 784)  # Batch of 32
+output = model(input_data)
+
+print(output.shape)  # torch.Size([32, 10])
+```
+
+## The Dying ReLU Problem
+
+**Issue:** Sometimes neurons can get "stuck" outputting only zeros.
+
+```python
+# Neuron with large negative bias
+weights = torch.randn(10)
+bias = torch.tensor(-100.0)  # Very negative!
+
+# Forward pass
+x = torch.randn(10)
+linear_output = x @ weights + bias
+activated = torch.relu(linear_output)
+
+print(linear_output)  # tensor(-98.5) - always negative!
+print(activated)      # tensor(0.) - always zero!
+```
+
+**Why this happens:**
+
+```yaml
+1. Neuron produces negative output
+2. ReLU makes it zero
+3. Gradient for negative inputs is also zero
+4. Neuron never updates → stuck at zero forever!
+
+Solution: Use variants like Leaky ReLU or careful initialization
+```
+
+## ReLU Variants
+
+### Leaky ReLU
+
+Allows small negative values:
+
+```python
+import torch.nn as nn
+
+# Standard ReLU
+relu = nn.ReLU()
+print(relu(torch.tensor(-1.0)))  # tensor(0.)
+
+# Leaky ReLU (small slope for negatives)
+leaky_relu = nn.LeakyReLU(negative_slope=0.01)
+print(leaky_relu(torch.tensor(-1.0)))  # tensor(-0.0100)
+```
+
+**Formula:**
+
+```yaml
+LeakyReLU(x) = max(0.01x, x)
+
+For x < 0: output = 0.01 * x (small negative)
+For x ≥ 0: output = x (unchanged)
+```
+
+## Key Takeaways
+
+✓ **Simple formula:** max(0, x)
+
+✓ **Fast:** Just comparison, no complex math
+
+✓ **Solves vanishing gradients:** Gradient is 1 for positive values
+
+✓ **Creates sparsity:** Zeros out negative activations
+
+✓ **Most popular:** Default choice for hidden layers
+
+✓ **Watch out for:** Dying ReLU (neurons stuck at zero)
+
+**Quick Reference:**
+
+```python
+# Using ReLU
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Method 1: Module
+relu_layer = nn.ReLU()
+output = relu_layer(x)
+
+# Method 2: Functional
+output = F.relu(x)
+
+# Method 3: Direct
+output = torch.relu(x)
+
+# Method 4: Manual
+output = torch.maximum(torch.tensor(0.0), x)
+```
+
+**When to use ReLU:**
+- ✓ Hidden layers in CNNs
+- ✓ Hidden layers in feedforward networks
+- ✓ Default activation for most architectures
+- ✗ NOT for output layer (use softmax/sigmoid/linear instead)
+
+**Remember:** ReLU is simple but powerful. It's the workhorse of modern deep learning! 🎉
diff --git a/public/content/learn/activation-functions/relu/relu-example.png b/public/content/learn/activation-functions/relu/relu-example.png
new file mode 100644
index 0000000..188ac1b
Binary files /dev/null and b/public/content/learn/activation-functions/relu/relu-example.png differ
diff --git a/public/content/learn/activation-functions/relu/relu-graph.png b/public/content/learn/activation-functions/relu/relu-graph.png
new file mode 100644
index 0000000..db0625a
Binary files /dev/null and b/public/content/learn/activation-functions/relu/relu-graph.png differ
diff --git a/public/content/learn/activation-functions/relu/relu-network.png b/public/content/learn/activation-functions/relu/relu-network.png
new file mode 100644
index 0000000..d577e3c
Binary files /dev/null and b/public/content/learn/activation-functions/relu/relu-network.png differ
diff --git a/public/content/learn/activation-functions/sigmoid/sigmoid-classification.png b/public/content/learn/activation-functions/sigmoid/sigmoid-classification.png
new file mode 100644
index 0000000..ef2d10b
Binary files /dev/null and b/public/content/learn/activation-functions/sigmoid/sigmoid-classification.png differ
diff --git a/public/content/learn/activation-functions/sigmoid/sigmoid-content.md b/public/content/learn/activation-functions/sigmoid/sigmoid-content.md
new file mode 100644
index 0000000..9700aa0
--- /dev/null
+++ b/public/content/learn/activation-functions/sigmoid/sigmoid-content.md
@@ -0,0 +1,357 @@
+---
+hero:
+  title: "Sigmoid"
+  subtitle: "The Classic S-shaped Activation Function"
+  tags:
+    - "⚡ Activation Functions"
+    - "⏱️ 10 min read"
+---
+
+Sigmoid is a smooth, S-shaped function that **squashes any input to a value between 0 and 1**. Perfect for probabilities!
+
+## The Formula
+
+**σ(x) = 1 / (1 + e⁻ˣ)**
+
+The output is always between 0 and 1, making it ideal for binary classification!
+
+![Sigmoid Graph](/content/learn/activation-functions/sigmoid/sigmoid-graph.png)
+
+```yaml
+Input → -∞  →  Output → 0
+Input = 0   →  Output = 0.5
+Input → +∞  →  Output → 1
+
+Key property: Output is always in (0, 1)
+```
+
+## How It Works
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# Create sigmoid activation
+sigmoid = nn.Sigmoid()
+
+# Test with different values
+x = torch.tensor([-5.0, -1.0, 0.0, 1.0, 5.0])
+output = sigmoid(x)
+
+print(output)
+# tensor([0.0067, 0.2689, 0.5000, 0.7311, 0.9933])
+```
+
+![Sigmoid Example](/content/learn/activation-functions/sigmoid/sigmoid-example.png)
+
+**Manual calculation (for x = 2):**
+
+```yaml
+σ(2) = 1 / (1 + e⁻²)
+     = 1 / (1 + 0.1353)
+     = 1 / 1.1353
+     = 0.881
+
+Result: ~0.88 or 88% probability
+```
+
+## The S-Shape Explained
+
+```yaml
+Large negative input (x = -10):
+  e⁻⁽⁻¹⁰⁾ = e¹⁰ = 22026 (huge!)
+  σ(x) = 1 / (1 + 22026) ≈ 0.00005
+  → Output near 0
+
+Zero input (x = 0):
+  e⁻⁰ = 1
+  σ(x) = 1 / (1 + 1) = 0.5
+  → Output exactly 0.5
+
+Large positive input (x = 10):
+  e⁻¹⁰ = 0.000045 (tiny!)
+  σ(x) = 1 / (1 + 0.000045) ≈ 0.99995
+  → Output near 1
+```
+
+## Binary Classification
+
+Sigmoid's killer application: **predicting probabilities for binary classification**!
+
+![Sigmoid Classification](/content/learn/activation-functions/sigmoid/sigmoid-classification.png)
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# Binary classification model
+class BinaryClassifier(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.linear = nn.Linear(10, 1)  # 10 features → 1 output
+        self.sigmoid = nn.Sigmoid()
+    
+    def forward(self, x):
+        logits = self.linear(x)
+        probabilities = self.sigmoid(logits)
+        return probabilities
+
+# Test
+model = BinaryClassifier()
+x = torch.randn(5, 10)  # 5 samples, 10 features each
+probs = model(x)
+
+print(probs)
+# tensor([[0.7234],
+#         [0.3421],
+#         [0.8956],
+#         [0.1234],
+#         [0.6543]], grad_fn=<SigmoidBackward0>)
+
+# Convert to predictions
+predictions = (probs > 0.5).float()
+print(predictions)
+# tensor([[1.],  # Class 1 (prob > 0.5)
+#         [0.],  # Class 0 (prob < 0.5)
+#         [1.],
+#         [0.],
+#         [1.]])
+```
+
+**What happened:**
+
+```yaml
+Model output (logit): 2.5
+     ↓
+Sigmoid: 1/(1 + e⁻²·⁵) = 0.92
+     ↓
+0.92 > 0.5 → Predict Class 1!
+```
+
+## In Code (Simple Implementation)
+
+```python
+import torch
+
+def sigmoid(x):
+    """Simple sigmoid implementation"""
+    return 1 / (1 + torch.exp(-x))
+
+# Test it
+x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
+output = sigmoid(x)
+print(output)
+# tensor([0.1192, 0.2689, 0.5000, 0.7311, 0.8808])
+```
+
+## Using Sigmoid in PyTorch
+
+### Method 1: As a Layer
+
+```python
+import torch.nn as nn
+
+model = nn.Sequential(
+    nn.Linear(10, 5),
+    nn.ReLU(),
+    nn.Linear(5, 1),
+    nn.Sigmoid()  # ← Output layer for binary classification
+)
+```
+
+### Method 2: As a Function
+
+```python
+import torch
+import torch.nn.functional as F
+
+x = torch.randn(5, 1)
+output = F.sigmoid(x)  # or torch.sigmoid(x)
+```
+
+### Method 3: Combined with Loss (BCE)
+
+```python
+import torch
+import torch.nn as nn
+
+# Binary Cross Entropy already includes sigmoid!
+criterion = nn.BCEWithLogitsLoss()  # Sigmoid + BCE
+
+# Model outputs raw logits (no sigmoid)
+logits = model(x)
+loss = criterion(logits, targets)  # Sigmoid applied internally!
+```
+
+## The Vanishing Gradient Problem
+
+Sigmoid's main weakness: **gradients vanish for large inputs**!
+
+```python
+import torch
+
+# Large input
+x = torch.tensor([10.0], requires_grad=True)
+y = torch.sigmoid(x)
+y.backward()
+
+print(f"Input: {x.item()}")
+print(f"Output: {y.item():.6f}")
+print(f"Gradient: {x.grad.item():.6f}")
+# Gradient: 0.000045 ← Very small!
+```
+
+**Why this is bad:**
+
+```yaml
+Gradient too small →
+  Slow learning →
+    Deep networks struggle →
+      ReLU became more popular!
+
+This is why ReLU replaced sigmoid in hidden layers.
+```
+
+**When sigmoid gradients vanish:**
+
+```yaml
+For x = -10 or x = 10:
+  Output is ~0 or ~1 (saturated)
+  Gradient ≈ 0 (flat region)
+  Learning stops!
+
+For x near 0:
+  Output around 0.5 (steep region)
+  Gradient maximum (~0.25)
+  Learning is good here
+```
+
+## Practical Examples
+
+### Example 1: Email Spam Detector
+
+```python
+import torch
+import torch.nn as nn
+
+class SpamDetector(nn.Module):
+    def __init__(self, num_features):
+        super().__init__()
+        self.fc1 = nn.Linear(num_features, 64)
+        self.fc2 = nn.Linear(64, 32)
+        self.fc3 = nn.Linear(32, 1)
+        self.sigmoid = nn.Sigmoid()
+    
+    def forward(self, x):
+        x = torch.relu(self.fc1(x))
+        x = torch.relu(self.fc2(x))
+        x = self.fc3(x)
+        probability = self.sigmoid(x)  # Sigmoid at end!
+        return probability
+
+# Predict
+email_features = torch.randn(1, 100)
+spam_probability = model(email_features)
+
+if spam_probability > 0.5:
+    print(f"SPAM (confidence: {spam_probability.item():.2%})")
+else:
+    print(f"NOT SPAM (confidence: {1-spam_probability.item():.2%})")
+```
+
+### Example 2: Medical Diagnosis
+
+```python
+# Patient features → Disease probability
+patient = torch.randn(1, 50)  # 50 medical features
+probability = model(patient)
+
+print(f"Disease probability: {probability.item():.1%}")
+# Output: Disease probability: 23.4%
+
+if probability > 0.7:
+    print("High risk - recommend testing")
+elif probability > 0.3:
+    print("Medium risk - monitor")
+else:
+    print("Low risk")
+```
+
+## Sigmoid vs ReLU
+
+```yaml
+Sigmoid:
+  ✓ Outputs 0 to 1 (probabilities)
+  ✓ Smooth, differentiable everywhere
+  ✓ Great for binary classification OUTPUT
+  ✗ Vanishing gradients for large |x|
+  ✗ Slow computation (exponential)
+  ✗ NOT zero-centered
+
+ReLU:
+  ✓ Fast (simple comparison)
+  ✓ No vanishing gradient for x > 0
+  ✓ Creates sparsity
+  ✗ Outputs 0 to ∞ (not probabilities)
+  ✗ Dying ReLU problem
+  ✗ NOT smooth at x = 0
+```
+
+**When to use each:**
+
+```yaml
+Use Sigmoid for:
+  ✓ Binary classification output layer
+  ✓ When you need probabilities
+  ✓ Gates in LSTM/GRU
+
+Use ReLU for:
+  ✓ Hidden layers
+  ✓ Convolutional layers
+  ✓ Most modern architectures
+```
+
+## Key Takeaways
+
+✓ **S-shaped curve:** Smooth transition from 0 to 1
+
+✓ **Formula:** σ(x) = 1 / (1 + e⁻ˣ)
+
+✓ **Output range:** Always between 0 and 1
+
+✓ **Perfect for probabilities:** Binary classification output
+
+✓ **Vanishing gradients:** Problem in deep networks
+
+✓ **Mostly for output:** ReLU used in hidden layers instead
+
+**Quick Reference:**
+
+```python
+# Using sigmoid
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Method 1: Module
+sigmoid_layer = nn.Sigmoid()
+output = sigmoid_layer(x)
+
+# Method 2: Functional
+output = F.sigmoid(x)
+
+# Method 3: Direct
+output = torch.sigmoid(x)
+
+# Method 4: Manual
+output = 1 / (1 + torch.exp(-x))
+
+# For binary classification with loss
+criterion = nn.BCEWithLogitsLoss()  # Includes sigmoid!
+```
+
+**Remember:** Sigmoid for the output, ReLU for the hidden layers! 🎉
diff --git a/public/content/learn/activation-functions/sigmoid/sigmoid-example.png b/public/content/learn/activation-functions/sigmoid/sigmoid-example.png
new file mode 100644
index 0000000..d2e0dd4
Binary files /dev/null and b/public/content/learn/activation-functions/sigmoid/sigmoid-example.png differ
diff --git a/public/content/learn/activation-functions/sigmoid/sigmoid-graph.png b/public/content/learn/activation-functions/sigmoid/sigmoid-graph.png
new file mode 100644
index 0000000..e1c7411
Binary files /dev/null and b/public/content/learn/activation-functions/sigmoid/sigmoid-graph.png differ
diff --git a/public/content/learn/activation-functions/silu/silu-content.md b/public/content/learn/activation-functions/silu/silu-content.md
new file mode 100644
index 0000000..276bd25
--- /dev/null
+++ b/public/content/learn/activation-functions/silu/silu-content.md
@@ -0,0 +1,375 @@
+---
+hero:
+  title: "SiLU"
+  subtitle: "Sigmoid Linear Unit - The Swish Activation"
+  tags:
+    - "⚡ Activation Functions"
+    - "⏱️ 10 min read"
+---
+
+SiLU (also called Swish) is a **smooth** alternative to ReLU. It's ReLU but with a smooth curve instead of a hard cutoff!
+
+## The Formula
+
+**SiLU(x) = x · σ(x) = x · sigmoid(x)**
+
+Simply multiply the input by its sigmoid! This creates a smooth, non-linear function.
+
+![SiLU Graph](/content/learn/activation-functions/silu/silu-graph.png)
+
+```yaml
+For large negative x:
+  sigmoid(x) ≈ 0
+  SiLU(x) = x · 0 ≈ 0
+
+For x = 0:
+  sigmoid(0) = 0.5
+  SiLU(0) = 0 · 0.5 = 0
+
+For large positive x:
+  sigmoid(x) ≈ 1
+  SiLU(x) = x · 1 ≈ x
+```
+
+## How It Works
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# Create SiLU activation
+silu = nn.SiLU()
+
+# Test with different values
+x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
+output = silu(x)
+
+print(output)
+# tensor([-0.2384, -0.2689,  0.0000,  0.7311,  1.7616])
+```
+
+**Manual calculation (for x = 2):**
+
+```yaml
+SiLU(2) = 2 · sigmoid(2)
+        = 2 · (1 / (1 + e⁻²))
+        = 2 · 0.881
+        = 1.762
+
+Notice: Not just 2 (like ReLU), but close!
+```
+
+## The Smooth Advantage
+
+Unlike ReLU, SiLU is **smooth everywhere** and allows small negative values:
+
+![SiLU vs ReLU](/content/learn/activation-functions/silu/silu-vs-relu.png)
+
+**Example comparison:**
+
+```python
+import torch
+
+x = torch.tensor([-2.0, -1.0, -0.5, 0.0, 1.0, 2.0])
+
+# ReLU: hard cutoff
+relu_out = torch.relu(x)
+print("ReLU:", relu_out)
+# tensor([0.0000, 0.0000, 0.0000, 0.0000, 1.0000, 2.0000])
+
+# SiLU: smooth transition
+silu_out = torch.nn.functional.silu(x)
+print("SiLU:", silu_out)
+# tensor([-0.2384, -0.2689, -0.1887,  0.0000,  0.7311,  1.7616])
+```
+
+**Key differences:**
+
+```yaml
+ReLU:
+  x < 0 → Output = 0 (hard cutoff)
+  x > 0 → Output = x (straight line)
+  NOT smooth at x = 0
+
+SiLU:
+  x < 0 → Small negative values (smooth)
+  x > 0 → Nearly linear (smooth)
+  Smooth everywhere!
+```
+
+## Why SiLU is Better Than ReLU
+
+### 1. Smooth Gradients
+
+```python
+import torch
+
+x = torch.tensor([0.0], requires_grad=True)
+
+# ReLU gradient at x=0 is undefined (jump)
+# SiLU gradient at x=0 is smooth (0.5)
+y = torch.nn.functional.silu(x)
+y.backward()
+
+print(x.grad)  # tensor([0.5000])
+# Smooth gradient!
+```
+
+### 2. No Dying Neurons
+
+```python
+# Neuron that would "die" with ReLU
+x = torch.tensor([-5.0], requires_grad=True)
+
+# ReLU would output 0 with gradient 0
+relu_out = torch.relu(x)
+print(relu_out)  # tensor([0.]) ← Dead!
+
+# SiLU allows gradient flow
+silu_out = torch.nn.functional.silu(x)
+print(silu_out)  # tensor([-0.0337]) ← Small but not zero!
+
+# Gradient still flows
+silu_out.backward()
+print(x.grad)  # tensor([0.0030]) ← Can still learn!
+```
+
+### 3. Better Performance
+
+Recent research shows SiLU **outperforms ReLU** in many tasks, especially in vision transformers and modern architectures!
+
+## In Code (Simple Implementation)
+
+```python
+import torch
+
+def silu(x):
+    """Simple SiLU implementation"""
+    return x * torch.sigmoid(x)
+
+# Test it
+x = torch.tensor([-1.0, 0.0, 1.0, 2.0])
+output = silu(x)
+print(output)
+# tensor([-0.2689,  0.0000,  0.7311,  1.7616])
+
+# Verify against PyTorch
+print(torch.nn.functional.silu(x))
+# tensor([-0.2689,  0.0000,  0.7311,  1.7616]) ← Same!
+```
+
+## Using SiLU in PyTorch
+
+### Method 1: As a Layer
+
+```python
+import torch.nn as nn
+
+model = nn.Sequential(
+    nn.Linear(10, 20),
+    nn.SiLU(),       # ← SiLU activation
+    nn.Linear(20, 5),
+    nn.SiLU(),       # ← Another SiLU
+    nn.Linear(5, 1)
+)
+```
+
+### Method 2: As a Function
+
+```python
+import torch.nn.functional as F
+
+x = torch.randn(5, 10)
+output = F.silu(x)
+```
+
+## Practical Example: Vision Transformer
+
+SiLU is used in many modern architectures like EfficientNet and Vision Transformers:
+
+```python
+import torch
+import torch.nn as nn
+
+class ModernBlock(nn.Module):
+    def __init__(self, dim):
+        super().__init__()
+        self.norm = nn.LayerNorm(dim)
+        self.fc1 = nn.Linear(dim, dim * 4)
+        self.fc2 = nn.Linear(dim * 4, dim)
+        self.silu = nn.SiLU()  # ← SiLU instead of ReLU!
+    
+    def forward(self, x):
+        residual = x
+        x = self.norm(x)
+        x = self.fc1(x)
+        x = self.silu(x)  # Smooth activation
+        x = self.fc2(x)
+        return x + residual
+
+# Test
+block = ModernBlock(dim=128)
+x = torch.randn(32, 128)  # Batch of 32
+output = block(x)
+print(output.shape)  # torch.Size([32, 128])
+```
+
+## SiLU vs Other Activations
+
+```yaml
+SiLU (Swish):
+  ✓ Smooth everywhere (no hard cutoff)
+  ✓ No dying neurons
+  ✓ Better performance than ReLU
+  ✓ Self-gated (uses its own sigmoid)
+  ✗ Slightly slower than ReLU
+  ✗ More computation (sigmoid)
+
+ReLU:
+  ✓ Fastest (simple comparison)
+  ✓ Simple to understand
+  ✗ Not smooth at x=0
+  ✗ Dying neuron problem
+  ✗ Hard cutoff at zero
+
+Tanh:
+  ✓ Zero-centered
+  ✓ Smooth
+  ✗ Vanishing gradients
+  ✗ Slower than both
+```
+
+## Where SiLU is Used
+
+**Modern architectures using SiLU:**
+- EfficientNet (image classification)
+- Vision Transformers (ViT)
+- Some language models
+- Mobile-optimized networks
+
+**Example from research:**
+
+```yaml
+Study: "Searching for Activation Functions" (Google Brain, 2017)
+Finding: Swish/SiLU outperformed ReLU on ImageNet
+Result: Adopted in many modern architectures
+
+Performance gain: ~0.6-0.9% accuracy improvement
+```
+
+## Practical Example: EfficientNet-style Block
+
+```python
+import torch
+import torch.nn as nn
+
+class MBConvBlock(nn.Module):
+    """Mobile Inverted Bottleneck with SiLU"""
+    def __init__(self, in_channels, out_channels, expand_ratio=4):
+        super().__init__()
+        hidden_dim = in_channels * expand_ratio
+        
+        self.expand_conv = nn.Conv2d(in_channels, hidden_dim, 1)
+        self.depthwise_conv = nn.Conv2d(hidden_dim, hidden_dim, 3, 
+                                        padding=1, groups=hidden_dim)
+        self.project_conv = nn.Conv2d(hidden_dim, out_channels, 1)
+        self.silu = nn.SiLU()  # ← SiLU for smooth activation
+    
+    def forward(self, x):
+        # Expand
+        out = self.expand_conv(x)
+        out = self.silu(out)  # SiLU
+        
+        # Depthwise
+        out = self.depthwise_conv(out)
+        out = self.silu(out)  # SiLU
+        
+        # Project
+        out = self.project_conv(out)
+        return out
+
+# Test
+block = MBConvBlock(32, 64)
+x = torch.randn(1, 32, 56, 56)  # Image: batch, channels, H, W
+output = block(x)
+print(output.shape)  # torch.Size([1, 64, 56, 56])
+```
+
+## The Self-Gating Mechanism
+
+SiLU is "self-gated" - it uses its own sigmoid as a gate:
+
+```python
+import torch
+
+x = torch.tensor([2.0])
+
+# SiLU gates itself
+sigmoid_gate = torch.sigmoid(x)  # 0.881
+output = x * sigmoid_gate         # 2.0 * 0.881 = 1.762
+
+print(f"Input: {x.item()}")
+print(f"Gate: {sigmoid_gate.item():.3f}")
+print(f"Output: {output.item():.3f}")
+
+# Input: 2.0
+# Gate: 0.881
+# Output: 1.762
+```
+
+**What this means:**
+
+```yaml
+The input controls its own "gate":
+  - Large positive x → gate ≈ 1 → mostly pass through
+  - Large negative x → gate ≈ 0 → mostly blocked
+  - Small x → partial gating (smooth)
+
+This self-regulation makes SiLU effective!
+```
+
+## Key Takeaways
+
+✓ **Formula:** x · sigmoid(x)
+
+✓ **Smooth:** No hard cutoff like ReLU
+
+✓ **Self-gated:** Uses its own sigmoid as a gate
+
+✓ **Better than ReLU:** Improved performance in many tasks
+
+✓ **No dying neurons:** Always has gradient flow
+
+✓ **Modern choice:** Used in EfficientNet, ViT, and more
+
+**Quick Reference:**
+
+```python
+# Using SiLU
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Method 1: Module
+silu_layer = nn.SiLU()
+output = silu_layer(x)
+
+# Method 2: Functional
+output = F.silu(x)
+
+# Method 3: Manual
+output = x * torch.sigmoid(x)
+
+# Also known as Swish
+swish = nn.SiLU()  # Same thing!
+```
+
+**When to use SiLU:**
+- ✓ Modern CNN architectures
+- ✓ Vision transformers
+- ✓ When you want better performance than ReLU
+- ✓ Mobile/efficient networks
+
+**Remember:** SiLU is the smooth, modern upgrade to ReLU! 🎉
diff --git a/public/content/learn/activation-functions/silu/silu-graph.png b/public/content/learn/activation-functions/silu/silu-graph.png
new file mode 100644
index 0000000..3cb874d
Binary files /dev/null and b/public/content/learn/activation-functions/silu/silu-graph.png differ
diff --git a/public/content/learn/activation-functions/silu/silu-vs-relu.png b/public/content/learn/activation-functions/silu/silu-vs-relu.png
new file mode 100644
index 0000000..3c6ef33
Binary files /dev/null and b/public/content/learn/activation-functions/silu/silu-vs-relu.png differ
diff --git a/public/content/learn/activation-functions/softmax/softmax-classification.png b/public/content/learn/activation-functions/softmax/softmax-classification.png
new file mode 100644
index 0000000..57b6983
Binary files /dev/null and b/public/content/learn/activation-functions/softmax/softmax-classification.png differ
diff --git a/public/content/learn/activation-functions/softmax/softmax-content.md b/public/content/learn/activation-functions/softmax/softmax-content.md
new file mode 100644
index 0000000..32b1bd6
--- /dev/null
+++ b/public/content/learn/activation-functions/softmax/softmax-content.md
@@ -0,0 +1,411 @@
+---
+hero:
+  title: "Softmax"
+  subtitle: "Multi-class Classification Activation Function"
+  tags:
+    - "⚡ Activation Functions"
+    - "⏱️ 10 min read"
+---
+
+Softmax converts raw model outputs (logits) into **probabilities that sum to 1**. Perfect for multi-class classification!
+
+## The Formula
+
+**Softmax(xᵢ) = exp(xᵢ) / Σ exp(xⱼ)**
+
+For each element:
+1. Take exponential (e^x)
+2. Divide by sum of all exponentials
+
+This ensures all outputs are positive and sum to exactly 1!
+
+## How It Works
+
+![Softmax Transformation](/content/learn/activation-functions/softmax/softmax-transformation.png)
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# Raw model outputs (logits)
+logits = torch.tensor([2.0, 1.0, 0.1])
+
+# Apply softmax
+softmax = nn.Softmax(dim=0)
+probabilities = softmax(logits)
+
+print(probabilities)
+# tensor([0.6590, 0.2424, 0.0986])
+
+print(probabilities.sum())
+# tensor(1.0000) ← Sums to 1!
+```
+
+**Manual calculation:**
+
+```yaml
+Step 1: Exponentiate each value
+  exp(2.0) = 7.389
+  exp(1.0) = 2.718
+  exp(0.1) = 1.105
+
+Step 2: Sum all exponentials
+  Sum = 7.389 + 2.718 + 1.105 = 11.212
+
+Step 3: Divide each by sum
+  7.389 / 11.212 = 0.659 (65.9%)
+  2.718 / 11.212 = 0.242 (24.2%)
+  1.105 / 11.212 = 0.099 (9.9%)
+
+Result: [0.659, 0.242, 0.099]
+Verification: 0.659 + 0.242 + 0.099 = 1.0 ✓
+```
+
+## Multi-Class Classification
+
+Softmax's main use: **predicting probabilities across multiple classes**!
+
+![Softmax Classification](/content/learn/activation-functions/softmax/softmax-classification.png)
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# 10-class classification model
+class MultiClassifier(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.fc1 = nn.Linear(784, 128)  # Input layer
+        self.fc2 = nn.Linear(128, 64)   # Hidden layer
+        self.fc3 = nn.Linear(64, 10)    # Output: 10 classes
+        self.softmax = nn.Softmax(dim=1)
+    
+    def forward(self, x):
+        x = torch.relu(self.fc1(x))
+        x = torch.relu(self.fc2(x))
+        logits = self.fc3(x)
+        probabilities = self.softmax(logits)  # ← Softmax!
+        return probabilities
+
+# Test
+model = MultiClassifier()
+batch = torch.randn(5, 784)  # 5 images
+probs = model(batch)
+
+print(probs.shape)  # torch.Size([5, 10])
+print(probs[0])     # First image probabilities
+# tensor([0.0823, 0.1245, 0.0567, 0.3421, 0.0912, 
+#         0.0734, 0.1823, 0.0234, 0.0156, 0.0085])
+
+print(probs[0].sum())  # tensor(1.0000) ← Sums to 1!
+
+# Get predictions
+predictions = torch.argmax(probs, dim=1)
+print(predictions)  # tensor([3, 7, 2, 3, 1])
+# Class indices with highest probability
+```
+
+## Why Exponential?
+
+The exponential makes softmax **sensitive to large values**:
+
+```python
+import torch
+
+# Small difference in logits
+logits1 = torch.tensor([1.0, 1.1, 1.2])
+probs1 = torch.softmax(logits1, dim=0)
+print(probs1)
+# tensor([0.3006, 0.3322, 0.3672])
+# Similar probabilities
+
+# Large difference in logits
+logits2 = torch.tensor([1.0, 2.0, 3.0])
+probs2 = torch.softmax(logits2, dim=0)
+print(probs2)
+# tensor([0.0900, 0.2447, 0.6652])
+# Clear winner!
+
+# Huge difference
+logits3 = torch.tensor([1.0, 5.0, 10.0])
+probs3 = torch.softmax(logits3, dim=0)
+print(probs3)
+# tensor([0.0000, 0.0067, 0.9933])
+# Dominant class!
+```
+
+**What happened:**
+
+```yaml
+exp() amplifies differences:
+  
+Small logits [1.0, 1.1, 1.2]:
+  exp → [2.7, 3.0, 3.3]
+  Difference is small → similar probabilities
+
+Large logits [1.0, 5.0, 10.0]:
+  exp → [2.7, 148, 22026]
+  Difference is HUGE → one dominates
+```
+
+## In Code (Simple Implementation)
+
+```python
+import torch
+
+def softmax(x):
+    """Simple softmax implementation"""
+    exp_x = torch.exp(x)
+    return exp_x / exp_x.sum()
+
+# Test it
+logits = torch.tensor([2.0, 1.0, 0.5])
+output = softmax(logits)
+print(output)
+# tensor([0.6364, 0.2341, 0.1295])
+print(output.sum())
+# tensor(1.0000) ← Sums to 1!
+```
+
+## Using Softmax in PyTorch
+
+### Method 1: As a Layer
+
+```python
+import torch.nn as nn
+
+model = nn.Sequential(
+    nn.Linear(784, 128),
+    nn.ReLU(),
+    nn.Linear(128, 10),
+    nn.Softmax(dim=1)  # ← Softmax on output
+)
+```
+
+### Method 2: As a Function
+
+```python
+import torch.nn.functional as F
+
+logits = torch.randn(32, 10)  # Batch of 32, 10 classes
+probs = F.softmax(logits, dim=1)  # Softmax across classes
+
+print(probs.shape)  # torch.Size([32, 10])
+print(probs.sum(dim=1))  # All 1.0
+```
+
+### Method 3: Combined with Loss (CrossEntropy)
+
+**Important:** PyTorch's `CrossEntropyLoss` includes softmax!
+
+```python
+import torch
+import torch.nn as nn
+
+# CrossEntropy already has softmax!
+criterion = nn.CrossEntropyLoss()
+
+# Model outputs raw logits (NO softmax)
+logits = model(x)
+loss = criterion(logits, targets)  # Softmax applied internally!
+
+# DON'T do this:
+# probs = F.softmax(logits, dim=1)  # ← Wrong!
+# loss = criterion(probs, targets)   # ← Applies softmax twice!
+```
+
+## Temperature Scaling
+
+You can control softmax "confidence" with temperature:
+
+```python
+import torch
+
+logits = torch.tensor([2.0, 1.0, 0.5])
+
+# Normal softmax (temperature = 1)
+probs_normal = torch.softmax(logits, dim=0)
+print(probs_normal)
+# tensor([0.6364, 0.2341, 0.1295])
+
+# Low temperature (sharper, more confident)
+probs_sharp = torch.softmax(logits / 0.5, dim=0)
+print(probs_sharp)
+# tensor([0.8360, 0.1131, 0.0508])
+
+# High temperature (softer, less confident)
+probs_soft = torch.softmax(logits / 2.0, dim=0)
+print(probs_soft)
+# tensor([0.4750, 0.3107, 0.2143])
+```
+
+**Effect of temperature:**
+
+```yaml
+T < 1 (low):
+  - Sharper probabilities
+  - More confident predictions
+  - Winner takes more
+
+T > 1 (high):
+  - Softer probabilities
+  - Less confident predictions
+  - More uniform distribution
+
+T = 1:
+  - Standard softmax
+```
+
+## Practical Example: Image Classification
+
+```python
+import torch
+import torch.nn as nn
+
+class ImageClassifier(nn.Module):
+    def __init__(self, num_classes=1000):
+        super().__init__()
+        self.features = nn.Sequential(
+            nn.Conv2d(3, 64, 3),
+            nn.ReLU(),
+            nn.MaxPool2d(2),
+            # ... more layers ...
+        )
+        self.classifier = nn.Sequential(
+            nn.Linear(512, 256),
+            nn.ReLU(),
+            nn.Linear(256, num_classes)
+            # NO softmax here if using CrossEntropyLoss!
+        )
+    
+    def forward(self, x):
+        x = self.features(x)
+        x = x.view(x.size(0), -1)  # Flatten
+        logits = self.classifier(x)
+        return logits  # Return logits, not probabilities!
+
+# For inference, apply softmax manually
+model = ImageClassifier()
+image = torch.randn(1, 3, 224, 224)
+
+with torch.no_grad():
+    logits = model(image)
+    probs = torch.softmax(logits, dim=1)
+    
+    # Get top-5 predictions
+    top5_probs, top5_indices = torch.topk(probs, 5, dim=1)
+    
+    print("Top 5 predictions:")
+    for i in range(5):
+        print(f"Class {top5_indices[0, i]}: {top5_probs[0, i]:.1%}")
+```
+
+## Softmax Across Different Dimensions
+
+```python
+import torch
+
+# Batch of logits
+logits = torch.tensor([[2.0, 1.0, 0.5],
+                       [0.8, 2.1, 1.3]])  # 2 samples, 3 classes
+
+# Softmax across classes (dim=1)
+probs = torch.softmax(logits, dim=1)
+print(probs)
+# tensor([[0.6364, 0.2341, 0.1295],
+#         [0.1899, 0.6841, 0.1260]])
+
+print(probs.sum(dim=1))  # tensor([1., 1.])
+# Each row sums to 1!
+
+# Softmax across samples (dim=0) - unusual!
+probs_dim0 = torch.softmax(logits, dim=0)
+print(probs_dim0.sum(dim=0))  # tensor([1., 1., 1.])
+# Each column sums to 1
+```
+
+**Rule:** Use `dim=1` for batch processing (softmax across classes for each sample)!
+
+## Common Mistakes
+
+### ❌ Mistake 1: Softmax Before CrossEntropyLoss
+
+```python
+# WRONG - softmax applied twice!
+logits = model(x)
+probs = torch.softmax(logits, dim=1)
+loss = nn.CrossEntropyLoss()(probs, targets)  # ← ERROR!
+
+# CORRECT - CrossEntropy includes softmax
+logits = model(x)
+loss = nn.CrossEntropyLoss()(logits, targets)  # ← Correct!
+```
+
+### ❌ Mistake 2: Wrong Dimension
+
+```python
+# Logits shape: (batch_size, num_classes)
+logits = torch.randn(32, 10)
+
+# WRONG - softmax across batch
+probs = torch.softmax(logits, dim=0)  # ← Each class sums to 1 (weird!)
+
+# CORRECT - softmax across classes
+probs = torch.softmax(logits, dim=1)  # ← Each sample sums to 1
+```
+
+## Key Takeaways
+
+✓ **Converts to probabilities:** All outputs between 0 and 1
+
+✓ **Sums to 1:** All probabilities add up to exactly 1
+
+✓ **Multi-class:** For 3+ classes (cat, dog, bird, etc.)
+
+✓ **Amplifies differences:** exp() makes large logits dominate
+
+✓ **CrossEntropy includes it:** Don't apply softmax before loss!
+
+✓ **Use dim=1:** For batch processing (softmax per sample)
+
+**Quick Reference:**
+
+```python
+# Using softmax
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Method 1: Module
+softmax_layer = nn.Softmax(dim=1)
+probs = softmax_layer(logits)
+
+# Method 2: Functional (most common)
+probs = F.softmax(logits, dim=1)
+
+# Method 3: Direct
+probs = torch.softmax(logits, dim=1)
+
+# For training with CrossEntropyLoss
+criterion = nn.CrossEntropyLoss()  # Includes softmax!
+loss = criterion(logits, targets)   # Don't softmax first!
+
+# For inference
+with torch.no_grad():
+    logits = model(x)
+    probs = F.softmax(logits, dim=1)
+    prediction = torch.argmax(probs, dim=1)
+```
+
+**When to use Softmax:**
+- ✓ Multi-class classification output (3+ classes)
+- ✓ When you need probability distribution
+- ✓ Attention mechanisms
+- ✗ Binary classification (use sigmoid instead)
+- ✗ Regression (use linear output)
+
+**Remember:** Softmax for multi-class, Sigmoid for binary! 🎉
diff --git a/public/content/learn/activation-functions/softmax/softmax-transformation.png b/public/content/learn/activation-functions/softmax/softmax-transformation.png
new file mode 100644
index 0000000..5dc59c5
Binary files /dev/null and b/public/content/learn/activation-functions/softmax/softmax-transformation.png differ
diff --git a/public/content/learn/activation-functions/swiglu/glu-variants.png b/public/content/learn/activation-functions/swiglu/glu-variants.png
new file mode 100644
index 0000000..bf7ed1f
Binary files /dev/null and b/public/content/learn/activation-functions/swiglu/glu-variants.png differ
diff --git a/public/content/learn/activation-functions/swiglu/swiglu-architecture.png b/public/content/learn/activation-functions/swiglu/swiglu-architecture.png
new file mode 100644
index 0000000..5330534
Binary files /dev/null and b/public/content/learn/activation-functions/swiglu/swiglu-architecture.png differ
diff --git a/public/content/learn/activation-functions/swiglu/swiglu-content.md b/public/content/learn/activation-functions/swiglu/swiglu-content.md
new file mode 100644
index 0000000..120ee40
--- /dev/null
+++ b/public/content/learn/activation-functions/swiglu/swiglu-content.md
@@ -0,0 +1,315 @@
+---
+hero:
+  title: "SwiGLU"
+  subtitle: "Swish-Gated Linear Unit - Advanced Activation"
+  tags:
+    - "⚡ Activation Functions"
+    - "⏱️ 10 min read"
+---
+
+SwiGLU is a **gated activation function** used in state-of-the-art language models like LLaMA and PaLM. It's more complex than ReLU but much more powerful!
+
+## The Concept: Gating
+
+**Gating = One path controls another path**
+
+Think of it like a smart light switch - one signal decides how much of another signal gets through!
+
+![SwiGLU Architecture](/content/learn/activation-functions/swiglu/swiglu-architecture.png)
+
+## The Formula
+
+**SwiGLU(x) = SiLU(W₁(x)) ⊙ V(x)**
+
+Where:
+- `W₁(x)` = first linear transformation
+- `SiLU()` = activation (swish)
+- `V(x)` = second linear transformation (gate)
+- `⊙` = element-wise multiplication
+
+**In plain English:**
+1. Split input into two paths
+2. Apply SiLU to first path
+3. Keep second path as-is
+4. Multiply them together element-wise
+
+## How It Works
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+class SwiGLU(nn.Module):
+    def __init__(self, dim):
+        super().__init__()
+        self.W1 = nn.Linear(dim, dim)
+        self.V = nn.Linear(dim, dim)
+        self.silu = nn.SiLU()
+    
+    def forward(self, x):
+        # Path 1: Linear + SiLU
+        gate = self.silu(self.W1(x))
+        
+        # Path 2: Linear only
+        value = self.V(x)
+        
+        # Multiply together
+        output = gate * value
+        return output
+
+# Test
+swiglu = SwiGLU(dim=128)
+x = torch.randn(32, 128)  # Batch of 32
+output = swiglu(x)
+
+print(output.shape)  # torch.Size([32, 128])
+```
+
+**Manual calculation (simplified):**
+
+```yaml
+Input x = [1.0, 2.0, 3.0]
+
+Path 1 (Gate):
+  W1(x) = [-0.5, 2.0, 1.0]
+  SiLU(W1(x)) = [-0.19, 1.76, 0.73]
+
+Path 2 (Value):
+  V(x) = [0.8, -1.2, 2.0]
+
+Element-wise multiply:
+  [-0.19 * 0.8,  1.76 * -1.2,  0.73 * 2.0]
+  = [-0.15, -2.11, 1.46]
+
+The gate controls how much of value passes through!
+```
+
+## Why SwiGLU is Powerful
+
+### 1. Gating Mechanism
+
+```python
+# Gating allows selective information flow
+gate = torch.tensor([0.1, 0.5, 0.9])  # Low, medium, high gates
+value = torch.tensor([5.0, 5.0, 5.0])  # Same values
+
+output = gate * value
+print(output)
+# tensor([0.5, 2.5, 4.5])
+
+# Gate controls how much gets through!
+```
+
+### 2. Double the Parameters (More Capacity)
+
+```yaml
+Regular FFN:
+  Linear(dim, 4*dim) → ReLU → Linear(4*dim, dim)
+  Parameters: dim*4*dim + 4*dim*dim = 8*dim²
+
+SwiGLU:
+  Two parallel linears + gating
+  Parameters: Slightly more (~1.5x FFN)
+  
+But: Better performance despite similar size!
+```
+
+### 3. Smooth Activation (SiLU)
+
+Using SiLU instead of ReLU provides smooth gradients!
+
+## The GLU Family
+
+![GLU Variants](/content/learn/activation-functions/swiglu/glu-variants.png)
+
+All GLU variants follow the same pattern:
+
+```yaml
+GLU:     σ(W(x)) ⊙ V(x)    ← Sigmoid gate
+ReGLU:   ReLU(W(x)) ⊙ V(x)  ← ReLU gate
+GEGLU:   GELU(W(x)) ⊙ V(x)  ← GELU gate
+SwiGLU:  SiLU(W(x)) ⊙ V(x)  ← SiLU gate (best!)
+```
+
+**Performance ranking (empirical):**
+
+```yaml
+Best:  SwiGLU ≈ GEGLU
+Good:  ReGLU
+Original: GLU
+```
+
+## Using SwiGLU in Transformers
+
+SwiGLU is used in the feedforward network (FFN) of transformers:
+
+```python
+import torch
+import torch.nn as nn
+
+class SwiGLUFFN(nn.Module):
+    """Feedforward network with SwiGLU"""
+    def __init__(self, dim, hidden_dim=None):
+        super().__init__()
+        if hidden_dim is None:
+            hidden_dim = int(dim * 8/3)  # Adjusted for gating
+        
+        self.W1 = nn.Linear(dim, hidden_dim, bias=False)
+        self.V = nn.Linear(dim, hidden_dim, bias=False)
+        self.W2 = nn.Linear(hidden_dim, dim, bias=False)
+        self.silu = nn.SiLU()
+    
+    def forward(self, x):
+        # SwiGLU activation
+        gate = self.silu(self.W1(x))
+        value = self.V(x)
+        hidden = gate * value
+        
+        # Project back
+        output = self.W2(hidden)
+        return output
+
+# Example usage in transformer block
+class TransformerBlock(nn.Module):
+    def __init__(self, dim):
+        super().__init__()
+        self.attention = nn.MultiheadAttention(dim, num_heads=8)
+        self.ffn = SwiGLUFFN(dim)  # ← SwiGLU FFN
+        self.norm1 = nn.LayerNorm(dim)
+        self.norm2 = nn.LayerNorm(dim)
+    
+    def forward(self, x):
+        # Attention block
+        x = x + self.attention(self.norm1(x), self.norm1(x), self.norm1(x))[0]
+        
+        # FFN block with SwiGLU
+        x = x + self.ffn(self.norm2(x))
+        return x
+```
+
+## Where SwiGLU is Used
+
+**Major models using SwiGLU:**
+- **LLaMA** (Meta's language model)
+- **PaLM** (Google's language model)
+- **GPT-J** (EleutherAI)
+- Many other modern LLMs
+
+**Why they chose SwiGLU:**
+
+```yaml
+Research findings:
+  - Better performance than standard FFN
+  - Improved training stability
+  - Smoother optimization
+  - State-of-the-art results
+
+Trade-off: Slightly more parameters, but worth it!
+```
+
+## Practical Example: LLaMA-style FFN
+
+```python
+import torch
+import torch.nn as nn
+
+class LLaMAFFN(nn.Module):
+    """FFN from LLaMA (uses SwiGLU)"""
+    def __init__(self, dim=4096, hidden_dim=11008):
+        super().__init__()
+        self.gate_proj = nn.Linear(dim, hidden_dim, bias=False)  # W1
+        self.up_proj = nn.Linear(dim, hidden_dim, bias=False)    # V
+        self.down_proj = nn.Linear(hidden_dim, dim, bias=False)  # W2
+        self.silu = nn.SiLU()
+    
+    def forward(self, x):
+        # SwiGLU
+        gate = self.silu(self.gate_proj(x))
+        up = self.up_proj(x)
+        hidden = gate * up
+        
+        # Project back down
+        output = self.down_proj(hidden)
+        return output
+
+# Test
+ffn = LLaMAFFN(dim=512, hidden_dim=1376)  # Smaller for demo
+x = torch.randn(2, 10, 512)  # Batch=2, seq_len=10, dim=512
+output = ffn(x)
+
+print(output.shape)  # torch.Size([2, 10, 512])
+```
+
+## Implementation Tips
+
+### Efficient Implementation
+
+```python
+import torch
+import torch.nn as nn
+
+class EfficientSwiGLU(nn.Module):
+    """Efficient SwiGLU with combined projection"""
+    def __init__(self, dim, hidden_dim):
+        super().__init__()
+        # Combine W1 and V into single matrix for efficiency
+        self.combined = nn.Linear(dim, hidden_dim * 2, bias=False)
+        self.down = nn.Linear(hidden_dim, dim, bias=False)
+        self.silu = nn.SiLU()
+    
+    def forward(self, x):
+        # Single matrix multiply, then split
+        combined = self.combined(x)
+        gate, value = combined.chunk(2, dim=-1)
+        
+        # SwiGLU
+        hidden = self.silu(gate) * value
+        output = self.down(hidden)
+        return output
+```
+
+## Key Takeaways
+
+✓ **Gated activation:** One path controls another
+
+✓ **Formula:** SiLU(W₁(x)) ⊙ V(x)
+
+✓ **State-of-the-art:** Used in LLaMA, PaLM, and modern LLMs
+
+✓ **Better than FFN:** Outperforms standard ReLU-based networks
+
+✓ **Smooth:** Thanks to SiLU activation
+
+✓ **More parameters:** But worth it for performance
+
+**Quick Reference:**
+
+```python
+# Basic SwiGLU implementation
+class SwiGLU(nn.Module):
+    def __init__(self, dim, hidden_dim):
+        super().__init__()
+        self.W1 = nn.Linear(dim, hidden_dim)
+        self.V = nn.Linear(dim, hidden_dim)
+        self.W2 = nn.Linear(hidden_dim, dim)
+    
+    def forward(self, x):
+        gate = torch.nn.functional.silu(self.W1(x))
+        value = self.V(x)
+        hidden = gate * value
+        return self.W2(hidden)
+
+# Usage
+swiglu = SwiGLU(dim=512, hidden_dim=2048)
+output = swiglu(input_tensor)
+```
+
+**When to use SwiGLU:**
+- ✓ Transformer feedforward networks
+- ✓ Large language models
+- ✓ When you want state-of-the-art performance
+- ✓ Modern architectures
+
+**Remember:** SwiGLU is the advanced gating mechanism powering modern LLMs! 🎉
diff --git a/public/content/learn/activation-functions/tanh/tanh-content.md b/public/content/learn/activation-functions/tanh/tanh-content.md
new file mode 100644
index 0000000..648d448
--- /dev/null
+++ b/public/content/learn/activation-functions/tanh/tanh-content.md
@@ -0,0 +1,323 @@
+---
+hero:
+  title: "Tanh"
+  subtitle: "Hyperbolic Tangent - Zero-centered Activation"
+  tags:
+    - "⚡ Activation Functions"
+    - "⏱️ 10 min read"
+---
+
+Tanh (hyperbolic tangent) is like Sigmoid's **zero-centered cousin**. It squashes inputs to the range **[-1, 1]** instead of [0, 1].
+
+## The Formula
+
+**tanh(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)**
+
+Or equivalently: **tanh(x) = 2·σ(2x) - 1** (scaled and shifted sigmoid)
+
+![Tanh Graph](/content/learn/activation-functions/tanh/tanh-graph.png)
+
+```yaml
+Input → -∞  →  Output → -1
+Input = 0   →  Output = 0
+Input → +∞  →  Output → +1
+
+Key property: Output is always in (-1, 1)
+Zero-centered! (unlike sigmoid)
+```
+
+## How It Works
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# Create tanh activation
+tanh = nn.Tanh()
+
+# Test with different values
+x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
+output = tanh(x)
+
+print(output)
+# tensor([-0.9640, -0.7616,  0.0000,  0.7616,  0.9640])
+```
+
+**Manual calculation:**
+
+```yaml
+Input:  [-2.0, -1.0,  0.0,  1.0,  2.0]
+         ↓      ↓      ↓     ↓     ↓
+Tanh:   -0.96  -0.76  0.00  0.76  0.96
+         ↓      ↓      ↓     ↓     ↓
+Range:  All values between -1 and 1
+```
+
+## The Zero-Centered Advantage
+
+**This is tanh's superpower:** outputs are centered around zero!
+
+![Tanh vs Sigmoid](/content/learn/activation-functions/tanh/tanh-vs-sigmoid.png)
+
+**Example:**
+
+```python
+import torch
+
+x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
+
+# Tanh: zero-centered
+tanh_out = torch.tanh(x)
+print(tanh_out.mean())
+# tensor(0.0000) ← Mean is zero!
+
+# Sigmoid: NOT zero-centered
+sigmoid_out = torch.sigmoid(x)
+print(sigmoid_out.mean())
+# tensor(0.5000) ← Mean is 0.5
+```
+
+**Why zero-centered is better:**
+
+```yaml
+Zero-centered (tanh):
+  ✓ Gradients can be positive or negative
+  ✓ Faster convergence
+  ✓ More stable training
+  ✓ Better for hidden layers
+
+Not zero-centered (sigmoid):
+  ✗ All gradients have same sign
+  ✗ Slower learning
+  ✗ Zig-zag optimization path
+```
+
+## In Code (Simple Implementation)
+
+```python
+import torch
+
+def tanh_manual(x):
+    """Manual tanh implementation"""
+    exp_x = torch.exp(x)
+    exp_neg_x = torch.exp(-x)
+    return (exp_x - exp_neg_x) / (exp_x + exp_neg_x)
+
+# Test it
+x = torch.tensor([-1.0, 0.0, 1.0])
+output = tanh_manual(x)
+print(output)
+# tensor([-0.7616,  0.0000,  0.7616])
+
+# Verify against PyTorch
+print(torch.tanh(x))
+# tensor([-0.7616,  0.0000,  0.7616]) ← Same!
+```
+
+## Using Tanh in PyTorch
+
+### Method 1: As a Layer
+
+```python
+import torch.nn as nn
+
+model = nn.Sequential(
+    nn.Linear(10, 20),
+    nn.Tanh(),      # ← Tanh activation
+    nn.Linear(20, 5),
+    nn.Tanh(),      # ← Another tanh
+    nn.Linear(5, 1)
+)
+```
+
+### Method 2: As a Function
+
+```python
+import torch
+import torch.nn.functional as F
+
+x = torch.randn(5, 10)
+output = F.tanh(x)  # or torch.tanh(x)
+```
+
+## Practical Example: RNN/LSTM
+
+Tanh is commonly used in recurrent neural networks:
+
+```python
+import torch
+import torch.nn as nn
+
+class SimpleRNN(nn.Module):
+    def __init__(self, input_size, hidden_size):
+        super().__init__()
+        self.hidden_size = hidden_size
+        self.i2h = nn.Linear(input_size, hidden_size)
+        self.h2h = nn.Linear(hidden_size, hidden_size)
+    
+    def forward(self, x, hidden):
+        # Combine input and hidden state
+        combined = self.i2h(x) + self.h2h(hidden)
+        
+        # Apply tanh
+        new_hidden = torch.tanh(combined)  # ← Tanh here!
+        return new_hidden
+
+# Initialize
+rnn = SimpleRNN(input_size=10, hidden_size=20)
+x = torch.randn(5, 10)  # 5 samples
+h = torch.zeros(5, 20)  # Initial hidden state
+
+# Forward pass
+new_h = rnn(x, h)
+print(new_h.shape)  # torch.Size([5, 20])
+print(new_h.min(), new_h.max())
+# All values between -1 and 1!
+```
+
+## Tanh vs Sigmoid vs ReLU
+
+```yaml
+Tanh:
+  ✓ Zero-centered (best for hidden layers)
+  ✓ Output range: [-1, 1]
+  ✓ Smooth gradient
+  ✗ Vanishing gradient problem
+  ✗ Slower than ReLU (exponentials)
+  
+Sigmoid:
+  ✓ Output range: [0, 1] (probabilities)
+  ✓ Smooth gradient
+  ✗ NOT zero-centered
+  ✗ Vanishing gradient problem
+  ✗ Slower than ReLU
+  
+ReLU:
+  ✓ Fast (no exponentials)
+  ✓ No vanishing gradient for x > 0
+  ✓ Creates sparsity
+  ✗ NOT smooth at zero
+  ✗ Dying ReLU problem
+  ✗ NOT zero-centered
+```
+
+**When to use each:**
+
+```yaml
+Hidden layers:
+  Modern: ReLU (fastest, works well)
+  Classical: Tanh (zero-centered)
+  Rarely: Sigmoid (not zero-centered)
+
+Output layer:
+  Binary classification: Sigmoid
+  Multi-class: Softmax
+  Regression: None (linear)
+  
+RNN/LSTM:
+  Gates: Sigmoid
+  State update: Tanh
+```
+
+## The Vanishing Gradient Problem
+
+Like sigmoid, tanh suffers from vanishing gradients:
+
+```python
+import torch
+
+# Large input
+x = torch.tensor([5.0], requires_grad=True)
+y = torch.tanh(x)
+y.backward()
+
+print(f"Output: {y.item():.6f}")  # 0.999909
+print(f"Gradient: {x.grad.item():.6f}")  # 0.000181
+# Gradient is tiny!
+```
+
+**Why this happens:**
+
+```yaml
+For large |x|:
+  Output saturates (near -1 or +1)
+  Gradient becomes very small
+  Learning slows down
+  
+This is why ReLU replaced tanh in most modern networks!
+```
+
+## Relationship to Sigmoid
+
+Tanh is actually just a rescaled sigmoid:
+
+```python
+import torch
+
+x = torch.tensor([0.5, 1.0, 1.5])
+
+# Tanh
+tanh_output = torch.tanh(x)
+
+# Same as scaled sigmoid
+sigmoid_output = 2 * torch.sigmoid(2*x) - 1
+
+print(tanh_output)
+# tensor([0.4621, 0.7616, 0.9051])
+
+print(sigmoid_output)
+# tensor([0.4621, 0.7616, 0.9051])
+
+# They're the same!
+```
+
+**Mathematical relationship:**
+
+```yaml
+tanh(x) = 2·sigmoid(2x) - 1
+
+Proof:
+  sigmoid(x) gives [0, 1]
+  2·sigmoid(2x) gives [0, 2]
+  2·sigmoid(2x) - 1 gives [-1, 1] ← tanh range!
+```
+
+## Key Takeaways
+
+✓ **S-shaped curve:** Like sigmoid but zero-centered
+
+✓ **Output range:** Always between -1 and 1
+
+✓ **Zero-centered:** Better than sigmoid for hidden layers
+
+✓ **Formula:** (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)
+
+✓ **Common in RNNs:** Used in LSTM/GRU cells
+
+✓ **Vanishing gradients:** Mostly replaced by ReLU in modern networks
+
+**Quick Reference:**
+
+```python
+# Using tanh
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Method 1: Module
+tanh_layer = nn.Tanh()
+output = tanh_layer(x)
+
+# Method 2: Functional
+output = F.tanh(x)
+
+# Method 3: Direct
+output = torch.tanh(x)
+
+# Method 4: Manual
+output = (torch.exp(x) - torch.exp(-x)) / (torch.exp(x) + torch.exp(-x))
+```
+
+**Remember:** Tanh is zero-centered sigmoid. Use it for RNN states, but ReLU is faster for feedforward! 🎉
diff --git a/public/content/learn/activation-functions/tanh/tanh-graph.png b/public/content/learn/activation-functions/tanh/tanh-graph.png
new file mode 100644
index 0000000..e50256c
Binary files /dev/null and b/public/content/learn/activation-functions/tanh/tanh-graph.png differ
diff --git a/public/content/learn/activation-functions/tanh/tanh-vs-sigmoid.png b/public/content/learn/activation-functions/tanh/tanh-vs-sigmoid.png
new file mode 100644
index 0000000..ca6dc6c
Binary files /dev/null and b/public/content/learn/activation-functions/tanh/tanh-vs-sigmoid.png differ
diff --git a/public/content/learn/attention-mechanism/applying-attention-weights/applying-attention-weights-content.md b/public/content/learn/attention-mechanism/applying-attention-weights/applying-attention-weights-content.md
new file mode 100644
index 0000000..04ecf2c
--- /dev/null
+++ b/public/content/learn/attention-mechanism/applying-attention-weights/applying-attention-weights-content.md
@@ -0,0 +1,93 @@
+---
+hero:
+  title: "Applying Attention Weights"
+  subtitle: "Combining Values with Attention"
+  tags:
+    - "🎯 Attention"
+    - "⏱️ 8 min read"
+---
+
+After calculating attention weights, we use them to create a **weighted combination of values**!
+
+## The Final Step
+
+**Output = Attention_Weights × Values**
+
+```python
+import torch
+
+# Attention weights (from softmax)
+attn_weights = torch.tensor([[0.5, 0.3, 0.2],   # Position 0 attends to...
+                             [0.1, 0.7, 0.2],   # Position 1 attends to...
+                             [0.4, 0.3, 0.3]])  # Position 2 attends to...
+
+# Values (what information each position has)
+V = torch.tensor([[1.0, 2.0],   # Position 0 value
+                  [3.0, 4.0],   # Position 1 value
+                  [5.0, 6.0]])  # Position 2 value
+
+# Apply attention
+output = attn_weights @ V
+
+print(output)
+# tensor([[2.2000, 3.2000],
+#         [2.8000, 3.8000],
+#         [2.6000, 3.6000]])
+```
+
+**Manual calculation for position 0:**
+
+```yaml
+Position 0 output:
+  = 0.5 × [1.0, 2.0] + 0.3 × [3.0, 4.0] + 0.2 × [5.0, 6.0]
+  = [0.5, 1.0] + [0.9, 1.2] + [1.0, 1.2]
+  = [2.4, 3.4]
+  
+This is a weighted average!
+```
+
+## Complete Attention
+
+```python
+import torch
+import torch.nn.functional as F
+
+def attention(Q, K, V):
+    """Complete attention mechanism"""
+    # 1. Compute scores
+    d_k = Q.size(-1)
+    scores = Q @ K.transpose(-2, -1) / (d_k ** 0.5)
+    
+    # 2. Softmax to get weights
+    attn_weights = F.softmax(scores, dim=-1)
+    
+    # 3. Apply to values
+    output = attn_weights @ V
+    
+    return output, attn_weights
+
+# Test
+Q = torch.randn(1, 5, 64)
+K = torch.randn(1, 5, 64)
+V = torch.randn(1, 5, 64)
+
+output, weights = attention(Q, K, V)
+print(output.shape)  # torch.Size([1, 5, 64])
+```
+
+## Key Takeaways
+
+✓ **Final step:** Multiply attention weights by values
+
+✓ **Weighted average:** Combines information by relevance
+
+✓ **Output:** Context-aware representation
+
+**Quick Reference:**
+
+```python
+# Attention output
+output = attention_weights @ V
+```
+
+**Remember:** Attention weights select which values to use! 🎉
diff --git a/public/content/learn/attention-mechanism/attention-in-code/attention-in-code-content.md b/public/content/learn/attention-mechanism/attention-in-code/attention-in-code-content.md
new file mode 100644
index 0000000..0d312ed
--- /dev/null
+++ b/public/content/learn/attention-mechanism/attention-in-code/attention-in-code-content.md
@@ -0,0 +1,96 @@
+---
+hero:
+  title: "Attention in Code"
+  subtitle: "Complete Attention Implementation"
+  tags:
+    - "🎯 Attention"
+    - "⏱️ 10 min read"
+---
+
+Here's the complete, production-ready attention implementation!
+
+## Full Implementation
+
+```python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+class ScaledDotProductAttention(nn.Module):
+    def __init__(self, dropout=0.1):
+        super().__init__()
+        self.dropout = nn.Dropout(dropout)
+    
+    def forward(self, Q, K, V, mask=None):
+        # Q, K, V: (batch, heads, seq_len, head_dim)
+        
+        d_k = Q.size(-1)
+        
+        # Compute attention scores
+        scores = Q @ K.transpose(-2, -1) / (d_k ** 0.5)
+        
+        # Apply mask if provided
+        if mask is not None:
+            scores = scores.masked_fill(mask == 0, float('-inf'))
+        
+        # Softmax
+        attn_weights = F.softmax(scores, dim=-1)
+        attn_weights = self.dropout(attn_weights)
+        
+        # Apply to values
+        output = attn_weights @ V
+        
+        return output, attn_weights
+
+# Use it
+attention = ScaledDotProductAttention()
+Q = torch.randn(2, 8, 10, 64)  # batch=2, heads=8, seq=10, dim=64
+K = torch.randn(2, 8, 10, 64)
+V = torch.randn(2, 8, 10, 64)
+
+output, weights = attention(Q, K, V)
+print(output.shape)  # torch.Size([2, 8, 10, 64])
+```
+
+## With Masking
+
+```python
+# Create causal mask (for autoregressive models)
+def create_causal_mask(seq_len):
+    mask = torch.triu(torch.ones(seq_len, seq_len), diagonal=1)
+    return mask == 0  # True where we CAN attend
+
+mask = create_causal_mask(5)
+print(mask)
+# tensor([[ True, False, False, False, False],
+#         [ True,  True, False, False, False],
+#         [ True,  True,  True, False, False],
+#         [ True,  True,  True,  True, False],
+#         [ True,  True,  True,  True,  True]])
+
+# Position 0 can only attend to position 0
+# Position 1 can attend to positions 0, 1
+# etc.
+```
+
+## PyTorch Implementation
+
+```python
+# Using PyTorch's built-in
+attention = nn.MultiheadAttention(embed_dim=512, num_heads=8)
+
+x = torch.randn(10, 32, 512)  # (seq, batch, embed)
+output, attn_weights = attention(x, x, x)
+
+print(output.shape)  # torch.Size([10, 32, 512])
+```
+
+## Key Takeaways
+
+✓ **Complete function:** Q, K, V → Output
+
+✓ **Masking:** Controls what can attend to what
+
+✓ **PyTorch built-in:** Use `nn.MultiheadAttention`
+
+**Remember:** Attention is just a few lines of code! 🎉
diff --git a/public/content/learn/attention-mechanism/calculating-attention-scores/attention-matrix.png b/public/content/learn/attention-mechanism/calculating-attention-scores/attention-matrix.png
new file mode 100644
index 0000000..1b708b6
Binary files /dev/null and b/public/content/learn/attention-mechanism/calculating-attention-scores/attention-matrix.png differ
diff --git a/public/content/learn/attention-mechanism/calculating-attention-scores/calculating-attention-scores-content.md b/public/content/learn/attention-mechanism/calculating-attention-scores/calculating-attention-scores-content.md
new file mode 100644
index 0000000..11d565e
--- /dev/null
+++ b/public/content/learn/attention-mechanism/calculating-attention-scores/calculating-attention-scores-content.md
@@ -0,0 +1,124 @@
+---
+hero:
+  title: "Calculating Attention Scores"
+  subtitle: "Computing Query-Key-Value Similarities"
+  tags:
+    - "🎯 Attention"
+    - "⏱️ 10 min read"
+---
+
+Attention scores measure **how much each position should attend to every other position**!
+
+![Attention Matrix](/content/learn/attention-mechanism/calculating-attention-scores/attention-matrix.png)
+
+## The Formula
+
+**Score = Q × Kᵀ / √d**
+
+Where:
+- Q = Query matrix
+- K = Key matrix  
+- d = dimension size
+- √d = scaling factor
+
+```python
+import torch
+import torch.nn.functional as F
+
+# Query and Key
+Q = torch.randn(1, 10, 64)  # (batch, seq_len, dim)
+K = torch.randn(1, 10, 64)
+
+# Compute scores
+scores = Q @ K.transpose(-2, -1)  # (1, 10, 10)
+scores = scores / (64 ** 0.5)     # Scale by √d
+
+# Convert to probabilities
+attn_weights = F.softmax(scores, dim=-1)
+
+print(attn_weights.shape)  # torch.Size([1, 10, 10])
+print(attn_weights[0, 0].sum())  # tensor(1.0) ← Sums to 1!
+```
+
+## Step-by-Step Example
+
+```python
+import torch
+import torch.nn.functional as F
+
+# Simple example: 3 positions, 4-dim embeddings
+Q = torch.tensor([[1.0, 0.0, 1.0, 0.0],
+                  [0.0, 1.0, 0.0, 1.0],
+                  [1.0, 1.0, 0.0, 0.0]])  # (3, 4)
+
+K = torch.tensor([[1.0, 0.0, 1.0, 0.0],
+                  [0.0, 1.0, 0.0, 1.0],
+                  [0.5, 0.5, 0.5, 0.5]])  # (3, 4)
+
+# 1. Dot product
+scores = Q @ K.T  # (3, 3)
+print("Raw scores:")
+print(scores)
+
+# 2. Scale
+d_k = 4
+scaled_scores = scores / (d_k ** 0.5)
+print("\\nScaled scores:")
+print(scaled_scores)
+
+# 3. Softmax
+attn_weights = F.softmax(scaled_scores, dim=-1)
+print("\\nAttention weights:")
+print(attn_weights)
+# Each row sums to 1!
+```
+
+## Why Scaling?
+
+```yaml
+Without scaling (√d):
+  Large dot products → large scores
+  Softmax saturates → gradients vanish
+  
+With scaling:
+  Controlled scores
+  Stable softmax
+  Better gradients
+```
+
+## Attention Matrix
+
+```python
+# The attention matrix shows who attends to whom
+attn_matrix = torch.softmax(Q @ K.T / (d ** 0.5), dim=-1)
+
+print(attn_matrix)
+#        Pos 0  Pos 1  Pos 2
+# Pos 0 [[0.5,   0.2,   0.3],   ← Position 0 attends to all positions
+# Pos 1  [0.1,   0.7,   0.2],   ← Position 1 mostly attends to itself
+# Pos 2  [0.4,   0.3,   0.3]]   ← Position 2 attends evenly
+```
+
+## Key Takeaways
+
+✓ **Scores:** Measure similarity (dot product)
+
+✓ **Scaling:** Divide by √d for stability
+
+✓ **Softmax:** Convert to probabilities
+
+✓ **Matrix:** Shows all attention connections
+
+**Quick Reference:**
+
+```python
+# Compute attention scores
+scores = Q @ K.transpose(-2, -1)
+scores = scores / (d_k ** 0.5)
+attn_weights = F.softmax(scores, dim=-1)
+
+# Apply to values
+output = attn_weights @ V
+```
+
+**Remember:** Scores tell us where to pay attention! 🎉
diff --git a/public/content/learn/attention-mechanism/multi-head-attention/multi-head-attention-content.md b/public/content/learn/attention-mechanism/multi-head-attention/multi-head-attention-content.md
new file mode 100644
index 0000000..ee3d9c9
--- /dev/null
+++ b/public/content/learn/attention-mechanism/multi-head-attention/multi-head-attention-content.md
@@ -0,0 +1,87 @@
+---
+hero:
+  title: "Multi-Head Attention"
+  subtitle: "Multiple Attention Mechanisms in Parallel"
+  tags:
+    - "🎯 Attention"
+    - "⏱️ 10 min read"
+---
+
+Multi-head attention runs **multiple attention mechanisms in parallel**, each focusing on different aspects!
+
+![Multi-Head Visual](/content/learn/attention-mechanism/multi-head-attention/multi-head-visual.png)
+
+## The Idea
+
+Instead of one attention:
+- Run 8 (or more) attention heads in parallel
+- Each head learns different patterns
+- Concatenate and project outputs
+
+```python
+import torch
+import torch.nn as nn
+
+# Single-head attention
+single_head = nn.MultiheadAttention(embed_dim=512, num_heads=1)
+
+# Multi-head attention (8 heads)
+multi_head = nn.MultiheadAttention(embed_dim=512, num_heads=8)
+
+x = torch.randn(10, 32, 512)  # (seq_len, batch, embed_dim)
+output, attn_weights = multi_head(x, x, x)
+
+print(output.shape)  # torch.Size([10, 32, 512])
+```
+
+## Implementation
+
+```python
+class MultiHeadAttention(nn.Module):
+    def __init__(self, embed_dim, num_heads):
+        super().__init__()
+        self.num_heads = num_heads
+        self.head_dim = embed_dim // num_heads
+        
+        self.q_linear = nn.Linear(embed_dim, embed_dim)
+        self.k_linear = nn.Linear(embed_dim, embed_dim)
+        self.v_linear = nn.Linear(embed_dim, embed_dim)
+        self.out_linear = nn.Linear(embed_dim, embed_dim)
+    
+    def forward(self, x):
+        batch_size, seq_len, embed_dim = x.size()
+        
+        # Project and split into heads
+        Q = self.q_linear(x).view(batch_size, seq_len, self.num_heads, self.head_dim)
+        K = self.k_linear(x).view(batch_size, seq_len, self.num_heads, self.head_dim)
+        V = self.v_linear(x).view(batch_size, seq_len, self.num_heads, self.head_dim)
+        
+        # Transpose for attention
+        Q = Q.transpose(1, 2)  # (batch, heads, seq, head_dim)
+        K = K.transpose(1, 2)
+        V = V.transpose(1, 2)
+        
+        # Attention for each head
+        scores = Q @ K.transpose(-2, -1) / (self.head_dim ** 0.5)
+        attn = F.softmax(scores, dim=-1)
+        output = attn @ V
+        
+        # Concatenate heads
+        output = output.transpose(1, 2).contiguous()
+        output = output.view(batch_size, seq_len, embed_dim)
+        
+        # Final projection
+        output = self.out_linear(output)
+        
+        return output
+```
+
+## Key Takeaways
+
+✓ **Multiple heads:** Each learns different patterns
+
+✓ **Parallel:** All heads run simultaneously
+
+✓ **Standard:** 8 heads is common
+
+**Remember:** More heads = more ways to pay attention! 🎉
diff --git a/public/content/learn/attention-mechanism/multi-head-attention/multi-head-visual.png b/public/content/learn/attention-mechanism/multi-head-attention/multi-head-visual.png
new file mode 100644
index 0000000..b0678a6
Binary files /dev/null and b/public/content/learn/attention-mechanism/multi-head-attention/multi-head-visual.png differ
diff --git a/public/content/learn/attention-mechanism/self-attention-from-scratch/self-attention-concept.png b/public/content/learn/attention-mechanism/self-attention-from-scratch/self-attention-concept.png
new file mode 100644
index 0000000..e76075a
Binary files /dev/null and b/public/content/learn/attention-mechanism/self-attention-from-scratch/self-attention-concept.png differ
diff --git a/public/content/learn/attention-mechanism/self-attention-from-scratch/self-attention-from-scratch-content.md b/public/content/learn/attention-mechanism/self-attention-from-scratch/self-attention-from-scratch-content.md
new file mode 100644
index 0000000..eecf67f
--- /dev/null
+++ b/public/content/learn/attention-mechanism/self-attention-from-scratch/self-attention-from-scratch-content.md
@@ -0,0 +1,99 @@
+---
+hero:
+  title: "Self Attention from Scratch"
+  subtitle: "Building Self-Attention from the Ground Up"
+  tags:
+    - "🎯 Attention"
+    - "⏱️ 10 min read"
+---
+
+Let's build self-attention from scratch - the core of transformers!
+
+![Self-Attention Concept](/content/learn/attention-mechanism/self-attention-from-scratch/self-attention-concept.png)
+
+## Complete Implementation
+
+```python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+class SelfAttention(nn.Module):
+    def __init__(self, embed_dim):
+        super().__init__()
+        self.embed_dim = embed_dim
+        
+        # Linear projections for Q, K, V
+        self.query = nn.Linear(embed_dim, embed_dim)
+        self.key = nn.Linear(embed_dim, embed_dim)
+        self.value = nn.Linear(embed_dim, embed_dim)
+    
+    def forward(self, x):
+        # x: (batch, seq_len, embed_dim)
+        
+        # Project to Q, K, V
+        Q = self.query(x)
+        K = self.key(x)
+        V = self.value(x)
+        
+        # Compute attention scores
+        scores = Q @ K.transpose(-2, -1)
+        scores = scores / (self.embed_dim ** 0.5)
+        
+        # Softmax
+        attn_weights = F.softmax(scores, dim=-1)
+        
+        # Apply to values
+        output = attn_weights @ V
+        
+        return output
+
+# Test
+attention = SelfAttention(embed_dim=64)
+x = torch.randn(2, 10, 64)  # Batch=2, seq=10, dim=64
+output = attention(x)
+print(output.shape)  # torch.Size([2, 10, 64])
+```
+
+## Step-by-Step Example
+
+```python
+import torch
+import torch.nn.functional as F
+
+# Input: 3 words, 4-dim embeddings
+x = torch.tensor([[1.0, 0.0, 1.0, 0.0],
+                  [0.0, 1.0, 0.0, 1.0],
+                  [1.0, 1.0, 0.0, 0.0]])
+
+# Create Q, K, V projections
+W_q = torch.randn(4, 4)
+W_k = torch.randn(4, 4)
+W_v = torch.randn(4, 4)
+
+# Compute Q, K, V
+Q = x @ W_q
+K = x @ W_k
+V = x @ W_v
+
+# Attention scores
+scores = Q @ K.T / (4 ** 0.5)
+attn_weights = F.softmax(scores, dim=-1)
+
+# Output
+output = attn_weights @ V
+
+print(output.shape)  # torch.Size([3, 4])
+```
+
+## Key Takeaways
+
+✓ **Self-attention:** Sequence attends to itself
+
+✓ **Q, K, V:** All come from same input
+
+✓ **Complete implementation:** ~20 lines of code
+
+✓ **Foundation:** Core of transformers
+
+**Remember:** Self-attention is simpler than it looks! 🎉
diff --git a/public/content/learn/attention-mechanism/what-is-attention/attention-concept.png b/public/content/learn/attention-mechanism/what-is-attention/attention-concept.png
new file mode 100644
index 0000000..914b992
Binary files /dev/null and b/public/content/learn/attention-mechanism/what-is-attention/attention-concept.png differ
diff --git a/public/content/learn/attention-mechanism/what-is-attention/qkv-mechanism.png b/public/content/learn/attention-mechanism/what-is-attention/qkv-mechanism.png
new file mode 100644
index 0000000..16998a7
Binary files /dev/null and b/public/content/learn/attention-mechanism/what-is-attention/qkv-mechanism.png differ
diff --git a/public/content/learn/attention-mechanism/what-is-attention/what-is-attention-content.md b/public/content/learn/attention-mechanism/what-is-attention/what-is-attention-content.md
new file mode 100644
index 0000000..aced9c9
--- /dev/null
+++ b/public/content/learn/attention-mechanism/what-is-attention/what-is-attention-content.md
@@ -0,0 +1,197 @@
+---
+hero:
+  title: "What is Attention"
+  subtitle: "Understanding the Attention Mechanism"
+  tags:
+    - "🎯 Attention"
+    - "⏱️ 10 min read"
+---
+
+Attention lets the model **focus on relevant parts** of the input, just like how you focus on important words when reading!
+
+![Attention Concept](/content/learn/attention-mechanism/what-is-attention/attention-concept.png)
+
+## The Core Idea
+
+**Attention = Weighted average based on relevance**
+
+Instead of treating all inputs equally, attention:
+1. Calculates how relevant each input is
+2. Weights inputs by relevance
+3. Combines them into output
+
+```yaml
+Without attention:
+  All words matter equally
+  "The cat sat on the mat"
+  → All words get same weight
+
+With attention:
+  Important words matter more
+  "The CAT sat on the mat"
+  → "cat" gets higher weight
+```
+
+## Simple Example
+
+```python
+import torch
+import torch.nn.functional as F
+
+# Input sequence (3 words, each 4-dim embedding)
+sequence = torch.tensor([[0.1, 0.2, 0.3, 0.4],  # word 1
+                         [0.5, 0.6, 0.7, 0.8],  # word 2
+                         [0.9, 1.0, 1.1, 1.2]]) # word 3
+
+# Attention scores (how important each word is)
+attention_weights = torch.tensor([0.1, 0.3, 0.6])  # word 3 most important
+
+# Weighted average
+output = torch.zeros(4)
+for i, weight in enumerate(attention_weights):
+    output += weight * sequence[i]
+
+print(output)
+# Mostly influenced by word 3 (weight 0.6)
+```
+
+## Query, Key, Value
+
+![QKV Mechanism](/content/learn/attention-mechanism/what-is-attention/qkv-mechanism.png)
+
+Attention uses three concepts:
+
+```yaml
+Query (Q): "What am I looking for?"
+Key (K): "What do I contain?"
+Value (V): "What information do I have?"
+
+Process:
+1. Compare Query with all Keys → scores
+2. Convert scores to weights (softmax)
+3. Weighted sum of Values
+```
+
+**Example:**
+
+```python
+import torch
+import torch.nn.functional as F
+
+# Query: what we're looking for
+query = torch.tensor([1.0, 0.0, 1.0])
+
+# Keys: what each position contains
+keys = torch.tensor([[1.0, 0.0, 1.0],  # Similar to query!
+                     [0.0, 1.0, 0.0],  # Different
+                     [1.0, 0.0, 0.8]]) # Somewhat similar
+
+# Values: actual information
+values = torch.tensor([[10.0, 20.0],
+                       [30.0, 40.0],
+                       [50.0, 60.0]])
+
+# 1. Compute attention scores (dot product)
+scores = keys @ query
+print("Scores:", scores)
+# tensor([2.0000, 0.0000, 1.8000])
+
+# 2. Convert to probabilities
+weights = F.softmax(scores, dim=0)
+print("Weights:", weights)
+# tensor([0.5308, 0.0874, 0.3818])
+
+# 3. Weighted sum of values
+output = torch.zeros(2)
+for i, weight in enumerate(weights):
+    output += weight * values[i]
+
+print("Output:", output)
+# Mostly from value 0 (weight 0.53)
+```
+
+## Why Attention is Powerful
+
+```yaml
+Before attention (RNNs):
+  Process sequence left-to-right
+  Hard to remember distant info
+  Slow (sequential)
+
+With attention (Transformers):
+  Look at ALL positions at once
+  Direct connections everywhere
+  Fast (parallel)
+  
+Result: Better at long sequences!
+```
+
+## Self-Attention
+
+**Self-attention: Sequence attends to itself**
+
+```python
+# Sentence: "The cat sat"
+# Each word attends to all words
+
+"The" attends to: The(0.3), cat(0.2), sat(0.5)
+"cat" attends to: The(0.4), cat(0.4), sat(0.2)
+"sat" attends to: The(0.1), cat(0.6), sat(0.3)
+
+# Each word builds context from others!
+```
+
+## Basic Implementation
+
+```python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+class SimpleAttention(nn.Module):
+    def __init__(self, embed_dim):
+        super().__init__()
+        self.query = nn.Linear(embed_dim, embed_dim)
+        self.key = nn.Linear(embed_dim, embed_dim)
+        self.value = nn.Linear(embed_dim, embed_dim)
+    
+    def forward(self, x):
+        # x shape: (batch, seq_len, embed_dim)
+        
+        # Compute Q, K, V
+        Q = self.query(x)
+        K = self.key(x)
+        V = self.value(x)
+        
+        # Attention scores
+        scores = Q @ K.transpose(-2, -1)
+        scores = scores / (Q.size(-1) ** 0.5)  # Scale
+        
+        # Attention weights
+        attn_weights = F.softmax(scores, dim=-1)
+        
+        # Weighted values
+        output = attn_weights @ V
+        
+        return output
+
+# Test
+attention = SimpleAttention(embed_dim=64)
+x = torch.randn(1, 10, 64)  # Batch=1, seq_len=10, dim=64
+output = attention(x)
+print(output.shape)  # torch.Size([1, 10, 64])
+```
+
+## Key Takeaways
+
+✓ **Attention:** Weighted average by relevance
+
+✓ **Q, K, V:** Query, Key, Value mechanism
+
+✓ **Self-attention:** Sequence attends to itself
+
+✓ **Parallel:** Processes all positions at once
+
+✓ **Transformers:** Built entirely on attention
+
+**Remember:** Attention lets models focus on what matters! 🎉
diff --git a/public/content/learn/building-a-transformer/building-a-transformer-block/block-diagram.png b/public/content/learn/building-a-transformer/building-a-transformer-block/block-diagram.png
new file mode 100644
index 0000000..b2c13c5
Binary files /dev/null and b/public/content/learn/building-a-transformer/building-a-transformer-block/block-diagram.png differ
diff --git a/public/content/learn/building-a-transformer/building-a-transformer-block/building-a-transformer-block-content.md b/public/content/learn/building-a-transformer/building-a-transformer-block/building-a-transformer-block-content.md
new file mode 100644
index 0000000..cf2350b
--- /dev/null
+++ b/public/content/learn/building-a-transformer/building-a-transformer-block/building-a-transformer-block-content.md
@@ -0,0 +1,142 @@
+---
+hero:
+  title: "Building a Transformer Block"
+  subtitle: "Creating the Core Transformer Component"
+  tags:
+    - "🤖 Transformers"
+    - "⏱️ 10 min read"
+---
+
+A transformer block is the **repeatable unit** that makes transformers work!
+
+![Block Diagram](/content/learn/building-a-transformer/building-a-transformer-block/block-diagram.png)
+
+## The Structure
+
+**Transformer Block = Attention + FFN + Normalization + Residuals**
+
+```python
+import torch
+import torch.nn as nn
+
+class TransformerBlock(nn.Module):
+    def __init__(self, d_model, n_heads, d_ff, dropout=0.1):
+        super().__init__()
+        
+        # 1. Multi-head attention
+        self.attention = nn.MultiheadAttention(
+            embed_dim=d_model,
+            num_heads=n_heads,
+            dropout=dropout,
+            batch_first=True
+        )
+        
+        # 2. Feed-forward network
+        self.ffn = nn.Sequential(
+            nn.Linear(d_model, d_ff),
+            nn.ReLU(),
+            nn.Dropout(dropout),
+            nn.Linear(d_ff, d_model),
+            nn.Dropout(dropout)
+        )
+        
+        # 3. Layer normalization
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+        
+        # 4. Dropout
+        self.dropout = nn.Dropout(dropout)
+    
+    def forward(self, x, mask=None):
+        # Attention sub-block
+        attn_out, _ = self.attention(x, x, x, attn_mask=mask)
+        x = self.norm1(x + self.dropout(attn_out))  # Residual + Norm
+        
+        # FFN sub-block
+        ffn_out = self.ffn(x)
+        x = self.norm2(x + ffn_out)  # Residual + Norm
+        
+        return x
+
+# Create and test
+block = TransformerBlock(d_model=512, n_heads=8, d_ff=2048)
+x = torch.randn(32, 10, 512)  # (batch, seq, embed)
+output = block(x)
+print(output.shape)  # torch.Size([32, 10, 512])
+```
+
+## The Flow
+
+```yaml
+Input
+  ↓
+Multi-Head Attention
+  ↓
+Add & Norm (residual connection)
+  ↓
+Feed-Forward Network
+  ↓
+Add & Norm (residual connection)
+  ↓
+Output (same shape as input!)
+```
+
+## Residual Connections
+
+**Why residual connections matter:**
+
+```python
+# Without residual
+output = layer(x)
+
+# With residual
+output = x + layer(x)  # Add input back!
+
+# This helps gradients flow during backprop
+```
+
+## Stacking Blocks
+
+```python
+class Transformer(nn.Module):
+    def __init__(self, vocab_size, d_model=512, n_heads=8, 
+                 n_layers=6, d_ff=2048):
+        super().__init__()
+        
+        self.embedding = nn.Embedding(vocab_size, d_model)
+        
+        # Stack N transformer blocks
+        self.blocks = nn.ModuleList([
+            TransformerBlock(d_model, n_heads, d_ff)
+            for _ in range(n_layers)
+        ])
+        
+        self.ln_f = nn.LayerNorm(d_model)
+        self.head = nn.Linear(d_model, vocab_size)
+    
+    def forward(self, x):
+        x = self.embedding(x)
+        
+        # Pass through all blocks
+        for block in self.blocks:
+            x = block(x)
+        
+        x = self.ln_f(x)
+        logits = self.head(x)
+        
+        return logits
+
+model = Transformer(vocab_size=50000, n_layers=12)
+```
+
+## Key Takeaways
+
+✓ **Core component:** Attention + FFN + Norm + Residuals
+
+✓ **Repeatable:** Stack many blocks
+
+✓ **Same shape:** Input and output dimensions match
+
+✓ **Self-contained:** Each block is independent
+
+**Remember:** Transformers are just stacked blocks! 🎉
diff --git a/public/content/learn/building-a-transformer/full-transformer-in-code/full-transformer-in-code-content.md b/public/content/learn/building-a-transformer/full-transformer-in-code/full-transformer-in-code-content.md
new file mode 100644
index 0000000..2f7e9b2
--- /dev/null
+++ b/public/content/learn/building-a-transformer/full-transformer-in-code/full-transformer-in-code-content.md
@@ -0,0 +1,143 @@
+---
+hero:
+  title: "Full Transformer in Code"
+  subtitle: "Complete Implementation from Scratch"
+  tags:
+    - "🤖 Transformers"
+    - "⏱️ 15 min read"
+---
+
+Let's build a complete, working transformer from scratch!
+
+## Complete Implementation
+
+```python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+
+class MultiHeadAttention(nn.Module):
+    def __init__(self, d_model, n_heads):
+        super().__init__()
+        self.d_model = d_model
+        self.n_heads = n_heads
+        self.head_dim = d_model // n_heads
+        
+        self.q_linear = nn.Linear(d_model, d_model)
+        self.k_linear = nn.Linear(d_model, d_model)
+        self.v_linear = nn.Linear(d_model, d_model)
+        self.out_linear = nn.Linear(d_model, d_model)
+    
+    def forward(self, x, mask=None):
+        batch_size, seq_len, d_model = x.size()
+        
+        # Project and split into heads
+        Q = self.q_linear(x).view(batch_size, seq_len, self.n_heads, self.head_dim).transpose(1, 2)
+        K = self.k_linear(x).view(batch_size, seq_len, self.n_heads, self.head_dim).transpose(1, 2)
+        V = self.v_linear(x).view(batch_size, seq_len, self.n_heads, self.head_dim).transpose(1, 2)
+        
+        # Attention
+        scores = Q @ K.transpose(-2, -1) / math.sqrt(self.head_dim)
+        if mask is not None:
+            scores = scores.masked_fill(mask == 0, float('-inf'))
+        
+        attn = F.softmax(scores, dim=-1)
+        output = attn @ V
+        
+        # Concatenate heads
+        output = output.transpose(1, 2).contiguous().view(batch_size, seq_len, d_model)
+        output = self.out_linear(output)
+        
+        return output
+
+class FeedForward(nn.Module):
+    def __init__(self, d_model, d_ff, dropout=0.1):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(d_model, d_ff),
+            nn.ReLU(),
+            nn.Dropout(dropout),
+            nn.Linear(d_ff, d_model),
+            nn.Dropout(dropout)
+        )
+    
+    def forward(self, x):
+        return self.net(x)
+
+class TransformerBlock(nn.Module):
+    def __init__(self, d_model, n_heads, d_ff, dropout=0.1):
+        super().__init__()
+        self.attention = MultiHeadAttention(d_model, n_heads)
+        self.ffn = FeedForward(d_model, d_ff, dropout)
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+        self.dropout = nn.Dropout(dropout)
+    
+    def forward(self, x, mask=None):
+        # Attention
+        attn_out = self.attention(x, mask)
+        x = self.norm1(x + self.dropout(attn_out))
+        
+        # FFN
+        ffn_out = self.ffn(x)
+        x = self.norm2(x + self.dropout(ffn_out))
+        
+        return x
+
+class Transformer(nn.Module):
+    def __init__(self, vocab_size, d_model=512, n_heads=8, 
+                 n_layers=6, d_ff=2048, max_seq_len=512, dropout=0.1):
+        super().__init__()
+        
+        # Embeddings
+        self.token_emb = nn.Embedding(vocab_size, d_model)
+        self.pos_emb = nn.Embedding(max_seq_len, d_model)
+        self.dropout = nn.Dropout(dropout)
+        
+        # Transformer blocks
+        self.blocks = nn.ModuleList([
+            TransformerBlock(d_model, n_heads, d_ff, dropout)
+            for _ in range(n_layers)
+        ])
+        
+        # Output
+        self.ln_f = nn.LayerNorm(d_model)
+        self.head = nn.Linear(d_model, vocab_size)
+    
+    def forward(self, x):
+        batch, seq_len = x.size()
+        
+        # Embeddings
+        positions = torch.arange(seq_len, device=x.device).unsqueeze(0)
+        x = self.token_emb(x) + self.pos_emb(positions)
+        x = self.dropout(x)
+        
+        # Transformer blocks
+        for block in self.blocks:
+            x = block(x)
+        
+        # Output
+        x = self.ln_f(x)
+        logits = self.head(x)
+        
+        return logits
+
+# Create GPT-style model
+model = Transformer(vocab_size=50000, n_layers=12, d_model=768)
+
+# Test
+tokens = torch.randint(0, 50000, (2, 64))
+logits = model(tokens)
+print(logits.shape)  # torch.Size([2, 64, 50000])
+```
+
+## Key Takeaways
+
+✓ **Complete:** All components together
+
+✓ **Production-ready:** Real implementation
+
+✓ **Flexible:** Easy to modify
+
+**Remember:** You just built a transformer! 🎉
diff --git a/public/content/learn/building-a-transformer/rope-positional-encoding/rope-positional-encoding-content.md b/public/content/learn/building-a-transformer/rope-positional-encoding/rope-positional-encoding-content.md
new file mode 100644
index 0000000..df6b22c
--- /dev/null
+++ b/public/content/learn/building-a-transformer/rope-positional-encoding/rope-positional-encoding-content.md
@@ -0,0 +1,84 @@
+---
+hero:
+  title: "RoPE Positional Encoding"
+  subtitle: "Rotary Position Embeddings"
+  tags:
+    - "🤖 Transformers"
+    - "⏱️ 10 min read"
+---
+
+RoPE (Rotary Position Embedding) is a modern way to encode position information in transformers!
+
+## The Problem
+
+Transformers don't know word order without position information!
+
+```yaml
+"Dog bites man" vs "Man bites dog"
+→ Without positions, looks the same to transformer!
+
+Need to add position information!
+```
+
+## How RoPE Works
+
+```python
+import torch
+import torch.nn as nn
+
+class RotaryPositionalEmbedding(nn.Module):
+    def __init__(self, dim, max_seq_len=2048):
+        super().__init__()
+        inv_freq = 1.0 / (10000 ** (torch.arange(0, dim, 2).float() / dim))
+        self.register_buffer('inv_freq', inv_freq)
+    
+    def forward(self, x):
+        seq_len = x.size(1)
+        t = torch.arange(seq_len, device=x.device).type_as(self.inv_freq)
+        freqs = torch.outer(t, self.inv_freq)
+        emb = torch.cat((freqs, freqs), dim=-1)
+        
+        cos_emb = emb.cos()
+        sin_emb = emb.sin()
+        
+        return cos_emb, sin_emb
+
+def apply_rope(x, cos, sin):
+    """Apply rotary embeddings"""
+    x1, x2 = x[..., ::2], x[..., 1::2]
+    rotated = torch.cat([
+        x1 * cos - x2 * sin,
+        x1 * sin + x2 * cos
+    ], dim=-1)
+    return rotated
+
+# Use it
+rope = RotaryPositionalEmbedding(dim=64)
+x = torch.randn(1, 10, 64)
+cos, sin = rope(x)
+x_with_pos = apply_rope(x, cos, sin)
+```
+
+## Why RoPE is Better
+
+```yaml
+Old way (learned embeddings):
+  - Fixed max sequence length
+  - Doesn't generalize to longer sequences
+  
+RoPE:
+  ✓ Works for any sequence length
+  ✓ Relative positions encoded
+  ✓ Better extrapolation
+  ✓ Used in LLaMA, GPT-NeoX
+```
+
+## Key Takeaways
+
+✓ **Rotary:** Encodes position via rotation
+
+✓ **Relative:** Captures relative positions
+
+✓ **Modern:** Used in latest LLMs
+
+**Remember:** RoPE is the modern way to handle positions! 🎉
diff --git a/public/content/learn/building-a-transformer/the-final-linear-layer/the-final-linear-layer-content.md b/public/content/learn/building-a-transformer/the-final-linear-layer/the-final-linear-layer-content.md
new file mode 100644
index 0000000..3b9e08d
--- /dev/null
+++ b/public/content/learn/building-a-transformer/the-final-linear-layer/the-final-linear-layer-content.md
@@ -0,0 +1,61 @@
+---
+hero:
+  title: "The Final Linear Layer"
+  subtitle: "From Hidden States to Predictions"
+  tags:
+    - "🤖 Transformers"
+    - "⏱️ 8 min read"
+---
+
+The final linear layer projects transformer outputs to vocabulary logits for prediction!
+
+## Language Model Head
+
+```python
+import torch
+import torch.nn as nn
+
+class LMHead(nn.Module):
+    def __init__(self, d_model, vocab_size):
+        super().__init__()
+        self.ln = nn.LayerNorm(d_model)
+        self.linear = nn.Linear(d_model, vocab_size, bias=False)
+    
+    def forward(self, x):
+        x = self.ln(x)
+        logits = self.linear(x)
+        return logits
+
+# Use it
+lm_head = LMHead(d_model=768, vocab_size=50000)
+hidden_states = torch.randn(32, 128, 768)  # (batch, seq, dim)
+logits = lm_head(hidden_states)
+
+print(logits.shape)  # torch.Size([32, 128, 50000])
+# For each position: 50000 logits (one per vocab token)
+```
+
+## Complete Forward Pass
+
+```python
+# Input tokens → Embeddings → Transformer → LM Head → Logits
+
+input_ids = torch.randint(0, 50000, (1, 10))
+embeddings = embedding_layer(input_ids)
+hidden_states = transformer_blocks(embeddings)
+logits = lm_head(hidden_states)
+
+# Get next token prediction
+next_token_logits = logits[:, -1, :]  # Last position
+next_token = torch.argmax(next_token_logits, dim=-1)
+```
+
+## Key Takeaways
+
+✓ **Final layer:** Hidden states → vocabulary logits
+
+✓ **Large:** Often biggest layer (vocab_size is huge)
+
+✓ **Shared weights:** Often tied with embedding matrix
+
+**Remember:** Final layer converts understanding to predictions! 🎉
diff --git a/public/content/learn/building-a-transformer/training-a-transformer/training-a-transformer-content.md b/public/content/learn/building-a-transformer/training-a-transformer/training-a-transformer-content.md
new file mode 100644
index 0000000..2eab24e
--- /dev/null
+++ b/public/content/learn/building-a-transformer/training-a-transformer/training-a-transformer-content.md
@@ -0,0 +1,83 @@
+---
+hero:
+  title: "Training a Transformer"
+  subtitle: "How to Train Language Models"
+  tags:
+    - "🤖 Transformers"
+    - "⏱️ 10 min read"
+---
+
+Training transformers involves next-token prediction and lots of data!
+
+## The Training Objective
+
+**Goal: Predict the next token given previous tokens**
+
+```python
+import torch
+import torch.nn as nn
+
+# Training data
+input_tokens = torch.tensor([[1, 2, 3, 4]])    # Input
+target_tokens = torch.tensor([[2, 3, 4, 5]])   # Targets (shifted by 1)
+
+# Model forward
+logits = model(input_tokens)  # (1, 4, vocab_size)
+
+# Loss: Cross entropy
+criterion = nn.CrossEntropyLoss()
+loss = criterion(
+    logits.view(-1, vocab_size),  # Flatten
+    target_tokens.view(-1)         # Flatten
+)
+```
+
+## Complete Training Loop
+
+```python
+import torch
+import torch.optim as optim
+
+def train_step(model, batch, optimizer, criterion):
+    # Get input and target (shifted)
+    input_ids = batch[:, :-1]
+    targets = batch[:, 1:]
+    
+    # Forward
+    logits = model(input_ids)
+    
+    # Loss
+    loss = criterion(
+        logits.reshape(-1, logits.size(-1)),
+        targets.reshape(-1)
+    )
+    
+    # Backward
+    optimizer.zero_grad()
+    loss.backward()
+    
+    # Update
+    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+    optimizer.step()
+    
+    return loss.item()
+
+# Training
+model = Transformer(vocab_size=50000)
+optimizer = optim.AdamW(model.parameters(), lr=3e-4)
+criterion = nn.CrossEntropyLoss()
+
+for epoch in range(num_epochs):
+    for batch in dataloader:
+        loss = train_step(model, batch, optimizer, criterion)
+```
+
+## Key Takeaways
+
+✓ **Next-token prediction:** Core training task
+
+✓ **Shift targets:** Input[:-1] → Target[1:]
+
+✓ **Cross entropy:** Standard loss for LMs
+
+**Remember:** Training is just next-token prediction! 🎉
diff --git a/public/content/learn/building-a-transformer/transformer-architecture/transformer-architecture-content.md b/public/content/learn/building-a-transformer/transformer-architecture/transformer-architecture-content.md
new file mode 100644
index 0000000..c907fdf
--- /dev/null
+++ b/public/content/learn/building-a-transformer/transformer-architecture/transformer-architecture-content.md
@@ -0,0 +1,147 @@
+---
+hero:
+  title: "Transformer Architecture"
+  subtitle: "Understanding the Transformer Model"
+  tags:
+    - "🤖 Transformers"
+    - "⏱️ 12 min read"
+---
+
+The Transformer is the architecture behind GPT, BERT, and modern LLMs. It's built entirely on attention!
+
+![Transformer Diagram](/content/learn/building-a-transformer/transformer-architecture/transformer-diagram.png)
+
+## The Big Picture
+
+**Transformer = Encoder + Decoder (or just one)**
+
+```yaml
+Input Text
+    ↓
+Embedding + Positional Encoding
+    ↓
+N × Transformer Blocks:
+  - Multi-Head Attention
+  - Feed-Forward Network
+  - Layer Normalization
+  - Residual Connections
+    ↓
+Output Logits
+```
+
+## Basic Transformer Block
+
+```python
+import torch
+import torch.nn as nn
+
+class TransformerBlock(nn.Module):
+    def __init__(self, embed_dim, num_heads, ff_dim, dropout=0.1):
+        super().__init__()
+        
+        # Multi-head attention
+        self.attention = nn.MultiheadAttention(embed_dim, num_heads, dropout=dropout)
+        
+        # Feedforward network
+        self.ff = nn.Sequential(
+            nn.Linear(embed_dim, ff_dim),
+            nn.ReLU(),
+            nn.Dropout(dropout),
+            nn.Linear(ff_dim, embed_dim),
+            nn.Dropout(dropout)
+        )
+        
+        # Layer normalization
+        self.norm1 = nn.LayerNorm(embed_dim)
+        self.norm2 = nn.LayerNorm(embed_dim)
+    
+    def forward(self, x):
+        # Attention block
+        attn_out, _ = self.attention(x, x, x)
+        x = self.norm1(x + attn_out)  # Residual connection
+        
+        # Feedforward block
+        ff_out = self.ff(x)
+        x = self.norm2(x + ff_out)  # Residual connection
+        
+        return x
+
+# Test
+block = TransformerBlock(embed_dim=512, num_heads=8, ff_dim=2048)
+x = torch.randn(10, 32, 512)  # (seq, batch, embed)
+output = block(x)
+print(output.shape)  # torch.Size([10, 32, 512])
+```
+
+## Complete Transformer
+
+```python
+class Transformer(nn.Module):
+    def __init__(self, vocab_size, embed_dim=512, num_heads=8, 
+                 num_layers=6, ff_dim=2048, max_seq_len=5000):
+        super().__init__()
+        
+        # Embeddings
+        self.token_embedding = nn.Embedding(vocab_size, embed_dim)
+        self.pos_embedding = nn.Embedding(max_seq_len, embed_dim)
+        
+        # Transformer blocks
+        self.blocks = nn.ModuleList([
+            TransformerBlock(embed_dim, num_heads, ff_dim)
+            for _ in range(num_layers)
+        ])
+        
+        # Output layer
+        self.ln_f = nn.LayerNorm(embed_dim)
+        self.head = nn.Linear(embed_dim, vocab_size, bias=False)
+    
+    def forward(self, x):
+        batch, seq_len = x.size()
+        
+        # Token + position embeddings
+        positions = torch.arange(seq_len, device=x.device).unsqueeze(0)
+        x = self.token_embedding(x) + self.pos_embedding(positions)
+        
+        # Apply transformer blocks
+        for block in self.blocks:
+            x = block(x.transpose(0, 1)).transpose(0, 1)
+        
+        # Output projection
+        x = self.ln_f(x)
+        logits = self.head(x)
+        
+        return logits
+
+# Create transformer
+model = Transformer(vocab_size=50000, num_layers=12)
+```
+
+## Key Components
+
+```yaml
+1. Embeddings:
+   - Token embeddings (vocabulary)
+   - Positional embeddings (position info)
+
+2. Transformer Blocks (repeated N times):
+   - Multi-head attention
+   - Feedforward network
+   - Layer normalization
+   - Residual connections
+
+3. Output:
+   - Final layer norm
+   - Linear projection to vocabulary
+```
+
+## Key Takeaways
+
+✓ **Self-attention based:** No recurrence, no convolution
+
+✓ **Parallel:** Processes entire sequence at once
+
+✓ **Scalable:** Stack more blocks for more capacity
+
+✓ **Powerful:** Powers GPT, BERT, LLaMA
+
+**Remember:** Transformers are just stacked attention blocks! 🎉
diff --git a/public/content/learn/building-a-transformer/transformer-architecture/transformer-diagram.png b/public/content/learn/building-a-transformer/transformer-architecture/transformer-diagram.png
new file mode 100644
index 0000000..7d52a53
Binary files /dev/null and b/public/content/learn/building-a-transformer/transformer-architecture/transformer-diagram.png differ
diff --git a/public/content/learn/math/derivatives/derivative-graph.png b/public/content/learn/math/derivatives/derivative-graph.png
new file mode 100644
index 0000000..e69de29
diff --git a/public/content/learn/math/derivatives/derivatives-content.md b/public/content/learn/math/derivatives/derivatives-content.md
new file mode 100644
index 0000000..6e1b3eb
--- /dev/null
+++ b/public/content/learn/math/derivatives/derivatives-content.md
@@ -0,0 +1,627 @@
+---
+hero:
+  title: "Understanding Derivatives"
+  subtitle: "The Foundation of Neural Network Training"
+  tags:
+    - "📐 Mathematics"
+    - "⏱️ 10 min read"
+---
+
+**[video coming soon]**
+
+## What are Derivatives?
+
+A **derivative** measures how a function changes as its input changes.
+
+### Intuitive Understanding
+
+Think of driving a car:
+
+
+
+
+
+- Your position is a function of time: `position(t)`
+
+- Your speed is the derivative of position: `speed = d(position)/dt`
+
+- Speed tells you how fast your position is changing
+
+If `x` goes from 3 to 4, does `f(x)`, that is `y`, change fast, eg. 6 to 40 or slower, eg. 6 to 7
+
+**Derivative tells us the instantaneous rate of change of a function at any point.**
+
+### Mathematical Definition
+
+The derivative of `f(x)` at point `x` is:
+
+```
+f'(x) = lim[h→0] (f(x+h) - f(x)) / h
+```
+
+### Visual Representation
+
+
+
+Here we have linearly growing function.
+
+Derivative is always 3 for any `x` value, which means that in the original function, growth of `y` at any point is 3x (if you increase `x` by 1, `y` will increase by 3, check it).
+
+![Linear Function Derivative](/content/learn/math/derivatives/linear-function-derivative.png)
+
+Here you can see that as `y` grows faster and faster in original function (square functions grow very fast).
+
+Derivative shows this accelerating growth, you can notice that derivative is increasing (linearly) - which means the growth is accelerating.
+
+![Quadratic Function Derivative](/content/learn/math/derivatives/quadratic-function-derivative.png)
+
+In previous example derivative was always 3, which meant that function is always consistantly growing by 3 times `x`.
+
+Here, on the other hand, the growth is growing. 
+
+## Common Derivative Rules
+
+You will never calculate derivatives manually, but researcher needs to understand how it works.
+
+### 1. Power Rule
+
+If `f(x) = xⁿ`, then `f'(x) = nxⁿ⁻¹`
+
+So just put the exponent in front of the variable (or multiply with the number in front) and reduce exponent by 1.
+
+For `f(x) = x³`, derivative is `f'(x) = 3x²`
+
+For `f(x) = 4x³`, derivative is `f'(x) = 4*3x² = 12x²`
+
+#### Step-by-Step Examples
+
+**Example 1:** `f(x) = x²`
+
+
+
+
+
+- Using power rule: `f'(x) = 2x^(2-1) = 2x¹ = 2x`
+
+- Verification: `f'(x) = 2x`
+
+**Example 2:** `f(x) = x³`
+
+- Using power rule: `f'(x) = 3x^(3-1) = 3x²`
+
+- Verification: `f'(x) = 3x²`
+
+**Example 3:** `f(x) = x⁴`
+
+- Using power rule: `f'(x) = 4x^(4-1) = 4x³`
+
+- Verification: `f'(x) = 4x³`
+
+**Example 4:** `f(x) = √x = x^(1/2)`
+
+- Using power rule: `f'(x) = (1/2)x^((1/2)-1) = (1/2)x^(-1/2) = 1/(2√x)`
+
+- Verification: `f'(x) = 1/(2√x)`
+
+**Example 5:** `f(x) = 1/x = x^(-1)`
+
+- Using power rule: `f'(x) = (-1)x^(-1-1) = (-1)x^(-2) = -1/x²`
+
+- Verification: `f'(x) = -1/x²`
+
+
+
+### 2. Constant Multiple Rule
+
+If `f(x) = c·g(x)`, then `f'(x) = c·g'(x)`
+
+#### Step-by-Step Examples
+
+**Example:** `f(x) = 5x²`
+
+**Step 1:** Identify the constant and the function
+
+- Constant: `c = 5`
+
+- Function: `g(x) = x²`
+
+**Step 2:** Find `g'(x)`
+
+- `g'(x) = 2x` (using power rule)
+
+**Step 3:** Apply constant multiple rule
+
+- `f'(x) = c·g'(x) = 5·(2x) = 10x` - I showed this in the power rule as well.
+
+**Verification:**
+
+- `f(x) = 5x²`
+
+- `f'(x) = 10x` ✓
+
+**Example:** `f(x) = -3x³`
+
+**Step 1:** Identify the constant and the function
+
+- Constant: `c = -3`
+
+
+
+- Function: `g(x) = x³`
+
+**Step 2:** Find `g'(x)`
+
+- `g'(x) = 3x²` (using power rule)
+
+**Step 3:** Apply constant multiple rule
+
+- `f'(x) = c·g'(x) = (-3)·(3x²) = -9x²`
+
+**Verification:**
+
+- `f(x) = -3x³`
+
+- `f'(x) = -9x²` ✓
+
+
+
+### 3. Sum Rule
+
+If `f(x) = g(x) + h(x)`, then `f'(x) = g'(x) + h'(x)`
+
+#### Step-by-Step Examples
+
+**Example:** `f(x) = x² + 3x`
+
+**Step 1:** Identify the functions
+
+- `g(x) = x²`
+
+- `h(x) = 3x`
+
+**Step 2:** Find individual derivatives
+
+- `g'(x) = 2x` (power rule)
+
+- `h'(x) = 3` (constant multiple rule: 3·1 = 3)
+
+**Step 3:** Apply sum rule
+
+- `f'(x) = g'(x) + h'(x) = 2x + 3`
+
+**Verification:**
+
+- `f(x) = x² + 3x`
+
+- `f'(x) = 2x + 3` ✓
+
+**Example:** `f(x) = x³ + 2x² + 5x + 1`
+
+**Step 1:** Identify the functions
+
+- `g(x) = x³`
+
+- `h(x) = 2x²`
+
+- `i(x) = 5x`
+
+- `j(x) = 1`
+
+**Step 2:** Find individual derivatives
+
+- `g'(x) = 3x²` (power rule)
+
+- `h'(x) = 4x` (constant multiple rule: 2·2x = 4x)
+
+- `i'(x) = 5` (constant multiple rule: 5·1 = 5)
+
+- `j'(x) = 0` (constant rule)
+
+**Step 3:** Apply sum rule
+
+- `f'(x) = g'(x) + h'(x) + i'(x) + j'(x) = 3x² + 4x + 5 + 0 = 3x² + 4x + 5`
+
+**Verification:**
+
+- `f(x) = x³ + 2x² + 5x + 1`
+
+- `f'(x) = 3x² + 4x + 5` ✓
+
+
+
+### 4. Product Rule
+
+If `f(x) = g(x)·h(x)`, then `f'(x) = g'(x)·h(x) + g(x)·h'(x)`
+
+#### Step-by-Step Examples
+
+**Example:** `f(x) = x²(x + 1)`
+
+**Step 1:** Identify the functions
+
+- `g(x) = x²`
+
+- `h(x) = x + 1`
+
+**Step 2:** Find individual derivatives
+
+- `g'(x) = 2x` (power rule)
+
+- `h'(x) = 1` (sum rule: derivative of x is 1, derivative of 1 is 0)
+
+**Step 3:** Apply product rule
+
+- `f'(x) = g'(x)·h(x) + g(x)·h'(x)`
+
+- `f'(x) = (2x)·(x + 1) + (x²)·(1)`
+
+- `f'(x) = 2x(x + 1) + x²`
+
+
+
+- `f'(x) = 2x² + 2x + x²`
+
+- `f'(x) = 3x² + 2x`
+
+**Verification by expanding first:**
+
+- `f(x) = x²(x + 1) = x³ + x²`
+
+- `f'(x) = 3x² + 2x` ✓
+
+**Example:** `f(x) = (2x + 3)(x² - 1)`
+
+**Step 1:** Identify the functions
+
+- `g(x) = 2x + 3`
+
+- `h(x) = x² - 1`
+
+**Step 2:** Find individual derivatives
+
+- `g'(x) = 2` (sum rule: derivative of 2x is 2, derivative of 3 is 0)
+
+- `h'(x) = 2x` (sum rule: derivative of x² is 2x, derivative of -1 is 0)
+
+**Step 3:** Apply product rule
+
+- `f'(x) = g'(x)·h(x) + g(x)·h'(x)`
+
+- `f'(x) = (2)·(x² - 1) + (2x + 3)·(2x)`
+
+- `f'(x) = 2(x² - 1) + (2x + 3)(2x)`
+
+- `f'(x) = 2x² - 2 + 4x² + 6x`
+
+- `f'(x) = 6x² + 6x - 2`
+
+
+
+### 5. Chain Rule
+
+If `f(x) = g(h(x))`, then `f'(x) = g'(h(x))·h'(x)`
+
+#### Step-by-Step Examples
+
+**Example:** `f(x) = (x² + 1)³`
+
+**Step 1:** Identify the inner and outer functions
+
+- Inner function: `h(x) = x² + 1`
+
+- Outer function: `g(u) = u³` (where `u = h(x)`)
+
+**Step 2:** Find individual derivatives
+
+- `h'(x) = 2x` (sum rule: derivative of x² is 2x, derivative of 1 is 0)
+
+- `g'(u) = 3u²` (power rule)
+
+**Step 3:** Apply chain rule
+
+- `f'(x) = g'(h(x))·h'(x)`
+
+- `f'(x) = 3(h(x))²·(2x)`
+
+- `f'(x) = 3(x² + 1)²·(2x)`
+
+- `f'(x) = 6x(x² + 1)²`
+
+**Verification by expanding first:**
+
+- `f(x) = (x² + 1)³ = (x² + 1)(x² + 1)(x² + 1)`
+
+- Expanding: `f(x) = x⁶ + 3x⁴ + 3x² + 1`
+
+- `f'(x) = 6x⁵ + 12x³ + 6x = 6x(x⁴ + 2x² + 1) = 6x(x² + 1)²` ✓
+
+**Example:** `f(x) = √(x² + 4)`
+
+**Step 1:** Identify the inner and outer functions
+
+- Inner function: `h(x) = x² + 4`
+
+- Outer function: `g(u) = √u = u^(1/2)` (where `u = h(x)`)
+
+**Step 2:** Find individual derivatives
+
+- `h'(x) = 2x` (sum rule: derivative of x² is 2x, derivative of 4 is 0)
+
+- `g'(u) = (1/2)u^(-1/2) = 1/(2√u)` (power rule)
+
+**Step 3:** Apply chain rule
+
+
+
+
+
+- `f'(x) = g'(h(x))·h'(x)`
+
+- `f'(x) = (1/(2√(x² + 4)))·(2x)`
+
+- `f'(x) = 2x/(2√(x² + 4))`
+
+- `f'(x) = x/√(x² + 4)`
+
+---
+
+## Derivatives of Neural Network Functions
+
+### 1. Sigmoid Function
+
+![Sigmoid Formula](/content/learn/math/derivatives/sigmoid-formula.png)
+
+```
+f(x) = 1 / (1 + e^(-x))
+```
+
+#### Step-by-Step Derivative Calculation
+
+To find the derivative of sigmoid, we'll use the quotient rule and chain rule.
+
+Usually you will ChatGPT sigmoid derivative, but let's see how it's derived.
+
+**Step 1:** Rewrite the function
+
+- `f(x) = 1 / (1 + e^(-x))`
+
+- Let `u = 1 + e^(-x)`, so `f(x) = 1/u`
+
+**Step 2:** Apply quotient rule
+
+- `f'(x) = (0·u - 1·u') / u² = -u' / u²`
+
+**Step 3:** Find `u'` using chain rule
+
+- `u = 1 + e^(-x)`
+
+- `u' = 0 + e^(-x) · (-1) = -e^(-x)`
+
+**Step 4:** Substitute back
+
+- `f'(x) = -(-e^(-x)) / (1 + e^(-x))²`
+
+- `f'(x) = e^(-x) / (1 + e^(-x))²`
+
+**Step 5:** Simplify
+
+- `f'(x) = e^(-x) / (1 + e^(-x))²`
+
+- `f'(x) = [e^(-x) / (1 + e^(-x))] · [1 / (1 + e^(-x))]`
+
+- `f'(x) = [1 / (1 + e^(-x))] · [e^(-x) / (1 + e^(-x))]`
+
+- `f'(x) = f(x) · [e^(-x) / (1 + e^(-x))]`
+
+**Step 6:** Further simplification
+
+- Notice that `e^(-x) / (1 + e^(-x)) = 1 - 1/(1 + e^(-x)) = 1 - f(x)`
+
+- Therefore: `f'(x) = f(x) · (1 - f(x))`
+
+**Final Result:** `f'(x) = f(x)(1 - f(x))`
+
+---
+
+## Chain Rule
+
+Chain rule is how neural networks learn (backpropagation).
+
+### Mathematical Statement
+
+If `y = f(g(x))`, then `dy/dx = (dy/dg) × (dg/dx)`
+
+### Neural Network Application
+
+In neural networks, we often have functions like: `f(x) = activation(linear_transformation(x))`
+
+### Step-by-Step Chain Rule Example
+
+**Example:** Neural Network Layer with Sigmoid Activation
+
+**Given:**
+
+- Linear transformation: `z = 2x + 1`
+
+- Activation function: `σ(z) = 1/(1 + e^(-z))`
+
+- Composite function: `f(x) = σ(2x + 1)`
+
+**Step 1:** Identify inner and outer functions
+
+- Inner function: `h(x) = 2x + 1`
+
+- Outer function: `g(z) = σ(z) = 1/(1 + e^(-z))`
+
+**Step 2:** Find individual derivatives
+
+- `h'(x) = 2` (derivative of 2x + 1)
+
+- `g'(z) = σ(z)(1 - σ(z))` (sigmoid derivative)
+
+**Step 3:** Apply chain rule
+
+- `f'(x) = g'(h(x)) · h'(x)`
+
+- `f'(x) = σ(2x + 1)(1 - σ(2x + 1)) · 2`
+
+- `f'(x) = 2σ(2x + 1)(1 - σ(2x + 1))`
+
+**Step 4:** Calculate at specific point `(x = 1)`
+
+**Step 4a:** Calculate `h(1)`
+
+- `h(1) = 2(1) + 1 = 3`
+
+**Step 4b:** Calculate `σ(3)`
+
+
+
+
+
+- `σ(3) = 1/(1 + e^(-3)) = 1/(1 + 0.050) = 1/1.050 ≈ 0.953`
+
+**Step 4c:** Calculate `σ'(3)`
+
+- `σ'(3) = σ(3)(1 - σ(3)) = 0.953(1 - 0.953) = 0.953(0.047) ≈ 0.045`
+
+**Step 4d:** Apply chain rule
+
+- `f'(1) = σ'(3) · h'(1) = 0.045 · 2 = 0.090`
+
+**Final Answer:** `f'(1) ≈ 0.090`
+
+---
+
+## Partial Derivatives
+
+When we have functions of multiple variables, we use **partial derivatives**.
+
+### Definition
+
+For `f(x, y)`, the partial derivative with respect to `x` is: 
+```
+∂f/∂x = lim[h→0] (f(x+h, y) - f(x, y)) / h
+```
+
+### Example: Linear Function
+
+`f(x, y) = 2x + 3y + 1`
+
+#### Step-by-Step Partial Derivative Calculation
+
+**Finding ∂f/∂x (partial derivative with respect to x):**
+
+**Step 1:** Treat `y` as a constant
+
+- `f(x, y) = 2x + 3y + 1`
+
+- When taking ∂f/∂x, we treat `y` as constant, so `3y + 1` is constant
+
+**Step 2:** Differentiate with respect to `x`
+
+- `∂f/∂x = ∂/∂x(2x) + ∂/∂x(3y) + ∂/∂x(1)`
+
+- `∂f/∂x = 2 + 0 + 0 = 2`
+
+**Finding ∂f/∂y (partial derivative with respect to y):**
+
+**Step 1:** Treat `x` as a constant
+
+- `f(x, y) = 2x + 3y + 1`
+
+- When taking ∂f/∂y, we treat `x` as constant, so `2x + 1` is constant
+
+**Step 2:** Differentiate with respect to `y`
+
+- `∂f/∂y = ∂/∂y(2x) + ∂/∂y(3y) + ∂/∂y(1)`
+
+- `∂f/∂y = 0 + 3 + 0 = 3`
+
+**Final Results:**
+
+- `∂f/∂x = 2`
+
+- `∂f/∂y = 3`
+
+#### Hand Calculation Examples
+
+**Example:** Find partial derivatives at `(x, y) = (1, 2)`
+
+- `∂f/∂x = 2` (constant, doesn't depend on x or y)
+
+- `∂f/∂y = 3` (constant, doesn't depend on x or y)
+
+**Example:** Find partial derivatives at `(x, y) = (5, -1)`
+
+- `∂f/∂x = 2` (still constant)
+
+- `∂f/∂y = 3` (still constant)
+
+
+
+### Example: Quadratic Function
+
+`f(x, y) = x² + 2xy + y²`
+
+#### Step-by-Step Partial Derivative Calculation
+
+**Finding ∂f/∂x (partial derivative with respect to x):**
+
+**Step 1:** Treat `y` as a constant
+
+- `f(x, y) = x² + 2xy + y²`
+
+- When taking ∂f/∂x, we treat `y` as constant
+
+**Step 2:** Differentiate with respect to `x`
+
+- `∂f/∂x = ∂/∂x(x²) + ∂/∂x(2xy) + ∂/∂x(y²)`
+
+- `∂f/∂x = 2x + 2y + 0 = 2x + 2y`
+
+**Finding ∂f/∂y (partial derivative with respect to y):**
+
+**Step 1:** Treat `x` as a constant
+
+- `f(x, y) = x² + 2xy + y²`
+
+- When taking ∂f/∂y, we treat `x` as constant
+
+**Step 2:** Differentiate with respect to `y`
+
+- `∂f/∂y = ∂/∂y(x²) + ∂/∂y(2xy) + ∂/∂y(y²)`
+
+- `∂f/∂y = 0 + 2x + 2y = 2x + 2y`
+
+**Final Results:**
+
+- `∂f/∂x = 2x + 2y`
+
+- `∂f/∂y = 2x + 2y`
+
+#### Hand Calculation Examples
+
+**Example:** Find partial derivatives at `(x, y) = (1, 2)`
+
+**Step 1:** Calculate ∂f/∂x
+
+- `∂f/∂x = 2(1) + 2(2) = 2 + 4 = 6`
+
+**Step 2:** Calculate ∂f/∂y
+
+- `∂f/∂y = 2(1) + 2(2) = 2 + 4 = 6`
+
+**Example:** Find partial derivatives at `(x, y) = (3, -1)`
+
+**Step 1:** Calculate ∂f/∂x
+
+- `∂f/∂x = 2(3) + 2(-1) = 6 - 2 = 4`
+
+**Step 2:** Calculate ∂f/∂y
+
+
+
+
+
+- `∂f/∂y = 2(3) + 2(-1) = 6 - 2 = 4`
\ No newline at end of file
diff --git a/public/content/learn/math/derivatives/linear-function-derivative.png b/public/content/learn/math/derivatives/linear-function-derivative.png
new file mode 100644
index 0000000..313cdce
Binary files /dev/null and b/public/content/learn/math/derivatives/linear-function-derivative.png differ
diff --git a/public/content/learn/math/derivatives/quadratic-function-derivative.png b/public/content/learn/math/derivatives/quadratic-function-derivative.png
new file mode 100644
index 0000000..4e76795
Binary files /dev/null and b/public/content/learn/math/derivatives/quadratic-function-derivative.png differ
diff --git a/public/content/learn/math/derivatives/sigmoid-formula.png b/public/content/learn/math/derivatives/sigmoid-formula.png
new file mode 100644
index 0000000..7c44d2b
Binary files /dev/null and b/public/content/learn/math/derivatives/sigmoid-formula.png differ
diff --git a/public/content/learn/math/functions/cubic-quartic-functions.png b/public/content/learn/math/functions/cubic-quartic-functions.png
new file mode 100644
index 0000000..84dd6e1
Binary files /dev/null and b/public/content/learn/math/functions/cubic-quartic-functions.png differ
diff --git a/public/content/learn/math/functions/exponential-functions-log-scale.png b/public/content/learn/math/functions/exponential-functions-log-scale.png
new file mode 100644
index 0000000..96000da
Binary files /dev/null and b/public/content/learn/math/functions/exponential-functions-log-scale.png differ
diff --git a/public/content/learn/math/functions/exponential-functions.png b/public/content/learn/math/functions/exponential-functions.png
new file mode 100644
index 0000000..54257fa
Binary files /dev/null and b/public/content/learn/math/functions/exponential-functions.png differ
diff --git a/public/content/learn/math/functions/functions-content.md b/public/content/learn/math/functions/functions-content.md
new file mode 100644
index 0000000..4faf670
--- /dev/null
+++ b/public/content/learn/math/functions/functions-content.md
@@ -0,0 +1,416 @@
+---
+hero:
+  title: "Mathematical Functions"
+  subtitle: "Building Blocks of Neural Networks"
+  tags:
+    - "📐 Mathematics"
+    - "⏱️ 12 min read"
+---
+
+Functions are the foundation of neural networks.
+
+## What is a Function?
+
+In simple terms, function is like a machine that takes something in and gives something back out. More formally, a **function** is a mathematical relationship that **maps inputs to outputs**.
+
+
+
+## Simple Examples
+
+### Example 1: Linear Function f(x) = 2x + 3
+
+This is a function that takes any number x and returns 2x + 3.
+
+![Linear Function](/content/learn/math/functions/linear-function.png)
+
+Let's calculate f(x) for different values step by step:
+
+For x = 1:
+
+
+
+
+
+f(1) = 2(1) + 3 = 2 + 3 = 5
+
+
+
+Don't confuse `f(1)` and `2(1)`. `f(1)` means passing 1 into function f, and `2(1)` mean `2*1`.
+
+For x = 0:
+
+
+
+
+
+f(0) = 2(0) + 3 = 0 + 3 = 3
+
+For x = -1:
+
+
+
+
+
+f(-1) = 2(-1) + 3 = -2 + 3 = 1
+
+Now image a function that takes in "Cat sat on a" and returns "mat" - that function would be a lot more difficult to create, but neural networks (LLMs) can learn it.
+
+### Example 2: Quadratic Function f(x) = x² + 2x + 1
+
+![Quadratic Function](/content/learn/math/functions/quadratic-function.png)
+
+Let's calculate f(x) for different values step by step:
+
+For x = 2:
+
+
+
+
+
+f(2) = (2)² + 2(2) + 1 = 4 + 4 + 1 = 9
+
+For x = 0:
+
+
+
+
+
+f(0) = (0)² + 2(0) + 1 = 0 + 0 + 1 = 1
+
+For x = -1:
+
+
+
+
+
+f(-1) = (-1)² + 2(-1) + 1 = 1 - 2 + 1 = 0
+
+## Mathematical Definition of a Function
+
+A function **f: A → B** maps every element in set A to **exactly one** element in set B.
+
+Previous quadratic function will always give 9 if x=2 and nothing else.
+
+## Notation
+
+
+
+
+
+**f(x) = y** (read as "f of x equals y")
+
+
+
+**x** is the input (independent variable)
+
+
+
+**y** is the output (dependent variable) - it depends on x
+
+## Code Examples
+
+Our 2 functions coded in python, if you are unfamiliar with python you can skip the code, next module will focus on python.
+
+```python
+# Linear function: f(x) = 2x + 3
+def linear_function(x):
+    return 2 * x + 3
+
+# Test the function
+print(f"f(1) = {linear_function(1)}")  # Output: f(1) = 5
+print(f"f(0) = {linear_function(0)}")  # Output: f(0) = 3
+print(f"f(-1) = {linear_function(-1)}")  # Output: f(-1) = 1
+
+# Quadratic function: f(x) = x² + 2x + 1
+def quadratic_function(x):
+    return x**2 + 2*x + 1
+
+# Test the function
+print(f"f(2) = {quadratic_function(2)}")  # Output: f(2) = 9
+print(f"f(0) = {quadratic_function(0)}")  # Output: f(0) = 1
+print(f"f(-1) = {quadratic_function(-1)}")  # Output: f(-1) = 0
+```
+
+## Types of Functions
+
+### 1. Linear Functions
+
+Linear functions have the form: **f(x) = mx + b**
+
+Where:
+
+
+
+
+
+**m** is the slope (how steep the line is)
+
+
+
+**b** is the y-intercept (where the line crosses the y-axis)
+
+Let's draw it
+
+![Linear Functions Comparison](/content/learn/math/functions/linear-functions-comparison.png)
+
+Blue line: 2x + 1
+
+
+
+
+
+2 is the slope, meaning that if you move by 1 on x axis, y will go up by 2
+
+
+
+y or f(x) - it's the same
+
+
+
+1 is the value on y coordinate where the blue line will cross it (y-intercept), at x=0 - see it for yourself, blue line should pass through x=0 and y=1 spot
+
+### 2. Polynomial Functions
+
+Functions with powers of x: **f(x) = aₙxⁿ + aₙ₋₁xⁿ⁻¹ + ... + a₁x + a₀**
+
+**Hand Calculation Examples**
+
+**Example: f(x) = x³ - 3x² + 2x + 1**
+
+Let's calculate f(x) for different values step by step:
+
+For x = 1:
+
+
+
+
+
+f(1) = (1)³ - 3(1)² + 2(1) + 1
+
+
+
+f(1) = 1 - 3(1) + 2 + 1
+
+
+
+f(1) = 1 - 3 + 2 + 1
+
+
+
+f(1) = 1
+
+For x = 2:
+
+
+
+
+
+f(2) = (2)³ - 3(2)² + 2(2) + 1
+
+
+
+f(2) = 8 - 3(4) + 4 + 1
+
+
+
+f(2) = 8 - 12 + 4 + 1
+
+
+
+f(2) = 1
+
+For x = 0:
+
+
+
+
+
+f(0) = (0)³ - 3(0)² + 2(0) + 1
+
+
+
+f(0) = 0 - 0 + 0 + 1
+
+
+
+f(0) = 1
+
+**Example: f(x) = x⁴ - 4x² + 3**
+
+Let's calculate f(x) for different values step by step:
+
+For x = 1:
+
+
+
+
+
+f(1) = (1)⁴ - 4(1)² + 3
+
+
+
+f(1) = 1 - 4(1) + 3
+
+
+
+f(1) = 1 - 4 + 3
+
+
+
+f(1) = 0
+
+For x = 2:
+
+
+
+
+
+f(2) = (2)⁴ - 4(2)² + 3
+
+
+
+f(2) = 16 - 4(4) + 3
+
+
+
+f(2) = 16 - 16 + 3
+
+
+
+f(2) = 3
+
+For x = 0:
+
+
+
+
+
+f(0) = (0)⁴ - 4(0)² + 3
+
+
+
+f(0) = 0 - 0 + 3
+
+
+
+f(0) = 3
+
+```python
+# Polynomial function examples
+def cubic_function(x):
+    return x**3 - 3*x**2 + 2*x + 1
+
+def quartic_function(x):
+    return x**4 - 4*x**2 + 3
+```
+
+![Cubic and Quartic Functions](/content/learn/math/functions/cubic-quartic-functions.png)
+
+Just look at it - it seems interesting, no need to master it yet.
+
+### 3. Exponential Functions
+
+Functions with constant base raised to variable power: **f(x) = aˣ**
+
+```python
+# Exponential function examples
+def exponential_function(x):
+    return 2**x
+
+def exponential_e(x):
+    return np.exp(x)
+```
+
+![Exponential Functions](/content/learn/math/functions/exponential-functions.png)
+
+Careful! The y axis is exponential.
+
+If we make it linear, it looks like this:
+
+![Exponential Functions Linear Scale](/content/learn/math/functions/exponential-functions-log-scale.png)
+
+
+
+
+
+### 4. Trigonometric Functions
+
+Functions based on angles and periodic behavior
+
+```python
+# Trigonometric function examples
+def sine_function(x):
+    return np.sin(x)
+
+def cosine_function(x):
+    return np.cos(x)
+```
+
+![Trigonometric Functions](/content/learn/math/functions/trigonometric-functions.png)
+
+This is used in Rotory Positional Embeddings (RoPE) - LLM is using it to know the order of words (tokens) in the text.
+
+
+
+
+
+
+
+Functions are using in neural networks a lot: forward propagation, backward propagation, attention, activation functions, gradients, and many more.
+
+You don't need to learn them yet, just check them out.
+
+### 1. Sigmoid Function
+
+![Sigmoid Formula](/content/learn/math/functions/sigmoid-formula.png)
+
+**e** is a famous constant (Euler's number) used in math everywhere, it's value is approximately 2.718
+
+**f(x) = 1 / (1 + e^(-x))**
+
+```python
+def sigmoid(x):
+    return 1 / (1 + np.exp(-x))
+
+def sigmoid_derivative(x):
+    s = sigmoid(x)
+    return s * (1 - s)
+```
+
+![Sigmoid Function and Derivative](/content/learn/math/functions/sigmoid-function-derivative.png)
+
+We will learn derivativers in the next lesson, but I included the images here - derivative tells you how fast the function is changing - you see that when sigmoid function is growing fastest (in the middle), the derivative value is spiking.
+
+Just look at the slope of the function, if it's big (changing fast), the derivative will be big.
+
+### 2. ReLU (Rectified Linear Unit)
+
+**f(x) = max(0, x)**
+
+```python
+def relu(x):
+    return np.maximum(0, x)
+
+def relu_derivative(x):
+    return (x > 0).astype(float)
+```
+
+![ReLU Function and Derivative](/content/learn/math/functions/relu-function-derivative.png)
+
+### 3. Tanh Function
+
+![Tanh Formula](/content/learn/math/functions/tanh-formula.png)
+
+**f(x) = tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))**
+
+```python
+def tanh(x):
+    return np.tanh(x)
+
+def tanh_derivative(x):
+    return 1 - np.tanh(x)**2
+```
+
+![Tanh Function and Derivative](/content/learn/math/functions/tanh-function-derivative.png)
+
+**Congratulations! You finished functions for neural networks lesson!**
\ No newline at end of file
diff --git a/public/content/learn/math/functions/linear-function.png b/public/content/learn/math/functions/linear-function.png
new file mode 100644
index 0000000..5f3f96e
Binary files /dev/null and b/public/content/learn/math/functions/linear-function.png differ
diff --git a/public/content/learn/math/functions/linear-functions-comparison.png b/public/content/learn/math/functions/linear-functions-comparison.png
new file mode 100644
index 0000000..ff85273
Binary files /dev/null and b/public/content/learn/math/functions/linear-functions-comparison.png differ
diff --git a/public/content/learn/math/functions/quadratic-function.png b/public/content/learn/math/functions/quadratic-function.png
new file mode 100644
index 0000000..044ca30
Binary files /dev/null and b/public/content/learn/math/functions/quadratic-function.png differ
diff --git a/public/content/learn/math/functions/relu-function-derivative.png b/public/content/learn/math/functions/relu-function-derivative.png
new file mode 100644
index 0000000..50aaf68
Binary files /dev/null and b/public/content/learn/math/functions/relu-function-derivative.png differ
diff --git a/public/content/learn/math/functions/sigmoid-formula.png b/public/content/learn/math/functions/sigmoid-formula.png
new file mode 100644
index 0000000..49e818c
Binary files /dev/null and b/public/content/learn/math/functions/sigmoid-formula.png differ
diff --git a/public/content/learn/math/functions/sigmoid-function-derivative.png b/public/content/learn/math/functions/sigmoid-function-derivative.png
new file mode 100644
index 0000000..ef46024
Binary files /dev/null and b/public/content/learn/math/functions/sigmoid-function-derivative.png differ
diff --git a/public/content/learn/math/functions/tanh-formula.png b/public/content/learn/math/functions/tanh-formula.png
new file mode 100644
index 0000000..3deb79a
Binary files /dev/null and b/public/content/learn/math/functions/tanh-formula.png differ
diff --git a/public/content/learn/math/functions/tanh-function-derivative.png b/public/content/learn/math/functions/tanh-function-derivative.png
new file mode 100644
index 0000000..93818db
Binary files /dev/null and b/public/content/learn/math/functions/tanh-function-derivative.png differ
diff --git a/public/content/learn/math/functions/trigonometric-functions.png b/public/content/learn/math/functions/trigonometric-functions.png
new file mode 100644
index 0000000..d071d33
Binary files /dev/null and b/public/content/learn/math/functions/trigonometric-functions.png differ
diff --git a/public/content/learn/math/gradients/derivatives-tangent-lines.png b/public/content/learn/math/gradients/derivatives-tangent-lines.png
new file mode 100644
index 0000000..b314af1
Binary files /dev/null and b/public/content/learn/math/gradients/derivatives-tangent-lines.png differ
diff --git a/public/content/learn/math/gradients/gradient-surface-plot.png b/public/content/learn/math/gradients/gradient-surface-plot.png
new file mode 100644
index 0000000..172f20e
Binary files /dev/null and b/public/content/learn/math/gradients/gradient-surface-plot.png differ
diff --git a/public/content/learn/math/gradients/gradients-content.md b/public/content/learn/math/gradients/gradients-content.md
new file mode 100644
index 0000000..f0efb7b
--- /dev/null
+++ b/public/content/learn/math/gradients/gradients-content.md
@@ -0,0 +1,166 @@
+---
+hero:
+  title: "Gradients"
+  subtitle: "How Neural Networks Learn Through Gradient Descent"
+  tags:
+    - "📐 Mathematics"
+    - "⏱️ 14 min read"
+---
+
+**[video coming soon]**
+
+Welcome! This guide will walk you through the concept of gradients. We'll start with the familiar idea of a derivative and build up to understanding how gradients make neural networks learn.
+
+**Prerequisites:** Check out previous 3 lessons: Functions, Derivatives & Vectors
+
+---
+
+## Step 1: From Line Slope (Derivative) To Surface Slope (Gradient)
+
+Let's start with what you know. For a simple function like `f(x) = x²`, the derivative `f'(x) = 2x` gives you the slope of the curve at any point `x`. So for `x=3`, derivative is `2*3=6`. That means as you increase `x` but a tiny bit, `f(x) = x²` will increase by 6.
+
+At `x=4`, derivative is `2*4=8`, so at that point `f(x) = x²` is increasing by 8x.
+
+
+
+
+
+Notice that I say "if you increase x by a bit, `f(x) = x²` will increase by 6" and I don't say "if you increase x by 1", because increasing x by 1 (from 3 to 4 in this case) is a lot and by that point derivative (rate of change) will go from 6 to 8.
+
+On this image you can see that the red slope at `x=3` is smaller than thes green slope at `x=4`.
+
+![Derivatives with Tangent Lines](/content/learn/math/gradients/derivatives-tangent-lines.png)
+
+In this case, if you increase `x=3` by 1, derivative will go from 6 to 8. So that's why we say "if you increase `x=3` by a tiny bit, `f(x) = x²` will increase by 6".
+
+But what if our function has multiple inputs, like `f(x, y) = x² + y²`?
+
+
+
+
+
+This function doesn't describe a line; it describes a 3D surface, like a bowl landscape. If you're standing at any point `(x, y)` on this surface, what is "the" slope?
+
+![Gradient Surface Plot](/content/learn/math/gradients/gradient-surface-plot.png)
+
+There isn't just one. There's a slope if you take a step in the x-direction, a different slope if you step in the y-direction, and another for every other direction in between.
+
+To handle this, we use **partial derivatives**.
+
+- **Partial Derivative with respect to x (∂f/∂x):** This is the slope if you only move in the x-direction. You treat y as a constant. For `f(x, y) = x² + y²`, the partial derivative `∂f/∂x = 2x` - remember the rule for a constant that stands alone, constants become 0 in the derivative, and since we treat y as a constant, `+ y²` will ecome `+ 0`.
+
+- **Partial Derivative with respect to y (∂f/∂y):** This is the slope if you only move in the y-direction. You treat x as a constant. For `f(x, y) = x² + y²`, the partial derivative `∂f/∂y = 2y`.
+
+Now we have two slopes, one for each axis. The **gradient** is simply a way to package all these partial derivatives together.
+
+**Definition:** The gradient is a vector that contains all the partial derivatives of a function. It's denoted by `∇f` (pronounced "nabla f" or "del f").
+
+For our function `f(x, y)`, the gradient is:
+
+```
+∇f = [ ∂f/∂x, ∂f/∂y ] = [ 2x, 2y ]
+```
+
+
+
+## Step 2: What the Gradient Vector Tells Us
+
+So, the gradient is a vector (think of it as an arrow). What do the direction and length of this arrow mean?
+
+This is the most important intuition to grasp.
+
+### 1. The Direction of the Gradient
+
+The gradient vector at any point `(x, y)` points in the direction of the **steepest possible ascent**.
+
+Imagine you're standing on a mountainside. If you look around, there are many ways to take a step. One direction leads straight uphill, another leads straight downhill, and others traverse the mountain at a constant elevation. The gradient is an arrow painted on the ground at your feet that points directly up the steepest path from where you are.
+
+### 2. The Magnitude (Length) of the Gradient
+
+The length of the gradient vector tells you **how steep** that steepest path is.
+
+
+
+
+
+- A **long gradient vector** means the slope is very steep. A small step will result in a large change in elevation.
+
+- A **short gradient vector** means the slope is gentle. The terrain is nearly flat.
+
+- A **zero-length gradient vector** (i.e., [0, 0]) means you are at a flat spot—either a peak, a valley bottom, or a flat plateau.
+
+
+
+## Step 3: A Concrete Example
+
+Let's go back to our bowl function, `f(x, y) = x² + y²`, and its gradient, `∇f = [2x, 2y]`. The minimum of this function is clearly at `(0, 0)`.
+
+Let's calculate the gradient at a specific point, say `(3, 1)`.
+
+```
+∇f(3, 1) = [ 2  3, 2  1 ] = [6, 2]
+```
+
+This vector `[6, 2]` is an arrow that points "6 units in the x-direction and 2 units in the y-direction." This is an arrow pointing up and to the right, away from the minimum at `(0, 0)`. This makes perfect sense! From the point `(3, 1)`, the steepest way up the bowl is away from the bottom.
+
+What about the point `(-2, -2)`?
+
+```
+∇f(-2, -2) = [ 2  -2, 2  -2 ] = [-4, -4]
+```
+
+This vector points down and to the left, again, away from the bottom of the bowl at `(0, 0)`.
+
+
+
+## Step 4: Visualizing the Gradient Field
+
+Let's visualize this. The image below shows a contour plot of our function `f(x, y) = x² + y²`. Think of this as a topographic map. The lines connect points of equal "elevation." The arrows represent the gradient vectors at various points.
+
+Notice two crucial properties in the visualization:
+
+- **Direction:** The arrows always point from a lower contour line to a higher one (from blue to yellow). They show the path of steepest ascent.
+
+- **Orthogonality:** The gradient vectors are always perpendicular to the contour lines. To go straight uphill, you must walk at a right angle to the path of "no elevation change."
+
+When you run this, you will see a visual representation of everything we've discussed.
+
+
+
+## Step 5: The "Why": Gradients and Machine Learning
+
+This is where gradients become incredibly powerful. In machine learning, we define a **loss function** (or **cost function**). This function measures how "wrong" our model's predictions are. The inputs to this function are the model's parameters (its weights and biases), and the output is a single number representing the total error.
+
+Our goal is to **find the set of parameters that minimizes the error**.
+
+This is the exact same problem as finding the lowest point in a valley!
+
+The algorithm used to do this is called **Gradient Descent**. Here's how it works:
+
+
+
+
+
+1. **Start Somewhere:** Initialize the model's parameters to random values. (This is like dropping a hiker at a random spot on the mountain).
+
+2. **Find the Way Down:** Calculate the gradient of the loss function at your current location. The gradient points straight uphill.
+
+3. **Take a Step Downhill:** To go downhill, simply move in the direction of the **negative gradient**. We update our parameters by taking a small step in that opposite direction.
+
+4. **Repeat:** Go back to step 2. Keep calculating the gradient and taking small steps downhill until you reach the bottom of the valley, where the gradient is zero.
+
+This is the core mechanic of how neural networks "learn." They are constantly calculating the gradient of their error and adjusting their internal parameters to move in the direction that reduces that error.
+
+## Key Takeaways
+
+
+
+
+
+- A **gradient** is a vector of partial derivatives that generalizes the concept of slope to functions with multiple inputs.
+
+- **Direction:** The gradient vector points in the direction of the steepest ascent.
+
+- **Magnitude:** Its length represents how steep that ascent is.
+
+- **Optimization:** The negative gradient points in the direction of steepest descent, which is the key to finding the minimum of a function using Gradient Descent.
\ No newline at end of file
diff --git a/public/content/learn/math/matrices/matrices-content.md b/public/content/learn/math/matrices/matrices-content.md
new file mode 100644
index 0000000..ba1f675
--- /dev/null
+++ b/public/content/learn/math/matrices/matrices-content.md
@@ -0,0 +1,102 @@
+---
+hero:
+  title: "Matrices"
+  subtitle: "Operations and Transformations for Neural Networks"
+  tags:
+    - "📐 Mathematics"
+    - "⏱️ 12 min read"
+---
+
+**[video coming soon]**
+
+**Level:** Beginner → Intermediate.
+
+---
+
+## 1. What is a matrix?
+
+A matrix is a rectangular array of numbers arranged in rows and columns. We write an `(m x n)` matrix as:
+
+![Matrix Notation](/content/learn/math/matrices/matrix-notation.png)
+
+
+
+
+
+`(m)` is the number of rows, `(n)` the number of columns.
+
+If `(m=n)` the matrix is **square**.
+
+**Why matrices?** They represent neural network weights, linear transformations, systems of linear equations, data tables, graphs, and more.
+
+
+
+## 2. Notation and basic examples
+
+**Entries:** `(A_ij)` is element in row `(i)`, column `(j)`.
+
+**Row vector:** 1×n, **column vector:** m×1.
+
+### Example matrices
+
+We will use these 2 matrices below.
+
+![Matrix Example](/content/learn/math/matrices/matrix-example.png)
+
+## 3. Step-by-step matrix operations
+
+### 3.1 Addition and subtraction (elementwise)
+
+Only for matrices of the same size. Add corresponding elements.
+
+**Example:** `(A+B)`
+
+![Matrix Addition](/content/learn/math/matrices/matrix-addition.png)
+
+### 3.2 Scalar multiplication
+
+Multiply each element by the scalar. For `(2A)`:
+
+![Scalar Multiplication Matrix](/content/learn/math/matrices/scalar-multiplication-matrix.png)
+
+### 3.3 Matrix multiplication
+
+You do a dot product of a row of th first matrix with the column of the second matrix and write result at the position where that row and column intercept.
+
+If `(A)` is `(m x p)` and `(B)` is `(p x n)`, then `(AB)` is `(m x n)`. Multiply rows of `(A)` by columns of `(B)` and sum.
+
+**Example:** multiply the two 2×2 matrices above.
+
+![Matrix Multiplication Steps](/content/learn/math/matrices/matrix-multiplication-steps.png)
+
+**Important:** Matrix multiplication is generally **not commutative**: `(AB is not equal to BA)` in general.
+
+## 4. Key matrix transformations and properties
+
+### 4.1 Transpose
+
+![Matrix Transpose](/content/learn/math/matrices/matrix-transpose.png)
+
+### 4.2 Determinant (square matrices)
+
+![Matrix Determinant](/content/learn/math/matrices/matrix-determinant.png)
+
+### 4.3 Inverse (when it exists)
+
+![Matrix Inverse Formula](/content/learn/math/matrices/matrix-inverse-formula.png)
+
+### 4.4 Rank
+
+The **rank** is the dimension of the column space (or row space). If rank = n for an `(n x n)` matrix, it's **full rank** and **invertible**.
+
+### 4.5 Special matrices (common types)
+
+![Special Matrices](/content/learn/math/matrices/special-matrices.png)
+
+## 5. Common pitfalls and tips
+
+- Remember matrix multiplication order matters.
+
+- Watch dimensions carefully (rows of left must equal columns of right).
+
+- Numerical stability: beware near-singular matrices (determinant ≈ 0).
\ No newline at end of file
diff --git a/public/content/learn/math/matrices/matrix-addition.png b/public/content/learn/math/matrices/matrix-addition.png
new file mode 100644
index 0000000..e9c3d24
Binary files /dev/null and b/public/content/learn/math/matrices/matrix-addition.png differ
diff --git a/public/content/learn/math/matrices/matrix-determinant.png b/public/content/learn/math/matrices/matrix-determinant.png
new file mode 100644
index 0000000..6042242
Binary files /dev/null and b/public/content/learn/math/matrices/matrix-determinant.png differ
diff --git a/public/content/learn/math/matrices/matrix-example.png b/public/content/learn/math/matrices/matrix-example.png
new file mode 100644
index 0000000..0c8c266
Binary files /dev/null and b/public/content/learn/math/matrices/matrix-example.png differ
diff --git a/public/content/learn/math/matrices/matrix-inverse-formula.png b/public/content/learn/math/matrices/matrix-inverse-formula.png
new file mode 100644
index 0000000..02b037d
Binary files /dev/null and b/public/content/learn/math/matrices/matrix-inverse-formula.png differ
diff --git a/public/content/learn/math/matrices/matrix-multiplication-steps.png b/public/content/learn/math/matrices/matrix-multiplication-steps.png
new file mode 100644
index 0000000..28dcfb8
Binary files /dev/null and b/public/content/learn/math/matrices/matrix-multiplication-steps.png differ
diff --git a/public/content/learn/math/matrices/matrix-notation.png b/public/content/learn/math/matrices/matrix-notation.png
new file mode 100644
index 0000000..be5e26d
Binary files /dev/null and b/public/content/learn/math/matrices/matrix-notation.png differ
diff --git a/public/content/learn/math/matrices/matrix-transpose.png b/public/content/learn/math/matrices/matrix-transpose.png
new file mode 100644
index 0000000..e7cb7b9
Binary files /dev/null and b/public/content/learn/math/matrices/matrix-transpose.png differ
diff --git a/public/content/learn/math/matrices/scalar-multiplication-matrix.png b/public/content/learn/math/matrices/scalar-multiplication-matrix.png
new file mode 100644
index 0000000..d36a8ab
Binary files /dev/null and b/public/content/learn/math/matrices/scalar-multiplication-matrix.png differ
diff --git a/public/content/learn/math/matrices/special-matrices.png b/public/content/learn/math/matrices/special-matrices.png
new file mode 100644
index 0000000..eefc9ca
Binary files /dev/null and b/public/content/learn/math/matrices/special-matrices.png differ
diff --git a/public/content/learn/math/vectors/scalar-multiplication.png b/public/content/learn/math/vectors/scalar-multiplication.png
new file mode 100644
index 0000000..0600354
Binary files /dev/null and b/public/content/learn/math/vectors/scalar-multiplication.png differ
diff --git a/public/content/learn/math/vectors/simple-vector.png b/public/content/learn/math/vectors/simple-vector.png
new file mode 100644
index 0000000..9345ff1
Binary files /dev/null and b/public/content/learn/math/vectors/simple-vector.png differ
diff --git a/public/content/learn/math/vectors/vector-addition.png b/public/content/learn/math/vectors/vector-addition.png
new file mode 100644
index 0000000..df1fb8f
Binary files /dev/null and b/public/content/learn/math/vectors/vector-addition.png differ
diff --git a/public/content/learn/math/vectors/vector-angle.png b/public/content/learn/math/vectors/vector-angle.png
new file mode 100644
index 0000000..14a6fd3
Binary files /dev/null and b/public/content/learn/math/vectors/vector-angle.png differ
diff --git a/public/content/learn/math/vectors/vectors-content.md b/public/content/learn/math/vectors/vectors-content.md
new file mode 100644
index 0000000..4541bde
--- /dev/null
+++ b/public/content/learn/math/vectors/vectors-content.md
@@ -0,0 +1,189 @@
+---
+hero:
+  title: "Vectors"
+  subtitle: "Magnitude, Direction, and Vector Operations"
+  tags:
+    - "📐 Mathematics"
+    - "⏱️ 15 min read"
+---
+
+**[video comingn soon]**
+
+Welcome! This guide will introduce you to vectors, which are fundamental objects in mathematics, physics, and computer science. We'll explore what they are and how to work with them, focusing on the concepts, not the code.
+
+---
+
+## Step 1: What is a Vector?
+
+At its core, a **vector** is a mathematical object that has both **magnitude** (length or size) and **direction**.
+
+
+
+
+
+Think about the difference between "speed" and "velocity."
+
+- **Speed** is a single number (a scalar), like 50 km/h. It only tells you the magnitude.
+
+- **Velocity** is a vector, like 50 km/h north. It tells you both the magnitude (50 km/h) and the direction (north).
+
+We represent vectors as a list of numbers called **components**. For example, in a 2D plane, a vector `v` can be written as:
+
+```
+v = [x, y]
+```
+
+This notation means "start at the origin (0,0), move x units along the horizontal axis, and y units along the vertical axis." The arrow drawn from the origin to that point (x, y) is the vector.
+
+**Examples:**
+
+- `v = [3, 4]` represents an arrow pointing to the coordinate (3, 4).
+
+- `u = [-2, 1]` represents an arrow pointing to the coordinate (-2, 1).
+
+![Simple Vector](/content/learn/math/vectors/simple-vector.png)
+
+
+
+## Step 2: The Two Core Properties: Magnitude and Direction
+
+Every vector is defined by these two properties.
+
+### Magnitude (Length)
+
+The **magnitude** of a vector is its length. It's often written with double bars, like `||v||`. We can calculate it using the Pythagorean theorem. For a 2D vector `v = [x, y]`, the formula is:
+
+```
+||v|| = √(x² + y²)
+```
+
+For a 3D vector `w = [x, y, z]`, it's a natural extension: `||w|| = √(x² + y² + z²)`.
+
+**Example:**
+For `v = [3, 4]`:
+```
+||v|| = √(3² + 4²) = √(9 + 16) = √25 = 5
+```
+The length of the vector [3, 4] is 5 units.
+
+### Direction (Unit Vectors)
+
+How can we describe only the direction of a vector, ignoring its length? We use a **unit vector**. A unit vector is any vector that has a magnitude of exactly 1.
+
+To find the unit vector of any given vector, you simply divide the vector by its own magnitude. This scales the vector down to a length of 1 while preserving its direction. The unit vector is often denoted with a "hat," like `û`.
+
+```
+û = v / ||v||
+```
+
+**Example:**
+For `v = [3, 4]`, we know `||v|| = 5`.
+The unit vector `û` is:
+```
+û = [3, 4] / 5 = [3/5, 4/5] = [0.6, 0.8]
+```
+This new vector [0.6, 0.8] points in the exact same direction as [3, 4], but its length is 1.
+
+
+
+## Step 3: Vector Arithmetic
+
+We can perform operations on vectors to combine or modify them.
+
+### Vector Addition
+
+**Geometrically**, adding two vectors `u + v` means placing the tail of vector `v` at the tip of vector `u`. The resulting vector, `w`, is the arrow drawn from the original starting point to the tip of the second vector.
+
+**Mathematically**, we just add the corresponding components:
+If `u = [x₁, y₁]` and `v = [x₂, y₂]`, then:
+```
+u + v = [x₁ + x₂, y₁ + y₂]
+```
+
+![Vector Addition](/content/learn/math/vectors/vector-addition.png)
+
+### Scalar Multiplication
+
+Multiplying a vector by a regular number (a **scalar**) changes its magnitude but not its direction (unless the scalar is negative, in which case the direction is reversed).
+
+If `k` is a scalar and `v = [x, y]`, then:
+```
+k  v = [kx, k*y]
+```
+
+
+**Examples:**
+
+- `2 * v` doubles the vector's length.
+
+- `0.5 * v` halves the vector's length.
+
+- `-1 * v` flips the vector to point in the opposite direction.
+
+![Scalar Multiplication](/content/learn/math/vectors/scalar-multiplication.png)
+
+
+
+
+
+
+
+## Step 4: The Dot Product
+
+The **dot product** is a way of multiplying two vectors that results in a single number (a scalar). It is one of the most important vector operations.
+
+**Intuition:** The dot product tells you how much two vectors align or point in the same direction.
+
+- **Large positive dot product:** The vectors point in very similar directions.
+
+- **Dot product is zero:** The vectors are perpendicular (orthogonal) to each other.
+
+- **Large negative dot product:** The vectors point in generally opposite directions.
+
+**Calculation:** To calculate the dot product, you multiply the corresponding components and then add the results.
+If `u = [x₁, y₁]` and `v = [x₂, y₂]`, the dot product `u · v` is:
+
+```
+u · v = (x₁  x₂) + (y₁  y₂)
+```
+
+### Geometric Meaning & Finding Angles
+The dot product also has a powerful geometric definition:
+
+```
+u · v = ||u||  ||v||  cos(θ)
+```
+
+where `θ` (theta) is the angle between the two vectors. We can rearrange this formula to find the angle between any two vectors!
+
+```
+cos(θ) = (u · v) / (||u|| * ||v||)
+```
+
+This is an incredibly useful property, allowing us to calculate angles in any number of dimensions.
+
+![Vector Angle](/content/learn/math/vectors/vector-angle.png)
+
+## Step 5: Neural Networks:
+
+Every input, hidden state, and output is a vector.
+
+- A single image, sound, or sentence is converted into a vector of numbers that captures its features.
+
+- Each neuron operates on these vectors — combining them through dot products, matrix multiplications, and nonlinear activations to extract patterns.
+
+- When you train a neural network, you're really adjusting weight vectors so that the model transforms input vectors into desired output vectors.
+
+
+
+### 💬 In Large Language Models (LLMs):
+
+LLMs represent words, sentences, and even abstract concepts as high-dimensional vectors (embeddings).
+
+- The vector for a word like "king" is close to "queen" in this space because their meanings are similar.
+
+- Attention mechanisms compute dot products between vectors to measure how related words are in context — that's how the model "focuses" on relevant information.
+
+- The entire reasoning process of an LLM — understanding, summarizing, generating — happens through transformations of these vectors.
+
+**By understanding vectors, you understand how neural networks think, learn, and represent meaning.**
\ No newline at end of file
diff --git a/public/content/learn/neural-networks/architecture-of-a-network/architecture-of-a-network-content.md b/public/content/learn/neural-networks/architecture-of-a-network/architecture-of-a-network-content.md
new file mode 100644
index 0000000..e8d76a0
--- /dev/null
+++ b/public/content/learn/neural-networks/architecture-of-a-network/architecture-of-a-network-content.md
@@ -0,0 +1,283 @@
+---
+hero:
+  title: "Architecture of a Network"
+  subtitle: "Understanding Neural Network Structure and Design"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 10 min read"
+---
+
+A neural network's **architecture** is its structure - how many layers, how many neurons, and how they connect!
+
+![Network Layers](/content/learn/neural-networks/architecture-of-a-network/network-layers.png)
+
+## Basic Architecture
+
+**Typical neural network has three parts:**
+
+1. **Input Layer:** Receives the data
+2. **Hidden Layers:** Process and transform
+3. **Output Layer:** Makes the prediction
+
+```yaml
+Input Layer → Hidden Layer 1 → Hidden Layer 2 → Output Layer
+   (784)         (128)              (64)             (10)
+```
+
+## Example Architecture
+
+```python
+import torch
+import torch.nn as nn
+
+class SimpleNet(nn.Module):
+    def __init__(self):
+        super().__init__()
+        # Input layer → Hidden layer 1
+        self.fc1 = nn.Linear(784, 128)
+        
+        # Hidden layer 1 → Hidden layer 2
+        self.fc2 = nn.Linear(128, 64)
+        
+        # Hidden layer 2 → Output layer
+        self.fc3 = nn.Linear(64, 10)
+    
+    def forward(self, x):
+        # Layer 1
+        x = torch.relu(self.fc1(x))
+        
+        # Layer 2
+        x = torch.relu(self.fc2(x))
+        
+        # Output layer (no activation for logits)
+        x = self.fc3(x)
+        
+        return x
+
+model = SimpleNet()
+print(model)
+```
+
+**Architecture diagram:**
+
+```yaml
+Input: 784 features (28×28 image flattened)
+  ↓
+Linear(784 → 128) + ReLU
+  ↓
+Linear(128 → 64) + ReLU
+  ↓
+Linear(64 → 10) [logits for 10 classes]
+  ↓
+Output: 10 class scores
+```
+
+## Layer Sizes
+
+**How to choose layer sizes:**
+
+```yaml
+Input layer:
+  Size = number of features
+  Example: 28×28 image = 784
+
+Hidden layers:
+  Start wide, gradually narrow
+  Common pattern: 512 → 256 → 128
+  Or: Stay same size
+
+Output layer:
+  Size = number of outputs
+  Classification: number of classes
+  Regression: usually 1
+```
+
+**Example patterns:**
+
+```python
+# Pattern 1: Funnel (wide to narrow)
+model = nn.Sequential(
+    nn.Linear(784, 512),
+    nn.ReLU(),
+    nn.Linear(512, 256),
+    nn.ReLU(),
+    nn.Linear(256, 10)
+)
+
+# Pattern 2: Uniform (same size)
+model = nn.Sequential(
+    nn.Linear(100, 100),
+    nn.ReLU(),
+    nn.Linear(100, 100),
+    nn.ReLU(),
+    nn.Linear(100, 1)
+)
+
+# Pattern 3: Bottleneck (narrow middle)
+model = nn.Sequential(
+    nn.Linear(784, 128),
+    nn.ReLU(),
+    nn.Linear(128, 32),   # Bottleneck
+    nn.ReLU(),
+    nn.Linear(32, 128),
+    nn.ReLU(),
+    nn.Linear(128, 784)
+)
+```
+
+## Depth vs Width
+
+**Depth = number of layers**
+**Width = neurons per layer**
+
+```python
+# Deep but narrow
+deep_narrow = nn.Sequential(
+    nn.Linear(10, 20),
+    nn.ReLU(),
+    nn.Linear(20, 20),
+    nn.ReLU(),
+    nn.Linear(20, 20),
+    nn.ReLU(),
+    nn.Linear(20, 20),
+    nn.ReLU(),
+    nn.Linear(20, 1)
+)  # 5 layers, 20 neurons each
+
+# Shallow but wide
+shallow_wide = nn.Sequential(
+    nn.Linear(10, 1000),
+    nn.ReLU(),
+    nn.Linear(1000, 1)
+)  # 2 layers, 1000 neurons
+```
+
+**Trade-offs:**
+
+```yaml
+Deep networks:
+  ✓ Learn hierarchical features
+  ✓ More expressive
+  ✗ Harder to train
+  ✗ Gradient problems
+
+Wide networks:
+  ✓ More parameters per layer
+  ✓ Easier to train
+  ✗ Less feature hierarchy
+  ✗ More memory
+```
+
+## Common Architectures
+
+### Fully Connected (Dense)
+
+```python
+# Every neuron connects to every neuron in next layer
+fc_net = nn.Sequential(
+    nn.Linear(784, 256),
+    nn.ReLU(),
+    nn.Linear(256, 128),
+    nn.ReLU(),
+    nn.Linear(128, 10)
+)
+```
+
+### Convolutional (CNN)
+
+```python
+# For images
+cnn = nn.Sequential(
+    nn.Conv2d(3, 32, 3),
+    nn.ReLU(),
+    nn.MaxPool2d(2),
+    nn.Conv2d(32, 64, 3),
+    nn.ReLU(),
+    nn.Flatten(),
+    nn.Linear(64*6*6, 10)
+)
+```
+
+## Counting Parameters
+
+```python
+import torch.nn as nn
+
+model = nn.Sequential(
+    nn.Linear(10, 20),  # 10×20 + 20 = 220 params
+    nn.ReLU(),          # 0 params
+    nn.Linear(20, 5)    # 20×5 + 5 = 105 params
+)
+
+# Count total parameters
+total_params = sum(p.numel() for p in model.parameters())
+print(f"Total parameters: {total_params}")
+# Output: 325
+```
+
+## Practical Example: MNIST Classifier
+
+```python
+import torch.nn as nn
+
+class MNISTNet(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.network = nn.Sequential(
+            # Input: 28×28 = 784
+            nn.Linear(784, 128),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            
+            nn.Linear(128, 64),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            
+            # Output: 10 classes (digits 0-9)
+            nn.Linear(64, 10)
+        )
+    
+    def forward(self, x):
+        # Flatten image
+        x = x.view(-1, 784)
+        # Forward pass
+        return self.network(x)
+
+model = MNISTNet()
+
+# Count parameters
+params = sum(p.numel() for p in model.parameters())
+print(f"Parameters: {params:,}")
+```
+
+## Key Takeaways
+
+✓ **Three parts:** Input → Hidden → Output
+
+✓ **Layer sizes:** Input (features), Hidden (variable), Output (targets)
+
+✓ **Depth:** Number of layers
+
+✓ **Width:** Neurons per layer
+
+✓ **More layers:** More complex patterns
+
+✓ **Design choice:** Many valid architectures
+
+**Quick Reference:**
+
+```python
+# Basic architecture template
+model = nn.Sequential(
+    nn.Linear(input_size, hidden1_size),
+    nn.ReLU(),
+    nn.Linear(hidden1_size, hidden2_size),
+    nn.ReLU(),
+    nn.Linear(hidden2_size, output_size)
+)
+
+# Count parameters
+total = sum(p.numel() for p in model.parameters())
+```
+
+**Remember:** Architecture is like a blueprint - it defines your network's structure! 🎉
diff --git a/public/content/learn/neural-networks/architecture-of-a-network/network-layers.png b/public/content/learn/neural-networks/architecture-of-a-network/network-layers.png
new file mode 100644
index 0000000..b7b9395
Binary files /dev/null and b/public/content/learn/neural-networks/architecture-of-a-network/network-layers.png differ
diff --git a/public/content/learn/neural-networks/backpropagation-in-action/backpropagation-in-action-content.md b/public/content/learn/neural-networks/backpropagation-in-action/backpropagation-in-action-content.md
new file mode 100644
index 0000000..f72db77
--- /dev/null
+++ b/public/content/learn/neural-networks/backpropagation-in-action/backpropagation-in-action-content.md
@@ -0,0 +1,114 @@
+---
+hero:
+  title: "Backpropagation in Action"
+  subtitle: "Seeing Gradients Flow Through Networks"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 8 min read"
+---
+
+Let's see backpropagation in action with real examples!
+
+## Watching Gradients
+
+```python
+import torch
+import torch.nn as nn
+
+model = nn.Sequential(
+    nn.Linear(2, 3),
+    nn.ReLU(),
+    nn.Linear(3, 1)
+)
+
+x = torch.tensor([[1.0, 2.0]])
+y_true = torch.tensor([[5.0]])
+
+# Forward
+y_pred = model(x)
+loss = (y_pred - y_true) ** 2
+
+# Backward
+loss.backward()
+
+# See gradients
+for name, param in model.named_parameters():
+    print(f"{name}:")
+    print(f"  Value: {param.data}")
+    print(f"  Gradient: {param.grad}")
+    print()
+```
+
+## Gradient Flow Example
+
+```python
+import torch
+
+# Three-step computation
+x = torch.tensor([2.0], requires_grad=True)
+y = x ** 2      # y = x²
+z = y + 3       # z = y + 3  
+loss = z ** 2   # loss = z²
+
+# Backward
+loss.backward()
+
+print(f"x = {x.item()}")
+print(f"y = {y.item()}")
+print(f"z = {z.item()}")
+print(f"loss = {loss.item()}")
+print(f"\\ndloss/dx = {x.grad.item()}")
+
+# Manual chain rule:
+# dloss/dx = dloss/dz × dz/dy × dy/dx
+#          = 2z × 1 × 2x
+#          = 2(7) × 1 × 2(2)
+#          = 14 × 4 = 56 ✓
+```
+
+## Training with Backprop
+
+```python
+import torch
+import torch.nn as nn
+import torch.optim as optim
+
+model = nn.Linear(1, 1)
+optimizer = optim.SGD(model.parameters(), lr=0.01)
+criterion = nn.MSELoss()
+
+# Data: y = 2x
+X = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
+y = torch.tensor([[2.0], [4.0], [6.0], [8.0]])
+
+# Train
+for epoch in range(50):
+    # Forward
+    pred = model(X)
+    loss = criterion(pred, y)
+    
+    # Backward
+    optimizer.zero_grad()
+    loss.backward()
+    
+    # Update
+    optimizer.step()
+    
+    if epoch % 10 == 0:
+        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
+
+print(f"Learned weight: {model.weight.item():.2f}")  # ~2.0
+print(f"Learned bias: {model.bias.item():.2f}")      # ~0.0
+```
+
+## Key Takeaways
+
+✓ **Backprop:** Computes gradients efficiently
+
+✓ **Chain rule:** Multiplies gradients backwards
+
+✓ **Automatic:** PyTorch handles it
+
+✓ **Essential:** Makes training possible
+
+**Remember:** Backprop = automatic gradient calculation through layers! 🎉
diff --git a/public/content/learn/neural-networks/backpropagation/backpropagation-content.md b/public/content/learn/neural-networks/backpropagation/backpropagation-content.md
new file mode 100644
index 0000000..e2ece30
--- /dev/null
+++ b/public/content/learn/neural-networks/backpropagation/backpropagation-content.md
@@ -0,0 +1,379 @@
+---
+hero:
+  title: "Backpropagation"
+  subtitle: "The Algorithm That Enables Learning"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 18 min read"
+---
+
+# Backpropagation
+
+## What is Backpropagation?
+
+Backpropagation (short for "backward propagation of errors") is the algorithm used to **calculate gradients** of the loss function with respect to the weights. It works backward through the network, computing how much each weight contributed to the error.
+
+Think of it as **tracing blame backward** through the network!
+
+![Backpropagation Overview](backprop-overview.png)
+
+## Why It Matters
+
+Without backpropagation:
+- ❌ We couldn't efficiently train deep neural networks
+- ❌ Would need to compute millions of partial derivatives manually
+- ❌ Training would take forever
+
+With backpropagation:
+- ✅ Efficiently computes all gradients in one backward pass
+- ✅ Uses the chain rule to reuse computations
+- ✅ Makes deep learning practical
+
+## The Core Idea
+
+The key insight is the **chain rule** from calculus:
+
+```
+If y = f(g(x)), then:
+dy/dx = (dy/dg) × (dg/dx)
+```
+
+In a neural network with multiple layers, we chain these derivatives together:
+
+```
+∂L/∂w⁽¹⁾ = (∂L/∂a⁽³⁾) × (∂a⁽³⁾/∂a⁽²⁾) × (∂a⁽²⁾/∂a⁽¹⁾) × (∂a⁽¹⁾/∂w⁽¹⁾)
+```
+
+## The Backpropagation Process
+
+### Step 1: Forward Pass
+First, do a forward pass to get the prediction and cache all intermediate values:
+
+```python
+# Forward pass (saving values for backprop)
+z1 = W1 @ X + b1
+a1 = relu(z1)          # Cache z1, a1
+
+z2 = W2 @ a1 + b2
+a2 = sigmoid(z2)       # Cache z2, a2 (prediction)
+
+# Compute loss
+loss = (a2 - y)**2     # MSE loss
+```
+
+### Step 2: Output Layer Gradient
+Calculate gradient at the output:
+
+```python
+# For MSE loss: L = (ŷ - y)²
+dL_da2 = 2 * (a2 - y)
+
+# Gradient through sigmoid
+da2_dz2 = a2 * (1 - a2)  # sigmoid derivative
+
+# Combine using chain rule
+dL_dz2 = dL_da2 * da2_dz2
+```
+
+### Step 3: Propagate Backward
+For each layer (from output to input):
+
+```python
+# Gradients for layer 2 weights and bias
+dL_dW2 = dL_dz2 @ a1.T
+dL_db2 = dL_dz2
+
+# Gradient flowing to previous layer
+dL_da1 = W2.T @ dL_dz2
+
+# Gradient through ReLU
+da1_dz1 = (z1 > 0).astype(float)  # ReLU derivative
+dL_dz1 = dL_da1 * da1_dz1
+
+# Gradients for layer 1 weights and bias
+dL_dW1 = dL_dz1 @ X.T
+dL_db1 = dL_dz1
+```
+
+### Step 4: Update Weights
+Use gradients to update parameters:
+
+```python
+# Gradient descent
+learning_rate = 0.01
+
+W1 -= learning_rate * dL_dW1
+b1 -= learning_rate * dL_db1
+W2 -= learning_rate * dL_dW2
+b2 -= learning_rate * dL_db2
+```
+
+![Backprop Steps](backprop-steps.png)
+
+## Detailed Example
+
+Let's work through a concrete example with numbers.
+
+### Setup
+```
+Input: x = 2
+Target: y = 1
+
+Network:
+- Layer 1: 1 neuron, ReLU
+  W1 = 0.5, b1 = 0.1
+- Layer 2: 1 neuron, Sigmoid
+  W2 = 0.8, b2 = 0.2
+
+Loss: MSE = (ŷ - y)²
+```
+
+### Forward Pass
+```
+Layer 1:
+z1 = 0.5(2) + 0.1 = 1.1
+a1 = ReLU(1.1) = 1.1
+
+Layer 2:
+z2 = 0.8(1.1) + 0.2 = 1.08
+a2 = sigmoid(1.08) = 0.746
+
+Loss:
+L = (0.746 - 1)² = 0.0645
+```
+
+### Backward Pass
+
+**Output Layer:**
+```
+dL/da2 = 2(0.746 - 1) = -0.508
+
+sigmoid'(z2) = a2(1 - a2)
+             = 0.746(1 - 0.746) = 0.189
+
+dL/dz2 = -0.508 × 0.189 = -0.096
+
+dL/dW2 = dL/dz2 × a1 = -0.096 × 1.1 = -0.106
+dL/db2 = dL/dz2 = -0.096
+```
+
+**Hidden Layer:**
+```
+dL/da1 = W2 × dL/dz2
+       = 0.8 × (-0.096) = -0.077
+
+ReLU'(z1) = 1 (since z1 = 1.1 > 0)
+
+dL/dz1 = -0.077 × 1 = -0.077
+
+dL/dW1 = dL/dz1 × x = -0.077 × 2 = -0.154
+dL/db1 = dL/dz1 = -0.077
+```
+
+### Update Weights (α = 0.1)
+```
+W1_new = 0.5 - 0.1(-0.154) = 0.515
+b1_new = 0.1 - 0.1(-0.077) = 0.108
+W2_new = 0.8 - 0.1(-0.106) = 0.811
+b2_new = 0.2 - 0.1(-0.096) = 0.210
+```
+
+The weights moved in the direction to reduce the loss! ✅
+
+## Activation Function Derivatives
+
+### ReLU
+```python
+def relu_derivative(z):
+    return (z > 0).astype(float)
+
+# Examples:
+relu'(-1) = 0
+relu'(0)  = 0
+relu'(1)  = 1
+```
+
+### Sigmoid
+```python
+def sigmoid_derivative(a):
+    # a is the sigmoid output
+    return a * (1 - a)
+
+# Examples:
+# If sigmoid(z) = 0.7, then sigmoid'(z) = 0.7 × 0.3 = 0.21
+```
+
+### Tanh
+```python
+def tanh_derivative(a):
+    # a is the tanh output
+    return 1 - a**2
+
+# Examples:
+# If tanh(z) = 0.5, then tanh'(z) = 1 - 0.25 = 0.75
+```
+
+### Softmax (special case)
+```python
+# For softmax with cross-entropy loss, the gradient simplifies to:
+dL/dz = a - y  # where a is softmax output, y is one-hot label
+```
+
+## Loss Function Gradients
+
+### Mean Squared Error (MSE)
+```python
+# L = (ŷ - y)²
+dL/dŷ = 2(ŷ - y)
+```
+
+### Binary Cross-Entropy
+```python
+# L = -[y log(ŷ) + (1-y)log(1-ŷ)]
+dL/dŷ = -(y/ŷ) + (1-y)/(1-ŷ)
+
+# Simplified with sigmoid: dL/dz = ŷ - y
+```
+
+### Categorical Cross-Entropy
+```python
+# L = -Σ yᵢ log(ŷᵢ)
+dL/dŷᵢ = -yᵢ/ŷᵢ
+
+# Simplified with softmax: dL/dzᵢ = ŷᵢ - yᵢ
+```
+
+## Matrix Form (Batch Processing)
+
+For a batch of examples:
+
+```python
+# Forward pass
+Z1 = X @ W1.T + b1        # (batch_size, hidden_dim)
+A1 = relu(Z1)
+
+Z2 = A1 @ W2.T + b2       # (batch_size, output_dim)
+A2 = sigmoid(Z2)
+
+# Loss (averaged over batch)
+L = ((A2 - Y)**2).mean()
+
+# Backward pass
+dL_dZ2 = (A2 - Y) / batch_size
+dL_dW2 = dL_dZ2.T @ A1
+dL_db2 = dL_dZ2.sum(axis=0)
+
+dL_dA1 = dL_dZ2 @ W2
+dL_dZ1 = dL_dA1 * (Z1 > 0)
+dL_dW1 = dL_dZ1.T @ X
+dL_db1 = dL_dZ1.sum(axis=0)
+```
+
+![Matrix Backprop](matrix-backprop.png)
+
+## Common Challenges
+
+### 1. Vanishing Gradients
+
+**Problem:** Gradients become very small in deep networks
+
+```
+# With sigmoid, if all gradients are < 1:
+grad = 0.25 × 0.25 × 0.25 × ... → ≈ 0
+```
+
+**Solutions:**
+- Use ReLU instead of sigmoid/tanh
+- Batch normalization
+- Residual connections (skip connections)
+- Careful weight initialization
+
+### 2. Exploding Gradients
+
+**Problem:** Gradients become very large
+
+```
+# If weights are > 1:
+grad = 2 × 2 × 2 × ... → ∞
+```
+
+**Solutions:**
+- Gradient clipping
+- Smaller learning rate
+- Better weight initialization
+
+### 3. Dead ReLU
+
+**Problem:** ReLU neurons output 0 for all inputs (gradient always 0)
+
+**Solutions:**
+- Use Leaky ReLU or ELU
+- Lower learning rate
+- Better initialization
+
+## Computational Efficiency
+
+Why backpropagation is efficient:
+
+1. **Reuses Computations**
+   ```
+   ∂L/∂w⁽¹⁾ needs ∂L/∂a⁽²⁾
+   ∂L/∂w⁽²⁾ also needs ∂L/∂a⁽²⁾
+   → Compute once, use twice!
+   ```
+
+2. **One Backward Pass**
+   - Forward: O(n) operations
+   - Backward: O(n) operations
+   - Total: O(2n) ≈ O(n)
+
+3. **Automatic Differentiation**
+   - Modern frameworks (PyTorch, TensorFlow) do this automatically
+   - Just specify the loss, backprop is automatic!
+
+## PyTorch Example
+
+Here's how easy it is with PyTorch:
+
+```python
+import torch
+import torch.nn as nn
+
+# Define network
+model = nn.Sequential(
+    nn.Linear(2, 3),
+    nn.ReLU(),
+    nn.Linear(3, 1),
+    nn.Sigmoid()
+)
+
+# Forward pass
+x = torch.tensor([[2.0, 3.0]])
+y = torch.tensor([[1.0]])
+y_pred = model(x)
+
+# Compute loss
+loss = ((y_pred - y)**2).mean()
+
+# Backward pass (automatic!)
+loss.backward()
+
+# Gradients are computed automatically
+for name, param in model.named_parameters():
+    print(f"{name}: {param.grad}")
+```
+
+## Key Takeaways
+
+✅ Backpropagation efficiently computes gradients using the chain rule  
+✅ It works backward from output to input layer  
+✅ Each layer computes: gradients for weights + gradients for previous layer  
+✅ Modern frameworks automate this process  
+✅ Understanding it helps with debugging and designing better networks
+
+## What's Next?
+
+Now that we know how to compute gradients, we need to learn how to **use them effectively** to train neural networks. That's where **optimization algorithms** come in!
+
+Let's explore training and optimization next! 🚀
+
diff --git a/public/content/learn/neural-networks/building-a-layer/building-a-layer-content.md b/public/content/learn/neural-networks/building-a-layer/building-a-layer-content.md
new file mode 100644
index 0000000..074071a
--- /dev/null
+++ b/public/content/learn/neural-networks/building-a-layer/building-a-layer-content.md
@@ -0,0 +1,170 @@
+---
+hero:
+  title: "Building a Layer"
+  subtitle: "Creating Layers of Neurons"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 8 min read"
+---
+
+A layer is a collection of neurons that process inputs together. It's the fundamental unit of neural networks!
+
+![Layer Structure](/content/learn/neural-networks/building-a-layer/layer-structure.png)
+
+## What is a Layer?
+
+**Layer = Multiple neurons working in parallel**
+
+```python
+import torch.nn as nn
+
+# Single neuron
+neuron = nn.Linear(10, 1)  # 10 inputs → 1 output
+
+# Layer of 5 neurons
+layer = nn.Linear(10, 5)   # 10 inputs → 5 outputs
+
+# Each output is from a different neuron!
+```
+
+## Creating a Layer
+
+```python
+import torch
+import torch.nn as nn
+
+# Create layer: 3 inputs → 4 outputs
+layer = nn.Linear(in_features=3, out_features=4)
+
+# Test
+x = torch.tensor([[1.0, 2.0, 3.0]])  # 1 sample, 3 features
+output = layer(x)
+
+print(output.shape)  # torch.Size([1, 4])
+print(output)
+# tensor([[0.234, -1.123, 0.567, 2.134]], grad_fn=<AddmmBackward0>)
+# 4 different outputs!
+```
+
+**What happened:**
+
+```yaml
+4 neurons, each with:
+  - 3 weights (one per input)
+  - 1 bias
+
+Total parameters: 4×(3+1) = 16 parameters
+
+Each neuron computes:
+  neuron1: w1·x + b1
+  neuron2: w2·x + b2
+  neuron3: w3·x + b3
+  neuron4: w4·x + b4
+```
+
+## Layer with Activation
+
+```python
+class LayerWithActivation(nn.Module):
+    def __init__(self, in_features, out_features):
+        super().__init__()
+        self.linear = nn.Linear(in_features, out_features)
+        self.activation = nn.ReLU()
+    
+    def forward(self, x):
+        return self.activation(self.linear(x))
+
+# Use it
+layer = LayerWithActivation(10, 20)
+x = torch.randn(32, 10)  # Batch of 32
+output = layer(x)
+
+print(output.shape)  # torch.Size([32, 20])
+```
+
+## Multiple Layers
+
+```python
+# Stack layers together
+model = nn.Sequential(
+    nn.Linear(784, 256),
+    nn.ReLU(),
+    
+    nn.Linear(256, 128),
+    nn.ReLU(),
+    
+    nn.Linear(128, 10)
+)
+
+# Each layer transforms the data
+x = torch.randn(1, 784)
+print(x.shape)  # torch.Size([1, 784])
+
+x = model[0](x)  # First linear
+print(x.shape)  # torch.Size([1, 256])
+
+x = model[1](x)  # ReLU
+print(x.shape)  # torch.Size([1, 256])
+
+x = model[2](x)  # Second linear
+print(x.shape)  # torch.Size([1, 128])
+```
+
+## Custom Layer
+
+```python
+class CustomLayer(nn.Module):
+    def __init__(self, in_dim, out_dim):
+        super().__init__()
+        self.linear = nn.Linear(in_dim, out_dim)
+        self.norm = nn.BatchNorm1d(out_dim)
+        self.activation = nn.ReLU()
+        self.dropout = nn.Dropout(0.2)
+    
+    def forward(self, x):
+        x = self.linear(x)
+        x = self.norm(x)
+        x = self.activation(x)
+        x = self.dropout(x)
+        return x
+
+# Use custom layer
+layer = CustomLayer(100, 50)
+x = torch.randn(32, 100)
+output = layer(x)
+print(output.shape)  # torch.Size([32, 50])
+```
+
+## Key Takeaways
+
+✓ **Layer = Multiple neurons:** Process inputs in parallel
+
+✓ **nn.Linear(in, out):** Creates a layer
+
+✓ **Add activation:** After linear transformation
+
+✓ **Stack layers:** Build deep networks
+
+✓ **Custom layers:** Combine multiple operations
+
+**Quick Reference:**
+
+```python
+# Basic layer
+layer = nn.Linear(input_dim, output_dim)
+
+# Layer with activation
+layer = nn.Sequential(
+    nn.Linear(in_dim, out_dim),
+    nn.ReLU()
+)
+
+# Multi-layer network
+model = nn.Sequential(
+    nn.Linear(784, 128),
+    nn.ReLU(),
+    nn.Linear(128, 10)
+)
+```
+
+**Remember:** Layers are just multiple neurons working together! 🎉
diff --git a/public/content/learn/neural-networks/building-a-layer/layer-structure.png b/public/content/learn/neural-networks/building-a-layer/layer-structure.png
new file mode 100644
index 0000000..882eef3
Binary files /dev/null and b/public/content/learn/neural-networks/building-a-layer/layer-structure.png differ
diff --git a/public/content/learn/neural-networks/calculating-gradients/calculating-gradients-content.md b/public/content/learn/neural-networks/calculating-gradients/calculating-gradients-content.md
new file mode 100644
index 0000000..f9a1d82
--- /dev/null
+++ b/public/content/learn/neural-networks/calculating-gradients/calculating-gradients-content.md
@@ -0,0 +1,99 @@
+---
+hero:
+  title: "Calculating Gradients"
+  subtitle: "Understanding Gradient Computation"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 8 min read"
+---
+
+Gradients tell us **which direction** to adjust weights to reduce loss!
+
+## What is a Gradient?
+
+**Gradient = Rate of change of loss with respect to a parameter**
+
+```python
+import torch
+
+# Simple function: loss = w²
+w = torch.tensor([3.0], requires_grad=True)
+loss = w ** 2
+
+# Calculate gradient
+loss.backward()
+
+print(f"Weight: {w.item()}")
+print(f"Loss: {loss.item()}")
+print(f"Gradient: {w.grad.item()}")
+
+# Gradient = 2w = 2×3 = 6
+# This tells us: increasing w increases loss
+```
+
+## Computing Gradients in PyTorch
+
+```python
+import torch
+import torch.nn as nn
+
+# Model
+model = nn.Linear(3, 1)
+
+# Data
+x = torch.tensor([[1.0, 2.0, 3.0]])
+y_true = torch.tensor([[5.0]])
+
+# Forward pass
+y_pred = model(x)
+loss = (y_pred - y_true) ** 2
+
+# Compute gradients
+loss.backward()
+
+# Check gradients
+print("Weight gradients:", model.weight.grad)
+print("Bias gradient:", model.bias.grad)
+```
+
+## Gradient Descent Update
+
+```python
+# Manual gradient descent
+learning_rate = 0.01
+
+with torch.no_grad():
+    for param in model.parameters():
+        # Update: param = param - lr * gradient
+        param -= learning_rate * param.grad
+        
+        # Reset gradient
+        param.grad.zero_()
+```
+
+## Key Takeaways
+
+✓ **Gradient:** Direction and magnitude of change
+
+✓ **`.backward()`:** Computes all gradients
+
+✓ **Automatic:** PyTorch calculates for you
+
+✓ **Update rule:** param -= lr * gradient
+
+**Quick Reference:**
+
+```python
+# Compute gradients
+loss.backward()
+
+# Access gradients
+param.grad
+
+# Zero gradients
+optimizer.zero_grad()
+# or
+param.grad.zero_()
+```
+
+**Remember:** Gradients point the way to better weights! 🎉
diff --git a/public/content/learn/neural-networks/forward-propagation/forward-propagation-content.md b/public/content/learn/neural-networks/forward-propagation/forward-propagation-content.md
new file mode 100644
index 0000000..fdc24ff
--- /dev/null
+++ b/public/content/learn/neural-networks/forward-propagation/forward-propagation-content.md
@@ -0,0 +1,303 @@
+---
+hero:
+  title: "Forward Propagation"
+  subtitle: "How Data Flows Through Neural Networks"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 13 min read"
+---
+
+# Forward Propagation
+
+## What is Forward Propagation?
+
+Forward propagation is the process of passing input data through the neural network to get an output (prediction). It's called **"forward"** because data moves in one direction:
+
+```
+Input Layer → Hidden Layers → Output Layer
+```
+
+This is how neural networks make predictions!
+
+![Forward Propagation Flow](forward-prop-diagram.png)
+
+## The Process Step by Step
+
+### Step 1: Input Layer
+Receive the input features
+
+```python
+# Example: Image of handwritten digit
+x = [0.5, 0.8, 0.3, ...]  # Pixel values
+```
+
+### Step 2: Weighted Sum
+For each neuron in the next layer, calculate:
+
+```
+z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
+```
+
+Or in matrix form:
+```
+Z = WX + b
+```
+
+Where:
+- `W` = weight matrix
+- `X` = input vector
+- `b` = bias vector
+
+### Step 3: Activation Function
+Apply non-linear activation:
+
+```
+a = σ(z)  # e.g., ReLU(z) or sigmoid(z)
+```
+
+### Step 4: Repeat
+Use the outputs as inputs for the next layer, repeat steps 2-3 until reaching the output layer.
+
+## Mathematical Representation
+
+For a network with L layers:
+
+```
+Layer 1: a⁽¹⁾ = σ(W⁽¹⁾x + b⁽¹⁾)
+Layer 2: a⁽²⁾ = σ(W⁽²⁾a⁽¹⁾ + b⁽²⁾)
+...
+Layer L: a⁽ᴸ⁾ = σ(W⁽ᴸ⁾a⁽ᴸ⁻¹⁾ + b⁽ᴸ⁾)
+```
+
+The final output `a⁽ᴸ⁾` is our prediction!
+
+## Simple Example: 2-Layer Network
+
+Let's walk through a tiny network:
+
+**Network Architecture:**
+- Input: 2 features
+- Hidden layer: 3 neurons (ReLU)
+- Output: 1 neuron (Sigmoid)
+
+### Given:
+```
+Input: x = [2, 3]
+
+Hidden layer weights:
+W⁽¹⁾ = [[0.5, 0.3],
+        [0.2, 0.8],
+        [0.1, 0.6]]
+
+Hidden layer bias: b⁽¹⁾ = [0.1, 0.2, 0.3]
+
+Output layer weights: W⁽²⁾ = [[0.4, 0.5, 0.6]]
+Output layer bias: b⁽²⁾ = [0.1]
+```
+
+### Step-by-Step Calculation:
+
+**Hidden Layer (Layer 1):**
+
+Neuron 1:
+```
+z₁⁽¹⁾ = 0.5(2) + 0.3(3) + 0.1 = 2.0
+a₁⁽¹⁾ = ReLU(2.0) = 2.0
+```
+
+Neuron 2:
+```
+z₂⁽¹⁾ = 0.2(2) + 0.8(3) + 0.2 = 3.0
+a₂⁽¹⁾ = ReLU(3.0) = 3.0
+```
+
+Neuron 3:
+```
+z₃⁽¹⁾ = 0.1(2) + 0.6(3) + 0.3 = 2.3
+a₃⁽¹⁾ = ReLU(2.3) = 2.3
+```
+
+Hidden layer output: `a⁽¹⁾ = [2.0, 3.0, 2.3]`
+
+**Output Layer (Layer 2):**
+```
+z⁽²⁾ = 0.4(2.0) + 0.5(3.0) + 0.6(2.3) + 0.1 = 3.68
+a⁽²⁾ = sigmoid(3.68) ≈ 0.975
+```
+
+**Final Prediction: 0.975** (97.5% probability for class 1)
+
+![Example Network](forward-example.png)
+
+## Matrix Operations (Vectorized)
+
+For efficiency, we compute for all neurons at once:
+
+### Layer 1:
+```python
+import numpy as np
+
+# Input
+X = np.array([2, 3])
+
+# Layer 1
+W1 = np.array([[0.5, 0.3],
+               [0.2, 0.8],
+               [0.1, 0.6]])
+b1 = np.array([0.1, 0.2, 0.3])
+
+Z1 = W1 @ X + b1  # Matrix multiplication
+A1 = np.maximum(0, Z1)  # ReLU
+
+# Layer 2
+W2 = np.array([[0.4, 0.5, 0.6]])
+b2 = np.array([0.1])
+
+Z2 = W2 @ A1 + b2
+A2 = 1 / (1 + np.exp(-Z2))  # Sigmoid
+
+print(f"Prediction: {A2[0]:.3f}")
+# Output: Prediction: 0.975
+```
+
+## Batch Processing
+
+In practice, we process **multiple examples** simultaneously:
+
+```python
+# Batch of 3 examples
+X = np.array([[2, 3],
+              [1, 4],
+              [3, 2]])  # Shape: (3, 2)
+
+# Forward pass
+Z1 = X @ W1.T + b1  # Broadcasting handles bias
+A1 = np.maximum(0, Z1)
+
+Z2 = A1 @ W2.T + b2
+A2 = 1 / (1 + np.exp(-Z2))
+
+print(A2.shape)  # (3, 1) - predictions for 3 examples
+```
+
+## Activation Functions in Action
+
+Different activation functions transform data differently:
+
+### ReLU
+```python
+def relu(z):
+    return np.maximum(0, z)
+
+# Keeps positive values, zeros out negative
+relu([-2, -1, 0, 1, 2])  # [0, 0, 0, 1, 2]
+```
+
+### Sigmoid
+```python
+def sigmoid(z):
+    return 1 / (1 + np.exp(-z))
+
+# Squashes to (0, 1)
+sigmoid([-2, 0, 2])  # [0.119, 0.5, 0.881]
+```
+
+### Tanh
+```python
+def tanh(z):
+    return np.tanh(z)
+
+# Squashes to (-1, 1)
+tanh([-2, 0, 2])  # [-0.964, 0, 0.964]
+```
+
+![Activation Functions](activations-comparison.png)
+
+## Common Patterns
+
+### Classification (Softmax Output)
+For multi-class classification, use softmax in the output layer:
+
+```python
+def softmax(z):
+    exp_z = np.exp(z - np.max(z))  # Numerical stability
+    return exp_z / exp_z.sum()
+
+# Example: 3-class classification
+logits = np.array([2.0, 1.0, 0.1])
+probs = softmax(logits)
+# [0.659, 0.242, 0.099] - probabilities sum to 1
+```
+
+### Regression (Linear Output)
+For regression, no activation in output layer:
+
+```python
+# Final layer for regression
+output = W_last @ a_last + b_last
+# No activation - can output any real number
+```
+
+## Key Properties
+
+### Deterministic
+Same input + same weights = same output every time
+
+### Differentiable
+We can compute gradients (needed for backpropagation)
+
+### Composable
+Output of one layer is input to next - function composition
+
+### Efficient
+Matrix operations are highly optimized (GPUs!)
+
+## Debugging Forward Pass
+
+Common issues and solutions:
+
+### 1. Shape Mismatches
+```python
+# Check shapes at each layer
+print(f"Input shape: {X.shape}")
+print(f"W1 shape: {W1.shape}")
+print(f"Z1 shape: {Z1.shape}")
+```
+
+### 2. Numerical Overflow
+```python
+# For sigmoid/softmax, use numerical stability tricks
+# Bad:  exp(x) / sum(exp(x))
+# Good: exp(x - max(x)) / sum(exp(x - max(x)))
+```
+
+### 3. Wrong Activation
+```python
+# Make sure you use the right activation for each layer
+# Hidden: ReLU, Tanh
+# Output (classification): Sigmoid (binary), Softmax (multi-class)
+# Output (regression): None (linear)
+```
+
+## Implementation Tips
+
+✅ Use vectorized operations (NumPy/PyTorch)  
+✅ Process data in batches for efficiency  
+✅ Cache intermediate values (needed for backprop)  
+✅ Add assertions to check shapes  
+✅ Normalize inputs for stable training
+
+## What We've Learned
+
+🎯 Forward propagation transforms inputs into predictions  
+🎯 It's a series of weighted sums + activations  
+🎯 Matrix operations make it efficient  
+🎯 Different activations serve different purposes  
+🎯 The process is deterministic and differentiable  
+
+## Next Steps
+
+Forward propagation gets us predictions, but how does the network **learn**? That's where **backpropagation** comes in! It calculates how to adjust the weights to improve predictions.
+
+Let's dive into backpropagation next! 🎓
+
diff --git a/public/content/learn/neural-networks/implementing-a-network/implementing-a-network-content.md b/public/content/learn/neural-networks/implementing-a-network/implementing-a-network-content.md
new file mode 100644
index 0000000..51d57bb
--- /dev/null
+++ b/public/content/learn/neural-networks/implementing-a-network/implementing-a-network-content.md
@@ -0,0 +1,215 @@
+---
+hero:
+  title: "Implementing a Network"
+  subtitle: "Building Complete Neural Networks in PyTorch"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 10 min read"
+---
+
+Let's build complete, working neural networks from scratch!
+
+## Simple Feedforward Network
+
+```python
+import torch
+import torch.nn as nn
+
+class FeedForwardNet(nn.Module):
+    def __init__(self, input_size, hidden_size, output_size):
+        super().__init__()
+        self.fc1 = nn.Linear(input_size, hidden_size)
+        self.fc2 = nn.Linear(hidden_size, output_size)
+    
+    def forward(self, x):
+        x = torch.relu(self.fc1(x))
+        x = self.fc2(x)
+        return x
+
+# Create network
+model = FeedForwardNet(input_size=784, hidden_size=128, output_size=10)
+
+# Test
+x = torch.randn(32, 784)
+output = model(x)
+print(output.shape)  # torch.Size([32, 10])
+```
+
+## Complete Training Pipeline
+
+```python
+import torch
+import torch.nn as nn
+import torch.optim as optim
+
+# 1. Define model
+class Net(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.layers = nn.Sequential(
+            nn.Linear(10, 20),
+            nn.ReLU(),
+            nn.Linear(20, 1)
+        )
+    
+    def forward(self, x):
+        return self.layers(x)
+
+# 2. Create model, loss, optimizer
+model = Net()
+criterion = nn.MSELoss()
+optimizer = optim.Adam(model.parameters(), lr=0.001)
+
+# 3. Training loop
+def train(model, X_train, y_train, epochs=100):
+    for epoch in range(epochs):
+        # Forward
+        predictions = model(X_train)
+        loss = criterion(predictions, y_train)
+        
+        # Backward
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+        
+        if epoch % 20 == 0:
+            print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
+    
+    return model
+
+# 4. Train
+X = torch.randn(100, 10)
+y = torch.randn(100, 1)
+trained_model = train(model, X, y)
+```
+
+## Multi-Layer Deep Network
+
+```python
+class DeepNet(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.layer1 = nn.Linear(784, 512)
+        self.layer2 = nn.Linear(512, 256)
+        self.layer3 = nn.Linear(256, 128)
+        self.layer4 = nn.Linear(128, 10)
+        
+        self.dropout = nn.Dropout(0.2)
+    
+    def forward(self, x):
+        x = torch.relu(self.layer1(x))
+        x = self.dropout(x)
+        
+        x = torch.relu(self.layer2(x))
+        x = self.dropout(x)
+        
+        x = torch.relu(self.layer3(x))
+        x = self.dropout(x)
+        
+        x = self.layer4(x)
+        return x
+
+model = DeepNet()
+```
+
+## Complete MNIST Example
+
+```python
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from torch.utils.data import DataLoader, TensorDataset
+
+class MNISTNet(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.network = nn.Sequential(
+            nn.Linear(784, 128),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(128, 64),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(64, 10)
+        )
+    
+    def forward(self, x):
+        x = x.view(-1, 784)  # Flatten
+        return self.network(x)
+
+# Create model
+model = MNISTNet()
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.Adam(model.parameters(), lr=0.001)
+
+# Training function
+def train_epoch(model, dataloader, criterion, optimizer):
+    model.train()
+    total_loss = 0
+    
+    for batch_x, batch_y in dataloader:
+        # Forward
+        outputs = model(batch_x)
+        loss = criterion(outputs, batch_y)
+        
+        # Backward
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+        
+        total_loss += loss.item()
+    
+    return total_loss / len(dataloader)
+
+# Evaluation function
+def evaluate(model, dataloader):
+    model.eval()
+    correct = 0
+    total = 0
+    
+    with torch.no_grad():
+        for batch_x, batch_y in dataloader:
+            outputs = model(batch_x)
+            predictions = torch.argmax(outputs, dim=1)
+            correct += (predictions == batch_y).sum().item()
+            total += batch_y.size(0)
+    
+    return correct / total
+```
+
+## Key Takeaways
+
+✓ **Structure:** Define model as `nn.Module`
+
+✓ **Forward:** Implement `forward()` method
+
+✓ **Training:** Forward → loss → backward → update
+
+✓ **Complete pipeline:** Model + criterion + optimizer
+
+**Quick Reference:**
+
+```python
+# Define
+class MyNet(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.layers = nn.Sequential(...)
+    
+    def forward(self, x):
+        return self.layers(x)
+
+# Train
+model = MyNet()
+optimizer = optim.Adam(model.parameters())
+criterion = nn.CrossEntropyLoss()
+
+for epoch in range(epochs):
+    pred = model(x)
+    loss = criterion(pred, y)
+    optimizer.zero_grad()
+    loss.backward()
+    optimizer.step()
+```
+
+**Remember:** You can now build any neural network! 🎉
diff --git a/public/content/learn/neural-networks/implementing-backpropagation/implementing-backpropagation-content.md b/public/content/learn/neural-networks/implementing-backpropagation/implementing-backpropagation-content.md
new file mode 100644
index 0000000..289bdee
--- /dev/null
+++ b/public/content/learn/neural-networks/implementing-backpropagation/implementing-backpropagation-content.md
@@ -0,0 +1,97 @@
+---
+hero:
+  title: "Implementing Backpropagation"
+  subtitle: "Coding the Backward Pass"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 10 min read"
+---
+
+Backpropagation is how neural networks **learn**. It calculates gradients for all weights efficiently!
+
+## The Algorithm
+
+**Backpropagation:**
+1. Forward pass: Compute predictions
+2. Compute loss
+3. Backward pass: Compute gradients (chain rule)
+4. Update weights
+
+```python
+import torch
+import torch.nn as nn
+
+model = nn.Sequential(nn.Linear(10, 5), nn.ReLU(), nn.Linear(5, 1))
+optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
+criterion = nn.MSELoss()
+
+# Training step
+def train_step(x, y):
+    # 1. Forward pass
+    predictions = model(x)
+    
+    # 2. Compute loss
+    loss = criterion(predictions, y)
+    
+    # 3. Backward pass (backpropagation!)
+    optimizer.zero_grad()
+    loss.backward()
+    
+    # 4. Update weights
+    optimizer.step()
+    
+    return loss.item()
+
+# Train
+x = torch.randn(32, 10)
+y = torch.randn(32, 1)
+loss = train_step(x, y)
+print(f"Loss: {loss:.4f}")
+```
+
+## Manual Backpropagation
+
+```python
+import torch
+
+# Simple network: y = w2 * relu(w1 * x)
+x = torch.tensor([2.0], requires_grad=True)
+w1 = torch.tensor([0.5], requires_grad=True)
+w2 = torch.tensor([1.5], requires_grad=True)
+
+# Forward
+z1 = w1 * x
+a1 = torch.relu(z1)
+y = w2 * a1
+
+# Target
+target = torch.tensor([3.0])
+loss = (y - target) ** 2
+
+# Backward (automatic)
+loss.backward()
+
+print(f"dL/dw1: {w1.grad.item()}")
+print(f"dL/dw2: {a1.item()}")
+```
+
+## Key Takeaways
+
+✓ **Backprop:** Efficiently computes all gradients
+
+✓ **Chain rule:** Applied automatically by PyTorch
+
+✓ **Three steps:** forward → backward → update
+
+✓ **`.backward()`:** Does all the work!
+
+**Quick Reference:**
+
+```python
+# Standard training step
+optimizer.zero_grad()  # Clear old gradients
+loss.backward()         # Compute gradients
+optimizer.step()        # Update weights
+```
+
+**Remember:** Backpropagation = automatic gradient calculation! 🎉
diff --git a/public/content/learn/neural-networks/introduction/introduction-content.md b/public/content/learn/neural-networks/introduction/introduction-content.md
new file mode 100644
index 0000000..206e239
--- /dev/null
+++ b/public/content/learn/neural-networks/introduction/introduction-content.md
@@ -0,0 +1,207 @@
+---
+hero:
+  title: "Introduction to Neural Networks"
+  subtitle: "Building Intelligent Systems from Scratch"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 15 min read"
+---
+
+# Introduction to Neural Networks
+
+## What is a Neural Network?
+
+A neural network is a **computational model** inspired by the way biological neural networks in the human brain work. It consists of interconnected nodes (neurons) organized in layers that process information.
+
+Think of it as a **function approximator** that learns patterns from data!
+
+![Neural Network Architecture](neural-network-diagram.png)
+
+## The Biological Inspiration
+
+Just like neurons in your brain:
+- Receive signals from multiple sources (dendrites)
+- Process the information (cell body)
+- Fire a signal if threshold is exceeded (axon)
+
+Artificial neurons work similarly:
+- Receive weighted inputs from previous layer
+- Sum them up and add bias
+- Apply activation function
+- Send output to next layer
+
+## Basic Architecture
+
+A typical neural network has **three types of layers**:
+
+### Input Layer
+- Receives the raw data (features)
+- One neuron per feature
+- No computation happens here
+
+**Example:** For a 28x28 grayscale image: 784 input neurons (28 × 28)
+
+### Hidden Layer(s)
+- Performs computations
+- Extracts features from the data
+- Can have multiple hidden layers (deep learning!)
+
+**The more layers:**
+- More complex patterns can be learned
+- But also harder to train
+
+### Output Layer
+- Produces the final prediction
+- Number of neurons depends on the task:
+  - 1 neuron: binary classification or regression
+  - N neurons: N-class classification
+
+![Layer Types](layer-types.png)
+
+## How Does a Single Neuron Work?
+
+Each neuron performs a simple operation:
+
+```
+1. Weighted Sum: z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
+2. Activation:   a = σ(z)
+3. Output:       a becomes input to next layer
+```
+
+**Example calculation:**
+```
+Inputs: x = [2, 3]
+Weights: w = [0.5, 0.3]
+Bias: b = 0.1
+
+Step 1: z = 0.5(2) + 0.3(3) + 0.1 = 2.0
+Step 2: a = ReLU(2.0) = 2.0
+Step 3: Output = 2.0
+```
+
+## The Learning Process
+
+Neural networks learn through **supervised learning**:
+
+### 1. Initialize
+Start with random weights and biases
+
+### 2. Forward Pass
+Pass data through the network to get predictions
+
+### 3. Calculate Loss
+Measure how wrong the predictions are
+
+```
+Loss = (prediction - actual)²
+```
+
+### 4. Backward Pass (Backpropagation)
+Calculate gradients: how much each weight contributed to the error
+
+### 5. Update Weights
+Adjust weights in the direction that reduces loss
+
+```
+w_new = w_old - learning_rate × gradient
+```
+
+### 6. Repeat
+Do this for many iterations (epochs) until the model performs well
+
+![Training Process](training-process.png)
+
+## Types of Neural Networks
+
+### Feedforward Neural Networks (FNN)
+- Information flows in one direction: input → hidden → output
+- Used for: tabular data, simple classification
+
+### Convolutional Neural Networks (CNN)
+- Specialized for image data
+- Uses filters to detect features
+- Used for: computer vision, image classification
+
+### Recurrent Neural Networks (RNN)
+- Has memory of previous inputs
+- Used for: time series, text, speech
+
+### Transformers
+- Attention mechanism
+- Used for: language models (GPT, BERT), machine translation
+
+## Real-World Applications
+
+| Domain | Application | Network Type |
+|--------|------------|--------------|
+| 🖼️ Computer Vision | Image classification, object detection | CNN |
+| 💬 NLP | Chatbots, translation, text generation | Transformer |
+| 🎵 Audio | Speech recognition, music generation | RNN, Transformer |
+| 🎮 Gaming | Game AI, reinforcement learning | Deep Q-Networks |
+| 🏥 Healthcare | Disease diagnosis, drug discovery | CNN, FNN |
+| 💰 Finance | Fraud detection, stock prediction | FNN, LSTM |
+
+## Why Neural Networks Work
+
+### Universal Approximation Theorem
+With enough neurons and the right activation functions, a neural network can approximate **any continuous function**!
+
+### Feature Learning
+Unlike traditional ML, neural networks **automatically learn** the important features from raw data. No manual feature engineering needed!
+
+### Scalability
+Neural networks get better with:
+- More data
+- More compute
+- Better architectures
+
+![Network Depth vs Performance](depth-vs-performance.png)
+
+## Key Components Summary
+
+| Component | Purpose |
+|-----------|---------|
+| **Weights (w)** | Parameters to learn, control signal strength |
+| **Bias (b)** | Shifts the activation function |
+| **Activation Function** | Introduces non-linearity |
+| **Loss Function** | Measures prediction error |
+| **Optimizer** | Updates weights to minimize loss |
+
+## Challenges and Solutions
+
+### 1. Overfitting
+**Problem:** Model memorizes training data  
+**Solution:** Dropout, regularization, more data
+
+### 2. Vanishing Gradients
+**Problem:** Gradients become too small in deep networks  
+**Solution:** ReLU activation, batch normalization, skip connections
+
+### 3. Slow Training
+**Problem:** Takes too long to converge  
+**Solution:** Better optimizers (Adam), GPU acceleration, batch processing
+
+### 4. Need Lots of Data
+**Problem:** Neural networks are data-hungry  
+**Solution:** Transfer learning, data augmentation, synthetic data
+
+## Getting Started Checklist
+
+Before building your first neural network, you should understand:
+
+- ✅ Linear algebra (matrices, vectors)
+- ✅ Calculus (derivatives, chain rule)
+- ✅ Probability basics
+- ✅ Programming (Python recommended)
+- ✅ Framework basics (PyTorch or TensorFlow)
+
+## What's Next?
+
+Now that you understand the basics, we'll dive deeper into:
+
+1. **Forward Propagation** - How data flows through the network
+2. **Backpropagation** - How the network learns
+3. **Training & Optimization** - How to train networks effectively
+
+Let's continue the journey! 🚀
+
diff --git a/public/content/learn/neural-networks/the-chain-rule/the-chain-rule-content.md b/public/content/learn/neural-networks/the-chain-rule/the-chain-rule-content.md
new file mode 100644
index 0000000..a90a5bc
--- /dev/null
+++ b/public/content/learn/neural-networks/the-chain-rule/the-chain-rule-content.md
@@ -0,0 +1,132 @@
+---
+hero:
+  title: "The Chain Rule"
+  subtitle: "The Math Behind Backpropagation"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 8 min read"
+---
+
+The chain rule is how we calculate gradients through multiple layers. It's the secret sauce of backpropagation!
+
+## The Basic Idea
+
+**Chain rule: Multiply gradients as you go backwards through layers**
+
+```yaml
+If y = f(g(x)), then:
+dy/dx = (dy/dg) × (dg/dx)
+
+In words: Multiply the gradients of each function
+```
+
+## Simple Example
+
+```python
+import torch
+
+# y = (x + 2)²
+x = torch.tensor([3.0], requires_grad=True)
+
+# Break it down:
+# g = x + 2
+# y = g²
+
+g = x + 2
+y = g ** 2
+
+# Backward pass
+y.backward()
+
+print(f"x = {x.item()}")
+print(f"g = {g.item()}")
+print(f"y = {y.item()}")
+print(f"dy/dx = {x.grad.item()}")
+
+# Manual:
+# dy/dg = 2g = 2×5 = 10
+# dg/dx = 1
+# dy/dx = 10×1 = 10 ✓
+```
+
+## In Neural Networks
+
+```python
+import torch
+import torch.nn as nn
+
+# Two-layer network
+model = nn.Sequential(
+    nn.Linear(1, 1),  # Layer 1
+    nn.ReLU(),
+    nn.Linear(1, 1)   # Layer 2
+)
+
+x = torch.tensor([[2.0]])
+y_true = torch.tensor([[10.0]])
+
+# Forward
+y_pred = model(x)
+loss = (y_pred - y_true) ** 2
+
+# Backward (chain rule applied automatically!)
+loss.backward()
+
+# Gradients computed through both layers
+for name, param in model.named_parameters():
+    print(f"{name}: gradient = {param.grad}")
+```
+
+**What happens:**
+
+```yaml
+Forward:
+  x → Layer1 → ReLU → Layer2 → prediction → loss
+
+Backward (chain rule):
+  dloss/dprediction → dLayer2 → dReLU → dLayer1 → dx
+  
+Each gradient multiplies with the next!
+```
+
+## Why It Works
+
+```yaml
+Loss depends on layer 2 output
+Layer 2 output depends on ReLU output  
+ReLU output depends on layer 1 output
+Layer 1 output depends on weights
+
+So: Loss depends on weights (through chain)!
+
+Chain rule connects them:
+dLoss/dWeight = dLoss/dOutput × dOutput/dWeight
+```
+
+## PyTorch Does It For You
+
+```python
+import torch
+
+# Complex computation
+x = torch.tensor([2.0], requires_grad=True)
+y = ((x ** 2 + 3) * torch.sin(x)) ** 3
+
+# PyTorch applies chain rule automatically!
+y.backward()
+
+print(f"Gradient: {x.grad.item()}")
+# Calculated using chain rule through all operations!
+```
+
+## Key Takeaways
+
+✓ **Chain rule:** Multiply gradients backwards
+
+✓ **Backpropagation:** Applies chain rule through network
+
+✓ **Automatic:** PyTorch does it for you
+
+✓ **Essential:** Makes training deep networks possible
+
+**Remember:** Chain rule lets us train deep networks by connecting all the gradients! 🎉
diff --git a/public/content/learn/neural-networks/training/training-content.md b/public/content/learn/neural-networks/training/training-content.md
new file mode 100644
index 0000000..5bb2d2a
--- /dev/null
+++ b/public/content/learn/neural-networks/training/training-content.md
@@ -0,0 +1,581 @@
+---
+hero:
+  title: "Training & Optimization"
+  subtitle: "Making Neural Networks Learn Effectively"
+  tags:
+    - "🧠 Neural Networks"
+    - "⏱️ 16 min read"
+---
+
+# Training & Optimization
+
+## The Training Process
+
+Training a neural network is an iterative process of adjusting weights to minimize the loss function. The goal is to find the optimal set of parameters that make accurate predictions on both **training and unseen data**.
+
+![Training Loop](training-loop.png)
+
+## Gradient Descent: The Foundation
+
+Gradient descent is the fundamental optimization algorithm:
+
+```
+1. Start with random weights
+2. Calculate loss on data
+3. Compute gradients (how to adjust weights)
+4. Update weights in opposite direction of gradient
+5. Repeat until convergence
+```
+
+### Mathematical Formula
+
+```
+θ_new = θ_old - α · ∇L(θ)
+```
+
+Where:
+- `θ` = parameters (weights and biases)
+- `α` = learning rate
+- `∇L` = gradient of loss
+
+Think of it as **rolling down a hill** to find the lowest point (minimum loss)!
+
+![Gradient Descent](gradient-descent.png)
+
+## Variants of Gradient Descent
+
+### 1. Batch Gradient Descent
+
+Uses **entire dataset** for each update:
+
+```python
+for epoch in range(num_epochs):
+    # Compute loss on ALL data
+    predictions = forward_pass(X_train, weights)
+    loss = compute_loss(predictions, y_train)
+    
+    # Compute gradients using all data
+    gradients = backward_pass(X_train, y_train, weights)
+    
+    # Single update per epoch
+    weights -= learning_rate * gradients
+```
+
+**Pros:**
+- ✅ Stable updates
+- ✅ Guaranteed convergence (for convex problems)
+
+**Cons:**
+- ❌ Very slow for large datasets
+- ❌ Requires entire dataset in memory
+- ❌ Can get stuck in local minima
+
+### 2. Stochastic Gradient Descent (SGD)
+
+Updates weights after **each training example**:
+
+```python
+for epoch in range(num_epochs):
+    # Shuffle data
+    indices = np.random.permutation(len(X_train))
+    
+    for i in indices:
+        # Use single example
+        x, y = X_train[i], y_train[i]
+        
+        prediction = forward_pass(x, weights)
+        loss = compute_loss(prediction, y)
+        gradients = backward_pass(x, y, weights)
+        
+        # Update after each example
+        weights -= learning_rate * gradients
+```
+
+**Pros:**
+- ✅ Much faster iterations
+- ✅ Can escape local minima (noise helps!)
+- ✅ Works with large datasets
+
+**Cons:**
+- ❌ Noisy updates
+- ❌ Can oscillate around minimum
+- ❌ Harder to parallelize
+
+### 3. Mini-Batch Gradient Descent ⭐ (Most Popular)
+
+Best of both worlds! Uses **small batches** (32, 64, 128, 256):
+
+```python
+batch_size = 64
+
+for epoch in range(num_epochs):
+    # Shuffle data
+    indices = np.random.permutation(len(X_train))
+    
+    for i in range(0, len(X_train), batch_size):
+        # Get batch
+        batch_indices = indices[i:i+batch_size]
+        X_batch = X_train[batch_indices]
+        y_batch = y_train[batch_indices]
+        
+        # Forward pass on batch
+        predictions = forward_pass(X_batch, weights)
+        loss = compute_loss(predictions, y_batch)
+        
+        # Backward pass on batch
+        gradients = backward_pass(X_batch, y_batch, weights)
+        
+        # Update weights
+        weights -= learning_rate * gradients
+```
+
+**Pros:**
+- ✅ Good balance between speed and stability
+- ✅ Efficient GPU utilization
+- ✅ More stable than SGD
+- ✅ Faster than batch GD
+
+**Cons:**
+- ❌ One more hyperparameter (batch size)
+
+![GD Variants Comparison](gd-variants.png)
+
+## Advanced Optimizers
+
+### 1. Momentum 🏃
+
+Accumulates a **velocity** term to accelerate in consistent directions:
+
+```python
+velocity = 0
+beta = 0.9  # momentum coefficient
+
+for epoch in range(num_epochs):
+    gradients = compute_gradients()
+    
+    # Update velocity
+    velocity = beta * velocity + (1 - beta) * gradients
+    
+    # Update weights using velocity
+    weights -= learning_rate * velocity
+```
+
+**Why it works:**
+- Accelerates in valleys
+- Dampens oscillations
+- Helps escape plateaus
+
+**Analogy:** A ball rolling down a hill gains momentum!
+
+### 2. RMSprop
+
+Adapts learning rate **per parameter** based on recent gradients:
+
+```python
+cache = 0
+beta = 0.9
+
+for epoch in range(num_epochs):
+    gradients = compute_gradients()
+    
+    # Update cache (exponential moving average of squared gradients)
+    cache = beta * cache + (1 - beta) * gradients**2
+    
+    # Update weights with adaptive learning rate
+    weights -= learning_rate * gradients / (np.sqrt(cache) + 1e-8)
+```
+
+**Why it works:**
+- Different learning rates for each parameter
+- Larger steps for parameters with small gradients
+- Smaller steps for parameters with large gradients
+
+**Great for:** Recurrent neural networks
+
+### 3. Adam (Adaptive Moment Estimation) ⭐ (Most Popular)
+
+Combines **momentum** and **RMSprop**:
+
+```python
+m = 0  # First moment (mean)
+v = 0  # Second moment (variance)
+beta1 = 0.9
+beta2 = 0.999
+
+for epoch in range(num_epochs):
+    gradients = compute_gradients()
+    
+    # Update moments
+    m = beta1 * m + (1 - beta1) * gradients
+    v = beta2 * v + (1 - beta2) * gradients**2
+    
+    # Bias correction
+    m_hat = m / (1 - beta1**epoch)
+    v_hat = v / (1 - beta2**epoch)
+    
+    # Update weights
+    weights -= learning_rate * m_hat / (np.sqrt(v_hat) + 1e-8)
+```
+
+**Why it works:**
+- Combines best of momentum and RMSprop
+- Adaptive learning rates
+- Bias correction for early iterations
+- Works well in practice
+
+**Default choice** for most deep learning tasks!
+
+![Optimizers Comparison](optimizers-comparison.png)
+
+## Learning Rate Strategies
+
+### 1. Fixed Learning Rate
+```python
+learning_rate = 0.001  # Constant throughout training
+```
+
+Simple but often suboptimal.
+
+### 2. Learning Rate Decay
+
+Gradually reduce learning rate:
+
+```python
+# Step decay
+initial_lr = 0.01
+drop_rate = 0.5
+epochs_drop = 10
+
+lr = initial_lr * (drop_rate ** (epoch // epochs_drop))
+
+# Exponential decay
+lr = initial_lr * np.exp(-decay_rate * epoch)
+
+# 1/t decay
+lr = initial_lr / (1 + decay_rate * epoch)
+```
+
+### 3. Learning Rate Scheduling
+
+```python
+# Cosine annealing
+import math
+
+def cosine_schedule(epoch, total_epochs, lr_max, lr_min=0):
+    return lr_min + 0.5 * (lr_max - lr_min) * (
+        1 + math.cos(math.pi * epoch / total_epochs)
+    )
+```
+
+### 4. Warm-up + Decay
+
+```python
+def lr_schedule(epoch, warmup_epochs=5, initial_lr=0.001):
+    if epoch < warmup_epochs:
+        # Linear warm-up
+        return initial_lr * (epoch / warmup_epochs)
+    else:
+        # Cosine decay
+        return cosine_schedule(
+            epoch - warmup_epochs,
+            total_epochs - warmup_epochs,
+            initial_lr
+        )
+```
+
+![Learning Rate Schedules](lr-schedules.png)
+
+## Key Hyperparameters
+
+### 1. Learning Rate (α)
+
+**Most important hyperparameter!**
+
+```python
+# Too high: divergence
+lr = 1.0  # Loss explodes ❌
+
+# Too low: very slow training
+lr = 0.00001  # Takes forever ❌
+
+# Just right: fast and stable
+lr = 0.001  # Good starting point ✅
+```
+
+**Finding the right learning rate:**
+- Start with 0.001 or 0.0001
+- Use learning rate finder
+- Monitor training loss
+
+### 2. Batch Size
+
+```python
+# Small batches (8-32)
+# + More noise → can escape local minima
+# - Slower, less stable
+
+# Medium batches (64-128) ⭐
+# + Good balance
+# + Efficient GPU usage
+
+# Large batches (256-1024)
+# + Faster training (fewer updates)
+# + More stable
+# - Can lead to poor generalization
+# - Requires more memory
+```
+
+**Rule of thumb:** Start with 32 or 64
+
+### 3. Number of Epochs
+
+```python
+# Too few epochs
+epochs = 5  # Underfitting ❌
+
+# Too many epochs
+epochs = 1000  # Overfitting ❌
+
+# Use early stopping ✅
+best_loss = float('inf')
+patience = 10
+counter = 0
+
+for epoch in range(max_epochs):
+    val_loss = validate()
+    
+    if val_loss < best_loss:
+        best_loss = val_loss
+        counter = 0
+        save_model()
+    else:
+        counter += 1
+        
+    if counter >= patience:
+        print("Early stopping!")
+        break
+```
+
+### 4. Optimizer Parameters
+
+```python
+# Adam parameters
+optimizer = Adam(
+    learning_rate=0.001,  # Step size
+    beta1=0.9,            # Momentum decay (usually 0.9)
+    beta2=0.999,          # RMSprop decay (usually 0.999)
+    epsilon=1e-8          # Numerical stability
+)
+
+# SGD with momentum
+optimizer = SGD(
+    learning_rate=0.01,
+    momentum=0.9          # Usually 0.9 or 0.95
+)
+```
+
+## Training Best Practices
+
+### 1. Data Preparation
+```python
+# Normalize inputs
+X = (X - X.mean()) / X.std()
+
+# Or use min-max scaling
+X = (X - X.min()) / (X.max() - X.min())
+```
+
+### 2. Weight Initialization
+```python
+# Xavier/Glorot initialization (for sigmoid/tanh)
+W = np.random.randn(n_in, n_out) * np.sqrt(1 / n_in)
+
+# He initialization (for ReLU)
+W = np.random.randn(n_in, n_out) * np.sqrt(2 / n_in)
+```
+
+### 3. Regularization
+```python
+# L2 regularization (weight decay)
+loss = mse_loss + lambda_reg * np.sum(weights**2)
+
+# Dropout (randomly zero out neurons)
+if training:
+    mask = (np.random.rand(*activations.shape) > dropout_rate)
+    activations = activations * mask / (1 - dropout_rate)
+```
+
+### 4. Batch Normalization
+```python
+# Normalize activations in each layer
+z_norm = (z - z.mean()) / np.sqrt(z.var() + epsilon)
+z_scaled = gamma * z_norm + beta  # Learnable parameters
+```
+
+### 5. Monitoring Training
+
+```python
+history = {
+    'train_loss': [],
+    'val_loss': [],
+    'train_acc': [],
+    'val_acc': []
+}
+
+for epoch in range(num_epochs):
+    # Training
+    train_loss, train_acc = train_epoch()
+    history['train_loss'].append(train_loss)
+    history['train_acc'].append(train_acc)
+    
+    # Validation
+    val_loss, val_acc = validate()
+    history['val_loss'].append(val_loss)
+    history['val_acc'].append(val_acc)
+    
+    # Check for overfitting
+    if val_loss > train_loss * 1.2:
+        print("Warning: Possible overfitting!")
+```
+
+![Training Curves](training-curves.png)
+
+## Common Issues and Solutions
+
+### 1. Loss Not Decreasing
+**Problem:** Loss stays constant or increases
+
+**Solutions:**
+- ✅ Check learning rate (try 0.001, 0.0001)
+- ✅ Verify data preprocessing
+- ✅ Check for bugs in forward/backward pass
+- ✅ Try different weight initialization
+
+### 2. Training Loss Decreases, Validation Loss Increases
+**Problem:** Overfitting
+
+**Solutions:**
+- ✅ Add regularization (L2, dropout)
+- ✅ Reduce model complexity
+- ✅ Get more training data
+- ✅ Use data augmentation
+- ✅ Early stopping
+
+### 3. Loss Explodes (NaN)
+**Problem:** Numerical instability
+
+**Solutions:**
+- ✅ Lower learning rate
+- ✅ Use gradient clipping
+- ✅ Check for division by zero
+- ✅ Use batch normalization
+
+### 4. Training Too Slow
+**Problem:** Takes forever to converge
+
+**Solutions:**
+- ✅ Increase learning rate
+- ✅ Use Adam instead of SGD
+- ✅ Increase batch size
+- ✅ Use GPU/TPU acceleration
+
+## Complete Training Example
+
+```python
+import numpy as np
+
+# Hyperparameters
+learning_rate = 0.001
+batch_size = 64
+num_epochs = 100
+patience = 10
+
+# Initialize optimizer
+m = v = 0
+beta1, beta2 = 0.9, 0.999
+
+# Training loop
+best_val_loss = float('inf')
+patience_counter = 0
+
+for epoch in range(num_epochs):
+    # Shuffle training data
+    indices = np.random.permutation(len(X_train))
+    
+    epoch_loss = 0
+    num_batches = 0
+    
+    # Mini-batch training
+    for i in range(0, len(X_train), batch_size):
+        # Get batch
+        batch_idx = indices[i:i+batch_size]
+        X_batch = X_train[batch_idx]
+        y_batch = y_train[batch_idx]
+        
+        # Forward pass
+        y_pred = forward(X_batch, weights)
+        loss = compute_loss(y_pred, y_batch)
+        
+        # Backward pass
+        grads = backward(X_batch, y_batch, weights)
+        
+        # Adam optimizer
+        m = beta1 * m + (1 - beta1) * grads
+        v = beta2 * v + (1 - beta2) * grads**2
+        m_hat = m / (1 - beta1**(epoch+1))
+        v_hat = v / (1 - beta2**(epoch+1))
+        
+        # Update weights
+        weights -= learning_rate * m_hat / (np.sqrt(v_hat) + 1e-8)
+        
+        epoch_loss += loss
+        num_batches += 1
+    
+    # Validation
+    val_loss = validate(X_val, y_val, weights)
+    
+    # Early stopping
+    if val_loss < best_val_loss:
+        best_val_loss = val_loss
+        save_weights(weights)
+        patience_counter = 0
+    else:
+        patience_counter += 1
+        
+    if patience_counter >= patience:
+        print(f"Early stopping at epoch {epoch}")
+        break
+    
+    # Print progress
+    avg_train_loss = epoch_loss / num_batches
+    print(f"Epoch {epoch}: Train Loss = {avg_train_loss:.4f}, "
+          f"Val Loss = {val_loss:.4f}")
+```
+
+## Key Takeaways
+
+✅ Gradient descent is the foundation of neural network training  
+✅ Mini-batch GD provides the best balance of speed and stability  
+✅ Adam is the go-to optimizer for most tasks  
+✅ Learning rate is the most important hyperparameter  
+✅ Monitor both training and validation metrics  
+✅ Use regularization to prevent overfitting  
+✅ Early stopping saves time and prevents overfitting
+
+## Congratulations! 🎉
+
+You've completed the Neural Networks from Scratch course! You now understand:
+
+- The mathematical foundations (derivatives, functions)
+- How neural networks process information (forward propagation)
+- How they learn (backpropagation)
+- How to train them effectively (optimization)
+
+**Next steps:**
+- Implement a neural network from scratch in Python
+- Try different architectures (CNN, RNN, Transformer)
+- Work on real projects and datasets
+- Explore advanced topics (attention mechanisms, GANs, etc.)
+
+Keep learning and building! 🚀
+
diff --git a/public/content/learn/neuron-from-scratch/building-a-neuron-in-python/building-a-neuron-in-python-content.md b/public/content/learn/neuron-from-scratch/building-a-neuron-in-python/building-a-neuron-in-python-content.md
new file mode 100644
index 0000000..b6a5c21
--- /dev/null
+++ b/public/content/learn/neuron-from-scratch/building-a-neuron-in-python/building-a-neuron-in-python-content.md
@@ -0,0 +1,312 @@
+---
+hero:
+  title: "Building a Neuron in Python"
+  subtitle: "Implementing a Neuron from Scratch"
+  tags:
+    - "🧠 Neuron"
+    - "⏱️ 10 min read"
+---
+
+Let's build a complete, working neuron from scratch using pure Python and PyTorch!
+
+![Neuron Code](/content/learn/neuron-from-scratch/building-a-neuron-in-python/neuron-code.png)
+
+## Simple Neuron Class
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+class Neuron(nn.Module):
+    def __init__(self, num_inputs):
+        super().__init__()
+        self.linear = nn.Linear(num_inputs, 1)
+        self.activation = nn.Sigmoid()
+    
+    def forward(self, x):
+        # Linear step
+        z = self.linear(x)
+        
+        # Activation
+        output = self.activation(z)
+        
+        return output
+
+# Create neuron with 3 inputs
+neuron = Neuron(num_inputs=3)
+
+# Make prediction
+x = torch.tensor([[1.0, 2.0, 3.0]])
+prediction = neuron(x)
+
+print(prediction)
+# tensor([[0.6789]], grad_fn=<SigmoidBackward0>)
+```
+
+## Complete Training Example
+
+```python
+import torch
+import torch.nn as nn
+import torch.optim as optim
+
+# Create neuron
+neuron = Neuron(num_inputs=2)
+
+# Training data (AND gate)
+X = torch.tensor([[0.0, 0.0],
+                  [0.0, 1.0],
+                  [1.0, 0.0],
+                  [1.0, 1.0]])
+
+y = torch.tensor([[0.0],
+                  [0.0],
+                  [0.0],
+                  [1.0]])
+
+# Loss and optimizer
+criterion = nn.BCELoss()
+optimizer = optim.SGD(neuron.parameters(), lr=0.5)
+
+# Training loop
+for epoch in range(1000):
+    # Forward pass
+    predictions = neuron(X)
+    
+    # Calculate loss
+    loss = criterion(predictions, y)
+    
+    # Backward pass
+    optimizer.zero_grad()
+    loss.backward()
+    
+    # Update weights
+    optimizer.step()
+    
+    if epoch % 200 == 0:
+        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
+
+# Test the trained neuron
+print("\\nTrained neuron predictions:")
+with torch.no_grad():
+    for i, (input_vals, target_val) in enumerate(zip(X, y)):
+        pred = neuron(input_vals.unsqueeze(0))
+        print(f"{input_vals.tolist()} → {pred.item():.3f} (target: {target_val.item()})")
+```
+
+## From Scratch (No nn.Linear)
+
+Build a neuron with just tensors:
+
+```python
+import torch
+
+class ManualNeuron:
+    def __init__(self, num_inputs):
+        # Initialize weights and bias randomly
+        self.weights = torch.randn(num_inputs, requires_grad=True)
+        self.bias = torch.randn(1, requires_grad=True)
+    
+    def forward(self, x):
+        # Linear step: w·x + b
+        z = torch.dot(self.weights, x) + self.bias
+        
+        # Activation: sigmoid
+        output = 1 / (1 + torch.exp(-z))
+        
+        return output
+    
+    def parameters(self):
+        return [self.weights, self.bias]
+
+# Create and test
+neuron = ManualNeuron(num_inputs=3)
+x = torch.tensor([1.0, 2.0, 3.0])
+output = neuron.forward(x)
+
+print(output)
+# tensor([0.7234], grad_fn=<MulBackward0>)
+```
+
+## Training From Scratch
+
+```python
+import torch
+
+# Manual neuron (from above)
+neuron = ManualNeuron(num_inputs=2)
+
+# Training data
+X = torch.tensor([[1.0, 2.0],
+                  [2.0, 3.0],
+                  [3.0, 4.0]])
+y = torch.tensor([0.0, 0.0, 1.0])
+
+learning_rate = 0.1
+
+# Training loop
+for epoch in range(100):
+    total_loss = 0
+    
+    for i in range(len(X)):
+        # Forward pass
+        prediction = neuron.forward(X[i])
+        
+        # Loss (MSE)
+        loss = (prediction - y[i]) ** 2
+        total_loss += loss.item()
+        
+        # Backward pass
+        loss.backward()
+        
+        # Update weights manually
+        with torch.no_grad():
+            for param in neuron.parameters():
+                param -= learning_rate * param.grad
+                param.grad.zero_()
+    
+    if epoch % 20 == 0:
+        print(f"Epoch {epoch}, Loss: {total_loss:.4f}")
+
+# Test
+print("\\nPredictions after training:")
+for i in range(len(X)):
+    pred = neuron.forward(X[i])
+    print(f"Input: {X[i].tolist()}, Prediction: {pred.item():.3f}, Target: {y[i].item()}")
+```
+
+## Complete Neuron with All Features
+
+```python
+import torch
+import torch.nn as nn
+
+class CompleteNeuron(nn.Module):
+    def __init__(self, num_inputs, activation='relu'):
+        super().__init__()
+        self.linear = nn.Linear(num_inputs, 1)
+        
+        # Choose activation
+        if activation == 'relu':
+            self.activation = nn.ReLU()
+        elif activation == 'sigmoid':
+            self.activation = nn.Sigmoid()
+        elif activation == 'tanh':
+            self.activation = nn.Tanh()
+        else:
+            self.activation = nn.Identity()  # No activation
+    
+    def forward(self, x):
+        z = self.linear(x)
+        output = self.activation(z)
+        return output
+    
+    def get_weights(self):
+        return self.linear.weight.data
+    
+    def get_bias(self):
+        return self.linear.bias.data
+
+# Create neurons with different activations
+relu_neuron = CompleteNeuron(3, activation='relu')
+sigmoid_neuron = CompleteNeuron(3, activation='sigmoid')
+
+x = torch.tensor([[1.0, 2.0, 3.0]])
+
+print("ReLU:", relu_neuron(x))
+print("Sigmoid:", sigmoid_neuron(x))
+```
+
+## Real-World Application
+
+```python
+import torch
+import torch.nn as nn
+import torch.optim as optim
+
+# House price predictor
+class HousePriceNeuron(nn.Module):
+    def __init__(self):
+        super().__init__()
+        # 3 features: size, bedrooms, age
+        self.linear = nn.Linear(3, 1)
+        # No activation (regression)
+    
+    def forward(self, features):
+        price = self.linear(features)
+        return price
+
+# Training data
+houses = torch.tensor([[1500.0, 3.0, 10.0],  # [size, bedrooms, age]
+                       [2000.0, 4.0, 5.0],
+                       [1200.0, 2.0, 15.0],
+                       [1800.0, 3.0, 8.0]])
+
+prices = torch.tensor([[300000.0],  # Actual prices
+                       [450000.0],
+                       [250000.0],
+                       [380000.0]])
+
+# Create and train
+model = HousePriceNeuron()
+criterion = nn.MSELoss()
+optimizer = optim.SGD(model.parameters(), lr=0.0000001)
+
+# Train
+for epoch in range(500):
+    predictions = model(houses)
+    loss = criterion(predictions, prices)
+    
+    optimizer.zero_grad()
+    loss.backward()
+    optimizer.step()
+    
+    if epoch % 100 == 0:
+        print(f"Epoch {epoch}, Loss: {loss.item():.2f}")
+
+# Predict new house
+new_house = torch.tensor([[1600.0, 3.0, 12.0]])
+predicted_price = model(new_house)
+print(f"\\nPredicted price: ${predicted_price.item():,.0f}")
+```
+
+## Key Takeaways
+
+✓ **Building blocks:** Linear layer + activation function
+
+✓ **From scratch:** Can build with just tensors
+
+✓ **PyTorch way:** Use `nn.Module` and `nn.Linear`
+
+✓ **Training:** Forward → loss → backward → update
+
+✓ **Flexible:** Choose different activations for different tasks
+
+**Quick Reference:**
+
+```python
+# Simple neuron
+class Neuron(nn.Module):
+    def __init__(self, num_inputs):
+        super().__init__()
+        self.linear = nn.Linear(num_inputs, 1)
+        self.activation = nn.ReLU()
+    
+    def forward(self, x):
+        return self.activation(self.linear(x))
+
+# Training
+model = Neuron(num_inputs=5)
+optimizer = optim.SGD(model.parameters(), lr=0.01)
+
+for epoch in range(epochs):
+    pred = model(x)
+    loss = criterion(pred, y)
+    optimizer.zero_grad()
+    loss.backward()
+    optimizer.step()
+```
+
+**Remember:** You just built a neuron from scratch! This is the foundation of all neural networks! 🎉
diff --git a/public/content/learn/neuron-from-scratch/building-a-neuron-in-python/neuron-code.png b/public/content/learn/neuron-from-scratch/building-a-neuron-in-python/neuron-code.png
new file mode 100644
index 0000000..e09a80b
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/building-a-neuron-in-python/neuron-code.png differ
diff --git a/public/content/learn/neuron-from-scratch/making-a-prediction/making-a-prediction-content.md b/public/content/learn/neuron-from-scratch/making-a-prediction/making-a-prediction-content.md
new file mode 100644
index 0000000..5829a28
--- /dev/null
+++ b/public/content/learn/neuron-from-scratch/making-a-prediction/making-a-prediction-content.md
@@ -0,0 +1,220 @@
+---
+hero:
+  title: "Making a Prediction"
+  subtitle: "Using a Neuron for Forward Pass"
+  tags:
+    - "🧠 Neuron"
+    - "⏱️ 8 min read"
+---
+
+Now that we understand neurons, let's use one to **make predictions**! This is called the **forward pass**.
+
+![Prediction Flow](/content/learn/neuron-from-scratch/making-a-prediction/prediction-flow.png)
+
+## The Forward Pass
+
+**Forward pass = Input → Linear → Activation → Output**
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# Create a trained neuron (pretend it's already trained)
+neuron = nn.Sequential(
+    nn.Linear(2, 1),
+    nn.Sigmoid()
+)
+
+# Set trained weights manually (normally learned from data)
+with torch.no_grad():
+    neuron[0].weight = nn.Parameter(torch.tensor([[0.5, 0.8]]))
+    neuron[0].bias = nn.Parameter(torch.tensor([-0.3]))
+
+# Make a prediction
+input_data = torch.tensor([[1.0, 2.0]])  # New data point
+prediction = neuron(input_data)
+
+print(prediction)
+# tensor([[0.8176]]) ← Prediction!
+```
+
+**Manual calculation:**
+
+```yaml
+Input: [1.0, 2.0]
+Weights: [0.5, 0.8]
+Bias: -0.3
+
+Step 1: Linear
+  z = (1.0×0.5) + (2.0×0.8) + (-0.3)
+    = 0.5 + 1.6 - 0.3
+    = 1.8
+
+Step 2: Activation (Sigmoid)
+  output = 1 / (1 + e⁻¹·⁸)
+         = 1 / (1 + 0.165)
+         = 0.858
+
+Prediction: 0.858 or 85.8% probability
+```
+
+## Batch Predictions
+
+Process multiple samples at once:
+
+```python
+import torch
+import torch.nn as nn
+
+neuron = nn.Sequential(
+    nn.Linear(3, 1),
+    nn.ReLU()
+)
+
+# Batch of 5 samples, 3 features each
+batch = torch.tensor([[1.0, 2.0, 3.0],
+                      [2.0, 3.0, 4.0],
+                      [0.5, 1.0, 1.5],
+                      [3.0, 2.0, 1.0],
+                      [1.5, 2.5, 3.5]])
+
+# Make predictions for all samples
+predictions = neuron(batch)
+
+print(predictions.shape)  # torch.Size([5, 1])
+print(predictions)
+# tensor([[...],
+#         [...],
+#         [...],
+#         [...],
+#         [...]]) ← 5 predictions!
+```
+
+## Real-World Example: Binary Classification
+
+```python
+import torch
+import torch.nn as nn
+
+# Spam detector neuron
+class SpamNeuron(nn.Module):
+    def __init__(self, num_features):
+        super().__init__()
+        self.linear = nn.Linear(num_features, 1)
+        self.sigmoid = nn.Sigmoid()
+    
+    def forward(self, email_features):
+        # Linear step
+        logit = self.linear(email_features)
+        
+        # Activation (probability)
+        probability = self.sigmoid(logit)
+        
+        return probability
+
+# Create and use
+spam_detector = SpamNeuron(num_features=100)
+
+# New email features
+email = torch.randn(1, 100)
+
+# Predict
+spam_probability = spam_detector(email)
+print(f"Spam probability: {spam_probability.item():.1%}")
+
+if spam_probability > 0.5:
+    print("Prediction: SPAM")
+else:
+    print("Prediction: NOT SPAM")
+```
+
+## Step-by-Step Prediction
+
+```python
+import torch
+
+# Input
+x = torch.tensor([3.0, 2.0])
+
+# Learned parameters
+w = torch.tensor([0.4, 0.6])
+b = torch.tensor(0.2)
+
+# Step 1: Weighted sum
+print("Inputs:", x)
+print("Weights:", w)
+
+products = x * w
+print("Products:", products)
+# tensor([1.2, 1.2])
+
+weighted_sum = products.sum() + b
+print("Sum + bias:", weighted_sum)
+# tensor(2.6)
+
+# Step 2: Activation
+activated = torch.relu(weighted_sum)
+print("After ReLU:", activated)
+# tensor(2.6)
+
+# Final prediction
+print(f"\\nPrediction: {activated.item()}")
+```
+
+**Output:**
+
+```yaml
+Inputs: tensor([3., 2.])
+Weights: tensor([0.4, 0.6])
+Products: tensor([1.2, 1.2])
+Sum + bias: tensor(2.6)
+After ReLU: tensor(2.6)
+
+Prediction: 2.6
+```
+
+## Inference Mode
+
+When making predictions (not training), use `torch.no_grad()`:
+
+```python
+import torch
+
+model = nn.Sequential(nn.Linear(10, 1), nn.Sigmoid())
+
+# For prediction (inference)
+with torch.no_grad():
+    input_data = torch.randn(1, 10)
+    prediction = model(input_data)
+    print(prediction)
+
+# Why? Saves memory (doesn't track gradients)
+```
+
+## Key Takeaways
+
+✓ **Forward pass:** Input → Linear → Activation → Output
+
+✓ **Batch processing:** Handle multiple samples at once
+
+✓ **Inference mode:** Use `torch.no_grad()` when not training
+
+✓ **Prediction:** Just run the forward pass!
+
+**Quick Reference:**
+
+```python
+# Single prediction
+output = model(input_data)
+
+# Batch predictions
+outputs = model(batch_data)
+
+# Inference mode (no gradients)
+with torch.no_grad():
+    prediction = model(new_data)
+```
+
+**Remember:** Making predictions is just running the forward pass! 🎉
diff --git a/public/content/learn/neuron-from-scratch/making-a-prediction/prediction-flow.png b/public/content/learn/neuron-from-scratch/making-a-prediction/prediction-flow.png
new file mode 100644
index 0000000..a28e6d5
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/making-a-prediction/prediction-flow.png differ
diff --git a/public/content/learn/neuron-from-scratch/the-activation-function/activation-comparison.png b/public/content/learn/neuron-from-scratch/the-activation-function/activation-comparison.png
new file mode 100644
index 0000000..7843c05
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/the-activation-function/activation-comparison.png differ
diff --git a/public/content/learn/neuron-from-scratch/the-activation-function/the-activation-function-content.md b/public/content/learn/neuron-from-scratch/the-activation-function/the-activation-function-content.md
new file mode 100644
index 0000000..11fda60
--- /dev/null
+++ b/public/content/learn/neuron-from-scratch/the-activation-function/the-activation-function-content.md
@@ -0,0 +1,243 @@
+---
+hero:
+  title: "The Activation Function"
+  subtitle: "Adding Non-Linearity to Neurons"
+  tags:
+    - "🧠 Neuron"
+    - "⏱️ 8 min read"
+---
+
+The activation function is what makes neural networks **powerful**. Without it, you'd just have fancy linear regression!
+
+![Activation Comparison](/content/learn/neuron-from-scratch/the-activation-function/activation-comparison.png)
+
+## Why We Need Activation Functions
+
+**Without activation:** No matter how many layers, it's still just linear!
+
+```python
+import torch
+import torch.nn as nn
+
+# Network WITHOUT activation functions
+model_linear = nn.Sequential(
+    nn.Linear(10, 20),
+    # No activation!
+    nn.Linear(20, 5),
+    # No activation!
+    nn.Linear(5, 1)
+)
+
+# This is mathematically equivalent to:
+model_simple = nn.Linear(10, 1)
+
+# Same power as single layer!
+```
+
+**With activation:** Non-linear transformations → complex patterns!
+
+```python
+# Network WITH activation functions
+model_nonlinear = nn.Sequential(
+    nn.Linear(10, 20),
+    nn.ReLU(),      # ← Non-linearity!
+    nn.Linear(20, 5),
+    nn.ReLU(),      # ← Non-linearity!
+    nn.Linear(5, 1)
+)
+
+# This can learn complex patterns!
+```
+
+**The difference:**
+
+```yaml
+Without activation:
+  Layer 1: y = W1x + b1
+  Layer 2: z = W2y + b2
+         = W2(W1x + b1) + b2
+         = W2W1x + W2b1 + b2
+         = W3x + b3  ← Still just linear!
+
+With activation:
+  Layer 1: y = ReLU(W1x + b1)
+  Layer 2: z = ReLU(W2y + b2)
+         ← Non-linear! Can learn curves, boundaries, etc.
+```
+
+## Common Activation Functions
+
+### ReLU (Most Popular)
+
+```python
+import torch
+
+def relu(x):
+    return torch.maximum(torch.tensor(0.0), x)
+
+x = torch.tensor([-1.0, 0.0, 1.0, 2.0])
+print(relu(x))
+# tensor([0., 0., 1., 2.])
+```
+
+```yaml
+ReLU(x) = max(0, x)
+
+Properties:
+  ✓ Fast (simple comparison)
+  ✓ No vanishing gradient
+  ✓ Creates sparsity
+  
+Use: Hidden layers
+```
+
+### Sigmoid (For Probabilities)
+
+```python
+def sigmoid(x):
+    return 1 / (1 + torch.exp(-x))
+
+x = torch.tensor([-2.0, 0.0, 2.0])
+print(sigmoid(x))
+# tensor([0.1192, 0.5000, 0.8808])
+```
+
+```yaml
+σ(x) = 1 / (1 + e⁻ˣ)
+
+Properties:
+  ✓ Outputs [0, 1]
+  ✓ Smooth
+  ✗ Vanishing gradients
+  
+Use: Binary classification output
+```
+
+### Tanh (Zero-Centered)
+
+```python
+x = torch.tensor([-1.0, 0.0, 1.0])
+print(torch.tanh(x))
+# tensor([-0.7616,  0.0000,  0.7616])
+```
+
+```yaml
+tanh(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)
+
+Properties:
+  ✓ Outputs [-1, 1]
+  ✓ Zero-centered
+  ✗ Vanishing gradients
+  
+Use: RNN cells
+```
+
+## Where Activation Goes
+
+**After the linear step, before the next layer:**
+
+```python
+import torch
+import torch.nn as nn
+
+class SingleNeuron(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.linear = nn.Linear(3, 1)
+        self.activation = nn.ReLU()
+    
+    def forward(self, x):
+        # Step 1: Linear (weighted sum)
+        z = self.linear(x)
+        
+        # Step 2: Activation (non-linearity)
+        output = self.activation(z)
+        
+        return output
+
+# Test
+neuron = SingleNeuron()
+x = torch.tensor([[1.0, 2.0, 3.0]])
+output = neuron(x)
+print(output)
+```
+
+## Practical Example
+
+```python
+import torch
+import torch.nn as nn
+
+# Temperature prediction neuron
+# Inputs: [humidity, pressure, wind_speed]
+weather = torch.tensor([[65.0, 1013.0, 10.0]])
+
+# Create neuron
+temp_neuron = nn.Sequential(
+    nn.Linear(3, 1),
+    nn.ReLU()  # Activation ensures non-negative temperature
+)
+
+prediction = temp_neuron(weather)
+print(f"Predicted temperature: {prediction.item():.1f}°F")
+```
+
+## Choosing the Right Activation
+
+```yaml
+Hidden layers:
+  Default: ReLU
+  Modern: SiLU/GELU
+  Classical: Tanh
+
+Output layer (depends on task):
+  Binary classification: Sigmoid
+  Multi-class: Softmax
+  Regression: None (linear)
+```
+
+**Example network:**
+
+```python
+import torch.nn as nn
+
+model = nn.Sequential(
+    nn.Linear(10, 20),
+    nn.ReLU(),        # Hidden layer activation
+    nn.Linear(20, 10),
+    nn.ReLU(),        # Hidden layer activation
+    nn.Linear(10, 1),
+    nn.Sigmoid()      # Output activation for binary classification
+)
+```
+
+## Key Takeaways
+
+✓ **Activation adds non-linearity:** Makes networks powerful
+
+✓ **Applied after linear step:** Linear → Activation → Next layer
+
+✓ **Different types:** ReLU, Sigmoid, Tanh, etc.
+
+✓ **Choose based on task:** Hidden vs output, type of problem
+
+✓ **Without activation:** Multiple layers = single layer (useless!)
+
+**Quick Reference:**
+
+```python
+# After linear transformation
+z = linear(x)
+
+# Apply activation
+output = activation(z)
+
+# Common activations
+torch.relu(z)      # ReLU
+torch.sigmoid(z)   # Sigmoid  
+torch.tanh(z)      # Tanh
+F.silu(z)          # SiLU
+F.gelu(z)          # GELU
+```
+
+**Remember:** Linear step computes, activation function decides! 🎉
diff --git a/public/content/learn/neuron-from-scratch/the-concept-of-learning/learning-process.png b/public/content/learn/neuron-from-scratch/the-concept-of-learning/learning-process.png
new file mode 100644
index 0000000..f5e7623
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/the-concept-of-learning/learning-process.png differ
diff --git a/public/content/learn/neuron-from-scratch/the-concept-of-learning/the-concept-of-learning-content.md b/public/content/learn/neuron-from-scratch/the-concept-of-learning/the-concept-of-learning-content.md
new file mode 100644
index 0000000..08a8604
--- /dev/null
+++ b/public/content/learn/neuron-from-scratch/the-concept-of-learning/the-concept-of-learning-content.md
@@ -0,0 +1,234 @@
+---
+hero:
+  title: "The Concept of Learning"
+  subtitle: "How Neurons Adjust Their Weights"
+  tags:
+    - "🧠 Neuron"
+    - "⏱️ 8 min read"
+---
+
+Learning is the process of **adjusting weights to reduce loss**. The neuron literally learns from mistakes!
+
+![Learning Process](/content/learn/neuron-from-scratch/the-concept-of-learning/learning-process.png)
+
+## What Does "Learning" Mean?
+
+**Learning = Automatically adjusting weights to make better predictions**
+
+```yaml
+Before learning:
+  Weights: Random
+  Predictions: Bad
+  Loss: High
+
+After learning:
+  Weights: Optimized
+  Predictions: Good
+  Loss: Low
+```
+
+## The Learning Process
+
+**Step-by-step:**
+
+1. Make prediction (forward pass)
+2. Calculate loss (how wrong?)
+3. Calculate gradients (which direction to adjust?)
+4. Update weights (move in right direction)
+5. Repeat!
+
+**Example:**
+
+```python
+import torch
+import torch.nn as nn
+
+# Model
+model = nn.Linear(1, 1)
+
+# Training data
+x = torch.tensor([[1.0], [2.0], [3.0]])
+y = torch.tensor([[2.0], [4.0], [6.0]])  # y = 2x
+
+# Loss function
+criterion = nn.MSELoss()
+
+# Optimizer (handles weight updates)
+optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
+
+# Training loop
+for epoch in range(100):
+    # 1. Forward pass
+    predictions = model(x)
+    
+    # 2. Calculate loss
+    loss = criterion(predictions, y)
+    
+    # 3. Backward pass (calculate gradients)
+    optimizer.zero_grad()
+    loss.backward()
+    
+    # 4. Update weights
+    optimizer.step()
+    
+    if epoch % 20 == 0:
+        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
+
+# After training
+print(f"Learned weight: {model.weight.item():.2f}")  # Should be close to 2.0
+print(f"Learned bias: {model.bias.item():.2f}")      # Should be close to 0.0
+```
+
+## Gradient Descent
+
+**The algorithm that powers learning:**
+
+```yaml
+Current weight: w = 0.5
+Loss: high
+
+Gradient: ∂Loss/∂w = -2.3
+  Negative gradient → loss decreases if we INCREASE w
+
+Update:
+  w_new = w - learning_rate × gradient
+  w_new = 0.5 - 0.01 × (-2.3)
+  w_new = 0.5 + 0.023
+  w_new = 0.523
+
+Result: Loss is now lower!
+```
+
+## Learning Rate
+
+**Learning rate controls how big each step is:**
+
+```python
+# Too small: slow learning
+optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
+# Takes forever to learn!
+
+# Just right: good learning
+optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
+# Learns efficiently
+
+# Too large: unstable learning
+optimizer = torch.optim.SGD(model.parameters(), lr=10.0)
+# Might overshoot and never converge!
+```
+
+**Effect of learning rate:**
+
+```yaml
+lr = 0.001 (small):
+  Small weight updates
+  Slow but stable
+  Many epochs needed
+
+lr = 0.01 (medium):
+  Moderate updates
+  Good balance
+  Converges reasonably
+
+lr = 1.0 (large):
+  Large weight updates
+  Fast but unstable
+  Might oscillate or diverge
+```
+
+## Simple Learning Example
+
+```python
+import torch
+
+# True relationship: y = 3x + 1
+x_train = torch.tensor([1.0, 2.0, 3.0, 4.0])
+y_train = torch.tensor([4.0, 7.0, 10.0, 13.0])
+
+# Model (start with random weights)
+w = torch.tensor([0.5], requires_grad=True)
+b = torch.tensor([0.0], requires_grad=True)
+
+learning_rate = 0.01
+
+# Train for 100 steps
+for step in range(100):
+    # Prediction
+    y_pred = w * x_train + b
+    
+    # Loss
+    loss = ((y_pred - y_train) ** 2).mean()
+    
+    # Backpropagation
+    loss.backward()
+    
+    # Update weights
+    with torch.no_grad():
+        w -= learning_rate * w.grad
+        b -= learning_rate * b.grad
+        
+        # Reset gradients
+        w.grad.zero_()
+        b.grad.zero_()
+    
+    if step % 20 == 0:
+        print(f"Step {step}: w={w.item():.2f}, b={b.item():.2f}, loss={loss.item():.4f}")
+
+print(f"\\nLearned: y = {w.item():.2f}x + {b.item():.2f}")
+# Should be close to: y = 3x + 1
+```
+
+## What the Neuron Learns
+
+```python
+# Example: Learning to classify
+
+# Initially (random weights):
+prediction = neuron([1.0, 2.0])  # 0.34 (wrong!)
+actual = 1.0
+loss = high
+
+# After seeing examples:
+# The neuron learns that:
+# - Feature 1 with value > 0.5 → usually class 1
+# - Feature 2 with value > 1.0 → usually class 1
+# So it adjusts weights accordingly
+
+# Finally (trained weights):
+prediction = neuron([1.0, 2.0])  # 0.98 (correct!)
+actual = 1.0
+loss = low
+```
+
+## Key Takeaways
+
+✓ **Learning = Adjusting weights:** Based on errors
+
+✓ **Goal:** Minimize loss
+
+✓ **Gradient descent:** The learning algorithm
+
+✓ **Learning rate:** Controls step size
+
+✓ **Automatic:** PyTorch calculates gradients for you!
+
+**Quick Reference:**
+
+```python
+# Training loop
+for epoch in range(num_epochs):
+    # Forward pass
+    predictions = model(inputs)
+    
+    # Calculate loss
+    loss = criterion(predictions, targets)
+    
+    # Backward pass
+    optimizer.zero_grad()
+    loss.backward()
+    
+    # Update weights
+    optimizer.step()
+```
+
+**Remember:** Learning is just: predict → measure error → adjust → repeat! 🎉
diff --git a/public/content/learn/neuron-from-scratch/the-concept-of-loss/loss-function.png b/public/content/learn/neuron-from-scratch/the-concept-of-loss/loss-function.png
new file mode 100644
index 0000000..a76146d
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/the-concept-of-loss/loss-function.png differ
diff --git a/public/content/learn/neuron-from-scratch/the-concept-of-loss/the-concept-of-loss-content.md b/public/content/learn/neuron-from-scratch/the-concept-of-loss/the-concept-of-loss-content.md
new file mode 100644
index 0000000..fc8106b
--- /dev/null
+++ b/public/content/learn/neuron-from-scratch/the-concept-of-loss/the-concept-of-loss-content.md
@@ -0,0 +1,229 @@
+---
+hero:
+  title: "The Concept of Loss"
+  subtitle: "Measuring How Wrong Your Model Is"
+  tags:
+    - "🧠 Neuron"
+    - "⏱️ 8 min read"
+---
+
+Loss tells you **how wrong** your model's predictions are. Lower loss = better model!
+
+![Loss Function](/content/learn/neuron-from-scratch/the-concept-of-loss/loss-function.png)
+
+## What is Loss?
+
+**Loss = Difference between prediction and actual answer**
+
+Think of it like a score in golf - **lower is better**!
+
+**Example:**
+
+```python
+import torch
+
+# Actual answer (ground truth)
+actual = torch.tensor([1.0])
+
+# Model's prediction
+prediction = torch.tensor([0.7])
+
+# Loss: how far off?
+loss = (prediction - actual) ** 2  # Squared difference
+print(loss)
+# tensor([0.0900])
+
+# Closer prediction
+better_prediction = torch.tensor([0.95])
+better_loss = (better_prediction - actual) ** 2
+print(better_loss)
+# tensor([0.0025]) ← Much lower! Better!
+```
+
+**Manual calculation:**
+
+```yaml
+Actual: 1.0
+Prediction: 0.7
+Difference: 0.7 - 1.0 = -0.3
+Squared: (-0.3)² = 0.09
+Loss: 0.09
+
+Better prediction: 0.95
+Difference: 0.95 - 1.0 = -0.05
+Squared: (-0.05)² = 0.0025
+Loss: 0.0025 ← Much better!
+```
+
+## Common Loss Functions
+
+### Mean Squared Error (MSE)
+
+For regression (predicting numbers):
+
+```python
+import torch
+import torch.nn as nn
+
+# Multiple predictions
+predictions = torch.tensor([2.5, 3.1, 4.8])
+actual = torch.tensor([2.0, 3.0, 5.0])
+
+# MSE Loss
+mse_loss = nn.MSELoss()
+loss = mse_loss(predictions, actual)
+
+print(loss)
+# tensor(0.1000)
+
+# Manual: ((2.5-2)² + (3.1-3)² + (4.8-5)²) / 3
+#       = (0.25 + 0.01 + 0.04) / 3
+#       = 0.1
+```
+
+### Binary Cross Entropy (BCE)
+
+For binary classification (yes/no):
+
+```python
+# Predictions (probabilities)
+predictions = torch.tensor([0.9, 0.2, 0.7])
+
+# Actual labels (0 or 1)
+labels = torch.tensor([1.0, 0.0, 1.0])
+
+# BCE Loss
+bce_loss = nn.BCELoss()
+loss = bce_loss(predictions, labels)
+
+print(loss)
+# Low loss because predictions are close to labels!
+```
+
+### Cross Entropy Loss
+
+For multi-class classification:
+
+```python
+# Raw logits (before softmax)
+logits = torch.tensor([[2.0, 1.0, 0.1]])
+
+# Actual class (class 0)
+target = torch.tensor([0])
+
+# Cross Entropy (includes softmax)
+ce_loss = nn.CrossEntropyLoss()
+loss = ce_loss(logits, target)
+
+print(loss)
+# Lower loss because logits[0]=2.0 is highest!
+```
+
+## Why We Minimize Loss
+
+**Goal of training: Make loss as small as possible!**
+
+```yaml
+High loss:
+  Model is very wrong
+  Predictions far from truth
+  Need to adjust weights
+
+Low loss:
+  Model is accurate
+  Predictions close to truth
+  Weights are good!
+
+Training:
+  Start: High loss (random weights)
+  Process: Adjust weights to reduce loss
+  End: Low loss (trained model)
+```
+
+## Practical Example
+
+```python
+import torch
+import torch.nn as nn
+
+# Simple model
+model = nn.Sequential(
+    nn.Linear(2, 1),
+    nn.Sigmoid()
+)
+
+# Data
+inputs = torch.tensor([[1.0, 2.0]])
+target = torch.tensor([[1.0]])  # Actual answer
+
+# Forward pass
+prediction = model(inputs)
+print(f"Prediction: {prediction.item():.3f}")
+
+# Calculate loss
+loss_fn = nn.BCELoss()
+loss = loss_fn(prediction, target)
+print(f"Loss: {loss.item():.3f}")
+
+# Interpretation
+if loss < 0.1:
+    print("Great! Model is accurate")
+elif loss < 0.5:
+    print("OK, but needs improvement")
+else:
+    print("Bad! Model needs more training")
+```
+
+## Loss Guides Learning
+
+```python
+# Loss tells us which direction to adjust weights
+
+# Current prediction vs target
+prediction = 0.3
+target = 1.0
+loss = (prediction - target) ** 2  # 0.49
+
+# If we increase weight:
+# prediction becomes 0.6
+# loss becomes (0.6 - 1.0)² = 0.16 ← Better!
+
+# If we decrease weight:
+# prediction becomes 0.1  
+# loss becomes (0.1 - 1.0)² = 0.81 ← Worse!
+
+# So we should INCREASE the weight!
+```
+
+## Key Takeaways
+
+✓ **Loss = Error:** Measures how wrong predictions are
+
+✓ **Lower is better:** Training minimizes loss
+
+✓ **Different types:** MSE, BCE, CrossEntropy for different tasks
+
+✓ **Guides learning:** Loss tells us how to adjust weights
+
+✓ **Always positive:** Loss is never negative
+
+**Quick Reference:**
+
+```python
+# MSE (regression)
+loss = nn.MSELoss()(predictions, targets)
+
+# BCE (binary classification)
+loss = nn.BCELoss()(predictions, targets)
+
+# CrossEntropy (multi-class)
+loss = nn.CrossEntropyLoss()(logits, targets)
+
+# Training loop
+for epoch in range(100):
+    prediction = model(x)
+    loss = loss_fn(prediction, y)
+    # ... backprop and update ...
+```
+
+**Remember:** Loss is your compass - it guides the model to better predictions! 🎉
diff --git a/public/content/learn/neuron-from-scratch/the-linear-step/linear-step-visual.png b/public/content/learn/neuron-from-scratch/the-linear-step/linear-step-visual.png
new file mode 100644
index 0000000..1973635
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/the-linear-step/linear-step-visual.png differ
diff --git a/public/content/learn/neuron-from-scratch/the-linear-step/the-linear-step-content.md b/public/content/learn/neuron-from-scratch/the-linear-step/the-linear-step-content.md
new file mode 100644
index 0000000..051bb5a
--- /dev/null
+++ b/public/content/learn/neuron-from-scratch/the-linear-step/the-linear-step-content.md
@@ -0,0 +1,307 @@
+---
+hero:
+  title: "The Linear Step"
+  subtitle: "Weighted Sum - The Core Computation"
+  tags:
+    - "🧠 Neuron"
+    - "⏱️ 8 min read"
+---
+
+The linear step is where the **magic begins** - it's how a neuron combines its inputs using weights!
+
+![Linear Step Visual](/content/learn/neuron-from-scratch/the-linear-step/linear-step-visual.png)
+
+## The Formula
+
+**z = w₁x₁ + w₂x₂ + w₃x₃ + ... + b**
+
+Or in vector form: **z = w · x + b**
+
+This is called the **weighted sum** or **linear combination**.
+
+## Breaking It Down
+
+**Example:**
+
+```python
+import torch
+
+# Inputs (features)
+x = torch.tensor([2.0, 3.0, 1.5])
+
+# Weights (learned parameters)
+w = torch.tensor([0.5, -0.3, 0.8])
+
+# Bias (learned parameter)
+b = torch.tensor(0.1)
+
+# Linear step: weighted sum
+z = torch.dot(w, x) + b
+# OR: z = (w * x).sum() + b
+
+print(z)
+# tensor(1.1000)
+```
+
+**Manual calculation:**
+
+```yaml
+Step 1: Multiply each input by its weight
+  2.0 × 0.5 = 1.0
+  3.0 × -0.3 = -0.9
+  1.5 × 0.8 = 1.2
+
+Step 2: Sum all products
+  1.0 + (-0.9) + 1.2 = 1.3
+
+Step 3: Add bias
+  1.3 + 0.1 = 1.4
+
+Result: z = 1.4
+```
+
+## Why "Linear"?
+
+It's called linear because the relationship between inputs and output is a **straight line**!
+
+```python
+# If you double an input, the contribution doubles
+x1 = torch.tensor([2.0])
+w1 = torch.tensor([0.5])
+
+contribution1 = w1 * x1
+print(contribution1)  # tensor([1.0])
+
+# Double the input
+x2 = torch.tensor([4.0])
+contribution2 = w1 * x2
+print(contribution2)  # tensor([2.0]) ← Exactly double!
+```
+
+**Linear properties:**
+
+```yaml
+f(x + y) = f(x) + f(y)  ← Additive
+f(2x) = 2·f(x)          ← Scalable
+
+This makes it predictable and stable!
+```
+
+## What Each Component Does
+
+### Weights: The Learnable Parameters
+
+Weights determine **which inputs matter**:
+
+```python
+# Positive weight → input increases output
+w_positive = 0.8
+x = 5.0
+contribution = w_positive * x  # 4.0 ← Boosts output!
+
+# Negative weight → input decreases output  
+w_negative = -0.8
+contribution = w_negative * x  # -4.0 ← Reduces output!
+
+# Small weight → input barely matters
+w_small = 0.01
+contribution = w_small * x  # 0.05 ← Tiny effect
+
+# Large weight → input matters a lot
+w_large = 10.0
+contribution = w_large * x  # 50.0 ← Huge effect!
+```
+
+### Bias: The Threshold Adjuster
+
+Bias shifts the decision boundary:
+
+```python
+import torch
+
+x = torch.tensor([1.0, 1.0])
+w = torch.tensor([1.0, 1.0])
+
+# No bias
+z_no_bias = torch.dot(w, x)
+print(z_no_bias)  # tensor(2.0000)
+
+# Positive bias (easier to activate)
+b_positive = 5.0
+z_positive = torch.dot(w, x) + b_positive
+print(z_positive)  # tensor(7.0000) ← Higher!
+
+# Negative bias (harder to activate)
+b_negative = -5.0
+z_negative = torch.dot(w, x) + b_negative
+print(z_negative)  # tensor(-3.0000) ← Lower!
+```
+
+**What bias does:**
+
+```yaml
+Positive bias:
+  Makes neuron more likely to "fire"
+  Shifts decision boundary down
+  
+Negative bias:
+  Makes neuron less likely to "fire"
+  Shifts decision boundary up
+  
+No bias:
+  Decision passes through origin
+```
+
+## Using nn.Linear in PyTorch
+
+PyTorch provides `nn.Linear` to do this automatically:
+
+```python
+import torch
+import torch.nn as nn
+
+# Create linear layer: 3 inputs → 1 output
+linear = nn.Linear(in_features=3, out_features=1)
+
+# Input batch: 5 samples, 3 features each
+x = torch.randn(5, 3)
+
+# Apply linear transformation
+z = linear(x)
+
+print(z.shape)  # torch.Size([5, 1])
+
+# What it does internally:
+# z = x @ linear.weight.T + linear.bias
+```
+
+## Multiple Outputs
+
+You can have multiple output neurons:
+
+```python
+import torch
+import torch.nn as nn
+
+# 3 inputs → 5 outputs (5 neurons)
+linear = nn.Linear(3, 5)
+
+x = torch.tensor([[1.0, 2.0, 3.0]])  # 1 sample
+
+z = linear(x)
+print(z)
+# tensor([[0.234, -1.123, 0.567, 2.134, -0.876]])
+# 5 different outputs (one per neuron)!
+
+# Each output has its own weights:
+print(linear.weight.shape)  # torch.Size([5, 3])
+# 5 neurons × 3 weights each
+
+print(linear.bias.shape)  # torch.Size([5])
+# 5 biases (one per neuron)
+```
+
+## Real-World Example
+
+```python
+import torch
+import torch.nn as nn
+
+# House price prediction
+# Inputs: [size_sqft, bedrooms, age_years]
+house_features = torch.tensor([[2000.0, 3.0, 10.0]])
+
+# Create linear layer
+price_neuron = nn.Linear(3, 1)
+
+# Manually set weights (usually learned from data)
+with torch.no_grad():
+    price_neuron.weight = nn.Parameter(torch.tensor([[200.0, 50000.0, -1000.0]]))
+    price_neuron.bias = nn.Parameter(torch.tensor([50000.0]))
+
+# Predict price
+predicted_price = price_neuron(house_features)
+print(predicted_price)
+# tensor([[540000.]]) ← $540,000 prediction
+
+# Manual calculation:
+# 2000×200 + 3×50000 + 10×(-1000) + 50000
+# = 400,000 + 150,000 - 10,000 + 50,000
+# = 590,000 (close to our result!)
+```
+
+**What the weights learned:**
+
+```yaml
+Weight for size: 200 → Each sq ft adds $200
+Weight for bedrooms: 50,000 → Each bedroom adds $50k
+Weight for age: -1,000 → Each year reduces price by $1k
+Bias: 50,000 → Base price of $50k
+```
+
+## Matrix Form
+
+For a batch, the linear step is matrix multiplication:
+
+```python
+# Batch of 3 samples
+X = torch.tensor([[1.0, 2.0],
+                  [3.0, 4.0],
+                  [5.0, 6.0]])  # Shape: (3, 2)
+
+# Weights for 1 output neuron
+W = torch.tensor([[0.5],
+                  [0.3]])  # Shape: (2, 1)
+
+b = torch.tensor([0.1])
+
+# Linear step as matrix multiplication
+Z = X @ W + b
+
+print(Z)
+# tensor([[1.2000],
+#         [2.8000],
+#         [4.4000]])
+```
+
+**Matrix form:**
+
+```yaml
+Z = XW + b
+
+Where:
+  X: (batch_size, input_features)
+  W: (input_features, output_features)
+  b: (output_features,)
+  Z: (batch_size, output_features)
+```
+
+## Key Takeaways
+
+✓ **Linear step:** Weighted sum of inputs plus bias
+
+✓ **Formula:** z = Σ(wᵢxᵢ) + b
+
+✓ **Weights:** Determine importance of each input
+
+✓ **Bias:** Shifts the output
+
+✓ **PyTorch:** Use `nn.Linear(in, out)`
+
+✓ **Matrix form:** Efficient for batches
+
+**Quick Reference:**
+
+```python
+# Manual linear step
+z = (weights * inputs).sum() + bias
+
+# Using PyTorch
+linear = nn.Linear(input_dim, output_dim)
+z = linear(x)
+
+# What it does:
+# z = x @ linear.weight.T + linear.bias
+```
+
+**Remember:** The linear step is just multiply → sum → add bias. Simple but powerful! 🎉
diff --git a/public/content/learn/neuron-from-scratch/what-is-a-neuron/biological-vs-artificial.png b/public/content/learn/neuron-from-scratch/what-is-a-neuron/biological-vs-artificial.png
new file mode 100644
index 0000000..ebd1b2c
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/what-is-a-neuron/biological-vs-artificial.png differ
diff --git a/public/content/learn/neuron-from-scratch/what-is-a-neuron/neuron-parts.png b/public/content/learn/neuron-from-scratch/what-is-a-neuron/neuron-parts.png
new file mode 100644
index 0000000..1ae788a
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/what-is-a-neuron/neuron-parts.png differ
diff --git a/public/content/learn/neuron-from-scratch/what-is-a-neuron/simple-neuron.png b/public/content/learn/neuron-from-scratch/what-is-a-neuron/simple-neuron.png
new file mode 100644
index 0000000..08ca705
Binary files /dev/null and b/public/content/learn/neuron-from-scratch/what-is-a-neuron/simple-neuron.png differ
diff --git a/public/content/learn/neuron-from-scratch/what-is-a-neuron/what-is-a-neuron-content.md b/public/content/learn/neuron-from-scratch/what-is-a-neuron/what-is-a-neuron-content.md
new file mode 100644
index 0000000..69f1e39
--- /dev/null
+++ b/public/content/learn/neuron-from-scratch/what-is-a-neuron/what-is-a-neuron-content.md
@@ -0,0 +1,267 @@
+---
+hero:
+  title: "What is a Neuron"
+  subtitle: "The Basic Building Block of Neural Networks"
+  tags:
+    - "🧠 Neuron"
+    - "⏱️ 8 min read"
+---
+
+A neuron is the **fundamental building block** of neural networks. Just like biological neurons in your brain, artificial neurons process inputs and produce outputs!
+
+## Biological vs Artificial
+
+![Biological vs Artificial](/content/learn/neuron-from-scratch/what-is-a-neuron/biological-vs-artificial.png)
+
+**Biological neuron:**
+- Receives signals through dendrites
+- Processes in cell body
+- Sends output through axon
+
+**Artificial neuron:**
+- Receives numerical inputs
+- Processes with math (multiply, sum, activate)
+- Outputs a single number
+
+**Both:** Transform multiple inputs into one output!
+
+## The Five Parts of a Neuron
+
+![Neuron Parts](/content/learn/neuron-from-scratch/what-is-a-neuron/neuron-parts.png)
+
+### 1. **Inputs** (x₁, x₂, x₃, ...)
+
+The data fed into the neuron:
+
+```python
+inputs = [2.0, 3.0, 1.0]
+```
+
+**Real examples:**
+- Pixel values from an image
+- Features of a house (size, bedrooms, age)
+- Word embeddings
+
+### 2. **Weights** (w₁, w₂, w₃, ...)
+
+How important each input is:
+
+```python
+weights = [0.5, -0.3, 0.8]
+```
+
+**What weights mean:**
+- Positive weight → input increases output
+- Negative weight → input decreases output
+- Large |weight| → input is important
+- Small weight → input matters less
+
+### 3. **Multiply** (inputs × weights)
+
+Each input gets multiplied by its weight:
+
+```python
+products = [2.0 × 0.5,  3.0 × -0.3,  1.0 × 0.8]
+         = [1.0,       -0.9,        0.8]
+```
+
+### 4. **Sum** (Σ)
+
+Add all products together, plus a bias:
+
+```python
+sum_total = 1.0 + (-0.9) + 0.8 + bias
+          = 0.9 + 0  # assuming bias = 0
+          = 0.9
+```
+
+### 5. **Activation Function**
+
+Apply non-linearity (like ReLU, sigmoid, etc.):
+
+```python
+output = ReLU(0.9) = 0.9  # Positive, so unchanged
+```
+
+## The Complete Formula
+
+**Output = Activation(Σ(weights · inputs) + bias)**
+
+Or in math notation:
+**y = f(w₁x₁ + w₂x₂ + w₃x₃ + ... + b)**
+
+Where:
+- `x` = inputs
+- `w` = weights
+- `b` = bias
+- `f` = activation function
+
+## Simple Example
+
+![Simple Neuron](/content/learn/neuron-from-scratch/what-is-a-neuron/simple-neuron.png)
+
+**Example:**
+
+```python
+import torch
+
+# Inputs
+x = torch.tensor([2.0, 3.0, 1.0])
+
+# Weights
+w = torch.tensor([0.5, -0.3, 0.8])
+
+# Bias
+b = torch.tensor(0.0)
+
+# Step 1: Multiply
+products = x * w
+print(products)
+# tensor([ 1.0000, -0.9000,  0.8000])
+
+# Step 2: Sum
+weighted_sum = products.sum() + b
+print(weighted_sum)
+# tensor(0.9000)
+
+# Step 3: Activation (ReLU)
+output = torch.relu(weighted_sum)
+print(output)
+# tensor(0.9000)
+```
+
+**Manual calculation:**
+
+```yaml
+Step 1: Multiply each input by its weight
+  2 × 0.5 = 1.0
+  3 × -0.3 = -0.9
+  1 × 0.8 = 0.8
+
+Step 2: Sum everything + bias
+  1.0 + (-0.9) + 0.8 + 0 = 0.9
+
+Step 3: Apply activation (ReLU)
+  ReLU(0.9) = max(0, 0.9) = 0.9
+
+Final output: 0.9
+```
+
+## Why Do We Need Neurons?
+
+### They Learn Patterns
+
+Neurons adjust their weights to recognize patterns:
+
+```python
+# Neuron learning to detect "cat" in images
+# After training:
+weights = [0.8,   # whiskers → high weight (important!)
+           0.9,   # pointy ears → high weight
+           0.1,   # background → low weight (not important)
+           -0.5]  # dog features → negative (opposite!)
+
+# When it sees a cat image:
+cat_features = [1.0, 1.0, 0.2, 0.0]  # Has whiskers, ears
+output = sum(cat_features * weights) + bias
+# = 0.8 + 0.9 + 0.02 + 0 = 1.72
+# → High output = "Yes, cat!"
+
+# When it sees a dog image:
+dog_features = [0.0, 0.0, 0.3, 1.0]  # No whiskers/ears, has dog features
+output = sum(dog_features * weights) + bias
+# = 0 + 0 + 0.03 + -0.5 = -0.47
+# → Low output = "No, not cat"
+```
+
+## Single Neuron Can Be Powerful
+
+Even one neuron can solve problems:
+
+**Example: AND gate**
+
+```python
+import torch
+
+def and_gate(x1, x2):
+    """Neuron implementing AND logic"""
+    w1, w2 = 1.0, 1.0
+    bias = -1.5
+    
+    # Weighted sum
+    z = x1 * w1 + x2 * w2 + bias
+    
+    # Activation (step function)
+    output = 1.0 if z > 0 else 0.0
+    return output
+
+# Truth table
+print(and_gate(0, 0))  # 0 (False AND False = False)
+print(and_gate(0, 1))  # 0 (False AND True = False)
+print(and_gate(1, 0))  # 0 (True AND False = False)
+print(and_gate(1, 1))  # 1 (True AND True = True)
+```
+
+**How it works:**
+
+```yaml
+Inputs: (1, 1)
+  1×1 + 1×1 + (-1.5) = 0.5 > 0 → Output 1 ✓
+
+Inputs: (0, 1)
+  0×1 + 1×1 + (-1.5) = -0.5 < 0 → Output 0 ✓
+
+Inputs: (1, 0)
+  1×1 + 0×1 + (-1.5) = -0.5 < 0 → Output 0 ✓
+
+Inputs: (0, 0)
+  0×1 + 0×1 + (-1.5) = -1.5 < 0 → Output 0 ✓
+```
+
+## Many Neurons = Network
+
+```yaml
+Single neuron:
+  Limited power
+  Can learn simple patterns
+  
+Multiple neurons:
+  Combined power
+  Can learn complex patterns
+  Each neuron specializes in something
+  
+Example: Image classification
+  Neuron 1: Detects edges
+  Neuron 2: Detects curves
+  Neuron 3: Detects textures
+  ...
+  Together: Recognize objects!
+```
+
+## Key Takeaways
+
+✓ **Neuron = Processor:** Takes inputs, produces output
+
+✓ **Three operations:** Multiply, Sum, Activate
+
+✓ **Weights are key:** They determine what the neuron learns
+
+✓ **Bias shifts:** Adjusts the threshold
+
+✓ **Activation adds non-linearity:** Makes networks powerful
+
+✓ **Building block:** Many neurons = neural network
+
+**The formula:**
+
+```yaml
+Output = Activation(Σ(weights × inputs) + bias)
+
+In code:
+  output = activation(torch.sum(weights * inputs) + bias)
+
+Or with linear layer:
+  output = activation(nn.Linear(inputs))
+```
+
+**Remember:** A neuron is just multiply → sum → activate! Everything else builds on this! 🎉
diff --git a/public/content/learn/tensors/concatenating-tensors/concat-dim0.png b/public/content/learn/tensors/concatenating-tensors/concat-dim0.png
new file mode 100644
index 0000000..bb5e622
Binary files /dev/null and b/public/content/learn/tensors/concatenating-tensors/concat-dim0.png differ
diff --git a/public/content/learn/tensors/concatenating-tensors/concat-dim1.png b/public/content/learn/tensors/concatenating-tensors/concat-dim1.png
new file mode 100644
index 0000000..0f6ef2d
Binary files /dev/null and b/public/content/learn/tensors/concatenating-tensors/concat-dim1.png differ
diff --git a/public/content/learn/tensors/concatenating-tensors/concat-rules.png b/public/content/learn/tensors/concatenating-tensors/concat-rules.png
new file mode 100644
index 0000000..83e6e02
Binary files /dev/null and b/public/content/learn/tensors/concatenating-tensors/concat-rules.png differ
diff --git a/public/content/learn/tensors/concatenating-tensors/concatenating-tensors-content.md b/public/content/learn/tensors/concatenating-tensors/concatenating-tensors-content.md
new file mode 100644
index 0000000..29f768a
--- /dev/null
+++ b/public/content/learn/tensors/concatenating-tensors/concatenating-tensors-content.md
@@ -0,0 +1,419 @@
+---
+hero:
+  title: "Concatenating Tensors"
+  subtitle: "Combining Multiple Tensors"
+  tags:
+    - "🔢 Tensors"
+    - "⏱️ 9 min read"
+---
+
+Concatenation lets you **join multiple tensors together** along a specific dimension. Think of it like gluing pieces together!
+
+## The Basic Idea
+
+**Concatenation = Joining tensors end-to-end along one dimension**
+
+You can join tensors:
+- **Vertically** (stack rows on top of each other)
+- **Horizontally** (place side by side)
+- **Along any dimension**
+
+## Concatenating Along Dimension 0 (Rows)
+
+Stack tensors **vertically** - adding more rows:
+
+![Concat Dimension 0](/content/learn/tensors/concatenating-tensors/concat-dim0.png)
+
+**Example:**
+
+```python
+import torch
+
+A = torch.tensor([[1, 2, 3],
+                  [4, 5, 6]])  # Shape: (2, 3)
+
+B = torch.tensor([[7, 8, 9],
+                  [10, 11, 12]])  # Shape: (2, 3)
+
+# Concatenate along dimension 0 (rows)
+result = torch.cat([A, B], dim=0)
+
+print(result)
+# tensor([[ 1,  2,  3],
+#         [ 4,  5,  6],
+#         [ 7,  8,  9],
+#         [10, 11, 12]])
+
+print(result.shape)  # torch.Size([4, 3])
+```
+
+**What happened:**
+
+```yaml
+A: (2, 3)  →  2 rows, 3 columns
+B: (2, 3)  →  2 rows, 3 columns
+
+Concatenate rows: 2 + 2 = 4 rows
+Columns stay same: 3 columns
+
+Result: (4, 3)
+```
+
+**Visual breakdown:**
+
+```yaml
+[[1, 2, 3],     ← From A
+ [4, 5, 6],     ← From A
+ [7, 8, 9],     ← From B
+ [10, 11, 12]]  ← From B
+```
+
+## Concatenating Along Dimension 1 (Columns)
+
+Join tensors **horizontally** - adding more columns:
+
+![Concat Dimension 1](/content/learn/tensors/concatenating-tensors/concat-dim1.png)
+
+**Example:**
+
+```python
+import torch
+
+A = torch.tensor([[1, 2],
+                  [3, 4]])  # Shape: (2, 2)
+
+B = torch.tensor([[5, 6, 7],
+                  [8, 9, 10]])  # Shape: (2, 3)
+
+# Concatenate along dimension 1 (columns)
+result = torch.cat([A, B], dim=1)
+
+print(result)
+# tensor([[ 1,  2,  5,  6,  7],
+#         [ 3,  4,  8,  9, 10]])
+
+print(result.shape)  # torch.Size([2, 5])
+```
+
+**What happened:**
+
+```yaml
+A: (2, 2)  →  2 rows, 2 columns
+B: (2, 3)  →  2 rows, 3 columns
+
+Rows stay same: 2 rows
+Concatenate columns: 2 + 3 = 5 columns
+
+Result: (2, 5)
+```
+
+**Visual breakdown:**
+
+```yaml
+[[1, 2,   5, 6, 7],
+ [3, 4,   8, 9, 10]]
+  ↑↑↑    ↑↑↑↑↑↑↑
+  From A  From B
+```
+
+## The Concatenation Rules
+
+![Concat Rules](/content/learn/tensors/concatenating-tensors/concat-rules.png)
+
+**Rule:** All dimensions EXCEPT the concatenation dimension must match!
+
+### ✓ Valid Examples
+
+```python
+# Concatenate dim=0: columns must match
+A = torch.randn(2, 3)  # (2, 3)
+B = torch.randn(4, 3)  # (4, 3) - same 3 columns ✓
+result = torch.cat([A, B], dim=0)  # (6, 3)
+
+# Concatenate dim=1: rows must match
+C = torch.randn(5, 2)  # (5, 2)
+D = torch.randn(5, 7)  # (5, 7) - same 5 rows ✓
+result = torch.cat([C, D], dim=1)  # (5, 9)
+```
+
+### ✗ Invalid Examples
+
+```python
+# Different column counts - can't stack rows!
+A = torch.randn(2, 3)
+B = torch.randn(2, 4)  # Different columns
+# torch.cat([A, B], dim=0)  # ERROR! 3 ≠ 4
+
+# Different row counts - can't join columns!
+C = torch.randn(3, 5)
+D = torch.randn(2, 5)  # Different rows
+# torch.cat([C, D], dim=1)  # ERROR! 3 ≠ 2
+```
+
+**Quick check:**
+
+```yaml
+Concatenating dim=0 (vertical):
+  ✓ (2,3) + (4,3) → (6,3)  ← columns match (3)
+  ✗ (2,3) + (2,4) → ERROR  ← columns don't match
+
+Concatenating dim=1 (horizontal):
+  ✓ (5,2) + (5,7) → (5,9)  ← rows match (5)
+  ✗ (3,5) + (2,5) → ERROR  ← rows don't match
+```
+
+## Stack: Creating a New Dimension
+
+`torch.stack()` is different - it **creates a new dimension**:
+
+![Stack Visual](/content/learn/tensors/concatenating-tensors/stack-visual.png)
+
+**Example:**
+
+```python
+import torch
+
+A = torch.tensor([[1, 2], [3, 4]])  # (2, 2)
+B = torch.tensor([[5, 6], [7, 8]])  # (2, 2)
+C = torch.tensor([[9, 10], [11, 12]])  # (2, 2)
+
+# Stack creates NEW dimension
+stacked = torch.stack([A, B, C], dim=0)
+
+print(stacked.shape)  # torch.Size([3, 2, 2])
+# 3 matrices, each 2×2
+
+print(stacked)
+# tensor([[[ 1,  2],
+#          [ 3,  4]],
+#
+#         [[ 5,  6],
+#          [ 7,  8]],
+#
+#         [[ 9, 10],
+#          [11, 12]]])
+```
+
+**Key difference:**
+
+```yaml
+cat([A, B], dim=0):
+  (2, 3) + (2, 3) → (4, 3)  ← Adds to existing dimension
+  
+stack([A, B], dim=0):
+  (2, 3) + (2, 3) → (2, 2, 3)  ← Creates NEW dimension
+```
+
+**For stack, all tensors must have EXACTLY the same shape!**
+
+## Multiple Tensors at Once
+
+You can concatenate more than 2 tensors:
+
+```python
+import torch
+
+A = torch.ones(2, 3)
+B = torch.ones(1, 3) * 2
+C = torch.ones(3, 3) * 3
+
+# Concatenate all three
+result = torch.cat([A, B, C], dim=0)
+
+print(result)
+# tensor([[1., 1., 1.],
+#         [1., 1., 1.],
+#         [2., 2., 2.],
+#         [3., 3., 3.],
+#         [3., 3., 3.],
+#         [3., 3., 3.]])
+
+print(result.shape)  # torch.Size([6, 3])
+# 2 + 1 + 3 = 6 rows
+```
+
+**Breakdown:**
+
+```yaml
+A: 2 rows
+B: 1 row
+C: 3 rows
+
+Total: 2 + 1 + 3 = 6 rows
+```
+
+## Practical Examples
+
+### Example 1: Combining Train and Test Data
+
+```python
+import torch
+
+# Training data: 100 samples
+train_data = torch.randn(100, 10)
+
+# Test data: 20 samples
+test_data = torch.randn(20, 10)
+
+# Combine into full dataset
+full_data = torch.cat([train_data, test_data], dim=0)
+
+print(full_data.shape)  # torch.Size([120, 10])
+# 100 + 20 = 120 samples
+```
+
+### Example 2: Concatenating Features
+
+```python
+import torch
+
+# Original features: 5 samples, 3 features each
+original_features = torch.randn(5, 3)
+
+# New features: 5 samples, 2 new features
+new_features = torch.randn(5, 2)
+
+# Combine features horizontally
+combined = torch.cat([original_features, new_features], dim=1)
+
+print(combined.shape)  # torch.Size([5, 5])
+# 5 samples, 3 + 2 = 5 features
+```
+
+### Example 3: Creating Batches with Stack
+
+```python
+import torch
+
+# Three separate samples
+sample1 = torch.randn(28, 28)
+sample2 = torch.randn(28, 28)
+sample3 = torch.randn(28, 28)
+
+# Stack into a batch
+batch = torch.stack([sample1, sample2, sample3], dim=0)
+
+print(batch.shape)  # torch.Size([3, 28, 28])
+# 3 samples in the batch
+```
+
+### Example 4: Building Sequences
+
+```python
+import torch
+
+# Word embeddings for a sentence
+# Each word is a 100-dim vector
+word1 = torch.randn(100)
+word2 = torch.randn(100)
+word3 = torch.randn(100)
+word4 = torch.randn(100)
+
+# Stack into sequence
+sentence = torch.stack([word1, word2, word3, word4], dim=0)
+
+print(sentence.shape)  # torch.Size([4, 100])
+# 4 words, 100-dim embedding each
+```
+
+## Cat vs Stack
+
+The key difference between `cat` and `stack`:
+
+```python
+import torch
+
+A = torch.tensor([[1, 2], [3, 4]])  # (2, 2)
+B = torch.tensor([[5, 6], [7, 8]])  # (2, 2)
+
+# CAT: Joins along existing dimension
+cat_result = torch.cat([A, B], dim=0)
+print(cat_result.shape)  # torch.Size([4, 2])
+
+# STACK: Creates new dimension
+stack_result = torch.stack([A, B], dim=0)
+print(stack_result.shape)  # torch.Size([2, 2, 2])
+```
+
+**When to use which:**
+
+```yaml
+Use cat() when:
+  - Adding more samples to a batch
+  - Extending features
+  - Combining datasets
+  - Tensors can have different sizes in concat dimension
+
+Use stack() when:
+  - Creating a batch from individual samples
+  - All tensors have SAME shape
+  - Want to add a new dimension
+```
+
+## Common Gotchas
+
+### ❌ Gotcha 1: Shape Mismatch
+
+```python
+A = torch.randn(2, 3)
+B = torch.randn(2, 4)
+
+# This will ERROR!
+# torch.cat([A, B], dim=0)  # 3 ≠ 4
+```
+
+### ❌ Gotcha 2: Wrong Dimension
+
+```python
+A = torch.randn(2, 3)
+B = torch.randn(2, 3)
+
+# This will ERROR!
+# torch.cat([A, B], dim=2)  # Only dims 0 and 1 exist!
+```
+
+### ❌ Gotcha 3: Forgetting List Brackets
+
+```python
+A = torch.randn(2, 3)
+B = torch.randn(2, 3)
+
+# This will ERROR!
+# torch.cat(A, B, dim=0)  # Missing [ ]
+
+# Correct:
+torch.cat([A, B], dim=0)  # ✓
+```
+
+## Key Takeaways
+
+✓ **cat() joins along existing dimension:** Extends that dimension
+
+✓ **stack() creates new dimension:** All tensors must have same shape
+
+✓ **Other dimensions must match:** Can't concatenate incompatible shapes
+
+✓ **dim=0 is vertical:** Stacks rows (more samples)
+
+✓ **dim=1 is horizontal:** Joins columns (more features)
+
+✓ **Use list brackets:** `torch.cat([A, B, C], dim=0)`
+
+**Quick Reference:**
+
+```python
+# Concatenate (extends existing dimension)
+torch.cat([A, B], dim=0)       # Stack vertically (more rows)
+torch.cat([A, B], dim=1)       # Join horizontally (more columns)
+torch.cat([A, B, C], dim=0)    # Multiple tensors
+
+# Stack (creates new dimension)
+torch.stack([A, B], dim=0)     # New dimension at front
+torch.stack([A, B], dim=1)     # New dimension at position 1
+
+# Split (opposite of concatenate)
+torch.split(tensor, 2, dim=0)  # Split into chunks of size 2
+torch.chunk(tensor, 3, dim=0)  # Split into 3 chunks
+```
+
+**Remember:** `cat()` extends, `stack()` creates! 🎉
diff --git a/public/content/learn/tensors/concatenating-tensors/stack-visual.png b/public/content/learn/tensors/concatenating-tensors/stack-visual.png
new file mode 100644
index 0000000..61c589e
Binary files /dev/null and b/public/content/learn/tensors/concatenating-tensors/stack-visual.png differ
diff --git a/public/content/learn/tensors/creating-special-tensors/arange-linspace.png b/public/content/learn/tensors/creating-special-tensors/arange-linspace.png
new file mode 100644
index 0000000..2623e3a
Binary files /dev/null and b/public/content/learn/tensors/creating-special-tensors/arange-linspace.png differ
diff --git a/public/content/learn/tensors/creating-special-tensors/creating-special-tensors-content.md b/public/content/learn/tensors/creating-special-tensors/creating-special-tensors-content.md
new file mode 100644
index 0000000..659453a
--- /dev/null
+++ b/public/content/learn/tensors/creating-special-tensors/creating-special-tensors-content.md
@@ -0,0 +1,501 @@
+---
+hero:
+  title: "Creating Special Tensors"
+  subtitle: "Zeros, Ones, Identity Matrices and More"
+  tags:
+    - "🔢 Tensors"
+    - "⏱️ 10 min read"
+---
+
+Instead of manually typing out every value, PyTorch provides quick ways to create common tensor patterns. These are incredibly useful!
+
+## Zeros and Ones
+
+The most basic special tensors: filled with all 0s or all 1s.
+
+![Zeros and Ones](/content/learn/tensors/creating-special-tensors/zeros-ones.png)
+
+### Creating Zeros
+
+**Example:**
+
+```python
+import torch
+
+# Create 2×3 matrix of zeros
+zeros = torch.zeros(2, 3)
+
+print(zeros)
+# tensor([[0., 0., 0.],
+#         [0., 0., 0.]])
+
+print(zeros.shape)  # torch.Size([2, 3])
+```
+
+**More examples:**
+
+```python
+# 1D tensor of zeros
+torch.zeros(5)
+# tensor([0., 0., 0., 0., 0.])
+
+# 3D tensor of zeros
+torch.zeros(2, 3, 4)
+# tensor([[[0., 0., 0., 0.],
+#          [0., 0., 0., 0.],
+#          [0., 0., 0., 0.]],
+#         [[0., 0., 0., 0.],
+#          [0., 0., 0., 0.],
+#          [0., 0., 0., 0.]]])
+```
+
+### Creating Ones
+
+**Example:**
+
+```python
+import torch
+
+# Create 2×3 matrix of ones
+ones = torch.ones(2, 3)
+
+print(ones)
+# tensor([[1., 1., 1.],
+#         [1., 1., 1.]])
+
+print(ones.shape)  # torch.Size([2, 3])
+```
+
+**When to use:**
+
+```yaml
+zeros():
+  - Initialize weights to zero
+  - Create padding
+  - Initialize bias terms
+  
+ones():
+  - Create masks (all True)
+  - Initialize certain layers
+  - Multiply by constant values
+```
+
+## Identity Matrix
+
+An identity matrix has 1s on the diagonal, 0s everywhere else:
+
+![Identity Matrix](/content/learn/tensors/creating-special-tensors/identity-matrix.png)
+
+**Example:**
+
+```python
+import torch
+
+# Create 4×4 identity matrix
+identity = torch.eye(4)
+
+print(identity)
+# tensor([[1., 0., 0., 0.],
+#         [0., 1., 0., 0.],
+#         [0., 0., 1., 0.],
+#         [0., 0., 0., 1.]])
+```
+
+**Properties:**
+
+```yaml
+torch.eye(n) creates:
+  - n × n square matrix
+  - 1s on diagonal (where row = column)
+  - 0s everywhere else
+
+Special property:
+  A @ eye(n) = A  (multiplying by identity doesn't change A)
+```
+
+**More examples:**
+
+```python
+# 3×3 identity
+I = torch.eye(3)
+print(I)
+# tensor([[1., 0., 0.],
+#         [0., 1., 0.],
+#         [0., 0., 1.]])
+
+# Test the property: A @ I = A
+A = torch.randn(3, 3)
+result = A @ I
+print(torch.allclose(A, result))  # True!
+```
+
+## Random Tensors
+
+Random tensors are crucial for initializing neural network weights!
+
+![Random Tensors](/content/learn/tensors/creating-special-tensors/random-tensors.png)
+
+### torch.rand() - Uniform Distribution
+
+Creates random values **uniformly distributed between 0 and 1**:
+
+```python
+import torch
+
+# Random values in [0, 1)
+random_uniform = torch.rand(2, 3)
+
+print(random_uniform)
+# tensor([[0.2347, 0.8723, 0.4512],
+#         [0.6234, 0.1156, 0.9901]])
+
+# All values are between 0 and 1
+```
+
+**When to use:**
+
+```yaml
+Good for:
+  - Dropout masks
+  - Random sampling [0, 1)
+  - Probabilities
+```
+
+### torch.randn() - Normal Distribution
+
+Creates random values from a **normal (Gaussian) distribution** with mean 0 and standard deviation 1:
+
+```python
+import torch
+
+# Random values from normal distribution
+random_normal = torch.randn(2, 3)
+
+print(random_normal)
+# tensor([[-0.5234,  1.2301, -1.1142],
+#         [ 0.0832, -0.7329,  0.4501]])
+
+# Values can be negative or positive
+# Most values are close to 0
+```
+
+**When to use:**
+
+```yaml
+BEST for:
+  - Weight initialization (most common!)
+  - Adding noise to data
+  - Sampling from Gaussian
+```
+
+**This is the most common way to initialize neural network weights!**
+
+### torch.randint() - Random Integers
+
+Creates random **integers** in a specified range:
+
+```python
+import torch
+
+# Random integers from 0 to 9 (10 excluded)
+random_ints = torch.randint(0, 10, (2, 3))
+
+print(random_ints)
+# tensor([[3, 7, 1],
+#         [9, 2, 5]])
+
+# All values are integers between 0 and 9
+```
+
+**More examples:**
+
+```python
+# Random integers from 1 to 6 (dice roll)
+dice = torch.randint(1, 7, (10,))
+print(dice)
+# tensor([4, 2, 6, 1, 3, 5, 2, 4, 6, 1])
+
+# Random integers for class labels
+labels = torch.randint(0, 5, (100,))  # 100 labels, classes 0-4
+```
+
+## Range Tensors
+
+Create sequences of numbers automatically!
+
+![Arange and Linspace](/content/learn/tensors/creating-special-tensors/arange-linspace.png)
+
+### torch.arange() - Step by Fixed Amount
+
+Creates a sequence with a fixed step size (like Python's `range`):
+
+```python
+import torch
+
+# From 0 to 10, step by 2 (10 not included!)
+seq = torch.arange(0, 10, 2)
+
+print(seq)
+# tensor([0, 2, 4, 6, 8])
+```
+
+**More examples:**
+
+```python
+# Default start is 0, default step is 1
+torch.arange(5)
+# tensor([0, 1, 2, 3, 4])
+
+# Specify start and end
+torch.arange(3, 8)
+# tensor([3, 4, 5, 6, 7])
+
+# Use decimals
+torch.arange(0, 1, 0.2)
+# tensor([0.0000, 0.2000, 0.4000, 0.6000, 0.8000])
+```
+
+**Pattern:**
+
+```yaml
+torch.arange(start, end, step)
+  - Starts at 'start'
+  - Stops BEFORE 'end'
+  - Increments by 'step'
+```
+
+### torch.linspace() - N Evenly Spaced Values
+
+Creates N values evenly spaced between start and end:
+
+```python
+import torch
+
+# 5 values evenly spaced from 0 to 1
+seq = torch.linspace(0, 1, 5)
+
+print(seq)
+# tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])
+```
+
+**More examples:**
+
+```python
+# 10 points from -1 to 1
+torch.linspace(-1, 1, 10)
+# tensor([-1.0000, -0.7778, -0.5556, -0.3333, -0.1111,
+#          0.1111,  0.3333,  0.5556,  0.7778,  1.0000])
+
+# Great for creating x-axis for plotting
+x = torch.linspace(0, 10, 100)  # 100 points from 0 to 10
+```
+
+**Key difference:**
+
+```yaml
+arange(0, 10, 2):
+  - You specify the STEP (2)
+  - Result: [0, 2, 4, 6, 8]
+  - End NOT included
+
+linspace(0, 10, 5):
+  - You specify the COUNT (5 values)
+  - Result: [0.0, 2.5, 5.0, 7.5, 10.0]
+  - End IS included!
+```
+
+## Creating "Like" Tensors
+
+Create new tensors matching another tensor's shape:
+
+![Like Tensors](/content/learn/tensors/creating-special-tensors/like-tensors.png)
+
+**Example:**
+
+```python
+import torch
+
+# Original tensor
+x = torch.tensor([[1, 2, 3],
+                  [4, 5, 6]])
+
+# Create zeros with same shape
+zeros = torch.zeros_like(x)
+print(zeros)
+# tensor([[0, 0, 0],
+#         [0, 0, 0]])
+
+# Create ones with same shape
+ones = torch.ones_like(x)
+print(ones)
+# tensor([[1, 1, 1],
+#         [1, 1, 1]])
+
+# Create random with same shape
+random = torch.randn_like(x.float())  # Must be float for randn
+print(random.shape)  # torch.Size([2, 3])
+```
+
+**When to use:**
+
+```yaml
+zeros_like():
+  - Reset gradients
+  - Create zero-initialized tensors matching input
+
+ones_like():
+  - Create masks
+  - Initialize to constant
+
+randn_like():
+  - Add noise matching shape
+  - Initialize weights
+```
+
+## Practical Examples
+
+### Example 1: Weight Initialization
+
+```python
+import torch
+
+# Input dimension: 784 (28×28 image flattened)
+# Output dimension: 10 (10 classes)
+input_dim = 784
+output_dim = 10
+
+# Initialize weights with small random values
+weights = torch.randn(input_dim, output_dim) * 0.01
+
+# Initialize bias to zeros
+bias = torch.zeros(output_dim)
+
+print(f"Weights shape: {weights.shape}")  # (784, 10)
+print(f"Bias shape: {bias.shape}")        # (10,)
+```
+
+### Example 2: Creating a Mask
+
+```python
+import torch
+
+# Data batch
+data = torch.randn(5, 10)
+
+# Create mask: first 3 samples are valid, last 2 are padding
+mask = torch.zeros(5, dtype=torch.bool)
+mask[:3] = True
+
+print(mask)
+# tensor([ True,  True,  True, False, False])
+
+# Apply mask
+valid_data = data[mask]
+print(valid_data.shape)  # torch.Size([3, 10])
+```
+
+### Example 3: Creating Training Data
+
+```python
+import torch
+
+batch_size = 32
+sequence_length = 50
+embedding_dim = 128
+
+# Input sequences (random for demo)
+inputs = torch.randn(batch_size, sequence_length, embedding_dim)
+
+# Labels (random class indices)
+labels = torch.randint(0, 10, (batch_size,))
+
+# Attention mask (all ones = all valid)
+attention_mask = torch.ones(batch_size, sequence_length)
+
+print(f"Inputs: {inputs.shape}")           # (32, 50, 128)
+print(f"Labels: {labels.shape}")           # (32,)
+print(f"Mask: {attention_mask.shape}")     # (32, 50)
+```
+
+## Full vs Empty
+
+Create tensors without initializing values (faster but contains garbage):
+
+```python
+import torch
+
+# Create empty tensor (uninitialized - garbage values)
+empty = torch.empty(2, 3)
+print(empty)
+# tensor([[3.6893e+19, 1.5414e-19, 3.0818e-41],
+#         [0.0000e+00, 0.0000e+00, 0.0000e+00]])
+# Random garbage values!
+
+# Create full tensor (fill with specific value)
+sevens = torch.full((2, 3), 7)
+print(sevens)
+# tensor([[7, 7, 7],
+#         [7, 7, 7]])
+```
+
+**When to use empty:**
+
+```yaml
+torch.empty():
+  - When you'll immediately overwrite all values
+  - Slightly faster than zeros/ones
+  - WARNING: Contains random garbage!
+  
+torch.full():
+  - Fill with any constant value
+  - Like ones() but more flexible
+```
+
+## Key Takeaways
+
+✓ **zeros() and ones():** All 0s or all 1s
+
+✓ **eye():** Identity matrix (diagonal 1s)
+
+✓ **rand():** Random [0, 1) uniform
+
+✓ **randn():** Random normal distribution (best for weights!)
+
+✓ **randint():** Random integers
+
+✓ **arange():** Sequence with step (end excluded)
+
+✓ **linspace():** N evenly spaced values (end included)
+
+✓ **_like():** Match another tensor's shape
+
+**Quick Reference:**
+
+```python
+# Zeros and ones
+torch.zeros(3, 4)              # 3×4 matrix of zeros
+torch.ones(2, 5)               # 2×5 matrix of ones
+
+# Identity
+torch.eye(5)                   # 5×5 identity matrix
+
+# Random
+torch.rand(3, 3)               # Uniform [0, 1)
+torch.randn(3, 3)              # Normal (μ=0, σ=1)
+torch.randint(0, 10, (3, 3))   # Random integers [0, 10)
+
+# Sequences
+torch.arange(0, 10, 2)         # [0, 2, 4, 6, 8]
+torch.linspace(0, 1, 5)        # [0.00, 0.25, 0.50, 0.75, 1.00]
+
+# Like another tensor
+x = torch.randn(2, 3)
+torch.zeros_like(x)            # Zeros with shape (2, 3)
+torch.ones_like(x)             # Ones with shape (2, 3)
+torch.randn_like(x)            # Random with shape (2, 3)
+
+# Fill with value
+torch.full((2, 3), 7)          # All 7s
+```
+
+**Remember:** Use `torch.randn()` for weight initialization - it's the standard! 🎉
diff --git a/public/content/learn/tensors/creating-special-tensors/identity-matrix.png b/public/content/learn/tensors/creating-special-tensors/identity-matrix.png
new file mode 100644
index 0000000..2629523
Binary files /dev/null and b/public/content/learn/tensors/creating-special-tensors/identity-matrix.png differ
diff --git a/public/content/learn/tensors/creating-special-tensors/like-tensors.png b/public/content/learn/tensors/creating-special-tensors/like-tensors.png
new file mode 100644
index 0000000..ad509b3
Binary files /dev/null and b/public/content/learn/tensors/creating-special-tensors/like-tensors.png differ
diff --git a/public/content/learn/tensors/creating-special-tensors/random-tensors.png b/public/content/learn/tensors/creating-special-tensors/random-tensors.png
new file mode 100644
index 0000000..6a87dc0
Binary files /dev/null and b/public/content/learn/tensors/creating-special-tensors/random-tensors.png differ
diff --git a/public/content/learn/tensors/creating-special-tensors/zeros-ones.png b/public/content/learn/tensors/creating-special-tensors/zeros-ones.png
new file mode 100644
index 0000000..fe84c6c
Binary files /dev/null and b/public/content/learn/tensors/creating-special-tensors/zeros-ones.png differ
diff --git a/public/content/learn/tensors/creating-tensors/3d-tensor.png b/public/content/learn/tensors/creating-tensors/3d-tensor.png
new file mode 100644
index 0000000..e3c1d2d
Binary files /dev/null and b/public/content/learn/tensors/creating-tensors/3d-tensor.png differ
diff --git a/public/content/learn/tensors/creating-tensors/creating-from-data.png b/public/content/learn/tensors/creating-tensors/creating-from-data.png
new file mode 100644
index 0000000..2ea1afa
Binary files /dev/null and b/public/content/learn/tensors/creating-tensors/creating-from-data.png differ
diff --git a/public/content/learn/tensors/creating-tensors/creating-tensors-content.md b/public/content/learn/tensors/creating-tensors/creating-tensors-content.md
new file mode 100644
index 0000000..738133d
--- /dev/null
+++ b/public/content/learn/tensors/creating-tensors/creating-tensors-content.md
@@ -0,0 +1,703 @@
+---
+hero:
+  title: "Creating Tensors"
+  subtitle: "Building Blocks of Deep Learning"
+  tags:
+    - "🔢 Tensors"
+    - "⏱️ 15 min read"
+---
+
+Tensors are the fundamental data structure in deep learning. Everything you work with in neural networks - images, text, audio, weights, gradients - is represented as tensors.
+
+## What is a Tensor?
+
+A **tensor** is a multi-dimensional array of numbers. Think of it as a container that can hold data in different dimensions:
+
+- **0D Tensor (Scalar)**: A single number → `5`
+- **1D Tensor (Vector)**: An array of numbers → `[1, 2, 3, 4]`
+- **2D Tensor (Matrix)**: A table of numbers → `[[1, 2], [3, 4], [5, 6]]`
+- **3D+ Tensor**: Multiple matrices stacked together → `[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]`
+
+Let me show you exactly what these look like:
+
+**0D Tensor (Scalar)** - Just a number, no brackets needed:
+```
+5
+```
+
+**1D Tensor (Vector)** - One set of brackets `[ ]`:
+```
+[1, 2, 3, 4, 5]
+```
+
+**2D Tensor (Matrix)** - Two sets of brackets `[[ ]]`, one for each row:
+```
+[[1, 2, 3],
+ [4, 5, 6],
+ [7, 8, 9]]
+```
+
+**3D Tensor** - Three sets of brackets `[[[ ]]]`, multiple matrices:
+```
+[[[1, 2],     [[[5, 6],
+  [3, 4]],      [7, 8]]]
+```
+
+In PyTorch and other deep learning frameworks, tensors are similar to NumPy arrays but with superpowers - they can run on GPUs and automatically compute gradients!
+
+## The Bracket Rule: How to Count Dimensions
+
+**Simple Rule:** Count the number of opening brackets `[` at the start of your data!
+
+**Examples:**
+
+```python
+# 0D Tensor (Scalar) - NO brackets
+5                    # 0 dimensions
+
+# 1D Tensor (Vector) - ONE opening bracket [
+[1, 2, 3]            # 1 dimension
+
+# 2D Tensor (Matrix) - TWO opening brackets [[
+[[1, 2],             # 2 dimensions
+ [3, 4]]
+
+# 3D Tensor - THREE opening brackets [[[
+[[[1, 2],            # 3 dimensions
+  [3, 4]],
+ [[5, 6],
+  [7, 8]]]
+```
+
+**Pro Tip:** When you create a tensor, look at the left edge of your data. Count the `[` symbols stacked up - that's your number of dimensions!
+
+```python
+import torch
+
+# Let's verify this rule
+scalar = torch.tensor(5)                    # 0 brackets → ndim = 0
+print(scalar.ndim)  # Output: 0
+
+vector = torch.tensor([1, 2, 3])            # 1 bracket → ndim = 1
+print(vector.ndim)  # Output: 1
+
+matrix = torch.tensor([[1, 2], [3, 4]])     # 2 brackets → ndim = 2
+print(matrix.ndim)  # Output: 2
+
+tensor_3d = torch.tensor([[[1, 2]], [[3, 4]]])  # 3 brackets → ndim = 3
+print(tensor_3d.ndim)  # Output: 3
+```
+
+![Tensor Dimensions](/content/learn/tensors/creating-tensors/tensor-dimensions.png)
+
+## Understanding Tensor Dimensions
+
+### 0D Tensor (Scalar)
+
+A scalar is just a single number.
+
+![Scalar Tensor](/content/learn/tensors/creating-tensors/scalar-tensor.png)
+
+**Example:**
+
+```python
+import torch
+
+# Creating a scalar tensor
+scalar = torch.tensor(5)
+
+print(scalar)           # Output: tensor(5)
+print(scalar.shape)     # Output: torch.Size([])
+print(scalar.ndim)      # Output: 0 (zero dimensions)
+```
+
+**What happens here?**
+
+When you write `torch.tensor(5)`:
+1. You pass the number `5` to PyTorch
+2. PyTorch creates a tensor object that holds this single value
+3. The shape is `[]` (empty brackets) because there are no dimensions
+4. `ndim` is `0` because it's just a single number, not an array
+
+Think of it like putting a single marble in a special container - the marble is your number `5`, and the container is the tensor.
+
+**Real-world use:** Learning rate, loss value, accuracy score
+
+**More Examples:**
+
+```python
+temperature = torch.tensor(36.5)     # Body temperature
+score = torch.tensor(95)             # Test score  
+
+print(temperature.ndim)  # Output: 0
+print(score.ndim)        # Output: 0
+```
+
+### 1D Tensor (Vector)
+
+A vector is an array of numbers, like a list.
+
+![Vector Tensor](/content/learn/tensors/creating-tensors/vector-tensor.png)
+
+**Example 1:** Simple vector
+
+```python
+import torch
+
+# Creating a 1D tensor (vector)
+vector = torch.tensor([1, 2, 3, 4, 5])
+
+print(vector)           # Output: tensor([1, 2, 3, 4, 5])
+print(vector.shape)     # Output: torch.Size([5])
+print(vector.ndim)      # Output: 1
+```
+
+**What happens here?**
+
+When you write `torch.tensor([1, 2, 3, 4, 5])`:
+1. You pass a **Python list** (notice the square brackets `[ ]`) to PyTorch
+2. PyTorch sees the list has 5 numbers
+3. It creates a 1D tensor with 5 elements in a row
+4. The shape is `[5]` meaning "one dimension with 5 elements"
+5. `ndim` is `1` because there's one dimension (length)
+
+**Visual breakdown of the brackets:**
+```python
+[1, 2, 3, 4, 5]
+↑             ↑
+One opening and one closing bracket = 1D tensor
+```
+
+**Think of it like:** A row of 5 boxes, each holding one number.
+
+**Example 2:** Accessing elements
+
+```python
+vector = torch.tensor([10, 20, 30, 40, 50])
+
+# Access individual elements (0-indexed)
+print(vector[0])        # Output: tensor(10)
+print(vector[2])        # Output: tensor(30)
+print(vector[-1])       # Output: tensor(50) (last element)
+
+# Access a slice
+print(vector[1:4])      # Output: tensor([20, 30, 40])
+```
+
+**Real-world use:** Word embeddings, feature vectors, time series data
+
+### 2D Tensor (Matrix)
+
+A matrix is a table of numbers with rows and columns.
+
+![Matrix Tensor](/content/learn/tensors/creating-tensors/matrix-tensor.png)
+
+**Example 1:** Creating a matrix
+
+```python
+import torch
+
+# Creating a 2D tensor (matrix)
+matrix = torch.tensor([[1, 2, 3, 4],
+                       [5, 6, 7, 8],
+                       [9, 10, 11, 12]])
+
+print(matrix)
+# Output: 
+# tensor([[ 1,  2,  3,  4],
+#         [ 5,  6,  7,  8],
+#         [ 9, 10, 11, 12]])
+
+print(matrix.shape)     # Output: torch.Size([3, 4])
+                        # 3 rows, 4 columns
+print(matrix.ndim)      # Output: 2
+```
+
+**What happens here?**
+
+When you write `torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])`:
+1. You pass a **nested Python list** (list inside a list!)
+2. The outer brackets `[ ]` represent the matrix itself
+3. Each inner bracket `[ ]` represents one row
+4. PyTorch counts: 3 inner lists = 3 rows, each has 4 numbers = 4 columns
+5. The shape is `[3, 4]` meaning "3 rows, 4 columns"
+6. `ndim` is `2` because there are two dimensions (rows and columns)
+
+**Visual breakdown of the brackets:**
+```python
+[[1, 2, 3, 4],    ← Row 0 (first row)
+ [5, 6, 7, 8],    ← Row 1 (second row)  
+ [9, 10, 11, 12]] ← Row 2 (third row)
+↑↑          ↑ ↑
+││          │ └─ Inner closing bracket (end of row)
+│└──────────┴─── Outer opening bracket (start of matrix)
+└────────────────Outer closing bracket (end of matrix)
+
+Two levels of brackets = 2D tensor
+```
+
+**Think of it like:** A table with 3 rows and 4 columns, like a spreadsheet.
+
+**Remember:** Shape is always `[ROWS, COLUMNS]`
+
+**Example 2:** Accessing rows and columns
+
+```python
+matrix = torch.tensor([[1, 2, 3],
+                       [4, 5, 6],
+                       [7, 8, 9]])
+
+# Access a single element [row, column]
+print(matrix[0, 0])     # Output: tensor(1)
+print(matrix[1, 2])     # Output: tensor(6)
+print(matrix[2, 1])     # Output: tensor(8)
+
+# Access entire row
+print(matrix[0])        # Output: tensor([1, 2, 3])
+print(matrix[1])        # Output: tensor([4, 5, 6])
+
+# Access entire column
+print(matrix[:, 0])     # Output: tensor([1, 4, 7])
+print(matrix[:, 1])     # Output: tensor([2, 5, 8])
+```
+
+**Real-world use:** Grayscale images, batch of word embeddings, weight matrices
+
+### 3D Tensor
+
+A 3D tensor is multiple matrices stacked together. Think of it as a cube of numbers.
+
+![3D Tensor](/content/learn/tensors/creating-tensors/3d-tensor.png)
+
+**Example 1:** Creating a 3D tensor
+
+```python
+import torch
+
+# Creating a 3D tensor (2 matrices, each 3x4)
+tensor_3d = torch.tensor([[[1, 2, 3, 4],
+                           [5, 6, 7, 8],
+                           [9, 10, 11, 12]],
+                          
+                          [[13, 14, 15, 16],
+                           [17, 18, 19, 20],
+                           [21, 22, 23, 24]]])
+
+print(tensor_3d.shape)  # Output: torch.Size([2, 3, 4])
+                        # 2 matrices, each with 3 rows and 4 columns
+print(tensor_3d.ndim)   # Output: 3
+```
+
+**What happens here?**
+
+When you write `torch.tensor([[[...], [...]], [[...], [...]]])`:
+1. You have **three levels of nested lists** (lists inside lists inside lists!)
+2. The outermost brackets `[ ]` represent the whole 3D tensor
+3. Each middle-level bracket `[ ]` represents one matrix
+4. Each innermost bracket `[ ]` represents one row in a matrix
+5. PyTorch counts: 2 middle lists = 2 matrices, each has 3 inner lists = 3 rows, each row has 4 numbers = 4 columns
+6. The shape is `[2, 3, 4]` meaning "2 matrices, each 3 rows × 4 columns"
+7. `ndim` is `3` because there are three dimensions
+
+**Visual breakdown of the brackets:**
+```python
+[  ← Outermost opening (start of 3D tensor)
+  [  ← First matrix opening
+    [1, 2, 3, 4],     ← Row 0 of matrix 0
+    [5, 6, 7, 8],     ← Row 1 of matrix 0
+    [9, 10, 11, 12]   ← Row 2 of matrix 0
+  ],  ← First matrix closing
+  
+  [  ← Second matrix opening
+    [13, 14, 15, 16],  ← Row 0 of matrix 1
+    [17, 18, 19, 20],  ← Row 1 of matrix 1
+    [21, 22, 23, 24]   ← Row 2 of matrix 1
+  ]  ← Second matrix closing
+]  ← Outermost closing (end of 3D tensor)
+
+Three levels of brackets = 3D tensor
+```
+
+**Think of it like:** A stack of 2 pages, where each page is a table (matrix) with 3 rows and 4 columns.
+
+**Understanding shape (2, 3, 4):**
+
+- **First dimension (2)**: Number of matrices (or "depth")
+- **Second dimension (3)**: Number of rows in each matrix
+- **Third dimension (4)**: Number of columns in each matrix
+
+```python
+# Access the first matrix
+print(tensor_3d[0])
+# Output:
+# tensor([[ 1,  2,  3,  4],
+#         [ 5,  6,  7,  8],
+#         [ 9, 10, 11, 12]])
+
+# Access the second matrix
+print(tensor_3d[1])
+# Output:
+# tensor([[13, 14, 15, 16],
+#         [17, 18, 19, 20],
+#         [21, 22, 23, 24]])
+
+# Access specific element [matrix, row, column]
+print(tensor_3d[0, 1, 2])  # Output: tensor(7)
+print(tensor_3d[1, 2, 3])  # Output: tensor(24)
+```
+
+**Real-world use:** RGB images (height, width, 3 color channels), video frames, batch of images
+
+## Creating Tensors from Different Data Types
+
+PyTorch provides multiple ways to create tensors from existing data.
+
+![Creating from Data](/content/learn/tensors/creating-tensors/creating-from-data.png)
+
+### From Python Lists
+
+**Example 1:** 1D tensor from list
+
+```python
+import torch
+
+# Create from Python list
+python_list = [1, 2, 3, 4, 5]
+tensor = torch.tensor(python_list)
+
+print(tensor)           # Output: tensor([1, 2, 3, 4, 5])
+print(type(tensor))     # Output: <class 'torch.Tensor'>
+```
+
+**Example 2:** 2D tensor from nested lists
+
+```python
+# Create 2D tensor from nested list
+nested_list = [[1, 2, 3],
+               [4, 5, 6],
+               [7, 8, 9]]
+
+tensor_2d = torch.tensor(nested_list)
+
+print(tensor_2d)
+# Output:
+# tensor([[1, 2, 3],
+#         [4, 5, 6],
+#         [7, 8, 9]])
+
+print(tensor_2d.shape)  # Output: torch.Size([3, 3])
+```
+
+**Example 3:** 3D tensor from deeply nested lists
+
+```python
+# Create 3D tensor (2 matrices, each 2x3)
+deep_list = [[[1, 2, 3],
+              [4, 5, 6]],
+             
+             [[7, 8, 9],
+              [10, 11, 12]]]
+
+tensor_3d = torch.tensor(deep_list)
+
+print(tensor_3d.shape)  # Output: torch.Size([2, 2, 3])
+```
+
+### From NumPy Arrays
+
+If you're working with NumPy arrays, you can easily convert them to tensors.
+
+**Example 1:** Converting NumPy array to tensor
+
+```python
+import torch
+import numpy as np
+
+# Create NumPy array
+np_array = np.array([1, 2, 3, 4, 5])
+
+# Convert to PyTorch tensor
+tensor = torch.from_numpy(np_array)
+
+print(np_array)         # Output: [1 2 3 4 5]
+print(tensor)           # Output: tensor([1, 2, 3, 4, 5])
+```
+
+**Example 2:** 2D NumPy array to tensor
+
+```python
+# Create 2D NumPy array
+np_matrix = np.array([[1, 2, 3],
+                      [4, 5, 6]])
+
+# Convert to tensor
+tensor_from_np = torch.from_numpy(np_matrix)
+
+print(tensor_from_np)
+# Output:
+# tensor([[1, 2, 3],
+#         [4, 5, 6]])
+
+print(tensor_from_np.shape)  # Output: torch.Size([2, 3])
+```
+
+**Important Note:** `torch.from_numpy()` shares memory with the original NumPy array, so changes to one affect the other!
+
+```python
+np_array = np.array([1, 2, 3])
+tensor = torch.from_numpy(np_array)
+
+# Modify NumPy array
+np_array[0] = 999
+
+print(np_array)         # Output: [999   2   3]
+print(tensor)           # Output: tensor([999,   2,   3])
+# They share memory!
+```
+
+### From Other Tensors
+
+**Example:** Creating a new tensor with the same shape
+
+```python
+# Create original tensor
+x = torch.tensor([[1, 2],
+                  [3, 4]])
+
+# Create new tensor with same shape (but different values)
+y = torch.tensor([[5, 6],
+                  [7, 8]])
+
+print(x.shape)          # Output: torch.Size([2, 2])
+print(y.shape)          # Output: torch.Size([2, 2])
+```
+
+## Specifying Data Types
+
+Tensors can hold different types of numbers. Choosing the right data type is important for memory efficiency and computation speed.
+
+![Data Types](/content/learn/tensors/creating-tensors/data-types.png)
+
+### Common Data Types
+
+- `torch.int32` or `torch.int`: 32-bit integers (4 bytes per number)
+- `torch.int64` or `torch.long`: 64-bit integers (8 bytes per number)
+- `torch.float32` or `torch.float`: 32-bit floating point (4 bytes per number) **[Most Common]**
+- `torch.float64` or `torch.double`: 64-bit floating point (8 bytes per number)
+- `torch.bool`: Boolean values (True/False)
+
+**Example 1:** Creating tensors with specific data types
+
+```python
+import torch
+
+# Integer tensor (int32)
+int_tensor = torch.tensor([1, 2, 3], dtype=torch.int32)
+print(int_tensor)       # Output: tensor([1, 2, 3], dtype=torch.int32)
+print(int_tensor.dtype) # Output: torch.int32
+
+# Float tensor (float32) - Most common for neural networks!
+float_tensor = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float32)
+print(float_tensor)     # Output: tensor([1., 2., 3.])
+print(float_tensor.dtype)  # Output: torch.float32
+
+# Boolean tensor
+bool_tensor = torch.tensor([True, False, True], dtype=torch.bool)
+print(bool_tensor)      # Output: tensor([ True, False,  True])
+print(bool_tensor.dtype)   # Output: torch.bool
+```
+
+**Example 2:** Default data type behavior
+
+```python
+# PyTorch infers the data type from your input
+
+# Integers → int64 (by default)
+x = torch.tensor([1, 2, 3])
+print(x.dtype)          # Output: torch.int64
+
+# Floats → float32 (by default)
+y = torch.tensor([1.0, 2.0, 3.0])
+print(y.dtype)          # Output: torch.float32
+
+# Mixed integers and floats → float32
+z = torch.tensor([1, 2.0, 3])
+print(z)                # Output: tensor([1., 2., 3.])
+print(z.dtype)          # Output: torch.float32
+```
+
+**Example 3:** Converting between data types
+
+```python
+# Create integer tensor
+int_tensor = torch.tensor([1, 2, 3])
+print(int_tensor.dtype)  # Output: torch.int64
+
+# Convert to float
+float_tensor = int_tensor.float()
+print(float_tensor)      # Output: tensor([1., 2., 3.])
+print(float_tensor.dtype)  # Output: torch.float32
+```
+
+**Example 4:** Why data type matters
+
+```python
+# Memory usage comparison
+large_int64 = torch.ones(1000000, dtype=torch.int64)
+large_int32 = torch.ones(1000000, dtype=torch.int32)
+
+print(f"int64 tensor: {large_int64.element_size() * large_int64.nelement() / 1e6} MB")
+# Output: 8.0 MB (8 bytes per element)
+
+print(f"int32 tensor: {large_int32.element_size() * large_int32.nelement() / 1e6} MB")
+# Output: 4.0 MB (4 bytes per element)
+
+# int32 uses half the memory!
+```
+
+## Practical Examples
+
+### Example 1: Creating a Batch of Data
+
+In deep learning, we often process multiple examples at once (a "batch").
+
+```python
+import torch
+
+# Create 3 examples, each with 3 features
+# Example: [height, weight, age]
+batch = torch.tensor([[170, 65, 25],
+                      [180, 80, 30],
+                      [165, 55, 22]], 
+                     dtype=torch.float32)
+
+print("Batch shape:", batch.shape)
+# Output: Batch shape: torch.Size([3, 3])
+# 3 people, 3 features each
+
+# Access all heights (first column)
+all_heights = batch[:, 0]
+print(f"Heights: {all_heights}")
+# Output: Heights: tensor([170., 180., 165.])
+
+print(f"Average height: {all_heights.mean():.1f}cm")
+# Output: Average height: 171.7cm
+```
+
+### Example 2: Creating RGB Image Data
+
+A tiny 2x2 RGB color image (3 color channels).
+
+```python
+import torch
+
+# Define a 2x2 RGB image
+# Each pixel has [Red, Green, Blue] values
+image_rgb = [
+    [[255, 0, 0],    [0, 255, 0]],    # Red, Green pixels
+    [[0, 0, 255],    [255, 255, 0]]   # Blue, Yellow pixels
+]
+
+rgb_tensor = torch.tensor(image_rgb, dtype=torch.float32)
+
+print("Shape:", rgb_tensor.shape)
+# Output: Shape: torch.Size([2, 2, 3])
+# 2 height, 2 width, 3 color channels
+
+# Access the red channel of all pixels
+red_channel = rgb_tensor[:, :, 0]
+print(f"Red channel:\n{red_channel}")
+# Output:
+# tensor([[255.,   0.],
+#         [  0., 255.]])
+```
+
+## Common Mistakes and How to Fix Them
+
+### Mistake 1: Shape Mismatch
+
+```python
+# ❌ Wrong: Inconsistent row lengths
+try:
+    wrong_tensor = torch.tensor([[1, 2, 3],
+                                 [4, 5]])  # Second row too short!
+except:
+    print("Error: All rows must have the same length")
+
+# ✅ Correct: All rows same length
+correct_tensor = torch.tensor([[1, 2, 3],
+                               [4, 5, 6]])
+print(correct_tensor.shape)  # Output: torch.Size([2, 3])
+```
+
+### Mistake 2: Forgetting Dimension Order
+
+```python
+# For images, be careful about dimension order!
+
+# ❌ Wrong order: (channels, height, width)
+# This might cause errors in some operations
+wrong_order = torch.rand(3, 224, 224)  
+
+# ✅ PyTorch usually expects: (batch, channels, height, width)
+correct_batch = torch.rand(1, 3, 224, 224)  # 1 image, 3 channels, 224x224
+
+# ✅ For a single image: (channels, height, width)
+single_image = torch.rand(3, 224, 224)
+```
+
+## Quick Reference
+
+### Creating Tensors
+
+```python
+# From list
+torch.tensor([1, 2, 3])
+
+# From NumPy
+torch.from_numpy(np_array)
+
+# With specific dtype
+torch.tensor([1, 2], dtype=torch.float32)
+```
+
+### Checking Tensor Properties
+
+```python
+tensor = torch.tensor([[1, 2], [3, 4]])
+
+tensor.shape      # Shape: torch.Size([2, 2])
+tensor.size()     # Same as .shape
+tensor.ndim       # Number of dimensions: 2
+tensor.dtype      # Data type: torch.int64
+tensor.numel()    # Total number of elements: 4
+```
+
+### Data Type Conversion
+
+```python
+tensor.float()    # Convert to float32
+tensor.int()      # Convert to int32
+tensor.long()     # Convert to int64
+tensor.double()   # Convert to float64
+tensor.bool()     # Convert to boolean
+```
+
+## Why Tensors Matter for Neural Networks
+
+- **Images**: RGB images are 3D tensors (height × width × 3 channels)
+- **Batches**: Neural networks process multiple examples at once (batch dimension)
+- **Text**: Word embeddings are 2D tensors (sequence length × embedding dimension)
+- **Weights**: Model parameters are tensors that get updated during training
+
+**Example: A batch of images**
+```python
+# Shape: (batch_size, channels, height, width)
+batch_of_images = torch.rand(32, 3, 224, 224)
+# 32 images, 3 color channels (RGB), 224×224 pixels
+
+print(f"Batch shape: {batch_of_images.shape}")
+# Output: Batch shape: torch.Size([32, 3, 224, 224])
+```
+
+**Congratulations! You now understand how to create and work with tensors!** 🎉
diff --git a/public/content/learn/tensors/creating-tensors/data-types.png b/public/content/learn/tensors/creating-tensors/data-types.png
new file mode 100644
index 0000000..2c3450e
Binary files /dev/null and b/public/content/learn/tensors/creating-tensors/data-types.png differ
diff --git a/public/content/learn/tensors/creating-tensors/matrix-tensor.png b/public/content/learn/tensors/creating-tensors/matrix-tensor.png
new file mode 100644
index 0000000..6a974d2
Binary files /dev/null and b/public/content/learn/tensors/creating-tensors/matrix-tensor.png differ
diff --git a/public/content/learn/tensors/creating-tensors/scalar-tensor.png b/public/content/learn/tensors/creating-tensors/scalar-tensor.png
new file mode 100644
index 0000000..f7adbf4
Binary files /dev/null and b/public/content/learn/tensors/creating-tensors/scalar-tensor.png differ
diff --git a/public/content/learn/tensors/creating-tensors/tensor-dimensions.png b/public/content/learn/tensors/creating-tensors/tensor-dimensions.png
new file mode 100644
index 0000000..0683845
Binary files /dev/null and b/public/content/learn/tensors/creating-tensors/tensor-dimensions.png differ
diff --git a/public/content/learn/tensors/creating-tensors/vector-tensor.png b/public/content/learn/tensors/creating-tensors/vector-tensor.png
new file mode 100644
index 0000000..d6ab661
Binary files /dev/null and b/public/content/learn/tensors/creating-tensors/vector-tensor.png differ
diff --git a/public/content/learn/tensors/indexing-and-slicing/basic-indexing.png b/public/content/learn/tensors/indexing-and-slicing/basic-indexing.png
new file mode 100644
index 0000000..df956b1
Binary files /dev/null and b/public/content/learn/tensors/indexing-and-slicing/basic-indexing.png differ
diff --git a/public/content/learn/tensors/indexing-and-slicing/indexing-and-slicing-content.md b/public/content/learn/tensors/indexing-and-slicing/indexing-and-slicing-content.md
new file mode 100644
index 0000000..a0d970a
--- /dev/null
+++ b/public/content/learn/tensors/indexing-and-slicing/indexing-and-slicing-content.md
@@ -0,0 +1,500 @@
+---
+hero:
+  title: "Indexing and Slicing"
+  subtitle: "Accessing and Extracting Tensor Elements"
+  tags:
+    - "🔢 Tensors"
+    - "⏱️ 10 min read"
+---
+
+Indexing and slicing let you access and extract specific parts of tensors. Think of it like selecting specific pages from a book or specific rows from a spreadsheet!
+
+## The Basics: Indexing Starts at 0
+
+**Important:** In Python and PyTorch, counting starts at **0**, not 1!
+
+![Basic Indexing](/content/learn/tensors/indexing-and-slicing/basic-indexing.png)
+
+**Example:**
+
+```python
+import torch
+
+v = torch.tensor([10, 20, 30, 40, 50])
+
+print(v[0])  # Output: tensor(10)  ← First element
+print(v[2])  # Output: tensor(30)  ← Third element
+print(v[4])  # Output: tensor(50)  ← Fifth element
+```
+
+**Manual breakdown:**
+
+```yaml
+v = [10, 20, 30, 40, 50]
+     ↑   ↑   ↑   ↑   ↑
+    [0] [1] [2] [3] [4]
+
+v[0] → 10
+v[1] → 20
+v[2] → 30
+```
+
+**Key rule:** First element is `[0]`, second is `[1]`, third is `[2]`, and so on!
+
+## Negative Indexing
+
+You can count **backwards from the end** using negative indices:
+
+![Negative Indexing](/content/learn/tensors/indexing-and-slicing/negative-indexing.png)
+
+**Example:**
+
+```python
+import torch
+
+v = torch.tensor([10, 20, 30, 40, 50])
+
+print(v[-1])  # Output: tensor(50)  ← Last element
+print(v[-2])  # Output: tensor(40)  ← Second from end
+print(v[-5])  # Output: tensor(10)  ← Fifth from end (first!)
+```
+
+**How it works:**
+
+```yaml
+Positive:  [0]  [1]  [2]  [3]  [4]
+Values:     10   20   30   40   50
+Negative: [-5] [-4] [-3] [-2] [-1]
+
+v[-1] = 50  (last)
+v[-2] = 40  (second from last)
+v[-3] = 30  (third from last)
+```
+
+**Useful trick:** `v[-1]` always gets the last element, no matter the size!
+
+## Matrix Indexing (2D)
+
+For matrices, use `[row, column]`:
+
+![Matrix Indexing](/content/learn/tensors/indexing-and-slicing/matrix-indexing.png)
+
+**Example:**
+
+```python
+import torch
+
+A = torch.tensor([[10, 20, 30, 40],
+                  [50, 60, 70, 80],
+                  [90, 100, 110, 120]])
+
+print(A[0, 0])    # Output: tensor(10)   ← Top-left
+print(A[1, 2])    # Output: tensor(70)   ← Row 1, Col 2
+print(A[2, 3])    # Output: tensor(120)  ← Bottom-right
+print(A[-1, -1])  # Output: tensor(120)  ← Also bottom-right!
+```
+
+**Manual breakdown:**
+
+```yaml
+         Col 0  Col 1  Col 2  Col 3
+Row 0:    10     20     30     40
+Row 1:    50     60     70     80
+Row 2:    90    100    110    120
+
+A[1, 2] → Row 1, Column 2 → 70
+A[0, 3] → Row 0, Column 3 → 40
+```
+
+**Pattern:** `[row, column]` always - row first, column second!
+
+## Slicing: Getting Multiple Elements
+
+Slicing uses the syntax `[start:end]` where **end is NOT included**!
+
+![Slicing Basics](/content/learn/tensors/indexing-and-slicing/slicing-basics.png)
+
+**Example:**
+
+```python
+import torch
+
+v = torch.tensor([10, 20, 30, 40, 50, 60])
+
+print(v[1:4])    # Output: tensor([20, 30, 40])
+print(v[0:3])    # Output: tensor([10, 20, 30])
+print(v[3:6])    # Output: tensor([40, 50, 60])
+```
+
+**Manual breakdown:**
+
+```yaml
+v = [10, 20, 30, 40, 50, 60]
+     [0] [1] [2] [3] [4] [5]
+
+v[1:4] gets indices: 1, 2, 3  (stops BEFORE 4)
+       →  [20, 30, 40]
+
+v[0:3] gets indices: 0, 1, 2
+       →  [10, 20, 30]
+```
+
+**Critical:** `v[1:4]` gets elements at positions 1, 2, and 3. It does NOT include position 4!
+
+## Slicing Shortcuts
+
+You can omit start or end:
+
+```python
+import torch
+
+v = torch.tensor([10, 20, 30, 40, 50, 60])
+
+print(v[:3])     # Output: tensor([10, 20, 30])  ← From start to 3
+print(v[3:])     # Output: tensor([40, 50, 60])  ← From 3 to end
+print(v[:])      # Output: tensor([10, 20, 30, 40, 50, 60])  ← Everything!
+```
+
+**What they mean:**
+
+```yaml
+v[:3]  → v[0:3]  → Start at 0, stop before 3
+v[3:]  → v[3:6]  → Start at 3, go to end
+v[:]   → v[0:6]  → All elements (copy)
+```
+
+## Matrix Slicing
+
+Slicing works in 2D too!
+
+![Matrix Slicing](/content/learn/tensors/indexing-and-slicing/matrix-slicing.png)
+
+**Example:**
+
+```python
+import torch
+
+A = torch.tensor([[1, 2, 3, 4],
+                  [5, 6, 7, 8],
+                  [9, 10, 11, 12],
+                  [13, 14, 15, 16]])
+
+# Get a sub-matrix
+print(A[1:3, 1:3])
+# Output:
+# tensor([[ 6,  7],
+#         [10, 11]])
+
+# Get entire row 2
+print(A[2, :])
+# Output: tensor([9, 10, 11, 12])
+
+# Get entire column 2
+print(A[:, 2])
+# Output: tensor([3, 7, 11, 15])
+```
+
+**Manual breakdown:**
+
+```yaml
+A[1:3, 1:3] means:
+- Rows 1 to 3 (not including 3) → rows 1, 2
+- Cols 1 to 3 (not including 3) → cols 1, 2
+
+Result:
+[[6,  7],
+ [10, 11]]
+
+A[2, :] means:
+- Row 2
+- All columns (:)
+→ [9, 10, 11, 12]
+
+A[:, 2] means:
+- All rows (:)
+- Column 2
+→ [3, 7, 11, 15]
+```
+
+**Remember:** `:` means "all" (all rows or all columns)
+
+## Step Slicing
+
+Add a **step** to skip elements: `[start:end:step]`
+
+![Step Slicing](/content/learn/tensors/indexing-and-slicing/step-slicing.png)
+
+**Example:**
+
+```python
+import torch
+
+v = torch.tensor([0, 10, 20, 30, 40, 50, 60, 70])
+
+print(v[::2])    # Output: tensor([0, 20, 40, 60])  ← Every 2nd
+print(v[1::2])   # Output: tensor([10, 30, 50, 70]) ← Start 1, every 2nd
+print(v[::3])    # Output: tensor([0, 30, 60])      ← Every 3rd
+print(v[::-1])   # Output: tensor([70, 60, 50, 40, 30, 20, 10, 0])  ← Reversed!
+```
+
+**How it works:**
+
+```yaml
+v[::2]  → Start at 0, take every 2nd element
+       → Indices: 0, 2, 4, 6
+       → Values: [0, 20, 40, 60]
+
+v[1::3] → Start at 1, take every 3rd element
+       → Indices: 1, 4, 7
+       → Values: [10, 40, 70]
+
+v[::-1] → Negative step reverses!
+       → Values: [70, 60, 50, 40, 30, 20, 10, 0]
+```
+
+**Cool trick:** `v[::-1]` reverses any tensor!
+
+## Multiple Elements at Once
+
+You can use lists to select specific indices:
+
+```python
+import torch
+
+v = torch.tensor([10, 20, 30, 40, 50])
+
+# Select indices 0, 2, 4
+indices = torch.tensor([0, 2, 4])
+result = v[indices]
+
+print(result)  # Output: tensor([10, 30, 50])
+```
+
+**For matrices:**
+
+```python
+import torch
+
+A = torch.tensor([[1, 2, 3],
+                  [4, 5, 6],
+                  [7, 8, 9]])
+
+# Get specific rows
+rows = torch.tensor([0, 2])
+result = A[rows]
+
+print(result)
+# Output:
+# tensor([[1, 2, 3],
+#         [7, 8, 9]])
+```
+
+## Practical Example: Batch Processing
+
+```python
+import torch
+
+# Batch of 5 samples, each with 3 features
+batch = torch.tensor([[1.0, 2.0, 3.0],
+                      [4.0, 5.0, 6.0],
+                      [7.0, 8.0, 9.0],
+                      [10.0, 11.0, 12.0],
+                      [13.0, 14.0, 15.0]])
+
+# Get first 3 samples
+first_three = batch[:3]
+print(first_three)
+# tensor([[ 1.,  2.,  3.],
+#         [ 4.,  5.,  6.],
+#         [ 7.,  8.,  9.]])
+
+# Get last 2 samples
+last_two = batch[-2:]
+print(last_two)
+# tensor([[10., 11., 12.],
+#         [13., 14., 15.]])
+
+# Get all samples, but only first 2 features
+first_two_features = batch[:, :2]
+print(first_two_features)
+# tensor([[ 1.,  2.],
+#         [ 4.,  5.],
+#         [ 7.,  8.],
+#         [10., 11.],
+#         [13., 14.]])
+```
+
+**What happened:**
+
+```yaml
+batch[:3] → First 3 rows (samples 0, 1, 2)
+batch[-2:] → Last 2 rows (samples 3, 4)
+batch[:, :2] → All rows, first 2 columns (features 0, 1)
+```
+
+## Modifying with Indexing
+
+You can change values using indexing:
+
+```python
+import torch
+
+v = torch.tensor([10, 20, 30, 40, 50])
+
+# Change single element
+v[2] = 999
+print(v)  # tensor([ 10,  20, 999,  40,  50])
+
+# Change slice
+v[0:2] = torch.tensor([100, 200])
+print(v)  # tensor([100, 200, 999,  40,  50])
+
+# Set all to same value
+v[:] = 0
+print(v)  # tensor([0, 0, 0, 0, 0])
+```
+
+## 3D Indexing
+
+For 3D tensors (like batches of images):
+
+```python
+import torch
+
+# 2 batches, 3 rows, 4 columns
+tensor_3d = torch.randn(2, 3, 4)
+
+# Get first batch
+first_batch = tensor_3d[0]      # Shape: (3, 4)
+
+# Get element from second batch, row 1, col 2
+element = tensor_3d[1, 1, 2]    # Single value
+
+# Get all batches, row 0, all columns
+slice_3d = tensor_3d[:, 0, :]   # Shape: (2, 4)
+```
+
+**Pattern:** `[batch, row, col]` for 3D tensors
+
+## Common Patterns
+
+### Get First/Last Row
+
+```python
+A = torch.randn(5, 3)
+
+first_row = A[0]     # or A[0, :]
+last_row = A[-1]     # or A[-1, :]
+```
+
+### Get First/Last Column
+
+```python
+A = torch.randn(5, 3)
+
+first_col = A[:, 0]
+last_col = A[:, -1]
+```
+
+### Get Main Diagonal
+
+```python
+A = torch.tensor([[1, 2, 3],
+                  [4, 5, 6],
+                  [7, 8, 9]])
+
+diagonal = torch.diag(A)
+print(diagonal)  # tensor([1, 5, 9])
+```
+
+### Skip Every Other Row
+
+```python
+A = torch.randn(10, 3)
+
+every_other_row = A[::2]  # Rows 0, 2, 4, 6, 8
+```
+
+## Common Gotchas
+
+### ❌ Gotcha 1: End Index Not Included
+
+```python
+v = torch.tensor([10, 20, 30, 40, 50])
+
+# v[1:4] gets indices 1, 2, 3 (NOT 4!)
+print(v[1:4])  # tensor([20, 30, 40])
+
+# To include index 4, use v[1:5]
+print(v[1:5])  # tensor([20, 30, 40, 50])
+```
+
+### ❌ Gotcha 2: Slicing Creates a View
+
+```python
+v = torch.tensor([1, 2, 3, 4, 5])
+slice_v = v[1:4]
+
+# Modifying slice also modifies original!
+slice_v[0] = 999
+
+print(v)        # tensor([  1, 999,   3,   4,   5])
+print(slice_v)  # tensor([999,   3,   4])
+
+# Use .clone() for a copy
+slice_copy = v[1:4].clone()
+slice_copy[0] = 100
+print(v)  # tensor([  1, 999,   3,   4,   5])  ← Unchanged!
+```
+
+### ❌ Gotcha 3: Integer vs Slice
+
+```python
+A = torch.randn(3, 4)
+
+# Integer index reduces dimensions
+row = A[0]      # Shape: (4,)  ← 1D tensor
+
+# Slice keeps dimensions
+row = A[0:1]    # Shape: (1, 4)  ← Still 2D!
+```
+
+## Key Takeaways
+
+✓ **Indexing starts at 0:** First element is `[0]`, not `[1]`
+
+✓ **Negative indexing:** `-1` is last, `-2` is second from last
+
+✓ **Slicing:** `[start:end]` - end is NOT included!
+
+✓ **Colon means all:** `A[:, 2]` = all rows, column 2
+
+✓ **Step:** `[::2]` = every 2nd element, `[::-1]` = reverse
+
+✓ **Views not copies:** Slicing creates views - use `.clone()` for copies
+
+**Quick Reference:**
+
+```python
+# Basic indexing
+v[0]           # First element
+v[-1]          # Last element
+A[1, 2]        # Row 1, column 2
+
+# Slicing
+v[1:4]         # Elements 1, 2, 3
+v[:3]          # First 3 elements
+v[3:]          # From index 3 to end
+v[:]           # All elements
+
+# 2D slicing
+A[1:3, 2:4]    # Rows 1-2, columns 2-3
+A[0, :]        # First row
+A[:, 0]        # First column
+
+# Step slicing
+v[::2]         # Every 2nd element
+v[::-1]        # Reversed
+```
+
+**Congratulations!** You now know how to access any part of any tensor! This is essential for data processing and neural networks. 🎉
diff --git a/public/content/learn/tensors/indexing-and-slicing/matrix-indexing.png b/public/content/learn/tensors/indexing-and-slicing/matrix-indexing.png
new file mode 100644
index 0000000..1e1469c
Binary files /dev/null and b/public/content/learn/tensors/indexing-and-slicing/matrix-indexing.png differ
diff --git a/public/content/learn/tensors/indexing-and-slicing/matrix-slicing.png b/public/content/learn/tensors/indexing-and-slicing/matrix-slicing.png
new file mode 100644
index 0000000..251ae83
Binary files /dev/null and b/public/content/learn/tensors/indexing-and-slicing/matrix-slicing.png differ
diff --git a/public/content/learn/tensors/indexing-and-slicing/negative-indexing.png b/public/content/learn/tensors/indexing-and-slicing/negative-indexing.png
new file mode 100644
index 0000000..315b5c8
Binary files /dev/null and b/public/content/learn/tensors/indexing-and-slicing/negative-indexing.png differ
diff --git a/public/content/learn/tensors/indexing-and-slicing/slicing-basics.png b/public/content/learn/tensors/indexing-and-slicing/slicing-basics.png
new file mode 100644
index 0000000..f025e12
Binary files /dev/null and b/public/content/learn/tensors/indexing-and-slicing/slicing-basics.png differ
diff --git a/public/content/learn/tensors/indexing-and-slicing/step-slicing.png b/public/content/learn/tensors/indexing-and-slicing/step-slicing.png
new file mode 100644
index 0000000..1cc240f
Binary files /dev/null and b/public/content/learn/tensors/indexing-and-slicing/step-slicing.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/all-positions.png b/public/content/learn/tensors/matrix-multiplication/all-positions.png
new file mode 100644
index 0000000..5286bea
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/all-positions.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/dot-product-steps.png b/public/content/learn/tensors/matrix-multiplication/dot-product-steps.png
new file mode 100644
index 0000000..142e6ea
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/dot-product-steps.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/dot-product.png b/public/content/learn/tensors/matrix-multiplication/dot-product.png
new file mode 100644
index 0000000..1bdac61
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/dot-product.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/elementwise-vs-matmul.png b/public/content/learn/tensors/matrix-multiplication/elementwise-vs-matmul.png
new file mode 100644
index 0000000..8aa78df
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/elementwise-vs-matmul.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/matrix-multiplication-content.md b/public/content/learn/tensors/matrix-multiplication/matrix-multiplication-content.md
new file mode 100644
index 0000000..1fd711c
--- /dev/null
+++ b/public/content/learn/tensors/matrix-multiplication/matrix-multiplication-content.md
@@ -0,0 +1,380 @@
+---
+hero:
+  title: "Matrix Multiplication"
+  subtitle: "The Core Operation in Neural Networks"
+  tags:
+    - "🔢 Tensors"
+    - "⏱️ 10 min read"
+---
+
+Matrix multiplication is THE most important operation in deep learning. Unlike addition, it's **not element-wise** - it combines rows and columns in a special way.
+
+## The Key Difference
+
+**Addition:** Add each position separately  
+**Multiplication:** Combine entire rows with entire columns
+
+Let's build up to matrix multiplication step by step!
+
+## Step 1: The Dot Product
+
+Before matrices, let's understand the **dot product** - multiplying two vectors:
+
+![Dot Product](/content/learn/tensors/matrix-multiplication/dot-product.png)
+
+**Example:**
+
+```python
+import torch
+
+a = torch.tensor([2, 3, 4])
+b = torch.tensor([1, 2, 3])
+
+# Dot product
+result = torch.dot(a, b)
+
+print(result)  # Output: tensor(20)
+```
+
+**Manual calculation:**
+
+```yaml
+Step 1: Multiply corresponding elements
+2 × 1 = 2
+3 × 2 = 6
+4 × 3 = 12
+
+Step 2: Add them all up
+2 + 6 + 12 = 20
+
+Result: 20
+```
+
+**Key insight:** Dot product = multiply pairs, then sum everything.
+
+![Dot Product Steps](/content/learn/tensors/matrix-multiplication/dot-product-steps.png)
+
+## Step 2: Matrix @ Matrix
+
+Matrix multiplication uses dot products repeatedly! The `@` operator means "matrix multiply":
+
+![Simple Matrix Multiplication](/content/learn/tensors/matrix-multiplication/simple-matmul.png)
+
+**Example:**
+
+```python
+import torch
+
+A = torch.tensor([[1, 2],
+                  [3, 4]])
+
+B = torch.tensor([[5, 6],
+                  [7, 8]])
+
+result = A @ B  # @ means matrix multiply
+
+print(result)
+# Output:
+# tensor([[19, 22],
+#         [43, 50]])
+```
+
+**How does this work?** Each position in the result is a dot product!
+
+## Computing One Position: The Rule
+
+**To get result[row, col]:**
+1. Take the **row** from matrix A
+2. Take the **column** from matrix B
+3. Compute their **dot product**
+
+![Step by Step](/content/learn/tensors/matrix-multiplication/step-by-step.png)
+
+**Manual calculation for position [0, 0]:**
+
+```yaml
+Take row 0 from A:  [1, 2]
+Take column 0 from B:  [5, 7]
+
+Dot product:
+(1 × 5) + (2 × 7) = 5 + 14 = 19
+
+Result[0, 0] = 19
+```
+
+**Manual calculation for position [0, 1]:**
+
+```yaml
+Take row 0 from A:  [1, 2]
+Take column 1 from B:  [6, 8]
+
+Dot product:
+(1 × 6) + (2 × 8) = 6 + 16 = 22
+
+Result[0, 1] = 22
+```
+
+**Manual calculation for position [1, 0]:**
+
+```yaml
+Take row 1 from A:  [3, 4]
+Take column 0 from B:  [5, 7]
+
+Dot product:
+(3 × 5) + (4 × 7) = 15 + 28 = 43
+
+Result[1, 0] = 43
+```
+
+**Manual calculation for position [1, 1]:**
+
+```yaml
+Take row 1 from A:  [3, 4]
+Take column 1 from B:  [6, 8]
+
+Dot product:
+(3 × 6) + (4 × 8) = 18 + 32 = 50
+
+Result[1, 1] = 50
+```
+
+**Complete result:**
+
+```yaml
+[[19, 22],
+ [43, 50]]
+```
+
+![All Positions](/content/learn/tensors/matrix-multiplication/all-positions.png)
+
+## The Shape Rule
+
+**Not all matrices can be multiplied!** The shapes must be compatible:
+
+![Shape Rule](/content/learn/tensors/matrix-multiplication/shape-rule.png)
+
+**The rule:** `(m, n) @ (n, p) = (m, p)`
+
+The **inner dimensions must match**!
+
+### ✓ Valid Examples
+
+```python
+# Example 1
+A = torch.randn(3, 4)  # 3 rows, 4 columns
+B = torch.randn(4, 2)  # 4 rows, 2 columns
+result = A @ B         # Works! → (3, 2)
+
+# Example 2
+A = torch.randn(5, 10)
+B = torch.randn(10, 7)
+result = A @ B         # Works! → (5, 7)
+
+# Example 3
+A = torch.randn(2, 3)
+B = torch.randn(3, 3)
+result = A @ B         # Works! → (2, 3)
+```
+
+**Why these work:**
+
+```yaml
+Example 1: (3, 4) @ (4, 2) = (3, 2)  ✓ 4 = 4
+Example 2: (5, 10) @ (10, 7) = (5, 7)  ✓ 10 = 10
+Example 3: (2, 3) @ (3, 3) = (2, 3)  ✓ 3 = 3
+```
+
+### ✗ Invalid Examples
+
+```python
+# Example 1 - WILL ERROR!
+A = torch.randn(3, 4)
+B = torch.randn(5, 2)
+# result = A @ B  # Error! 4 ≠ 5
+
+# Example 2 - WILL ERROR!
+A = torch.randn(2, 7)
+B = torch.randn(3, 5)
+# result = A @ B  # Error! 7 ≠ 3
+```
+
+**Why these fail:**
+
+```yaml
+Example 1: (3, 4) @ (5, 2)  ✗ 4 ≠ 5 (can't match rows with columns)
+Example 2: (2, 7) @ (3, 5)  ✗ 7 ≠ 3 (dimensions incompatible)
+```
+
+## Vector @ Matrix
+
+A common pattern in neural networks is multiplying a vector by a matrix:
+
+![Vector @ Matrix](/content/learn/tensors/matrix-multiplication/vector-matrix.png)
+
+**Example:**
+
+```python
+import torch
+
+# Input vector (like data going into a layer)
+x = torch.tensor([1, 2, 3])  # Shape: (3,)
+
+# Weight matrix
+W = torch.tensor([[4, 5],
+                  [6, 7],
+                  [8, 9]])   # Shape: (3, 2)
+
+result = x @ W
+
+print(result)  # Output: tensor([40, 46])
+print(result.shape)  # Shape: (2,)
+```
+
+**Manual calculation:**
+
+```yaml
+Position [0]:
+Take vector:  [1, 2, 3]
+Take column 0:  [4, 6, 8]
+Dot product: (1×4) + (2×6) + (3×8) = 4 + 12 + 24 = 40
+
+Position [1]:
+Take vector:  [1, 2, 3]
+Take column 1:  [5, 7, 9]
+Dot product: (1×5) + (2×7) + (3×9) = 5 + 14 + 27 = 46
+
+Result: [40, 46]
+```
+
+**This is exactly what happens in a neural network layer!**
+
+## Practical Example: Neural Network Layer
+
+Here's a realistic example of matrix multiplication in action:
+
+```python
+import torch
+
+# Batch of 2 samples, each with 3 features
+inputs = torch.tensor([[1.0, 2.0, 3.0],
+                       [4.0, 5.0, 6.0]])  # Shape: (2, 3)
+
+# Weight matrix: 3 inputs → 4 outputs
+weights = torch.tensor([[0.1, 0.2, 0.3, 0.4],
+                        [0.5, 0.6, 0.7, 0.8],
+                        [0.9, 1.0, 1.1, 1.2]])  # Shape: (3, 4)
+
+# Forward pass
+outputs = inputs @ weights  # Shape: (2, 4)
+
+print(outputs)
+# tensor([[3.2000, 3.8000, 4.4000, 5.0000],
+#         [7.7000, 9.2000, 10.7000, 12.2000]])
+```
+
+**What happened:**
+
+```yaml
+Shape: (2, 3) @ (3, 4) = (2, 4)
+       ↓       ↓       ↓
+    2 samples  →  4 outputs per sample
+    3 features each
+```
+
+Each of the 2 input samples got transformed into 4 output values. This is how neural networks transform data!
+
+![Neural Network Layer](/content/learn/tensors/matrix-multiplication/neural-network.png)
+
+## Matrix @ Vector
+
+You can also multiply matrix @ vector (different from vector @ matrix):
+
+```python
+import torch
+
+A = torch.tensor([[1, 2, 3],
+                  [4, 5, 6]])  # Shape: (2, 3)
+
+v = torch.tensor([1, 2, 3])   # Shape: (3,)
+
+result = A @ v
+
+print(result)  # Output: tensor([14, 32])
+print(result.shape)  # Shape: (2,)
+```
+
+**Manual calculation:**
+
+```yaml
+Row 0: [1, 2, 3] · [1, 2, 3] = 1 + 4 + 9 = 14
+Row 1: [4, 5, 6] · [1, 2, 3] = 4 + 10 + 18 = 32
+
+Result: [14, 32]
+```
+
+## Common Mistakes
+
+### ❌ Mistake 1: Using * instead of @
+
+```python
+A = torch.tensor([[1, 2], [3, 4]])
+B = torch.tensor([[5, 6], [7, 8]])
+
+wrong = A * B    # Element-wise multiplication! ❌
+right = A @ B    # Matrix multiplication! ✓
+
+print("Wrong:", wrong)
+# tensor([[ 5, 12],
+#         [21, 32]])
+
+print("Right:", right)
+# tensor([[19, 22],
+#         [43, 50]])
+```
+
+**Visual comparison:**
+
+![Element-wise vs Matrix Multiplication](/content/learn/tensors/matrix-multiplication/elementwise-vs-matmul.png)
+
+### ❌ Mistake 2: Wrong shape order
+
+```python
+A = torch.randn(3, 4)
+B = torch.randn(5, 3)
+
+# result = A @ B  # Error! 4 ≠ 5
+
+# Fix: Either change order or transpose
+result = B @ A  # Works! (5, 3) @ (3, 4) = (5, 4)
+```
+
+## Key Takeaways
+
+✓ **Dot product:** Multiply pairs, then sum
+
+✓ **Matrix multiply:** Each result position = dot product of row × column
+
+✓ **Shape rule:** `(m, n) @ (n, p) = (m, p)` - inner dimensions must match!
+
+✓ **Use @:** For matrix multiplication (not `*`)
+
+✓ **Common in ML:** Input @ Weights = Output
+
+**Quick Reference:**
+
+```python
+# Dot product (1D × 1D)
+torch.dot(torch.tensor([1, 2]), torch.tensor([3, 4]))  # = 11
+
+# Vector @ Matrix (transforms vector)
+torch.tensor([1, 2]) @ torch.tensor([[1, 2], [3, 4]])  # = [7, 10]
+
+# Matrix @ Vector (applies to rows)
+torch.tensor([[1, 2], [3, 4]]) @ torch.tensor([1, 2])  # = [5, 11]
+
+# Matrix @ Matrix (transforms matrix)
+torch.tensor([[1, 2], [3, 4]]) @ torch.tensor([[5, 6], [7, 8]])
+# = [[19, 22], [43, 50]]
+```
+
+**Remember:** Every neural network layer uses matrix multiplication to transform data. You've just learned the most important operation in deep learning! 🎉
diff --git a/public/content/learn/tensors/matrix-multiplication/neural-network.png b/public/content/learn/tensors/matrix-multiplication/neural-network.png
new file mode 100644
index 0000000..fe59f7a
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/neural-network.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/shape-rule.png b/public/content/learn/tensors/matrix-multiplication/shape-rule.png
new file mode 100644
index 0000000..2677781
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/shape-rule.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/simple-matmul.png b/public/content/learn/tensors/matrix-multiplication/simple-matmul.png
new file mode 100644
index 0000000..9ff39b0
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/simple-matmul.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/step-by-step.png b/public/content/learn/tensors/matrix-multiplication/step-by-step.png
new file mode 100644
index 0000000..02ffe77
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/step-by-step.png differ
diff --git a/public/content/learn/tensors/matrix-multiplication/vector-matrix.png b/public/content/learn/tensors/matrix-multiplication/vector-matrix.png
new file mode 100644
index 0000000..8801543
Binary files /dev/null and b/public/content/learn/tensors/matrix-multiplication/vector-matrix.png differ
diff --git a/public/content/learn/tensors/reshaping-tensors/auto-dimension.png b/public/content/learn/tensors/reshaping-tensors/auto-dimension.png
new file mode 100644
index 0000000..20e58e5
Binary files /dev/null and b/public/content/learn/tensors/reshaping-tensors/auto-dimension.png differ
diff --git a/public/content/learn/tensors/reshaping-tensors/basic-reshape.png b/public/content/learn/tensors/reshaping-tensors/basic-reshape.png
new file mode 100644
index 0000000..10dd8f9
Binary files /dev/null and b/public/content/learn/tensors/reshaping-tensors/basic-reshape.png differ
diff --git a/public/content/learn/tensors/reshaping-tensors/batch-reshape.png b/public/content/learn/tensors/reshaping-tensors/batch-reshape.png
new file mode 100644
index 0000000..023ef7f
Binary files /dev/null and b/public/content/learn/tensors/reshaping-tensors/batch-reshape.png differ
diff --git a/public/content/learn/tensors/reshaping-tensors/flatten-visual.png b/public/content/learn/tensors/reshaping-tensors/flatten-visual.png
new file mode 100644
index 0000000..8dbdb2c
Binary files /dev/null and b/public/content/learn/tensors/reshaping-tensors/flatten-visual.png differ
diff --git a/public/content/learn/tensors/reshaping-tensors/reshape-rules.png b/public/content/learn/tensors/reshaping-tensors/reshape-rules.png
new file mode 100644
index 0000000..6d2710e
Binary files /dev/null and b/public/content/learn/tensors/reshaping-tensors/reshape-rules.png differ
diff --git a/public/content/learn/tensors/reshaping-tensors/reshaping-tensors-content.md b/public/content/learn/tensors/reshaping-tensors/reshaping-tensors-content.md
new file mode 100644
index 0000000..1ce5e23
--- /dev/null
+++ b/public/content/learn/tensors/reshaping-tensors/reshaping-tensors-content.md
@@ -0,0 +1,477 @@
+---
+hero:
+  title: "Reshaping Tensors"
+  subtitle: "Changing Tensor Dimensions"
+  tags:
+    - "🔢 Tensors"
+    - "⏱️ 10 min read"
+---
+
+Reshaping lets you change how data is organized **without changing the actual values**. Same data, different shape!
+
+## The Basic Idea
+
+Reshaping reorganizes elements into a new structure. Think of it like rearranging books on shelves - same books, different arrangement!
+
+![Basic Reshape](/content/learn/tensors/reshaping-tensors/basic-reshape.png)
+
+**Example:**
+
+```python
+import torch
+
+# 1D tensor with 6 elements
+v = torch.tensor([1, 2, 3, 4, 5, 6])
+print(v.shape)  # torch.Size([6])
+
+# Reshape to 2D: 2 rows, 3 columns
+matrix = v.reshape(2, 3)
+print(matrix)
+# tensor([[1, 2, 3],
+#         [4, 5, 6]])
+print(matrix.shape)  # torch.Size([2, 3])
+```
+
+**What happened:**
+
+```yaml
+Original:  [1, 2, 3, 4, 5, 6]  → Shape: (6,)
+
+Reshaped:  [[1, 2, 3],
+            [4, 5, 6]]         → Shape: (2, 3)
+
+Same 6 elements, new organization!
+```
+
+## The Golden Rule
+
+**Total number of elements must stay the same!**
+
+```yaml
+6 elements can become:
+✓ (6,)      - 1D with 6 elements
+✓ (2, 3)    - 2×3 = 6 elements
+✓ (3, 2)    - 3×2 = 6 elements
+✓ (1, 6)    - 1×6 = 6 elements
+✗ (2, 4)    - 2×4 = 8 elements (ERROR!)
+```
+
+## Common Reshape Patterns
+
+### Pattern 1: 1D → 2D
+
+```python
+import torch
+
+v = torch.tensor([1, 2, 3, 4, 5, 6])
+
+# Make it 2×3
+matrix = v.reshape(2, 3)
+print(matrix)
+# tensor([[1, 2, 3],
+#         [4, 5, 6]])
+
+# Make it 3×2
+matrix = v.reshape(3, 2)
+print(matrix)
+# tensor([[1, 2],
+#         [3, 4],
+#         [5, 6]])
+```
+
+### Pattern 2: 2D → Different 2D
+
+```python
+import torch
+
+A = torch.tensor([[1, 2, 3],
+                  [4, 5, 6]])  # Shape: (2, 3)
+
+B = A.reshape(3, 2)
+print(B)
+# tensor([[1, 2],
+#         [3, 4],
+#         [5, 6]])  # Shape: (3, 2)
+```
+
+## Flattening: Any Dimension → 1D
+
+Flattening converts any tensor into a single row:
+
+![Flatten Visual](/content/learn/tensors/reshaping-tensors/flatten-visual.png)
+
+**Example:**
+
+```python
+import torch
+
+matrix = torch.tensor([[1, 2, 3],
+                       [4, 5, 6]])
+
+# Method 1: flatten()
+flat = matrix.flatten()
+print(flat)  # tensor([1, 2, 3, 4, 5, 6])
+
+# Method 2: reshape(-1)
+flat = matrix.reshape(-1)
+print(flat)  # tensor([1, 2, 3, 4, 5, 6])
+
+# Method 3: view(-1)
+flat = matrix.view(-1)
+print(flat)  # tensor([1, 2, 3, 4, 5, 6])
+```
+
+**How it reads:**
+
+```yaml
+Matrix:
+[[1, 2, 3],
+ [4, 5, 6]]
+
+Flattens row by row:
+Row 0: [1, 2, 3]
+Row 1: [4, 5, 6]
+
+Result: [1, 2, 3, 4, 5, 6]
+```
+
+## Using -1: Automatic Dimension
+
+Use `-1` to let PyTorch figure out one dimension automatically!
+
+![Auto Dimension](/content/learn/tensors/reshaping-tensors/auto-dimension.png)
+
+**Example:**
+
+```python
+import torch
+
+t = torch.arange(12)  # [0, 1, 2, ..., 11] - 12 elements
+
+# You specify columns, PyTorch figures out rows
+print(t.reshape(-1, 3))  # (?, 3) → (4, 3)
+# tensor([[ 0,  1,  2],
+#         [ 3,  4,  5],
+#         [ 6,  7,  8],
+#         [ 9, 10, 11]])
+
+# You specify rows, PyTorch figures out columns
+print(t.reshape(3, -1))  # (3, ?) → (3, 4)
+# tensor([[ 0,  1,  2,  3],
+#         [ 4,  5,  6,  7],
+#         [ 8,  9, 10, 11]])
+
+# Just -1 means flatten
+print(t.reshape(-1))  # (12,)
+```
+
+**How it works:**
+
+```yaml
+12 elements, reshape(-1, 3):
+→ 12 ÷ 3 = 4 rows
+→ Result: (4, 3)
+
+12 elements, reshape(2, -1):
+→ 12 ÷ 2 = 6 columns
+→ Result: (2, 6)
+```
+
+**Important:** Only ONE -1 allowed per reshape!
+
+## Squeeze & Unsqueeze
+
+These add or remove dimensions of size 1:
+
+![Squeeze Unsqueeze](/content/learn/tensors/reshaping-tensors/squeeze-unsqueeze.png)
+
+### Unsqueeze: Add a Dimension
+
+```python
+import torch
+
+v = torch.tensor([1, 2, 3])  # Shape: (3,)
+
+# Add dimension at position 0
+v_unsqueezed = v.unsqueeze(0)
+print(v_unsqueezed.shape)  # torch.Size([1, 3])
+print(v_unsqueezed)
+# tensor([[1, 2, 3]])
+
+# Add dimension at position 1
+v_unsqueezed = v.unsqueeze(1)
+print(v_unsqueezed.shape)  # torch.Size([3, 1])
+print(v_unsqueezed)
+# tensor([[1],
+#         [2],
+#         [3]])
+```
+
+### Squeeze: Remove Dimensions of Size 1
+
+```python
+import torch
+
+t = torch.tensor([[[1, 2, 3]]])  # Shape: (1, 1, 3)
+
+# Remove all size-1 dimensions
+squeezed = t.squeeze()
+print(squeezed.shape)  # torch.Size([3])
+print(squeezed)  # tensor([1, 2, 3])
+
+# Remove specific dimension
+t2 = torch.randn(1, 5, 1, 3)  # Shape: (1, 5, 1, 3)
+squeezed = t2.squeeze(0)  # Remove dimension 0
+print(squeezed.shape)  # torch.Size([5, 1, 3])
+```
+
+**When to use:**
+
+```yaml
+Unsqueeze: When you need to match shapes for operations
+  (3,) + unsqueeze(1) → (3, 1) for broadcasting
+
+Squeeze: When you want to remove extra dimensions
+  (1, 5, 1) + squeeze() → (5,) cleaner shape
+```
+
+## Reshape vs View
+
+Both change shape, but there's a difference:
+
+```python
+import torch
+
+t = torch.tensor([[1, 2], [3, 4]])
+
+# reshape() - always works, may copy data
+r = t.reshape(4)  # Works!
+
+# view() - faster but requires contiguous memory
+v = t.view(4)     # Works if contiguous!
+```
+
+**Key difference:**
+
+```yaml
+.reshape():
+  - Always works
+  - May create a copy if needed
+  - Safer choice
+
+.view():
+  - Faster (no copy)
+  - Only works on contiguous tensors
+  - May fail with error
+```
+
+**When to use which:**
+- Use `.reshape()` by default (safer)
+- Use `.view()` if you know tensor is contiguous and want speed
+
+## Practical Example: Batch Processing
+
+![Batch Reshape](/content/learn/tensors/reshaping-tensors/batch-reshape.png)
+
+```python
+import torch
+
+# 3 images, each 2×2 pixels
+images = torch.tensor([[[1, 2], [3, 4]],
+                       [[5, 6], [7, 8]],
+                       [[9, 10], [11, 12]]])
+
+print(images.shape)  # torch.Size([3, 2, 2])
+
+# Flatten each image for neural network
+batch = images.reshape(3, -1)
+print(batch)
+# tensor([[ 1,  2,  3,  4],
+#         [ 5,  6,  7,  8],
+#         [ 9, 10, 11, 12]])
+
+print(batch.shape)  # torch.Size([3, 4])
+# 3 samples, 4 features each - ready for neural network!
+```
+
+**What happened:**
+
+```yaml
+Original: (3, 2, 2)
+  - 3 images
+  - Each image is 2×2
+
+Reshaped: (3, 4)
+  - 3 samples
+  - Each sample has 4 features (flattened image)
+```
+
+## Reshaping Rules
+
+![Reshape Rules](/content/learn/tensors/reshaping-tensors/reshape-rules.png)
+
+### ✓ Valid Reshapes
+
+```python
+# 12 elements can be reshaped many ways
+t = torch.arange(12)  # 12 elements
+
+t.reshape(3, 4)    # ✓ 3×4 = 12
+t.reshape(2, 6)    # ✓ 2×6 = 12
+t.reshape(1, 12)   # ✓ 1×12 = 12
+t.reshape(2, 2, 3) # ✓ 2×2×3 = 12
+```
+
+### ✗ Invalid Reshapes
+
+```python
+t = torch.arange(12)  # 12 elements
+
+# t.reshape(3, 5)  # ✗ 3×5 = 15 ≠ 12 - ERROR!
+# t.reshape(2, 7)  # ✗ 2×7 = 14 ≠ 12 - ERROR!
+```
+
+## Real-World Examples
+
+### Example 1: Preparing Data for Linear Layer
+
+```python
+import torch
+
+# Batch of 32 images, each 28×28 pixels
+images = torch.randn(32, 28, 28)
+
+# Flatten for fully connected layer
+flattened = images.reshape(32, -1)
+print(flattened.shape)  # torch.Size([32, 784])
+# 32 samples, 784 features (28×28)
+
+# Now ready for: output = linear_layer(flattened)
+```
+
+### Example 2: Converting Model Output
+
+```python
+import torch
+
+# Model outputs 100 predictions, need 10×10 grid
+predictions = torch.randn(100)
+
+# Reshape to grid
+grid = predictions.reshape(10, 10)
+print(grid.shape)  # torch.Size([10, 10])
+```
+
+### Example 3: Adding Batch Dimension
+
+```python
+import torch
+
+# Single sample
+sample = torch.randn(28, 28)
+print(sample.shape)  # torch.Size([28, 28])
+
+# Add batch dimension for model
+batched = sample.unsqueeze(0)
+print(batched.shape)  # torch.Size([1, 28, 28])
+# Now it looks like a batch of 1 sample
+```
+
+## Common Patterns
+
+### Pattern: Flatten Batch
+
+```python
+batch = torch.randn(32, 3, 224, 224)  # 32 images, 3 channels, 224×224
+flat = batch.reshape(32, -1)           # (32, 150528)
+```
+
+### Pattern: Split into Batches
+
+```python
+data = torch.arange(100)
+batches = data.reshape(10, 10)  # 10 batches of 10 samples
+```
+
+### Pattern: Match Dimensions for Broadcasting
+
+```python
+a = torch.randn(5, 3)      # (5, 3)
+b = torch.randn(3)         # (3,)
+
+# Add dimension to b for broadcasting
+b = b.unsqueeze(0)         # (1, 3)
+result = a + b             # Works! (5, 3) + (1, 3)
+```
+
+## Common Gotchas
+
+### ❌ Gotcha 1: Element Count Mismatch
+
+```python
+t = torch.arange(12)  # 12 elements
+
+# This will ERROR!
+# t.reshape(3, 5)  # 15 ≠ 12
+```
+
+### ❌ Gotcha 2: Too Many -1
+
+```python
+t = torch.arange(12)
+
+# This will ERROR!
+# t.reshape(-1, -1)  # Can't infer both dimensions!
+```
+
+### ❌ Gotcha 3: View on Non-Contiguous Tensor
+
+```python
+t = torch.randn(3, 4)
+t_t = t.T  # Transpose makes it non-contiguous
+
+# This might ERROR!
+# v = t_t.view(12)
+
+# Use reshape instead:
+r = t_t.reshape(12)  # Works!
+```
+
+## Key Takeaways
+
+✓ **Same data, new shape:** Reshaping reorganizes elements without changing values
+
+✓ **Element count must match:** Total elements before = total elements after
+
+✓ **Use -1 for auto:** Let PyTorch figure out one dimension
+
+✓ **Flatten with reshape(-1):** Any tensor → 1D
+
+✓ **Unsqueeze adds, squeeze removes:** Manage dimensions of size 1
+
+✓ **reshape() is safer:** Use reshape() by default, view() for speed
+
+**Quick Reference:**
+
+```python
+# Basic reshape
+t.reshape(2, 3)      # Specific shape
+t.reshape(-1, 3)     # Auto rows, 3 columns
+t.reshape(-1)        # Flatten to 1D
+
+# Flatten
+t.flatten()          # Always returns 1D
+t.reshape(-1)        # Also flattens
+t.view(-1)           # Flatten (if contiguous)
+
+# Add/remove dimensions
+t.unsqueeze(0)       # Add dimension at position 0
+t.unsqueeze(1)       # Add dimension at position 1
+t.squeeze()          # Remove all size-1 dimensions
+t.squeeze(0)         # Remove specific dimension
+
+# Alternative (view is faster but less safe)
+t.view(2, 3)         # Like reshape, but needs contiguous tensor
+```
+
+**Remember:** Reshaping doesn't change the data, only how it's organized! 🎉
diff --git a/public/content/learn/tensors/reshaping-tensors/squeeze-unsqueeze.png b/public/content/learn/tensors/reshaping-tensors/squeeze-unsqueeze.png
new file mode 100644
index 0000000..fd2652e
Binary files /dev/null and b/public/content/learn/tensors/reshaping-tensors/squeeze-unsqueeze.png differ
diff --git a/public/content/learn/tensors/tensor-addition/broadcasting-scalar-vector.png b/public/content/learn/tensors/tensor-addition/broadcasting-scalar-vector.png
new file mode 100644
index 0000000..335ac14
Binary files /dev/null and b/public/content/learn/tensors/tensor-addition/broadcasting-scalar-vector.png differ
diff --git a/public/content/learn/tensors/tensor-addition/matrix-addition.png b/public/content/learn/tensors/tensor-addition/matrix-addition.png
new file mode 100644
index 0000000..2caabe8
Binary files /dev/null and b/public/content/learn/tensors/tensor-addition/matrix-addition.png differ
diff --git a/public/content/learn/tensors/tensor-addition/scalar-addition.png b/public/content/learn/tensors/tensor-addition/scalar-addition.png
new file mode 100644
index 0000000..9e26f20
Binary files /dev/null and b/public/content/learn/tensors/tensor-addition/scalar-addition.png differ
diff --git a/public/content/learn/tensors/tensor-addition/step-by-step-addition.png b/public/content/learn/tensors/tensor-addition/step-by-step-addition.png
new file mode 100644
index 0000000..37320f3
Binary files /dev/null and b/public/content/learn/tensors/tensor-addition/step-by-step-addition.png differ
diff --git a/public/content/learn/tensors/tensor-addition/tensor-addition-content.md b/public/content/learn/tensors/tensor-addition/tensor-addition-content.md
new file mode 100644
index 0000000..7432f00
--- /dev/null
+++ b/public/content/learn/tensors/tensor-addition/tensor-addition-content.md
@@ -0,0 +1,290 @@
+---
+hero:
+  title: "Tensor Addition"
+  subtitle: "Element-wise Operations on Tensors"
+  tags:
+    - "🔢 Tensors"
+    - "⏱️ 8 min read"
+---
+
+Tensor addition is one of the most fundamental operations in deep learning. It's simple: **add corresponding elements together**.
+
+## The Basic Rule
+
+**When you add two tensors, you add each position separately (element-wise).**
+
+Think of it like adding two shopping lists item by item:
+- First item + First item
+- Second item + Second item  
+- Third item + Third item
+
+## Scalar Addition
+
+Adding two single numbers:
+
+![Scalar Addition](/content/learn/tensors/tensor-addition/scalar-addition.png)
+
+**Example:**
+
+```python
+import torch
+
+a = torch.tensor(5)
+b = torch.tensor(3)
+result = a + b
+
+print(result)  # Output: tensor(8)
+```
+
+**Manual calculation:**
+```yaml
+5 + 3 = 8
+```
+
+Simple! Just like regular math.
+
+## Vector Addition
+
+Adding arrays of numbers, **element by element**:
+
+![Vector Addition](/content/learn/tensors/tensor-addition/vector-addition.png)
+
+**Example:**
+
+```python
+import torch
+
+a = torch.tensor([10, 20, 30])
+b = torch.tensor([5, 15, 25])
+result = a + b
+
+print(result)  # Output: tensor([15, 35, 55])
+```
+
+**Manual calculation:**
+```yaml
+Position 0: 10 + 5  = 15
+Position 1: 20 + 15 = 35
+Position 2: 30 + 25 = 55
+
+Result: [15, 35, 55]
+```
+
+![Step by Step Addition](/content/learn/tensors/tensor-addition/step-by-step-addition.png)
+
+**Key insight:** Each position is independent. We add `[0]` with `[0]`, `[1]` with `[1]`, `[2]` with `[2]`.
+
+## Matrix Addition
+
+Same rule applies to matrices - add corresponding positions:
+
+![Matrix Addition](/content/learn/tensors/tensor-addition/matrix-addition.png)
+
+**Example:**
+
+```python
+import torch
+
+a = torch.tensor([[10, 20, 30],
+                  [15, 25, 35]])
+
+b = torch.tensor([[5, 10, 15],
+                  [8, 12, 18]])
+
+result = a + b
+
+print(result)
+# Output:
+# tensor([[15, 30, 45],
+#         [23, 37, 53]])
+```
+
+**Manual calculation:**
+```yaml
+Position [0, 0]: 10 + 5  = 15
+Position [0, 1]: 20 + 10 = 30
+Position [0, 2]: 30 + 15 = 45
+Position [1, 0]: 15 + 8  = 23
+Position [1, 1]: 25 + 12 = 37
+Position [1, 2]: 35 + 18 = 53
+
+Result:
+[[15, 30, 45],
+ [23, 37, 53]]
+```
+
+## Broadcasting: Adding a Scalar to a Vector
+
+What if you want to add a single number to every element in a vector? PyTorch automatically "broadcasts" the scalar:
+
+![Broadcasting](/content/learn/tensors/tensor-addition/broadcasting-scalar-vector.png)
+
+**Example:**
+
+```python
+import torch
+
+vector = torch.tensor([10, 20, 30])
+scalar = 5
+
+result = vector + scalar
+
+print(result)  # Output: tensor([15, 25, 35])
+```
+
+**What happens behind the scenes:**
+
+PyTorch automatically expands `5` to `[5, 5, 5]` and then adds:
+
+```yaml
+[10, 20, 30] + 5
+    ↓
+[10, 20, 30] + [5, 5, 5]  (automatic broadcast)
+    ↓
+[15, 25, 35]
+```
+
+**Manual calculation:**
+```yaml
+10 + 5 = 15
+20 + 5 = 25
+30 + 5 = 35
+
+Result: [15, 25, 35]
+```
+
+This works because adding the same number to every position makes sense!
+
+## Addition Rules
+
+### Quick Reminder: What is Shape?
+
+- **Shape** tells you the dimensions and size of your tensor
+- Written as `(rows, columns)` for 2D, or `(size,)` for 1D
+
+**Examples:**
+```yaml
+5             → Shape: ()         (scalar - no dimensions)
+[1, 2, 3]     → Shape: (3,)       (1D - 3 elements)
+[[1, 2],      → Shape: (3, 2)     (2D - 3 rows, 2 columns) - last shape number is the inner most tensor dimension
+ [3, 4],
+ [5, 6]]
+[[[...],      → Shape: (2, 3, 5)  (3D - 2 matrices, 3 rows, 5 columns)
+  [...],
+  [...]],
+ [[...],
+  [...],
+  [...]]]
+
+...and so on for higher dimensions
+```
+
+Now let's use this to understand addition rules!
+
+### ✓ Rule 1: Same Shapes Work
+
+Tensors must have the **same shape** to be added:
+
+```python
+a = torch.tensor([1, 2, 3])      # Shape: (3,)
+b = torch.tensor([4, 5, 6])      # Shape: (3,)
+result = a + b                    # Works! ✓
+
+print(result)  # tensor([5, 7, 9])
+```
+
+### ✓ Rule 2: Broadcasting Works
+
+A scalar can be added to any tensor:
+
+```python
+a = torch.tensor([1, 2, 3])      # Shape: (3,)
+b = 10                            # Scalar
+result = a + b                    # Works! ✓
+
+print(result)  # tensor([11, 12, 13])
+```
+
+### ✗ Rule 3: Different Shapes Don't Work
+
+You **cannot** add tensors with different shapes:
+
+```python
+a = torch.tensor([1, 2, 3])      # Shape: (3,)
+b = torch.tensor([4, 5])         # Shape: (2,)
+
+# This will cause an ERROR! ✗
+# result = a + b  
+# RuntimeError: The size of tensor a (3) must match the size of tensor b (2)
+```
+
+**Why?** PyTorch doesn't know how to match up the elements:
+- Should position `[0]` add to `[0]`? Yes.
+- Should position `[1]` add to `[1]`? Yes.  
+- Should position `[2]` add to...? There's no `[2]` in the second tensor! ❌
+
+## Real-World Example: Adjusting Image Brightness
+
+Imagine you have a small 2×2 grayscale image (values 0-255):
+
+```python
+import torch
+
+# Original image (darker)
+image = torch.tensor([[100, 150],
+                      [120, 180]], dtype=torch.float32)
+
+# Make it brighter by adding 50 to all pixels
+brightness_increase = 50
+brighter_image = image + brightness_increase
+
+print("Original image:")
+print(image)
+# tensor([[100., 150.],
+#         [120., 180.]])
+
+print("\nBrighter image:")
+print(brighter_image)
+# tensor([[150., 200.],
+#         [170., 230.]])
+```
+
+**Manual calculation:**
+```yaml
+Original:     Add 50:      Result:
+[[100, 150]   +  50    →   [[150, 200]
+ [120, 180]]                 [170, 230]]
+
+Each pixel becomes 50 points brighter!
+```
+
+This is exactly how image editing software makes images brighter - it adds a value to every pixel.
+
+## Key Takeaways
+
+✓ **Element-wise:** Addition happens position by position
+
+✓ **Same shapes:** Tensors must have identical shapes (or use broadcasting)
+
+✓ **Broadcasting:** Scalars are automatically added to every element
+
+✓ **Independent:** Each position is added separately - no mixing between positions
+
+**Quick Reference:**
+
+```python
+# Scalar + Scalar
+torch.tensor(5) + torch.tensor(3)          # = 8
+
+# Vector + Vector (same size)
+torch.tensor([1, 2]) + torch.tensor([3, 4])  # = [4, 6]
+
+# Vector + Scalar (broadcasting)
+torch.tensor([1, 2, 3]) + 10               # = [11, 12, 13]
+
+# Matrix + Matrix (same shape)
+torch.tensor([[1, 2], [3, 4]]) + torch.tensor([[5, 6], [7, 8]])
+# = [[6, 8], [10, 12]]
+```
+
+**Congratulations!** You now understand tensor addition. This same element-wise principle applies to subtraction, multiplication, and division too! 🎉
diff --git a/public/content/learn/tensors/tensor-addition/vector-addition.png b/public/content/learn/tensors/tensor-addition/vector-addition.png
new file mode 100644
index 0000000..824aaa9
Binary files /dev/null and b/public/content/learn/tensors/tensor-addition/vector-addition.png differ
diff --git a/public/content/learn/tensors/transposing-tensors/matrix-transpose.png b/public/content/learn/tensors/transposing-tensors/matrix-transpose.png
new file mode 100644
index 0000000..047999e
Binary files /dev/null and b/public/content/learn/tensors/transposing-tensors/matrix-transpose.png differ
diff --git a/public/content/learn/tensors/transposing-tensors/square-transpose.png b/public/content/learn/tensors/transposing-tensors/square-transpose.png
new file mode 100644
index 0000000..eeadb57
Binary files /dev/null and b/public/content/learn/tensors/transposing-tensors/square-transpose.png differ
diff --git a/public/content/learn/tensors/transposing-tensors/transpose-detailed.png b/public/content/learn/tensors/transposing-tensors/transpose-detailed.png
new file mode 100644
index 0000000..ea15a43
Binary files /dev/null and b/public/content/learn/tensors/transposing-tensors/transpose-detailed.png differ
diff --git a/public/content/learn/tensors/transposing-tensors/transposing-tensors-content.md b/public/content/learn/tensors/transposing-tensors/transposing-tensors-content.md
new file mode 100644
index 0000000..aad25f3
--- /dev/null
+++ b/public/content/learn/tensors/transposing-tensors/transposing-tensors-content.md
@@ -0,0 +1,393 @@
+---
+hero:
+  title: "Transposing Tensors"
+  subtitle: "Flipping Dimensions and Axes"
+  tags:
+    - "🔢 Tensors"
+    - "⏱️ 10 min read"
+---
+
+Transposing is like **flipping** a tensor - rows become columns, and columns become rows. It's simple but incredibly useful!
+
+## The Basic Idea
+
+**Transpose = Swap rows and columns**
+
+Think of it like rotating a table 90 degrees. The first row becomes the first column, the second row becomes the second column, and so on.
+
+## Vector Transpose
+
+When you transpose a vector, you change it from horizontal to vertical (or vice versa):
+
+![Vector Transpose](/content/learn/tensors/transposing-tensors/vector-transpose.png)
+
+**Example:**
+
+```python
+import torch
+
+# Horizontal vector (row)
+v = torch.tensor([1, 2, 3, 4])
+print(v.shape)  # torch.Size([4])
+
+# Transpose to vertical (column)
+v_t = v.T
+print(v_t)
+# tensor([[1],
+#         [2],
+#         [3],
+#         [4]])
+print(v_t.shape)  # torch.Size([4, 1])
+```
+
+**Manual visualization:**
+
+```yaml
+Before: [1, 2, 3, 4]  →  Shape: (4,)
+
+After:  [[1],
+         [2],
+         [3],
+         [4]]          →  Shape: (4, 1)
+```
+
+## Matrix Transpose
+
+This is where transpose really shines! Rows become columns, columns become rows:
+
+![Matrix Transpose](/content/learn/tensors/transposing-tensors/matrix-transpose.png)
+
+**Example:**
+
+```python
+import torch
+
+# Original matrix: 2 rows, 3 columns
+A = torch.tensor([[1, 2, 3],
+                  [4, 5, 6]])
+
+print(A.shape)  # torch.Size([2, 3])
+
+# Transpose: 3 rows, 2 columns
+A_T = A.T
+
+print(A_T)
+# tensor([[1, 4],
+#         [2, 5],
+#         [3, 6]])
+
+print(A_T.shape)  # torch.Size([3, 2])
+```
+
+**Manual calculation:**
+
+```yaml
+Original (2×3):
+[[1, 2, 3],
+ [4, 5, 6]]
+
+Transpose (3×2):
+[[1, 4],    ← First column becomes first row
+ [2, 5],    ← Second column becomes second row
+ [3, 6]]    ← Third column becomes third row
+```
+
+## How Elements Move
+
+Here's exactly what happens to each element during transpose:
+
+![Transpose Detailed](/content/learn/tensors/transposing-tensors/transpose-detailed.png)
+
+**The pattern:** Position `[i, j]` → Position `[j, i]`
+
+**Example tracking specific elements:**
+
+```yaml
+Original position → Transposed position
+
+[0, 0]: value 1  →  [0, 0]: value 1  (stays in place)
+[0, 1]: value 2  →  [1, 0]: value 2  (row 0, col 1 → row 1, col 0)
+[0, 2]: value 3  →  [2, 0]: value 3
+[1, 0]: value 4  →  [0, 1]: value 4
+[1, 1]: value 5  →  [1, 1]: value 5  (stays in place)
+[1, 2]: value 6  →  [2, 1]: value 6
+```
+
+**Key rule:** Just swap the two indices! `[i, j]` becomes `[j, i]`
+
+## Square Matrix Transpose
+
+Square matrices (same number of rows and columns) have a special property:
+
+![Square Transpose](/content/learn/tensors/transposing-tensors/square-transpose.png)
+
+**Example:**
+
+```python
+import torch
+
+A = torch.tensor([[1, 2, 3],
+                  [4, 5, 6],
+                  [7, 8, 9]])
+
+print(A.shape)  # torch.Size([3, 3])
+
+A_T = A.T
+print(A_T)
+# tensor([[1, 4, 7],
+#         [2, 5, 8],
+#         [3, 6, 9]])
+
+print(A_T.shape)  # torch.Size([3, 3])
+```
+
+**What happens:**
+
+```yaml
+Original:           Transposed:
+[[1, 2, 3],        [[1, 4, 7],
+ [4, 5, 6],   →     [2, 5, 8],
+ [7, 8, 9]]         [3, 6, 9]]
+
+Diagonal (1, 5, 9) stays in place!
+Everything else flips across the diagonal.
+```
+
+**The diagonal stays put:** Elements where row = column don't move!
+
+## Shape Changes
+
+The shape always flips:
+
+```python
+# Examples of shape changes
+original_shape = (2, 3)
+transposed_shape = (3, 2)
+
+original_shape = (5, 7)
+transposed_shape = (7, 5)
+
+original_shape = (4, 4)  # Square
+transposed_shape = (4, 4)  # Still square!
+```
+
+**Quick reference:**
+
+```yaml
+(2, 3) → (3, 2)
+(5, 1) → (1, 5)
+(10, 20) → (20, 10)
+(n, m) → (m, n)  ← General pattern
+```
+
+## Why Do We Transpose?
+
+The most common reason: **making shapes compatible for matrix multiplication!**
+
+![Why Transpose](/content/learn/tensors/transposing-tensors/why-transpose.png)
+
+**Example:**
+
+```python
+import torch
+
+A = torch.randn(2, 3)  # Shape: (2, 3)
+B = torch.randn(2, 4)  # Shape: (2, 4)
+
+# This WON'T work - shapes incompatible
+# result = A @ B  # Error! 3 ≠ 2
+
+# Transpose B to make it work!
+B_T = B.T  # Shape: (4, 2)
+
+# Now this works!
+result = A @ B_T  # (2, 3) @ (4, 2)? Wait, still wrong!
+
+# Actually, we need different dimensions
+# Let's try a real example:
+A = torch.randn(2, 3)
+B = torch.randn(4, 3)  # Same inner dimension as A's columns
+
+# Without transpose - doesn't work
+# result = A @ B  # Error! (2,3) @ (4,3) - 3 ≠ 4
+
+# With transpose - works!
+result = A @ B.T  # (2,3) @ (3,4) = (2,4) ✓
+
+print(result.shape)  # torch.Size([2, 4])
+```
+
+**Real example with actual values:**
+
+```python
+import torch
+
+# Two data samples with 3 features each
+X = torch.tensor([[1.0, 2.0, 3.0],
+                  [4.0, 5.0, 6.0]])  # Shape: (2, 3)
+
+# Weight matrix: 3 inputs, 2 outputs (we want this orientation)
+W = torch.tensor([[0.1, 0.2],
+                  [0.3, 0.4],
+                  [0.5, 0.6]])  # Shape: (3, 2)
+
+# This works!
+output = X @ W  # (2, 3) @ (3, 2) = (2, 2)
+print(output)
+# tensor([[2.2000, 2.8000],
+#         [4.9000, 6.4000]])
+
+# But if W was stored transposed...
+W_stored = W.T  # Shape: (2, 3)
+
+# We need to transpose it back
+output = X @ W_stored.T  # (2, 3) @ (3, 2) = (2, 2)
+print(output)  # Same result!
+```
+
+## Practical Examples
+
+### Example 1: Computing Dot Products
+
+```python
+import torch
+
+# Two vectors
+a = torch.tensor([1, 2, 3])
+b = torch.tensor([4, 5, 6])
+
+# Can't use @ directly on 1D tensors for matrix multiply
+# But we can reshape and transpose!
+
+a_col = a.reshape(-1, 1)  # Column vector (3, 1)
+b_row = b.reshape(1, -1)  # Row vector (1, 3)
+
+# Outer product
+outer = a_col @ b_row  # (3, 1) @ (1, 3) = (3, 3)
+print(outer)
+# tensor([[ 4,  5,  6],
+#         [ 8, 10, 12],
+#         [12, 15, 18]])
+
+# Inner product (dot product)
+inner = b_row @ a_col  # (1, 3) @ (3, 1) = (1, 1)
+print(inner)  # tensor([[32]])
+```
+
+### Example 2: Batch Matrix Transpose
+
+```python
+import torch
+
+# Batch of 3 matrices, each 2×4
+batch = torch.randn(3, 2, 4)
+
+# Transpose last two dimensions
+batch_T = batch.transpose(-2, -1)  # Now (3, 4, 2)
+
+print(batch.shape)    # torch.Size([3, 2, 4])
+print(batch_T.shape)  # torch.Size([3, 4, 2])
+
+# Each of the 3 matrices got transposed individually!
+```
+
+### Example 3: Neural Network Weights
+
+```python
+import torch
+
+# In neural networks, weights are often stored transposed
+# for computational efficiency
+
+batch_size = 32
+input_features = 10
+output_features = 5
+
+# Input batch
+X = torch.randn(batch_size, input_features)  # (32, 10)
+
+# Weights stored as (input, output) for efficiency
+W = torch.randn(input_features, output_features)  # (10, 5)
+
+# Forward pass - works directly!
+output = X @ W  # (32, 10) @ (10, 5) = (32, 5) ✓
+
+# If weights were stored as (output, input) instead...
+W_alt = torch.randn(output_features, input_features)  # (5, 10)
+
+# Need to transpose
+output = X @ W_alt.T  # (32, 10) @ (10, 5) = (32, 5) ✓
+```
+
+## Common Gotchas
+
+### ❌ Gotcha 1: 1D Tensors Don't Change Much
+
+```python
+v = torch.tensor([1, 2, 3])
+v_t = v.T
+
+print(torch.equal(v, v_t))  # True!
+# 1D tensors look the same after transpose!
+```
+
+To actually change a 1D tensor, reshape it first:
+
+```python
+v = torch.tensor([1, 2, 3])
+v_col = v.reshape(-1, 1)  # Column vector
+
+print(v.shape)      # torch.Size([3])
+print(v_col.shape)  # torch.Size([3, 1])
+```
+
+### ❌ Gotcha 2: Transpose Creates a View
+
+```python
+A = torch.tensor([[1, 2], [3, 4]])
+A_T = A.T
+
+# Modifying A_T also modifies A!
+A_T[0, 0] = 999
+
+print(A)
+# tensor([[999,   2],
+#         [  3,   4]])
+
+# Use .clone() if you want a copy
+A_T_copy = A.T.clone()
+A_T_copy[0, 0] = 42
+# A is unchanged
+```
+
+## Key Takeaways
+
+✓ **Transpose swaps rows and columns:** `[i, j]` → `[j, i]`
+
+✓ **Shape flips:** `(m, n)` → `(n, m)`
+
+✓ **Main use:** Making shapes compatible for matrix multiplication
+
+✓ **Diagonal stays:** In square matrices, diagonal elements don't move
+
+✓ **Use `.T`:** Simple and clean syntax in PyTorch
+
+**Quick Reference:**
+
+```python
+# Basic transpose
+A = torch.tensor([[1, 2, 3], [4, 5, 6]])
+A_T = A.T  # Shape: (2,3) → (3,2)
+
+# For 3D+ tensors, specify dimensions
+B = torch.randn(5, 10, 20)
+B_T = B.transpose(1, 2)  # Swap dimensions 1 and 2
+# Shape: (5, 10, 20) → (5, 20, 10)
+
+# Transpose last two dimensions (common in batch operations)
+C = torch.randn(8, 4, 6)
+C_T = C.transpose(-2, -1)  # or C.transpose(1, 2)
+# Shape: (8, 4, 6) → (8, 6, 4)
+```
+
+**Remember:** Transposing is just flipping! Rows → Columns, Columns → Rows. That's it! 🎉
diff --git a/public/content/learn/tensors/transposing-tensors/vector-transpose.png b/public/content/learn/tensors/transposing-tensors/vector-transpose.png
new file mode 100644
index 0000000..d079e05
Binary files /dev/null and b/public/content/learn/tensors/transposing-tensors/vector-transpose.png differ
diff --git a/public/content/learn/tensors/transposing-tensors/why-transpose.png b/public/content/learn/tensors/transposing-tensors/why-transpose.png
new file mode 100644
index 0000000..ad240c1
Binary files /dev/null and b/public/content/learn/tensors/transposing-tensors/why-transpose.png differ
diff --git a/public/content/learn/transformer-feedforward/combining-experts/combining-experts-content.md b/public/content/learn/transformer-feedforward/combining-experts/combining-experts-content.md
new file mode 100644
index 0000000..1e5e806
--- /dev/null
+++ b/public/content/learn/transformer-feedforward/combining-experts/combining-experts-content.md
@@ -0,0 +1,63 @@
+---
+hero:
+  title: "Combining Experts"
+  subtitle: "Weighted Combination of Expert Outputs"
+  tags:
+    - "🔀 MoE"
+    - "⏱️ 8 min read"
+---
+
+After routing, we combine expert outputs using router weights!
+
+## Combining Formula
+
+**Output = Σ (router_weight_i × expert_i(x))**
+
+```python
+import torch
+
+# Router selected experts 2 and 5 with weights
+expert_indices = [2, 5]
+expert_weights = [0.6, 0.4]
+
+# Expert outputs
+expert_2_output = torch.tensor([1.0, 2.0, 3.0])
+expert_5_output = torch.tensor([4.0, 5.0, 6.0])
+
+# Weighted combination
+final_output = 0.6 * expert_2_output + 0.4 * expert_5_output
+print(final_output)
+# tensor([2.2000, 3.2000, 4.2000])
+```
+
+## Complete MoE Forward
+
+```python
+def moe_forward(x, experts, router):
+    # Get routing decisions
+    weights, indices = router(x, top_k=2)
+    
+    # Combine expert outputs
+    output = torch.zeros_like(x)
+    
+    for i in range(len(experts)):
+        # Mask for tokens using this expert
+        expert_mask = (indices == i).any(dim=-1)
+        
+        if expert_mask.any():
+            expert_out = experts[i](x[expert_mask])
+            expert_weight = weights[expert_mask][(indices[expert_mask] == i).any(dim=-1)]
+            output[expert_mask] += expert_weight.unsqueeze(-1) * expert_out
+    
+    return output
+```
+
+## Key Takeaways
+
+✓ **Weighted sum:** Combine based on router weights
+
+✓ **Sparse:** Only use selected experts
+
+✓ **Efficient:** Skip unused experts
+
+**Remember:** Combining is just weighted averaging! 🎉
diff --git a/public/content/learn/transformer-feedforward/moe-in-a-transformer/moe-in-a-transformer-content.md b/public/content/learn/transformer-feedforward/moe-in-a-transformer/moe-in-a-transformer-content.md
new file mode 100644
index 0000000..c1209b7
--- /dev/null
+++ b/public/content/learn/transformer-feedforward/moe-in-a-transformer/moe-in-a-transformer-content.md
@@ -0,0 +1,63 @@
+---
+hero:
+  title: "MoE in a Transformer"
+  subtitle: "Integrating Mixture of Experts"
+  tags:
+    - "🔀 MoE"
+    - "⏱️ 10 min read"
+---
+
+MoE replaces the standard FFN in transformer blocks with a sparse expert layer!
+
+## MoE Transformer Block
+
+```python
+import torch.nn as nn
+
+class MoETransformerBlock(nn.Module):
+    def __init__(self, d_model, n_heads, num_experts=8):
+        super().__init__()
+        
+        # Attention (same as standard)
+        self.attention = nn.MultiheadAttention(d_model, n_heads, batch_first=True)
+        
+        # MoE instead of FFN
+        self.moe = MixtureOfExperts(d_model, num_experts)
+        
+        # Normalization
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+    
+    def forward(self, x):
+        # Attention
+        attn_out, _ = self.attention(x, x, x)
+        x = self.norm1(x + attn_out)
+        
+        # MoE (replaces FFN)
+        moe_out = self.moe(x)
+        x = self.norm2(x + moe_out)
+        
+        return x
+```
+
+## Key Difference
+
+```yaml
+Standard Transformer:
+  Attention → FFN → Output
+
+MoE Transformer:
+  Attention → MoE → Output
+              ↑
+  (Sparse expert routing)
+```
+
+## Key Takeaways
+
+✓ **Drop-in replacement:** MoE replaces FFN
+
+✓ **Same interface:** Input/output shapes unchanged
+
+✓ **More capacity:** Many experts, sparse activation
+
+**Remember:** MoE makes transformers bigger without more compute! 🎉
diff --git a/public/content/learn/transformer-feedforward/moe-in-code/moe-in-code-content.md b/public/content/learn/transformer-feedforward/moe-in-code/moe-in-code-content.md
new file mode 100644
index 0000000..be9f0b0
--- /dev/null
+++ b/public/content/learn/transformer-feedforward/moe-in-code/moe-in-code-content.md
@@ -0,0 +1,88 @@
+---
+hero:
+  title: "MoE in Code"
+  subtitle: "Complete MoE Implementation"
+  tags:
+    - "🔀 MoE"
+    - "⏱️ 10 min read"
+---
+
+Complete, working Mixture of Experts implementation!
+
+## Full MoE Layer
+
+```python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+class MixtureOfExperts(nn.Module):
+    def __init__(self, d_model, num_experts=8, top_k=2, d_ff=None):
+        super().__init__()
+        self.num_experts = num_experts
+        self.top_k = top_k
+        
+        if d_ff is None:
+            d_ff = 4 * d_model
+        
+        # Create experts
+        self.experts = nn.ModuleList([
+            nn.Sequential(
+                nn.Linear(d_model, d_ff),
+                nn.ReLU(),
+                nn.Linear(d_ff, d_model)
+            )
+            for _ in range(num_experts)
+        ])
+        
+        # Router
+        self.router = nn.Linear(d_model, num_experts)
+    
+    def forward(self, x):
+        batch_size, seq_len, d_model = x.size()
+        x_flat = x.view(-1, d_model)
+        
+        # Route
+        router_logits = self.router(x_flat)
+        router_probs = F.softmax(router_logits, dim=-1)
+        
+        # Top-k
+        top_k_probs, top_k_indices = torch.topk(router_probs, self.top_k, dim=-1)
+        top_k_probs = top_k_probs / top_k_probs.sum(dim=-1, keepdim=True)
+        
+        # Apply experts
+        output = torch.zeros_like(x_flat)
+        
+        for expert_idx in range(self.num_experts):
+            # Tokens for this expert
+            mask = (top_k_indices == expert_idx).any(dim=-1)
+            
+            if mask.any():
+                expert_input = x_flat[mask]
+                expert_output = self.experts[expert_idx](expert_input)
+                
+                # Weight by router probability
+                for k in range(self.top_k):
+                    token_mask = (top_k_indices[:, k] == expert_idx)
+                    if token_mask.any():
+                        output[token_mask] += top_k_probs[token_mask, k].unsqueeze(-1) * expert_output
+        
+        output = output.view(batch_size, seq_len, d_model)
+        return output
+
+# Test
+moe = MixtureOfExperts(d_model=512, num_experts=8, top_k=2)
+x = torch.randn(2, 10, 512)
+output = moe(x)
+print(output.shape)  # torch.Size([2, 10, 512])
+```
+
+## Key Takeaways
+
+✓ **Complete implementation:** Production-ready code
+
+✓ **Routing:** Each token to top-k experts
+
+✓ **Efficient:** Sparse computation
+
+**Remember:** MoE is routing + expert combination! 🎉
diff --git a/public/content/learn/transformer-feedforward/the-deepseek-mlp/the-deepseek-mlp-content.md b/public/content/learn/transformer-feedforward/the-deepseek-mlp/the-deepseek-mlp-content.md
new file mode 100644
index 0000000..67686d0
--- /dev/null
+++ b/public/content/learn/transformer-feedforward/the-deepseek-mlp/the-deepseek-mlp-content.md
@@ -0,0 +1,83 @@
+---
+hero:
+  title: "The DeepSeek MLP"
+  subtitle: "DeepSeek's Efficient MoE Design"
+  tags:
+    - "🔀 MoE"
+    - "⏱️ 10 min read"
+---
+
+DeepSeek-MoE uses an efficient MLP design that reduces parameters while maintaining performance!
+
+## DeepSeek MoE Architecture
+
+**Key innovation: Shared expert + Routed experts**
+
+```python
+import torch
+import torch.nn as nn
+
+class DeepSeekMoE(nn.Module):
+    def __init__(self, d_model, num_experts=64, top_k=6):
+        super().__init__()
+        
+        # Shared expert (always active)
+        self.shared_expert = nn.Sequential(
+            nn.Linear(d_model, d_model * 4),
+            nn.SiLU(),
+            nn.Linear(d_model * 4, d_model)
+        )
+        
+        # Routed experts
+        self.experts = nn.ModuleList([
+            nn.Sequential(
+                nn.Linear(d_model, d_model // 4),  # Smaller!
+                nn.SiLU(),
+                nn.Linear(d_model // 4, d_model)
+            )
+            for _ in range(num_experts)
+        ])
+        
+        # Router
+        self.router = nn.Linear(d_model, num_experts)
+        self.top_k = top_k
+    
+    def forward(self, x):
+        # Shared expert (all tokens)
+        shared_out = self.shared_expert(x)
+        
+        # Route to top-k experts
+        router_logits = self.router(x)
+        router_probs = F.softmax(router_logits, dim=-1)
+        top_k_probs, top_k_indices = torch.topk(router_probs, self.top_k, dim=-1)
+        
+        # Combine routed experts
+        routed_out = self.route_and_combine(x, top_k_probs, top_k_indices)
+        
+        # Final output
+        output = shared_out + routed_out
+        return output
+```
+
+## Why It's Efficient
+
+```yaml
+Standard MoE:
+  64 experts × (d_model → 4*d_model → d_model)
+  = 64 × 8d² parameters
+
+DeepSeek MoE:
+  1 shared × 8d² parameters
+  + 64 routed × 0.5d² parameters (smaller experts!)
+  = Much fewer parameters!
+```
+
+## Key Takeaways
+
+✓ **Shared expert:** Always active for all tokens
+
+✓ **Smaller routed experts:** More efficient
+
+✓ **Better performance:** Despite fewer parameters
+
+**Remember:** DeepSeek MoE is efficient MoE! 🎉
diff --git a/public/content/learn/transformer-feedforward/the-expert/the-expert-content.md b/public/content/learn/transformer-feedforward/the-expert/the-expert-content.md
new file mode 100644
index 0000000..7e1f7d7
--- /dev/null
+++ b/public/content/learn/transformer-feedforward/the-expert/the-expert-content.md
@@ -0,0 +1,77 @@
+---
+hero:
+  title: "The Expert"
+  subtitle: "Individual Expert Networks in MoE"
+  tags:
+    - "🔀 MoE"
+    - "⏱️ 8 min read"
+---
+
+An expert is a **specialized feedforward network** in the Mixture of Experts architecture!
+
+## Expert Structure
+
+```python
+import torch
+import torch.nn as nn
+
+class Expert(nn.Module):
+    def __init__(self, d_model, d_ff):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(d_model, d_ff),
+            nn.SiLU(),  # Modern activation
+            nn.Linear(d_ff, d_model)
+        )
+    
+    def forward(self, x):
+        return self.net(x)
+
+# Create expert
+expert = Expert(d_model=512, d_ff=2048)
+x = torch.randn(10, 512)
+output = expert(x)
+print(output.shape)  # torch.Size([10, 512])
+```
+
+## Multiple Experts
+
+```python
+num_experts = 8
+
+experts = nn.ModuleList([
+    Expert(d_model=512, d_ff=2048)
+    for _ in range(num_experts)
+])
+
+# Each expert specializes in different patterns!
+# Expert 0: Maybe handles technical text
+# Expert 1: Maybe handles conversational text
+# Expert 2: Maybe handles code
+# etc.
+```
+
+## Expert Specialization
+
+```yaml
+During training:
+  - Router learns which expert for which pattern
+  - Experts specialize automatically
+  - No manual assignment needed!
+
+Result:
+  - Expert 1: Good at math
+  - Expert 2: Good at grammar
+  - Expert 3: Good at facts
+  - etc.
+```
+
+## Key Takeaways
+
+✓ **Expert = FFN:** Same structure as standard feedforward
+
+✓ **Specialized:** Each learns different patterns
+
+✓ **Independent:** Trained separately via routing
+
+**Remember:** Experts are specialized sub-networks! 🎉
diff --git a/public/content/learn/transformer-feedforward/the-feedforward-layer/the-feedforward-layer-content.md b/public/content/learn/transformer-feedforward/the-feedforward-layer/the-feedforward-layer-content.md
new file mode 100644
index 0000000..bc1d5c4
--- /dev/null
+++ b/public/content/learn/transformer-feedforward/the-feedforward-layer/the-feedforward-layer-content.md
@@ -0,0 +1,43 @@
+---
+hero:
+  title: "The Feedforward Layer"
+  subtitle: "FFN in Transformer Blocks"
+  tags:
+    - "🔀 MoE"
+    - "⏱️ 8 min read"
+---
+
+The feedforward network (FFN) in transformers processes each position independently!
+
+## Structure
+
+```python
+import torch.nn as nn
+
+class FeedForward(nn.Module):
+    def __init__(self, d_model, d_ff, dropout=0.1):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(d_model, d_ff),
+            nn.ReLU(),
+            nn.Dropout(dropout),
+            nn.Linear(d_ff, d_model),
+            nn.Dropout(dropout)
+        )
+    
+    def forward(self, x):
+        return self.net(x)
+
+# Typical: d_ff = 4 × d_model
+ffn = FeedForward(d_model=512, d_ff=2048)
+```
+
+## Key Takeaways
+
+✓ **Two layers:** Expand then compress
+
+✓ **Position-wise:** Same FFN for each position
+
+✓ **Standard ratio:** d_ff = 4 × d_model
+
+**Remember:** FFN adds capacity after attention! 🎉
diff --git a/public/content/learn/transformer-feedforward/the-gate/the-gate-content.md b/public/content/learn/transformer-feedforward/the-gate/the-gate-content.md
new file mode 100644
index 0000000..dd611c1
--- /dev/null
+++ b/public/content/learn/transformer-feedforward/the-gate/the-gate-content.md
@@ -0,0 +1,56 @@
+---
+hero:
+  title: "The Gate"
+  subtitle: "Router Network in Mixture of Experts"
+  tags:
+    - "🔀 MoE"
+    - "⏱️ 8 min read"
+---
+
+The gate (router) decides **which experts each token should use**!
+
+## Router Implementation
+
+```python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+class Router(nn.Module):
+    def __init__(self, d_model, num_experts):
+        super().__init__()
+        self.gate = nn.Linear(d_model, num_experts)
+    
+    def forward(self, x, top_k=2):
+        # x: (batch, seq, d_model)
+        
+        # Compute routing scores
+        router_logits = self.gate(x)
+        router_probs = F.softmax(router_logits, dim=-1)
+        
+        # Select top-k experts
+        top_k_probs, top_k_indices = torch.topk(router_probs, top_k, dim=-1)
+        
+        # Normalize
+        top_k_probs = top_k_probs / top_k_probs.sum(dim=-1, keepdim=True)
+        
+        return top_k_probs, top_k_indices
+
+# Use it
+router = Router(d_model=512, num_experts=8)
+x = torch.randn(2, 10, 512)
+probs, indices = router(x, top_k=2)
+
+print(probs.shape)    # torch.Size([2, 10, 2])
+print(indices.shape)  # torch.Size([2, 10, 2])
+```
+
+## Key Takeaways
+
+✓ **Router:** Selects which experts to use
+
+✓ **Top-K:** Usually top-2 experts per token
+
+✓ **Learnable:** Router weights are trained
+
+**Remember:** The gate is the traffic controller! 🎉
diff --git a/public/content/learn/transformer-feedforward/what-is-mixture-of-experts/moe-routing.png b/public/content/learn/transformer-feedforward/what-is-mixture-of-experts/moe-routing.png
new file mode 100644
index 0000000..efc3589
Binary files /dev/null and b/public/content/learn/transformer-feedforward/what-is-mixture-of-experts/moe-routing.png differ
diff --git a/public/content/learn/transformer-feedforward/what-is-mixture-of-experts/what-is-mixture-of-experts-content.md b/public/content/learn/transformer-feedforward/what-is-mixture-of-experts/what-is-mixture-of-experts-content.md
new file mode 100644
index 0000000..92a71f2
--- /dev/null
+++ b/public/content/learn/transformer-feedforward/what-is-mixture-of-experts/what-is-mixture-of-experts-content.md
@@ -0,0 +1,107 @@
+---
+hero:
+  title: "What is Mixture of Experts"
+  subtitle: "Sparse Expert Models Explained"
+  tags:
+    - "🔀 MoE"
+    - "⏱️ 10 min read"
+---
+
+Mixture of Experts (MoE) uses **multiple specialized sub-networks (experts)** and routes inputs to the most relevant ones!
+
+![MoE Routing](/content/learn/transformer-feedforward/what-is-mixture-of-experts/moe-routing.png)
+
+## The Core Idea
+
+Instead of one big feedforward network:
+- Have many smaller expert networks
+- Route each token to top-K experts
+- Combine expert outputs
+
+```yaml
+Traditional FFN:
+  All tokens → Same FFN → Output
+
+MoE:
+  Token 1 → Expert 2 + Expert 5 → Output
+  Token 2 → Expert 1 + Expert 3 → Output
+  Token 3 → Expert 2 + Expert 7 → Output
+  
+Each token uses different experts!
+```
+
+## Simple Example
+
+```python
+import torch
+import torch.nn as nn
+
+class SimpleMoE(nn.Module):
+    def __init__(self, d_model, num_experts=8):
+        super().__init__()
+        
+        # Multiple expert networks
+        self.experts = nn.ModuleList([
+            nn.Sequential(
+                nn.Linear(d_model, d_model * 4),
+                nn.ReLU(),
+                nn.Linear(d_model * 4, d_model)
+            )
+            for _ in range(num_experts)
+        ])
+        
+        # Router (chooses which experts to use)
+        self.router = nn.Linear(d_model, num_experts)
+    
+    def forward(self, x):
+        # x: (batch, seq, d_model)
+        
+        # Router scores
+        router_logits = self.router(x)
+        router_probs = F.softmax(router_logits, dim=-1)
+        
+        # Get top-2 experts
+        top_k_probs, top_k_indices = torch.topk(router_probs, k=2, dim=-1)
+        
+        # Route to experts
+        output = torch.zeros_like(x)
+        for i in range(len(self.experts)):
+            # Find tokens routed to this expert
+            mask = (top_k_indices == i).any(dim=-1)
+            if mask.any():
+                expert_output = self.experts[i](x[mask])
+                output[mask] += expert_output * top_k_probs[mask, (top_k_indices[mask] == i).argmax(dim=-1)].unsqueeze(-1)
+        
+        return output
+```
+
+## Why MoE?
+
+```yaml
+Benefits:
+  ✓ Huge capacity (many parameters)
+  ✓ Efficient (only use few experts per token)
+  ✓ Specialization (experts learn different patterns)
+  
+Trade-offs:
+  ✗ Complex training
+  ✗ Load balancing needed
+  ✗ More memory
+```
+
+## Used In
+
+- Switch Transformer
+- DeepSeek-MoE
+- Mixtral
+- GPT-4 (rumored)
+
+## Key Takeaways
+
+✓ **Multiple experts:** Specialized sub-networks
+
+✓ **Sparse routing:** Each token uses few experts
+
+✓ **Scalable:** Add experts without much compute cost
+
+**Remember:** MoE = specialized experts for different patterns! 🎉