orionrobots · Copilot · Jul 30, 2025 · Jul 30, 2025 · Jul 30, 2025 · Jul 30, 2025
diff --git a/.eleventyignore b/.eleventyignore
@@ -1,3 +1,4 @@
 ./README.md
 ./_image_sources
 ./_drafts
+./.github
diff --git a/.github/linkchecker/Dockerfile b/.github/linkchecker/Dockerfile
@@ -0,0 +1,15 @@
+FROM ubuntu:22.04
+RUN apt-get -y update && \
+    apt-get install -y ca-certificates linkchecker python3-pip curl --no-install-recommends \
+    && apt-get clean && \
+    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
+RUN pip3 install --trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org jinja2
+
+WORKDIR /linkchecker
+COPY filter_csv.py output_template.html linkchecker.conf run_linkcheck.sh ./
+
+# Make script executable
+RUN chmod +x run_linkcheck.sh
+
+# Default command to run linkchecker
+CMD ["linkchecker", "--config=linkchecker.conf"]
diff --git a/.github/linkchecker/README.md b/.github/linkchecker/README.md
@@ -0,0 +1,138 @@
+# OrionRobots Link Checker
+
+This directory contains the link checking functionality for the OrionRobots website, designed to detect broken links with a focus on image links and internal broken links.
+
+## 🎯 Features
+
+- **Image-focused checking**: Prioritizes broken image links that affect visual content
+- **Categorized results**: Separates internal, external, image, and email links
+- **HTML reports**: Generates detailed, styled reports with priority indicators
+- **Docker integration**: Runs in isolated containers for consistency
+- **CI/CD integration**: Automated nightly checks and PR-based checks
+
+## 🚀 Usage
+
+### Local Usage
+
+Run the link checker locally using the provided script:
+
+```bash
+./.github/scripts/local_linkcheck.sh
+```
+
+This will:
+1. Build the site
+2. Start a local HTTP server
+3. Run the link checker
+4. Generate a report in `./linkchecker_reports/`
+5. Clean up containers
+
+### Manual Docker Compose
+
+You can also run individual services manually:
+
+```bash
+# Build and serve the site
+docker compose --profile manual up -d http_serve
+
+# Run link checker
+docker compose --profile manual up broken_links
+
+# View logs
+docker compose logs broken_links
+
+# Cleanup
+docker compose down
+```
+
+### GitHub Actions Integration
+
+#### Nightly Checks
+- Runs every night at 2 AM UTC
+- Checks the production site (https://orionrobots.co.uk)
+- Creates warnings for broken links
+- Uploads detailed reports as artifacts
+
+#### PR-based Checks
+- Triggered when a PR is labeled with `link-check`
+- Deploys a staging version of the PR
+- Runs link checker on the staging deployment
+- Comments results on the PR
+- Automatically cleans up staging deployment
+
+To run link checking on a PR:
+1. Add the `link-check` label to the PR
+2. The workflow will automatically deploy staging and run checks
+3. Results will be commented on the PR
+
+## 📁 Files
+
+- `Dockerfile`: Container definition for the link checker
+- `linkchecker.conf`: Configuration for linkchecker tool
+- `filter_csv.py`: Python script to process and categorize results
+- `output_template.html`: HTML template for generating reports
+- `run_linkcheck.sh`: Main script that orchestrates the checking process
+
+## 📊 Report Categories
+
+The generated reports categorize broken links by priority:
+
+1. **🖼️ Images** (High Priority): Broken image links that affect visual content
+2. **🏠 Internal Links** (High Priority): Broken internal links under our control
+3. **🌐 External Links** (Medium Priority): Broken external links (may be temporary)
+4. **📧 Email Links** (Low Priority): Broken email links (complex to validate)
+
+## ⚙️ Configuration
+
+The link checker configuration in `linkchecker.conf` includes:
+
+- **Recursion**: Checks up to 10 levels deep
+- **Output**: CSV format for easy processing
+- **Filtering**: Ignores common social media sites that block crawlers
+- **Anchor checking**: Validates internal page anchors
+- **Warning handling**: Configurable warning levels
+
+## 🔧 Customization
+
+To modify the link checking behavior:
+
+1. **Change checking depth**: Edit `recursionlevel` in `linkchecker.conf`
+2. **Add ignored URLs**: Add patterns to the `ignore` section in `linkchecker.conf`
+3. **Modify report styling**: Edit `output_template.html`
+4. **Change categorization**: Modify `filter_csv.py`
+
+## 🐳 Docker Integration
+
+The link checker integrates with the existing Docker Compose setup:
+
+- Uses the `http_serve` service as the target
+- Depends on health checks to ensure site availability
+- Outputs reports to a mounted volume for persistence
+- Runs in the `manual` profile to avoid automatic execution
+
+## 📋 Requirements
+
+- Docker and Docker Compose
+- Python 3 with Jinja2 (handled in container)
+- linkchecker tool (handled in container)
+- curl for health checks (handled in container)
+
+## 🔍 Troubleshooting
+
+### Site not available
+If you get "Site not available" errors:
+1. Ensure the site builds successfully first
+2. Check that the HTTP server is running
+3. Verify port 8082 is not in use
+
+### Permission errors
+If you get permission errors with volumes:
+1. Check Docker permissions
+2. Ensure the linkchecker_reports directory exists
+3. Try running with sudo (not recommended for production)
+
+### Missing dependencies
+If linkchecker fails to run:
+1. Check the Dockerfile builds successfully
+2. Verify Python dependencies are installed
+3. Check linkchecker configuration syntax
diff --git a/.github/linkchecker/filter_csv.py b/.github/linkchecker/filter_csv.py
@@ -0,0 +1,80 @@
+# -*- coding: utf-8 -*-
+import csv
+import sys
+import os
+from urllib.parse import urlparse
+
+from jinja2 import Environment, FileSystemLoader, select_autoescape
+
+
+def is_image_url(url):
+    """Check if URL points to an image file"""
+    image_extensions = {'.jpg', '.jpeg', '.png', '.gif', '.svg', '.webp', '.ico', '.bmp'}
+    parsed = urlparse(url)
+    path = parsed.path.lower()
+    return any(path.endswith(ext) for ext in image_extensions)
+
+
+def categorize_link(item):
+    """Categorize link by type"""
+    url = item['url']
+    if is_image_url(url):
+        return 'image'
+    elif url.startswith('mailto:'):
+        return 'email'
+    elif url.startswith('http'):
+        return 'external'
+    else:
+        return 'internal'
+
+
+def output_file(items):
+    # Get the directory where this script is located
+    script_dir = os.path.dirname(os.path.abspath(__file__))
+    env = Environment(
+        loader=FileSystemLoader(script_dir),
+        autoescape=select_autoescape(['html', 'xml'])
+    )
+    template = env.get_template('output_template.html')
+
+    # Categorize items
+    categorized = {}
+    for item in items:
+        category = categorize_link(item)
+        if category not in categorized:
+            categorized[category] = []
+        categorized[category].append(item)
+
+    print(template.render(
+        categorized=categorized, 
+        total_count=len(items),
+        image_count=len(categorized.get('image', [])),
+        internal_count=len(categorized.get('internal', [])),
+        external_count=len(categorized.get('external', [])),
+        email_count=len(categorized.get('email', []))
+    ))
+
+
+def main():
+    filename = sys.argv[1] if len(sys.argv) > 1 else '/linkchecker/output.csv'
+
+    if not os.path.exists(filename):
+        print(f"Error: CSV file {filename} not found")
+        sys.exit(1)
+
+    with open(filename, encoding='utf-8') as csv_file:
+        data = csv_file.readlines()
+    reader = csv.DictReader((row for row in data if not row.startswith('#')), delimiter=';')
+
+    # Filter out successful links and redirects
+    non_200 = (item for item in reader if 'OK' not in item['result'])
+    non_redirect = (item for item in non_200 if '307' not in item['result'] and '301' not in item['result'] and '302' not in item['result'])
+    non_ssl = (item for item in non_redirect if 'ssl' not in item['result'].lower())
+
+    total_list = sorted(list(non_ssl), key=lambda item: (categorize_link(item), item['parentname']))
+
+    output_file(total_list)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/.github/linkchecker/linkchecker.conf b/.github/linkchecker/linkchecker.conf
@@ -0,0 +1,44 @@
+[checking]
+# Check links with limited recursion for faster execution
+recursionlevel=2
+# Focus on internal links
+allowedschemes=http,https,file
+# Check for broken images specifically
+checkextern=1
+# Limit number of URLs to check for faster execution
+maxrequestspersecond=10
+# Timeout for each request
+timeout=10
+# Hard time limit - 2 minutes maximum for PR checks
+maxrunseconds=120
+threads=4
+
+[output]
+# Output in CSV format for easier processing
+log=csv
+filename=/linkchecker_reports/output.csv
+# Also output to console
+verbose=1
+warnings=1
+
+[filtering]
+# Ignore certain file types that might cause issues
+ignorewarnings=url-whitespace,url-content-size-zero,url-content-too-large
+# Skip external social media links that often block crawlers
+ignore=
+    url:facebook\.com
+    url:twitter\.com
+    url:instagram\.com
+    url:linkedin\.com
+    url:youtube\.com
+    url:tiktok\.com
+
+[AnchorCheck]
+# Check for broken internal anchors
+add=1
+
+[authentication]
+# No authentication required for most checks
+
+[plugins]
+# No additional plugins needed for basic checking