Security Features

Overview

eml2pdf includes security features to protect users from potentially malicious content embedded in email messages. These features are implemented in eml2pdf/security.py and are enabled by default during PDF generation.

HTML Sanitization

The primary security mechanism is the sanitize_html() function in eml2pdf/security.py, which removes or neutralizes potentially dangerous HTML elements and attributes before PDF generation.

Protected Against

1. Script Execution and Active Content

The sanitizer removes tags that can execute code or load external active content:

<script> - JavaScript execution
<iframe> - Embedded frames that can load external content
<object> - Embedded objects (Flash, Java applets, etc.)
<embed> - Alternative embedding mechanism
<video> - Video elements with potential codec vulnerabilities
<audio> - Audio elements with potential codec vulnerabilities

2. Data Exfiltration via Forms

Forms can be used to send data to external servers:

<form> - Form elements that could submit data

3. Privacy Leaks via Remote Resources

Remote resources can be used to track email opens and gather information:

Remote Images

Images with src attributes starting with http:// or https:// are removed
This prevents tracking pixels and web beacons
Only embedded images (using data URIs or CID references) are allowed

CSS URL References

Inline style attributes containing url() functions are cleared
Prevents loading of remote stylesheets, fonts, or background images

External Links

Links with href attributes starting with http:// or https:// are replaced with #
Prevents clickable links that could lead to malicious sites
Note: This doesn't remove the link text, just neutralizes the destination

4. Cross-Site Scripting (XSS) Vectors

Event handlers and data attributes can be exploited for XSS attacks:

Event Handlers

All attributes starting with on are removed (e.g., onclick, onload, onerror)
Prevents JavaScript execution via event handlers

Data Attributes

All attributes starting with data- are removed
Prevents storage of potentially malicious data for later exploitation

5. Metadata and Resource Loading

Additional tags that can affect page behavior or load external resources:

<meta> - Meta tags that could redirect or refresh
<link> - Linked resources like stylesheets, icons, or prefetch hints

Usage

Default (Secure) Mode

By default, all HTML content is sanitized before PDF generation:

from eml2pdf import libeml2pdf

# Sanitization is enabled by default (unsafe=False)
libeml2pdf.process_eml(eml_path, output_dir)

Command-line:

python -m eml2pdf input/ output/

Unsafe Mode

The --unsafe flag disables HTML sanitization. This mode should only be used when:

You completely trust the source of the EML files
You understand the security risks involved

Warning again: Unsafe mode may expose sensitive user information through tracking pixels, external resources, and other privacy-invasive techniques. Consider running eml2pdf airgapped if you have reason to.

python -m eml2pdf input/ output/ --unsafe

Technical Details

Sanitization Process

HTML content is parsed using BeautifulSoup.
Risky tags are found and completely removed with .decompose().
Attributes are selectively filtered or modified.
The sanitized HTML is converted back to a string.

Integration with PDF Generation

The sanitization occurs in the generate_pdf() function in eml2pdf/libeml2pdf.py:

def generate_pdf(html_content: str, outfile_path: Path, infile: Path,
                 debug_html: bool = False, page: str = 'a4',
                 unsafe: bool = False):
    """Convert HTML to PDF."""
    if not unsafe:
        html_content = security.sanitize_html(html_content)
    # ... PDF generation continues ...

This ensures that all HTML content is sanitized immediately before PDF rendering, regardless of the source.

Limitations

Above is a listing of what we do and you have access to the code. Judge for yourself and use on your discretion. Suggestions are more than welcome!

Recommendations

Keep Default Security: Always use the default secure mode unless you have a specific, trusted use case
Verify Sources: Only process EML files from trusted sources
Sandbox Processing: Consider running eml2pdf in a sandboxed environment when processing untrusted emails. With no Internet access.
Update Dependencies: Keep BeautifulSoup, WeasyPrint, and other dependencies up to date

Related Files

Security implementation: eml2pdf/security.py
Security integration: eml2pdf/libeml2pdf.py
CLI unsafe flag: eml2pdf/eml2pdf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security

SECURITY.md

Security Features

Overview

HTML Sanitization

Protected Against

1. Script Execution and Active Content

2. Data Exfiltration via Forms

3. Privacy Leaks via Remote Resources

Remote Images

CSS URL References

External Links

4. Cross-Site Scripting (XSS) Vectors

Event Handlers

Data Attributes

5. Metadata and Resource Loading

Usage

Default (Secure) Mode

Unsafe Mode

Technical Details

Sanitization Process

Integration with PDF Generation

Limitations

Recommendations

Related Files

There aren’t any published security advisories

Security: plenaerts/eml2pdf

Security

SECURITY.md

Security Features

Overview

HTML Sanitization

Protected Against

1. Script Execution and Active Content

2. Data Exfiltration via Forms

3. Privacy Leaks via Remote Resources

Remote Images

CSS URL References

External Links

4. Cross-Site Scripting (XSS) Vectors

Event Handlers

Data Attributes

5. Metadata and Resource Loading

Usage

Default (Secure) Mode

Unsafe Mode

Technical Details

Sanitization Process

Integration with PDF Generation

Limitations

Recommendations

Related Files

There aren’t any published security advisories