Prevent URL in `Link` header from including invalid characters #1802

AhmarZaidi · 2025-01-14T07:30:47Z

Summary

This PR addresses an issue where non-ASCII characters in URL filenames caused HTTP headers to break reverse proxies and violate standards. The solution encodes only the filename part of URLs, ensuring compliance with ISO-8859-1 character requirements. This change maintains URL integrity while preventing potential issues with reverse proxies.

Example URL : https://testsite.com/wp-content/uploads/2025/01/חנות-scaled.avif
Corrected URL: https://testsite.com/wp-content/uploads/2025/01/%D7%97%D7%A0%D7%95%D7%AA-scaled.avif

Fixes #1775

Relevant technical choices

Implemented a solution to address the issue where non-ASCII characters in URLs, such as Hebrew characters in filenames, were causing HTTP headers to break reverse proxies and violate HTTP standards.
Updated the URL encoding logic to use a regular expression that matches any character not allowed by RFC 3986, ensuring that all non-ASCII and disallowed ASCII characters are percent-encoded.
Implemented preg_replace_callback() with a static closure to dynamically encode characters using rawurlencode(). This ensures that only characters outside the allowed set are encoded.
Add test cases to include international domain names, non-ASCII paths, and URLs with special characters.

codecov · 2025-01-14T07:38:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.97%. Comparing base (e793289) to head (c3c2512).
Report is 12 commits behind head on trunk.

Additional details and impacted files

@@            Coverage Diff             @@
##            trunk    #1802      +/-   ##
==========================================
+ Coverage   65.92%   65.97%   +0.04%     
==========================================
  Files          88       88              
  Lines        6885     6895      +10     
==========================================
+ Hits         4539     4549      +10     
  Misses       2346     2346

Flag	Coverage Δ
multisite	`65.97% <100.00%> (+0.04%)`	⬆️
single	`38.17% <0.00%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

westonruter

Thanks for the PR! I left a couple alternative suggestions.

Please add some test coverage for the lines not covered be tests.

plugins/optimization-detective/class-od-link-collection.php

westonruter · 2025-01-14T17:46:08Z

The solution encodes only the filename part of URLs

What about when an internationalized domain name is used? Couldn't this also cause problems with encoding?

westonruter · 2025-01-21T04:32:15Z

Additionally, on multisite subdirectory installs, in theory the path before wp-content could also include non-ASCII chars:

https://testsite.com/חנות/wp-content/uploads/2025/01/example-scaled.avif

…in-link-header

westonruter · 2025-01-29T21:50:56Z

@AhmarZaidi Hey, are you intending to pick this up again?

AhmarZaidi · 2025-01-30T06:33:02Z

@westonruter Yes, I'll be working on the changes right away.

AhmarZaidi · 2025-01-30T08:11:22Z

Instead of parsing the URL and then re-constructing it, what if you just check if the href has any non-ASCII chars and then encode the entire URL

@westonruter I've implemented the changes: If the path contains any non-ascii characters then we encode the whole URL.

However we can use preg_replace_callback to encode only the non-ascii characters in the URL. Let me know if that approach will be better.

westonruter · 2025-01-30T18:15:25Z

What you did looks good. Could you add some test cases for international domain names and non-ASCII paths to ensure the expected output?

AhmarZaidi · 2025-01-31T13:51:24Z

@westonruter The old solution was encoding the :// part also, so I've update the code slightly.
Current solution: Encode complete url after :// part.

Potential Issue: This solution converts the slashes (/) to %2F so the file path would be incorrect.
For example https://xn--fsq.com/תמונה.jpg will be encoded to https://xn--fsq.com%2F%D7%AA%D7%9E%D7%95%D7%A0%D7%94.jpg
Note: I've written the tests according to the above output.

If we use something like:

$decoded_url = urldecode( $link['href'] );
$encoded_url = preg_replace_callback(
        '/[^\x00-\x7F]/',
        fn( $matches ) => rawurlencode( $matches[0] ),
        $decoded_url
);

Then https://xn--fsq.com/תמונה.jpg will be encoded to https://xn--fsq.com/%D7%AA%D7%9E%D7%95%D7%A0%D7%94.jpg

Please feel free to let me know if we should do any further changes to the approach.

westonruter · 2025-01-31T16:28:41Z

If we use something like:

That looks good!

AhmarZaidi · 2025-01-31T19:52:35Z

Implemented the changes.

westonruter · 2025-01-31T19:56:29Z

So this is no longer a draft and is ready for review, correct?

github-actions · 2025-01-31T20:03:29Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Unlinked Accounts

The following contributors have not linked their GitHub and WordPress.org accounts: @amitay-elementor.

Contributors, please read how to link your accounts to ensure your work is properly credited in WordPress releases.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Unlinked contributors: amitay-elementor.

Co-authored-by: AhmarZaidi <[email protected]>
Co-authored-by: westonruter <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

plugins/optimization-detective/class-od-link-collection.php

plugins/optimization-detective/tests/test-class-od-link-collection.php

westonruter · 2025-01-31T20:28:00Z

Relevant technical choices

Could you update the description based on the latest revisions?

plugins/optimization-detective/class-od-link-collection.php

…in-link-header

westonruter

Thank you!

@felixarntz Anything we missed here?

…in-link-header

felixarntz

Thank you for the PR @AhmarZaidi, LGTM!

Optimize URL encoding logic in get_response_header

700ec77

AhmarZaidi mentioned this pull request Jan 14, 2025

Optimization detective can return non-ascii characters in the Link header, breaking some reverse proxies and HTTP standards. #1775

Closed

AhmarZaidi changed the title ~~Fix: Optimize URL encoding logic in get_response_header~~ Fix: Optimization detective can return non-ascii characters in the Link header, breaking some reverse proxies and HTTP standards Jan 14, 2025

westonruter reviewed Jan 14, 2025

View reviewed changes

plugins/optimization-detective/class-od-link-collection.php Outdated Show resolved Hide resolved

plugins/optimization-detective/class-od-link-collection.php Outdated Show resolved Hide resolved

westonruter added this to the optimization-detective n.e.x.t milestone Jan 14, 2025

westonruter added [Type] Bug An existing feature is broken [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Jan 14, 2025

westonruter modified the milestones: optimization-detective 1.0.0-beta1, optimization-detective n.e.x.t Jan 23, 2025

Merge branch 'trunk' into fix/optimization-detective-non-ascii-chars-…

cd4116d

…in-link-header

Upadate encoding logic to encode whole path

d1c5afd

Add tests and update url encode scheme separately

505c5bb

Encode only non ascii characters

50591ef

AhmarZaidi marked this pull request as ready for review January 31, 2025 20:03

AhmarZaidi requested a review from felixarntz as a code owner January 31, 2025 20:03

westonruter reviewed Jan 31, 2025

View reviewed changes

plugins/optimization-detective/class-od-link-collection.php Show resolved Hide resolved

westonruter requested changes Jan 31, 2025

View reviewed changes

AhmarZaidi and others added 3 commits February 3, 2025 12:30

Update non-ascii matching logic & tests

8389a40

Add test case for a responsive preload link without an href

ace40cb

Add failing test case for bare percent appearing in URL

b1ed4c1

westonruter reviewed Feb 3, 2025

View reviewed changes

plugins/optimization-detective/class-od-link-collection.php Outdated Show resolved Hide resolved

westonruter changed the title ~~Fix: Optimization detective can return non-ascii characters in the Link header, breaking some reverse proxies and HTTP standards~~ Prevent URL in Link header from including invalid characters Feb 4, 2025

AhmarZaidi and others added 2 commits February 4, 2025 14:15

Remove percentage from regex

7e8867f

Merge branch 'trunk' into fix/optimization-detective-non-ascii-chars-…

2d14c6e

…in-link-header

westonruter approved these changes Feb 4, 2025

View reviewed changes

Merge branch 'trunk' into fix/optimization-detective-non-ascii-chars-…

c3c2512

…in-link-header

felixarntz approved these changes Feb 5, 2025

View reviewed changes

westonruter merged commit 40e8c31 into WordPress:trunk Feb 5, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent URL in `Link` header from including invalid characters #1802

Prevent URL in `Link` header from including invalid characters #1802

AhmarZaidi commented Jan 14, 2025 •

edited

Loading

codecov bot commented Jan 14, 2025 •

edited

Loading

westonruter left a comment

westonruter commented Jan 14, 2025

westonruter commented Jan 21, 2025

westonruter commented Jan 29, 2025

AhmarZaidi commented Jan 30, 2025

AhmarZaidi commented Jan 30, 2025 •

edited

Loading

westonruter commented Jan 30, 2025

AhmarZaidi commented Jan 31, 2025

westonruter commented Jan 31, 2025

AhmarZaidi commented Jan 31, 2025

westonruter commented Jan 31, 2025

github-actions bot commented Jan 31, 2025

westonruter commented Jan 31, 2025

westonruter left a comment

felixarntz left a comment

Prevent URL in Link header from including invalid characters #1802

Prevent URL in Link header from including invalid characters #1802

Conversation

AhmarZaidi commented Jan 14, 2025 • edited Loading

Summary

Relevant technical choices

codecov bot commented Jan 14, 2025 • edited Loading

Codecov Report

westonruter left a comment

Choose a reason for hiding this comment

westonruter commented Jan 14, 2025

westonruter commented Jan 21, 2025

westonruter commented Jan 29, 2025

AhmarZaidi commented Jan 30, 2025

AhmarZaidi commented Jan 30, 2025 • edited Loading

westonruter commented Jan 30, 2025

AhmarZaidi commented Jan 31, 2025

westonruter commented Jan 31, 2025

AhmarZaidi commented Jan 31, 2025

westonruter commented Jan 31, 2025

github-actions bot commented Jan 31, 2025

Unlinked Accounts

westonruter commented Jan 31, 2025

westonruter left a comment

Choose a reason for hiding this comment

felixarntz left a comment

Choose a reason for hiding this comment

Prevent URL in `Link` header from including invalid characters #1802

Prevent URL in `Link` header from including invalid characters #1802

AhmarZaidi commented Jan 14, 2025 •

edited

Loading

codecov bot commented Jan 14, 2025 •

edited

Loading

AhmarZaidi commented Jan 30, 2025 •

edited

Loading