-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add RFC 3986 and WHATWG compliant URL parsing support #14461
base: master
Are you sure you want to change the base?
Conversation
@@ -3715,7 +3715,6 @@ function uniqid(string $prefix = "", bool $more_entropy = false): string {} | |||
|
|||
/** | |||
* @return int|string|array<string, int|string>|null|false | |||
* @compile-time-eval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed for benchmarking purposes
ext/url/php_url.c
Outdated
|
||
static void cleanup_parser(void) | ||
{ | ||
if (++URL_G(urls) % 500 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach is copy-pasted from lexbor/lexbor#206
823e4c6
to
e65deb7
Compare
0ccffa0
to
04d4e6e
Compare
13e8566
to
8e07430
Compare
d894dad
to
8e21e67
Compare
ext/uri/php_uri.c
Outdated
RETURN_THROWS(); | ||
} | ||
|
||
zend_result result = this_internal_uri->handler->normalize_uri(this_uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nielsdos I would like to ask your opinion about a problem I recently realized.
Context: I added a normalize_uri
handler that is able to normalize the URI. It is used in multiple places (e.g. normalize()
, toNormalizedString
, equalsTo()
methods). However, this would possibly cause some unexpected behavior when a built-in URI is extended in userland: if a child class overrides the normalize()
method, they would legitimately assume that doing so will affect the rest of the relevant methods (toNormalizedString()
, equalsTo()
), but no, this won't be the case, since the internal handler is reused in fact... which PHP programmers have no control over. I can see two solutions so far:
- make the built-in URI implementations
final
: there's no more problem with method overrides, but using composition instead of inheritance requires more work (implementing all the methods of theUriInterface
), you can't just substitute an internal URI class with your own implementation etc. - the
normalize()
method itself should be called instead of thenormalize_uri
internal handler: this solves the issue, but it results in quite some performance overhead
Do you have any preference/insights about this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kocsismate Interesting.
Here's an idea:
We can know upfront if the "normalize" method is overridden when we create the internal uri object. We can do this by checking whether the entry for "normalize" in the function table has type ZEND_USER_FUNCTION
instead of ZEND_INTERNAL_FUNCTION
. In that case, we know we should call the user handler, otherwise we should call the internal handler.
Create a helper function invoke_normalize_uri
(name is just illustrative) that you use to invoke either the internal normalize handler or the user function. Then you can use invoke_normalize_uri
everywhere without worrying. To make the invocation fast you can have a zend_function*
field in the internal object that is NULL
when the user did not override it, and points to the user function if the user did override it. Then you can use the Zend APIs to call that user function if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nielsdos Thanks for the brilliant idea! :) That's really impressive. I'll have to think about in which cases a similar issue can happen, but if it's not too frequent, then your solution will perfectly mitigate it. I wouldn't like though if i.e. all the property reader functions would have to use this workaround, but hopefully it won't be the case.
d0b9a3d
to
b9293a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. Did not look into any details, yet.
@@ -141,6 +141,7 @@ static void user_shutdown_function_dtor(zval *zv); | |||
static void user_tick_function_dtor(user_tick_function_entry *tick_function_entry); | |||
|
|||
static const zend_module_dep standard_deps[] = { /* {{{ */ | |||
ZEND_MOD_REQUIRED("uri") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If uri
is required, it should not require the --enable-uri
flag to build it. It should always be built, like ext/random
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good catch!
@@ -32,13 +32,13 @@ class TestSoapClient extends SoapClient { | |||
|
|||
$client = new TestSoapClient(__DIR__.'/bug38067.wsdl', | |||
array('encoding' => 'ISO-8859-1')); | |||
$str = 'test: �'; | |||
$str = 'test: �'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintended change? Did you editor treat this file as UTF-8 instead of ISO-8859-1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's surely an editor issue.
@@ -604,7 +604,7 @@ public function __getLastRequestHeaders(): ?string {} | |||
public function __getLastResponseHeaders(): ?string {} | |||
|
|||
/** @tentative-return-type */ | |||
public function __doRequest(string $request, string $location, string $action, int $version, bool $oneWay = false): ?string {} | |||
public function __doRequest(string $request, string $location, string $action, int $version, bool $oneWay = false, ?string $uriParserClass = null): ?string {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the number of tests that you changed, is this a breaking change? If so, it needs to be noted in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's a good idea
@@ -495,6 +495,12 @@ PRIMARY MAINTAINER: Andrei Zmievski <[email protected]> (2002 - 2002) | |||
MAINTENANCE: Maintained | |||
STATUS: Working | |||
------------------------------------------------------------------------------- | |||
EXTENSION: uri | |||
PRIMARY MAINTAINER Máté Kocsis <[email protected]> (2024 - 2024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2025
[Enable URI support])]) | ||
|
||
if test "$PHP_URI" != "no"; then | ||
PHP_LEXBOR_CFLAGS="-I$abs_srcdir/ext/dom/lexbor -DLEXBOR_STATIC -I$abs_srcdir/ext/uri/uriparser/include" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively makes ext/dom a dependency of ext/uri. Should Lexbor be moved somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a WIP behavior from the early days of the PR. Once I manage to make the RFC approved, I'll create a lexbor "extension" (similarly to mysqlnd).
2nd take after the failed experiment with #11315
RFC: https://wiki.php.net/rfc/url_parsing_api