-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Description
Bug report
Bug description:
Issue
The add_handler()
method of urllib.request.OpenerDirector
matches too broadly, and generates spurious handler entries as a result.
Specifically, any method name that starts with zero or one underscore, followed by error
, open
, request
, or response
, and that isn't in the small list of exclusions, will be treated as a handler method.
Minimal reproducible example
>>> import urllib.request
>>> class BogusHandler(urllib.request.BaseHandler):
... def _open(self):
... pass
... def open(self):
... pass
... def error(self):
... pass
...
>>> opdir = urllib.request.OpenerDirector()
>>> opdir.add_handler(BogusHandler())
>>> opdir.handle_open
{'': [<__main__.BogusHandler...>], 'ope': [<__main__.BogusHandler...>]}
>>> opdir.handle_error
{'erro': {'error': [<__main__.BogusHandler...>]}}
Cause
Lines 419 to 421 in 401fff7
i = meth.find("_") | |
protocol = meth[:i] | |
condition = meth[i+1:] |
In the normal case, this code splits a method name like http_open
on the first underscore character, giving protocol == 'http'
and condition == 'open'
.
If there is a leading underscore character, this produces a false match with an empty protocol. For example, a method called _open
will give protocol == ''
and condition == 'open'
.
If there is no underscore character, then meth.find('_') == -1
, and weird results happen. In this case, condition
contains the entire method name, and protocol
contains all but the last character. For example, a method called open
will give protocol == 'ope'
and condition == 'open'
.
The subsequent lines branch on the content of condition
, with only error*
, open
, request
, and response
treated as handler methods.
Impact
Low. It is probably impossible that a request would have the empty string as its protocol, and unlikely that it would be ope
, reques
, respons
, or erro
. If one of these did happen, the spurious match wouldn't be called: OpenerDirector
would go looking for (say) ope_open
, probably not find it, and raise an AttributeError
. (It's for this reason that I'm positive this is a bug, and not an undocumented feature where open
, etc. are meant to be registered as handlers.)
Spurious entries are added to the internal members handle_open
, handle_error
, process_response
, process_request
, and (especially) the catch-all handler
, using bisect
to perform sort-preserving insertions. This is unnecessary busywork, but I have no data on any actual performance impact.
Solution
It suffices to add a check like so, immediately after meth.find("_")
:
if i < 1:
continue
This catches both a leading underscore (i == 0
) and no underscore (i == -1
). I've forked the repo and made such a change myself, but submitting a pull request is a big step up from filing an issue, and it expects an issue to be filed first in any case, so… here it is! 😄
Related issues
I can only see one relevant existing issue, #61924. This was a withdrawn proposal to clean up the add_handler()
code. It was more far-reaching than the solution above, but (on my inspection at least) it would've fixed this bug. At the time, @bitdancer commented, "I'll try to keep this patch in mind if we have occasion to fix anything in that code".
The code to extract a specific type of error behaves similarly:
Lines 424 to 425 in 401fff7
j = condition.find("_") + i + 1 | |
kind = meth[j+1:] |
However, I'm not positive that it's an error for it to give (for example) kind == 'error'
when the method name is *_error
(or _error
or error
), so I haven't explored further.
CPython versions tested on:
3.12
Operating systems tested on:
No response