You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The modules typing.io and typing.re are missing in 3.10. They are present only in 3.8 and nowhere else, even though they exist at least in 3.10, albeit created in an awkward way.
Thanks for the report @gmesch! Could you say a bit more about the "awkward way"? This may be something we are able to patch in the module collection script.
I only read the 3.10 implementation of 3.10, in file python-3.10.14/lib/python3.10/typing.py in the python install tree. There, the submodules re and io are created as classes inside the typing.py file, but are kept off the __all__ collection of the typing module. This is referred to as "pseudo submodules" in the comments:
# The pseudo-submodules 're' and 'io' are part of the public
# namespace, but excluded from __all__ because they might stomp on
# legitimate imports of those modules.
Here is the code that creates the io module, and inserts it into sys.modules:
Hmm, this is indeed pretty awkward -- our generation tooling uses inspect.ismodule to walk through all stdlib packages, and in this case typing.io is really a weird class namespace thing, not a module.
I'm not 100% sure to do about that -- io is "behaving" like a module here, but it empirically is not one. This may just be something we have to document as an explicit limitation.
I also poked at this a bit, read some docs, and the best idea I could come up with was to snapshot sys.modules before and after an import, and collect any additions to sys.modules after, but didn't actually write any code to do it - so no clue if that would work either. 😀
Missing typing.io was originally noted in #7 (comment) and the documentation of policy is still outstanding, per #80
Yeah, I should really write that policy 😅. I've got some time today, so I'll send a PR for it in a bit.
I also poked at this a bit, read some docs, and the best idea I could come up with was to snapshot sys.modules before and after an import, and collect any additions to sys.modules after, but didn't actually write any code to do it - so no clue if that would work either. 😀
I think this would work, although it still wouldn't pass the inspect.ismodule test -- we'd need to loosen the check to "anything that might appear in sys.modules regardless of type", and I don't know enough about Python's module system to know whether this is sensible to do 🙂
@gmesch I'm thinking about ways to address this. One possibility is us adding a new API, something like in_stdlib_namespace, that would essentially boil down to a string prefix check on the input against the list of known stdlib modules. In other words:
This would make things like typing.io detectable, but with a number of caveats (no guarantee that it's actually a module, no guarantee that it actually exists, etc.). Would this kind of new API satisfy your usecases, or is it too generic?
We use this in a tool that computes the dependencies of python programs on pip packages from the import statements in the python source files. Since every python executable depends on the python interpreter and with it the standard library, an import of anything in the standard library does not imply a dependency on a pip package. (And this in turn we use this to keep the deps declarations in bazelBUILD files up to date.)
So if just matching the prefix would be correct, we don't need a new API for that, we can just check all prefixes of an imported module name, in addition to the full module name, using the current API.
However, I think that would be wrong, because I think that a pip package can supply its modules in a namespace package that shares a path prefix with modules in the standard library. I.e. I think it would be legitimate if e.g. a pip package typing-foo supplies code in the namespace typing.foo. If that's true indeed, then the prefix check of a module to determine inclusion in the standard library would be wrong.
FWIW, the approach to import each file found in the standard library and capture the delta in the sys.modules map before and after import seems promising to me, and is closest to the semantics I am looking for in the use case described above.
I.e. I just want to know whether an import statement can be satisfied against only the python interpreter install tree as it comes from the python distribution, or whether additional pip packages are necessary for such import to work.
However, I think that would be wrong, because I think that a pip package can supply its modules in a namespace package that shares a path prefix with modules in the standard library. I.e. I think it would be legitimate if e.g. a pip package typing-foo supplies code in the namespace typing.foo. If that's true indeed, then the prefix check of a module to determine inclusion in the standard library would be wrong.
This is arguably something that packages should never do, but Python is too dynamic to prevent it. Notably, this also means that any amount of import analysis is always imperfect, since a package can do this:
In other words, you can't (perfectly) infer that a package isn't loaded just because it wasn't imported by a non-stdlib import statement. Ideally Python would forbid this and it should never appear in real code anyways, but I have no evidence to substantiate an assertion that real code doesn't do this 😅
So it did not seem to be straightforward to just use sys.
Yeah, stdlib_module_names in particular is restricted to top-level names:
For packages, only the main package is listed: sub-packages and sub-modules are not listed. For example, the email package is listed, but the email.mime sub-package and the email.message sub-module are not listed.
So unfortunately you can't use it for specific sub-packages/modules 🙁 -- it's really just meant for a top-level check.
TL;DR: I'm not of aware of a sound way to guarantee that import foo s.t. foo in stdlib ensures that only CPython source code is required. In practice however, I think the namespace inclusion check is correct > 99.999% of the time. But this may not be sufficient for your use case 🙂
Activity
woodruffw commentedon Apr 27, 2024
Thanks for the report @gmesch! Could you say a bit more about the "awkward way"? This may be something we are able to patch in the module collection script.
gmesch commentedon Apr 27, 2024
I only read the 3.10 implementation of
3.10
, in filepython-3.10.14/lib/python3.10/typing.py
in the python install tree. There, the submodulesre
andio
are created as classes inside thetyping.py
file, but are kept off the__all__
collection of thetyping
module. This is referred to as "pseudo submodules" in the comments:Here is the code that creates the
io
module, and inserts it intosys.modules
:woodruffw commentedon Apr 28, 2024
Thanks for digging into it!
Hmm, this is indeed pretty awkward -- our generation tooling uses
inspect.ismodule
to walk through all stdlib packages, and in this casetyping.io
is really a weird class namespace thing, not a module.I'm not 100% sure to do about that --
io
is "behaving" like a module here, but it empirically is not one. This may just be something we have to document as an explicit limitation.miketheman commentedon Apr 28, 2024
I also poked at this a bit, read some docs, and the best idea I could come up with was to snapshot
sys.modules
before and after an import, and collect any additions tosys.modules
after, but didn't actually write any code to do it - so no clue if that would work either. 😀Missing
typing.io
was originally noted in #7 (comment) and the documentation of policy is still outstanding, per #80woodruffw commentedon Apr 28, 2024
Yeah, I should really write that policy 😅. I've got some time today, so I'll send a PR for it in a bit.
I think this would work, although it still wouldn't pass the
inspect.ismodule
test -- we'd need to loosen the check to "anything that might appear insys.modules
regardless of type", and I don't know enough about Python's module system to know whether this is sensible to do 🙂woodruffw commentedon Apr 29, 2024
@gmesch I'm thinking about ways to address this. One possibility is us adding a new API, something like
in_stdlib_namespace
, that would essentially boil down to a string prefix check on the input against the list of known stdlib modules. In other words:This would make things like
typing.io
detectable, but with a number of caveats (no guarantee that it's actually a module, no guarantee that it actually exists, etc.). Would this kind of new API satisfy your usecases, or is it too generic?gmesch commentedon Apr 29, 2024
We use this in a tool that computes the dependencies of python programs on pip packages from the
import
statements in the python source files. Since every python executable depends on the python interpreter and with it the standard library, animport
of anything in the standard library does not imply a dependency on a pip package. (And this in turn we use this to keep thedeps
declarations inbazel
BUILD
files up to date.)So if just matching the prefix would be correct, we don't need a new API for that, we can just check all prefixes of an imported module name, in addition to the full module name, using the current API.
However, I think that would be wrong, because I think that a pip package can supply its modules in a namespace package that shares a path prefix with modules in the standard library. I.e. I think it would be legitimate if e.g. a pip package
typing-foo
supplies code in the namespacetyping.foo
. If that's true indeed, then the prefix check of a module to determine inclusion in the standard library would be wrong.gmesch commentedon Apr 29, 2024
FWIW, the approach to import each file found in the standard library and capture the delta in the
sys.modules
map before and after import seems promising to me, and is closest to the semantics I am looking for in the use case described above.I.e. I just want to know whether an
import
statement can be satisfied against only the python interpreter install tree as it comes from the python distribution, or whether additional pip packages are necessary for suchimport
to work.gmesch commentedon Apr 29, 2024
Btw. I also took note of the hint in the documentation,
But I could not quite decide which of the two would be right, and I detected already this discrepancy:
I doublechecked that the file
python-3.10.14/lib/python3.10/urllib/parse.py
does indeed exist in my python interpreter install tree.So it did not seem to be straightforward to just use
sys
.woodruffw commentedon Apr 29, 2024
Thanks for the responses @gmesch!
Yeah, this is unfortunately true:
This is arguably something that packages should never do, but Python is too dynamic to prevent it. Notably, this also means that any amount of
import
analysis is always imperfect, since a package can do this:In other words, you can't (perfectly) infer that a package isn't loaded just because it wasn't imported by a non-stdlib
import
statement. Ideally Python would forbid this and it should never appear in real code anyways, but I have no evidence to substantiate an assertion that real code doesn't do this 😅Yeah,
stdlib_module_names
in particular is restricted to top-level names:So unfortunately you can't use it for specific sub-packages/modules 🙁 -- it's really just meant for a top-level check.
TL;DR: I'm not of aware of a sound way to guarantee that
import foo s.t. foo in stdlib
ensures that only CPython source code is required. In practice however, I think the namespace inclusion check is correct > 99.999% of the time. But this may not be sufficient for your use case 🙂