Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file_digest.file_digest signature is overly strict #12414

Closed
ncoghlan opened this issue Jul 23, 2024 · 4 comments · Fixed by #12686
Closed

file_digest.file_digest signature is overly strict #12414

ncoghlan opened this issue Jul 23, 2024 · 4 comments · Fixed by #12686
Labels
stubs: false positive Type checkers report false errors topic: io I/O related issues

Comments

@ncoghlan
Copy link
Contributor

Using hashlib.file_digest with a file object obtained via pathlib.Path.open failed to typecheck with a protocol compatibility error:

src/create_bundles.py:223: error: Argument 1 to "file_digest" has incompatible type "FileIO"; expected "_BytesIOLike | _FileDigestFileObj"  [arg-type]
src/create_bundles.py:223: note: Following member(s) of "FileIO" have conflicts:
src/create_bundles.py:223: note:     Expected:
src/create_bundles.py:223: note:         def readinto(self, bytearray, /) -> int
src/create_bundles.py:223: note:     Got:
src/create_bundles.py:223: note:         def readinto(self, Buffer, /) -> int | None

I'm reasonably sure this is a typeshed issue rather than a mypy issue.

My first thought was that this looks like an internal inconsistency in typeshed, as _FileDigestFileObj is declared specifically with bytearray rather than the more general WriteableBuffer (hence filing the issue here).

However, my second thought was that this signature declaration is presumably as it is because file_digest will specifically pass a bytearray instance to readinto, and bytearray is compatible with Buffer, so mypy shouldn't be complaining about that. That means the conflict is presumably on the return type rather than on parameter types.

The declared type signature nominally indicates that file_digest can't cope with readinto returning None, and mypy is correctly flagging that as inconsistent with the way typeshed declares the file and socket IO types (allowing them to return None here).

Searching the typeshed code suggests this particular inconsistency exists across several different IO consumer type declarations: https://github.com/search?q=repo%3Apython%2Ftypeshed%20readinto&type=code

@srittau srittau added stubs: false positive Type checkers report false errors topic: io I/O related issues labels Jul 23, 2024
@srittau
Copy link
Collaborator

srittau commented Jul 23, 2024

I/O types are a mess, which is why we're moving more into tight protocols for arguments as used in hashlib.

I think your analysis is correct. In fact, file_digest can't handle None return values:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Lib/hashlib.py#L232-L236

On the other hand it seems that FileIO.readinto can't return None:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Modules/_io/fileio.c#L657-L684

At the moment, FileIO inherits readinto from RawIOBase. I think we just need to override it in FileIO with a more precise return type.

srittau added a commit to srittau/typeshed that referenced this issue Jul 23, 2024
@AlexWaygood
Copy link
Member

On the other hand it seems that FileIO.readinto can't return None:

Doesn't it return None in this code path here? https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Modules/_io/fileio.c#L678

@srittau
Copy link
Collaborator

srittau commented Jul 23, 2024

I blame my the heat and my tiredness for overlooking this quite obvious code path ... That said, I'd be interested in which situation FileIO works with non-blocking I/O. One case I can see is here:

https://github.com/python/cpython/blob/2c1b1e7a07eba0138b9858c6f2bea3cae9af0808/Python/fileutils.c#L1887

Another is probably if an opened file descriptor has the non-blocking flag set.

I see two possible solutions: Make FileIO.readinto() return int | MaybeNone as the situations where this happens are a bit esoteric. That said, I actually think that this uncovered a real bug in file_digest() as that doesn't handle the None case correctly when it should.

@srittau
Copy link
Collaborator

srittau commented Jul 23, 2024

In fact, file_digest is explicitly documented as taking a SocketIO object. And that can very much return None:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Lib/socket.py#L713

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stubs: false positive Type checkers report false errors topic: io I/O related issues
Projects
None yet
3 participants