-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix media type inference for URLs with query parameters #3501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix media type inference for URLs with query parameters #3501
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we using the mimetypes stdlib module? mimetypes.guess_type() already parses URLs and the current implementation doesn't take into account case insensitivity, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Viicos Interestingly we already use that in DocumentUrl._infer_media_type, after checking a bunch of types ourselves :/
@fedexman Can you see if we can use mimetypes.guess_type() for all of these?
The method can be changed to just return str rather than XMediaType, as I don't think that type is used on any public fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Viicos Interestingly we already use that in DocumentUrl._infer_media_type, after checking a bunch of types ourselves :/
@fedexman Can you see if we can use mimetypes.guess_type() for all of these?
The method can be changed to just return str rather than XMediaType, as I don't think that type is used on any public fields.
|
This PR is stale, and will be closed in 3 days if no reply is received. |
|
I'll update this week 🙏🙏 |
|
I refactored using mimetypes. Some types are defined in the standard library, if not, they use some os files that is dependent of the machine. For all the types not in the standard library I added them manually to have reliable behavior. |
| # Register manually MIME types that are not in the standard library | ||
| # Document types | ||
| mimetypes.add_type('text/markdown', '.mdx') | ||
| mimetypes.add_type('text/x-asciidoc', '.asciidoc') | ||
|
|
||
| # Video types | ||
| mimetypes.add_type('video/3gpp', '.three_gp') | ||
| mimetypes.add_type('video/x-flv', '.flv') | ||
| mimetypes.add_type('video/x-matroska', '.mkv') | ||
| mimetypes.add_type('video/x-ms-wmv', '.wmv') | ||
|
|
||
| # Audio types | ||
| mimetypes.add_type('audio/flac', '.flac') | ||
| mimetypes.add_type('audio/mpeg', '.mp3') | ||
| mimetypes.add_type('audio/ogg', '.oga') | ||
| # override stdlib mimetypes that use x- prefix with standard types | ||
| mimetypes.add_type('audio/aac', '.aac') | ||
| mimetypes.add_type('audio/aiff', '.aiff') | ||
| mimetypes.add_type('audio/wav', '.wav') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will affect the global mimetypes db for the current interpreter. Let's instantiate an explicit MimeTypes object instead, and attach the additional types directly to it.
You can then use your instance's guess_type() directly instead of the module-level guess_type().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be created at module level, or in the class init 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the module level, it's fine to share the added mime types between each class and the instantiation can impact performance.
Fix media type inference for URLs with query parameters
When using presigned URLs (eg AWS S3) with
ImageUrl,AudioUrl, orVideoUrl, the media type inference failsto the agent we get this error,
the reason is that the
_infer_media_typefunction only check the end of the url withurl.endswith('.mkv')but do not parse the url.I propose to parse the url with