-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[utils] Shorten proposed file name on create if too long #29989
base: master
Are you sure you want to change the base?
Conversation
827c7c6
to
aaa81b7
Compare
Apparently the suspicion that I voiced here is valid. For UNIX-like platforms, an over-long filename gives ENAMETOOLONG as expected. For Windows, there may be two cases:
Any test results from versions of Windows other than what's in the CI VMs, or from OSX, would be welcome. |
aaa81b7
to
4af5991
Compare
This is an promising approach however the current implementation will be highly likely to clobber the video ID (which IMO is the more important part) in the default templates as the video ID is put at the end. A similar shortening that shortens in the middle might be a more reliable approach. There's also the issue of temp files - as the temp files get increased integer suffixes the temp file prefixes will change as the suffixes increase. Without unit test coverage ensuring this behavior doesn't break assembling chunks I'm concerned that this might cause a regression at some point. |
The aim is to avoid what people have often complained about, namely having a download fail just because the proposed filename is too long for the file system:
|
I'm still concerned that this will create more bugs than it fixes - failing fast with a clear error message or altering default templates to reduce likelihood of exceeding length will both offer more predicable behavior without introducing additional edge cases without test coverage. The same URL on the same OS with the same template will result in different filenames of saved on different filesystems with this change. |
On reflection (short-term memory < 3 weeks), the shortening doesn't occur in the middle, so I had a different shortening library in mind: more to consider. Again, many users don't consider it acceptable that a download should fail just because of filename length restrictions. Personally, I haven't suffered from this thanks to ext2/4, but I don't believe that it can be a bug if the download is successfully saved under some reasonable filename, actually the reverse. Plainly the likelihood of needing to shorten filenames can be reduced if extractors don't (Twitter, eg) generate pointlessly long titles or IDs. But I do agree with users who are disappointed when their download that appeared to work fails because of some filename issue. Already we have the
The issue of test coverage is addressable if further test cases are provided. |
I like the proposed solution of It's worth noting that it might not be apparent that a name needs to be shortened right away - without knowing that "part 10000" will be the one to exceed length requirements the download may fail deep into a download (likely many gigabytes in unless chunks are very small) so it's important that any implementation be resilient to resuming and other operations involving part/temp files like merging and remuxing. I'd like to propose another option: a "fallback short name". Perhaps For the purposes of plumbing it might be handy to use a less friendly, safer, but still intelligible name (such as extractor_id) for operations up until the final rename according the user's preferred templates. But all of this aside I think I friendly error (instead of some error code) that just says "filename too long, use -o" or something like that before a download even starts would be super, especially for those that are using this in scripts (like me). This could be predicted by using a longer-than-expected suffix to test. Something like attempting to create a file called I'm of the opinion that software that clearly reports conditions (especially when it gives meaningful hints as to how to fix the issue) is preferable to software that tries to automatically avoid the error in a non-obvious way. I've spent way more time recovering from issues created by "helpful" software that silently did things differently than it normally would than I have by trying to figure out a way to handle an edge case. |
Since the One possible failure mode: if a directory containing an incomplete download is resumed in a context that changes the available filename length, eg after moving the directory to a shorter pathname, or upgrading the filesystem to support longer pathnames. The resume logic depends on the name of the Fragment filename management may be more tricky if simultaneous fragment downloads are supported (yt-dlp?). |
dfa5e93
to
a0159ad
Compare
So now we have a solution based on the |
If the actual filename is different from the one in the info-dict, postprocessors could fail
Actually, no. It appears like that when looking at youtube-dl/youtube_dl/YoutubeDL.py Line 721 in af9e725
So the except block in sanitize_open doesn't actually do anything. I haven't looked at the blame, but I assume it is a remnant from before output template was a thing |
Thanks for the tip. Looking again, it seems that A If the inserted and appended elements happen to cause the filename to exceed a length limit, the except block won't help, and the error will be re-raised. So this is a problem:
Gruesome, really. |
We could also hash the format id if it is too long. But there may be people who depend on the intermediate filenames in their scripts I also thought about having the downloader return the final filename and then overwriting it in the infodict. But this would cause the |
While I wait to be merged I added my own alias which helped. Keep in mind I'm not bash expert and truncation is not included here but this helped me.
|
unfortunately I am really bad with bash code / shell scripts. Can we have some work around like an additional commandline flag? Something like "--safe-download" or so, that will download no matter what. Right now I don't know how to get the file name if the name is too long, yt-tdl just errors out on me and nothing gets downloaded as a result. |
Just put an output template specification in your config file or on the command-line, replacing the default
Or change 200 to a smaller number if that doesn't work for you. |
for CJK users 200 is still too long. |
This is why a better solution is the approach attempted in the PR: ie, detecting what actual length is valid. |
I would love to use this but EXT filesystem says that the file name is too long. Otherwise a great hack and I'd like to use it if it would shorten it. So to add an extra layer of regex, or how would you shorten the length to 255 characters? |
Just use the known work-around. |
Please follow the guide below
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
If the destination filename constructed from the output template is too long for the filesystem, the yt-dl download fails. The user can't predict how long extracted fields (eg
title
) may be, so this may be unexpected. The user then has to work out how to modify the output template (eg%(title)s
->%(title).25s
) and run the command again.Consequently there have been suggestions (of which #29912 is the most recent) that the destination filename should be shortened automatically. This doesn't seem unreasonable, since
sanitize_path()
already has the potential to edit the filename. PR #25475 failed because it was platform-specific.This PR adds a function
reduce_filename()
toutils.py
that shortens the filename component by a specified factor, and adds logic tosanitize_open()
to detecterrno.ENAMETOOLONG
and iteratively callreduce_filename()
until opening either succeeds, or fails for some other reason.Resolves #30361, resolves #29912, resolves #29975, resolves #29627, resolves #28626, resolves #7765, resolves #5921, resolves #5908, probably others.