Description
Should pathbuf.push()
on Windows convert legacy paths to UNC syntax when they exceed the MAX_PATH
length, or other limits?
Legacy MS-DOS (non-UNC) paths should not exceed MAX_PATH
length (~260 chars) and have other syntactic limitations. Windows has partial, opt-in support for longer legacy paths, but not all APIs and not all applications support long legacy paths.
Windows has extended-length paths that start with a \\?\
(UNC) prefix. They are Microsoft's preferred way of specifying long paths. They can be 32KB long, and don't have path handling quirks inherited from MS-DOS. However, the UNC paths are not properly supported by many Windows applications #42869.
This question is related to fs::canonicalize()
that currently returns UNC paths even when not necessary. fs::canonicalize()
would be more compatible with Windows apps if it returned legacy paths whenever possible (when they fit under MAX_PATH
and meet other restrictions like reserved names). However, the legacy paths have the length limit, so whether fs::canonicalize(short_path).push(very_long_path)
works as expected depends on which syntax fs::canonicalize
uses, or whether path()
will correct the syntax if necessary.
Currently push()
does not convert between legacy and UNC paths. If a push()
causes a legacy path to exceed the limit, the path will keep using the legacy syntax, and technically won't be valid in APIs/apps that have the MAX_PATH
limit. This is a simple, predictable implementation, but arguably makes it possible for push()
to create an invalid path, a syntax error.
Besides the length limit, there are issues with reserved file names and trailing whitespace. legacy_path.push("con.txt")
makes the whole path parse as a device name only, but unc_path.push("con.txt")
simply appends the con.txt
file name to the path as expected. Is this a bug in push()
? Should push
be a leaky abstraction that exposes quirks of how Windows parses legacy paths, or should push()
switch to the less common UNC path syntax when it's necessary to precisely and losslessly append the given path components?
If push()
converted paths to UNC whenever they exceed limits of legacy paths, then push()
would be more robust, and semantically closer to push()
on Unix that always appends components, rather than pushing characters that may cause the whole path to parse as something else, or get rejected entirely for not using a syntax appropriate for its length.