-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Labels
blockerShow stopping issues for 0.0.1Show stopping issues for 0.0.1bugSomething isn't workingSomething isn't working
Milestone
Metadata
Metadata
Assignees
Labels
blockerShow stopping issues for 0.0.1Show stopping issues for 0.0.1bugSomething isn't workingSomething isn't working
Type
Projects
Relationships
Development
Select code repository
Activity
iCharlesHu commentedon May 15, 2025
For this one we decided to just remove the default buffer size limit, but still allow you to set one if you choose to.weissi commentedon May 23, 2025
That doesn't sound like the right fix. If something blows past the 'maximum bytes collect' limit, you should throw an error (potentially giving the user what we have so far).
Just raising the buffer won't help, try
cat /dev/zero
iCharlesHu commentedon Jun 10, 2025
Update: we decided to throw an error instead of raising the limit. This is the same default behavior with
AsyncBufferSequence.LineSequence
iCharlesHu commentedon Jul 3, 2025
Update: in addition to throwing an error we now also require
bufferLength
to be set by the developer.LiarPrincess commentedon Jul 9, 2025
Things are a bit complicated here, and I think the weissi answer lacks some context. It may be because there was a behind the scenes conversation - not posted on the github.
The biggest question is: what do users expect?
git status
I will use
git status
for all of the examples; think of it as a Schrödinger process that can print a little, but it can also print a lot.Users expect to get the full output of a command, regardless of the size, because they explicitly asked for “without limit”. While reading the output we have a potentially unbounded buffer - those are never comfortable, but there are some practical uses for “Just give me everything, I don't want to play the limit guessing game.”.
This is similar to how
AsyncSequence
has unbounded buffer by default - the stdlib just trusts that the users know what they are doing.Most of the users expect that the subprocess reads the first 60kb and then throws an error if there is something more. Exactly when the error happens is an entirely different conversation.
There are alternatives to throwing an error.
This is the situation mentioned in the issue, but if you read closely it is about the deadlock. Regardless of whether an error is thrown or not, I think that we can all agree that this should never deadlock.
I do not think that “throw or not throw” is the question here. I think it is about the default behavior:
git status without limit
abovegit status with 60kb limit
aboveWe choose the default value for the argument, and the
throw
(or nothrow
) follows.Questions
This whole thing creates a flowchart of questions that need to be answered. There may be some other questions here, but I'm not getting paid for this anyway.
1. Should the default value for
stdin/stderr
be discard or collect?“Discard” is nice because it does “nothing” by default. If you want something to happen then you have to explicitly specify it by providing the value for the
standardOutput
argument. This is good for readability and code reviews.OTOH, “collect” is a good option as this is what users expect. The big problem is that some people may not know that this is the default and end up with tons of memory consumed for something that they do not need.
2. If “collect” is the default: should the buffer be unbounded or bounded+throw?
Unbounded may consume tons of the memory (possible halt/crash), but the same is true for
AsyncSequence
default.Bounded+throw is definitely a safer option, but most of the people do not expect
git status
(orcat big_file.txt
) to throw. You need to read the documentation carefully, and notice that this argument has a limit in the default value. “RTFM” exists because people are not good at reading documentation.If we use CI/CD as an example: as our code base grows so does the output of the
swift build
, and at some point it may exhaust the limit, which will willthrow
. Will people be surprised about this?3. When we reach he limit do we
throw
orreturn
normally?We need the bounded mode anyway (regardless of whether it is the default), so the questions 3 and 4 are for this scenario.
Alternative to throwing the error:
Throwing an error is a “more expected” outcome.
4. When do we
throw
the limit error?Let's assume that we are at the exact moment that the limit was exhausted.
Option A: Wait for the child process to finish and then
throw
the error. To archive this we need to continue reading (and discarding) the output (bothstdout/stderr
) to prevent deadlock with filled pipe, thenthrow
once we confirm the termination. We may end up in a situation when the child process exits successfully, and yet we have tothrow
the error because of the limit. This error has to contain the exit status of the process etc.Option B:
Throw
at the exact moment we reached the limit. What happens to the child process?swiftlang/swift-subprocess
library has no way of knowing if this is safe. Things may go very wrong if we kill the process in the middle of a distributed database update.throw
an error, and the process continues running. Slightly weird.Trivia
If you want an infinite stream you can just use
yes
instead ofcat /dev/zero
.iCharlesHu commentedon Jul 9, 2025
The current default values for
stdin
,stdout
, andstderr
are.none
(no input),.string
(collect as String), and.discarded
. I believe this is the most common use case. Instead of setting the output to.string
by default, we propose removing this default setting and.string
itself. This means the caller will have to explicitly choose an output type and buffer size. We chose this approach because it aligns with Swift’s overall design principle: if we can’t provide a reasonable default (and in this case, we truly can’t), we leave it to the caller to decide. This way, you can pass.string(limit: .max)
if you want thegit status without limit
behavior.This is a good question which is why we decided to not offer a default at all. You'll have to pick one according to your needs.
I think what you’re describing is a choice between a "soft" and a "hard" limit. We opted for a hard limit here because, for many collected output types, it’s not possible to arbitrarily limit the output and expect it to still be convertible to the target output type. For instance, converting a truncated buffer to a
String
might not always work due to incorrect grapheme cluster breaks. In my opinion, if you want a soft limit, you should use the custom closurerun
. This way, you can decide how to process potentially partial data and determine what to do (send a signal, etc.) when the limit is reached.We decided on throwing the error at the exact moment the limit is reached since there's no point return partial data as discussed above. At which point we will attempt to gracefully terminate the child process by running
teardownSequence
.Hence we implemented graceful shutdown via teardown sequence. Any well written apps should respond gracefully to
SIGTERM
to finish up any pending writes and terminates itself gracefully to avoid data losses.