Background and motivation
The Roslyn and runtime teams are looking at improving the experience of writing formatted text, primarily around source generators. An important feature for Roslyn is a mechanism to, without a copy, get a read-only view of any text that no one else can write to. The lack of an API to provide this has been blocking dotnet/roslyn#61326.
The Roslyn team has mentioned recently that they are okay with a non-optimal copy-based API for downlevel scenarios as long as there is a no-copy API on the horizon. This proposal provides such an API.
API Proposal
namespace System.Text;
public class StringBuilder
{
public StringBuilder MoveChunksToStringBuilder();
}
This API will create a new StringBuilder instance initialized to the same state as the original instance, and then reset the fields of the original instance back to the same state as a freshly created instance.
This clearing behavior is in contrast to the behavior when Clear is called, where the internal array is not thrown away.
API Usage
Roslyn would use it with an API similar to the following:
public abstract class SourceText
{
public static SourceText DrainFrom(StringBuilder stringBuilder)
{
return new StringBuilderText(stringBuilder.MoveChunksToStringBuilder());
}
}
(StringBuilderText implementation, not able to wrap user-provided StringBuilders until this StringBuilder API is added)
New StringBuilder versus array
An allocation is needed somewhere so that chunk enumeration can start multiple times after the original StringBuilder is cleared.
Allocating a new StringBuilder is cheaper than allocating an array of chunks, both because the array would be variable sized and because populating the array requires following the whole linked list chain. In comparison, the implementation of MoveChunksToStringBuilder needs only allocate a new StringBuilder and set a handful of fields directly in the old and new instances.
Allocating a new StringBuilder is also a more composable operation, opening the door to any use case where you need a cheap way to be sure that no one else can modify the contents of the StringBuilder that you hold, even if your use case goes beyond chunk enumeration (possibly involving your own mutations).
Naming
MoveChunks is preferred over Move or Drain because it is clearer that the chunks are leaving the current instance and landing as-is in the new instance without being compacted into a single chunk. MoveChunks is preferred over DrainChunks because Move represents the O(1) operation on the chunks that is happening, whereas Drain sounds more involved. (And in prior art, Drain is in fact more costly than Move; ImmutableArray<T>.Builder.DrainToImmutable freshly allocates and copies elements when the presizing doesn't match.)
Background and motivation
The Roslyn and runtime teams are looking at improving the experience of writing formatted text, primarily around source generators. An important feature for Roslyn is a mechanism to, without a copy, get a read-only view of any text that no one else can write to. The lack of an API to provide this has been blocking dotnet/roslyn#61326.
The Roslyn team has mentioned recently that they are okay with a non-optimal copy-based API for downlevel scenarios as long as there is a no-copy API on the horizon. This proposal provides such an API.
The following is updated from #97570 (comment)
API Proposal
This API will create a new StringBuilder instance initialized to the same state as the original instance, and then reset the fields of the original instance back to the same state as a freshly created instance.
This clearing behavior is in contrast to the behavior when
Clearis called, where the internal array is not thrown away.API Usage
Roslyn would use it with an API similar to the following:
(StringBuilderText implementation, not able to wrap user-provided StringBuilders until this StringBuilder API is added)
New StringBuilder versus array
An allocation is needed somewhere so that chunk enumeration can start multiple times after the original StringBuilder is cleared.
Allocating a new StringBuilder is cheaper than allocating an array of chunks, both because the array would be variable sized and because populating the array requires following the whole linked list chain. In comparison, the implementation of
MoveChunksToStringBuilderneeds only allocate a new StringBuilder and set a handful of fields directly in the old and new instances.Allocating a new StringBuilder is also a more composable operation, opening the door to any use case where you need a cheap way to be sure that no one else can modify the contents of the StringBuilder that you hold, even if your use case goes beyond chunk enumeration (possibly involving your own mutations).
Naming
MoveChunksis preferred overMoveorDrainbecause it is clearer that the chunks are leaving the current instance and landing as-is in the new instance without being compacted into a single chunk.MoveChunksis preferred overDrainChunksbecauseMoverepresents the O(1) operation on the chunks that is happening, whereasDrainsounds more involved. (And in prior art,Drainis in fact more costly thanMove;ImmutableArray<T>.Builder.DrainToImmutablefreshly allocates and copies elements when the presizing doesn't match.)