Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do some extra work to avoid making a full size copy of the buffer when base64 encoding #1372

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lewing
Copy link
Member

@lewing lewing commented Mar 15, 2025

fixes: dotnet/runtime#113474 (kinda)

@lewing lewing force-pushed the overkill branch 2 times, most recently from 77ead98 to ebe1a19 Compare March 15, 2025 01:59
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the way XML results are written by implementing a custom ICryptoTransform to convert XML content into a Base64-encoded, single-line representation, reducing potential large memory allocations.

  • Introduces a new custom Base64 transformation for non-DEBUG NET builds
  • Adds conditional compilation to maintain legacy behavior in DEBUG or non-NET builds
  • Updates using statements and implements a new WriteOnSingleLine method using a CryptoStream
Comments suppressed due to low confidence (1)

src/Microsoft.DotNet.XHarness.TestRunners.Xunit/WasmXmlResultWriter.cs:71

  • [nitpick] Add a clarifying comment explaining why the output length is doubled after the Base64 transformation to enhance code maintainability.
totalBytesWritten += base64BytesWritten * 2;

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new base64 transformation logic to write XML output compactly, aiming to avoid large allocations.

  • Adds a new conditional branch (#if !NET || DEBUG) with a custom ICryptoTransform (ToBase64CharTransform) for encoding.
  • Updates the WriteOnSingleLine method to utilize the new CryptoStream-based approach with the custom transform.

@lewing lewing force-pushed the overkill branch 2 times, most recently from 0f872f4 to 162ea03 Compare March 15, 2025 04:18
@lewing lewing requested a review from Copilot March 15, 2025 04:19
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an alternative base64 encoding approach in the WasmXmlResultWriter to avoid large allocations during XML result writing, addressing issue dotnet/runtime#113474.

  • Added a custom ICryptoTransform implementation (ToBase64CharTransform) for efficient UTF-16 expansion of base64-encoded data.
  • Segregated the behavior by build flags (#if !NET || DEBUG vs. #else) to use simpler logic in debug builds and the optimized transform in release builds.

@lewing lewing requested a review from Copilot March 15, 2025 04:40

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new implementation of WriteOnSingleLine for non-Debug or NET builds in order to avoid large allocations during XML result writing by converting the XML output into a Base64 encoded UTF-16 stream.

  • Added a custom ICryptoTransform (ToBase64CharTransform) to stream Base64 encoded characters
  • Updated WriteOnSingleLine method to use the new transform via CryptoStream for efficient output conversion
Comments suppressed due to low confidence (1)

src/Microsoft.DotNet.XHarness.TestRunners.Xunit/WasmXmlResultWriter.cs:51

  • Using Debug.Assert to validate that outputOffset is a multiple of 2 may not enforce this condition in release builds. Consider adding a runtime check or throwing an exception to ensure this invariant in production code.
Debug.Assert(outputOffset % 2 == 0, "$\{nameof(outputOffset)} must be a multiple 2");
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the XML result writer to use a custom Base64 transformation to reduce large allocations during XML output generation.

  • Introduces a new ICryptoTransform implementation (ToBase64CharTransform) for converting Base64 bytes to UTF-16 encoded characters.
  • Refactors the WriteOnSingleLine method to use CryptoStream and custom transformation logic under non-Debug / .NET builds.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new memory‐efficient implementation for writing XML results on a single line by using a custom Base64 transformation to avoid large allocations.

  • Conditionally compiles two different implementations of WriteOnSingleLine for DEBUG/.NET vs. production builds.
  • Adds a new helper class (ToBase64CharTransform) that performs in-place Base64-to-UTF16 conversion via crypto streams.
Comments suppressed due to low confidence (1)

src/Microsoft.DotNet.XHarness.TestRunners.Xunit/WasmXmlResultWriter.cs:100

  • Although the buffer slice is sized to hold two bytes per character, BitConverter.TryWriteBytes could potentially fail. Consider adding an assertion or verifying its return value to ensure that each character conversion writes the expected data.
BitConverter.TryWriteBytes(outputChars.Slice(i * 2), (char)base64Written[i]);

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an alternative XML result writing mechanism in release builds to avoid large memory allocations, addressing dotnet/runtime#113474.

  • Implements a custom base64 character transformation (ToBase64CharTransform)
  • Updates WriteOnSingleLine to use a CryptoStream with the custom transformation
  • Adds new using directives for System.Runtime.InteropServices, System.Text, and System.Security.Cryptography

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR aims to reduce large memory allocations by changing how XML result data is written on a single line, addressing dotnet/runtime#113474.

  • Introduces a conditional compilation path that uses a custom ICryptoTransform implementation in production mode.
  • Implements a Base64-to-UTF16 conversion mechanism via ToBase64CharTransform.
  • Updates the WriteOnSingleLine method to use the new transformation in production builds.
Comments suppressed due to low confidence (1)

src/Microsoft.DotNet.XHarness.TestRunners.Xunit/WasmXmlResultWriter.cs:119

  • Similar to the TransformBlock method, this conversion relies on BitConverter.TryWriteBytes for UTF-16 expansion, which could behave inconsistently across platforms. Consider using an explicit encoding conversion method for clarity and reliability.
BitConverter.TryWriteBytes(outputBuffer.AsSpan(i * 2), (char)base64Buffer[i]);

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the WasmXmlResultWriter to avoid large allocations by introducing a custom Base64-to-UTF-16 conversion when not in DEBUG mode.

  • Adds a new implementation of WriteOnSingleLine using a custom ICryptoTransform (ToBase64CharTransform) with CryptoStream.
  • Introduces detailed Base64 transformation logic and expands the output buffer for UTF-16 encoding.
  • Updates using directives to support memory marshaling and cryptography.

@lewing lewing force-pushed the overkill branch 2 times, most recently from 5c8221f to 838110c Compare March 15, 2025 17:39
@lewing lewing requested a review from Copilot March 15, 2025 17:42
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the XML result writer for WebAssembly tests to avoid large memory allocations by utilizing a custom Base64-to-char stream transformation.

  • Introduces a new ICryptoTransform implementation (ToBase64CharTransform) to transform Base64 bytes into UTF-16 encoded characters.
  • Provides separate WriteOnSingleLine implementations for debug and production builds.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new implementation of WriteOnSingleLine for release builds by replacing the DEBUG branch with a custom Base64 transformation that avoids large allocations.

  • Introduces a new ICryptoTransform implementation (ToBase64CharTransform) for Base64 encoding with UTF-16 output
  • Refactors WriteOnSingleLine to use CryptoStream with the new transform in non-debug builds
Comments suppressed due to low confidence (1)

src/Microsoft.DotNet.XHarness.TestRunners.Xunit/WasmXmlResultWriter.cs:90

  • [nitpick] Consider defining a named constant for 'OutputBlockSize / 2' to improve readability and clarify the purpose of this calculation.
int base64OutputStart = outputOffset + OutputBlockSize / 2;

@lewing lewing changed the title Do some silyness to avoid large allocations Do some extra work to avoid making a full size copy of the buffer when base64 encoding Mar 15, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a custom ICryptoTransform implementation to perform Base64 encoding without creating a full-size copy of the buffer. Key changes include:

  • Adding a new internal class (ToBase64CharTransform) that directly transforms data into UTF-16 encoded Base64.
  • Using conditional compilation to use a different WriteOnSingleLine implementation in release builds.
  • Refactoring the WriteOnSingleLine method to leverage CryptoStream and MemoryMarshal for improved efficiency.
Comments suppressed due to low confidence (1)

src/Microsoft.DotNet.XHarness.TestRunners.Xunit/WasmXmlResultWriter.cs:106

  • [nitpick] BitConverter.TryWriteBytes is used to write a single char into a fixed-size slice and is expected to succeed given the slice size; consider replacing it with a direct assignment (e.g. outputChars[i * 2] = (char)base64Written[i]) for clarity and potential performance benefits.
if (!BitConverter.TryWriteBytes(outputChars.Slice(i * 2), (char)base64Written[i]))

using var ms = new MemoryStream();
assembliesElement.Save(ms);
ms.TryGetBuffer(out var bytes);
var base64 = Convert.ToBase64String(bytes, Base64FormattingOptions.None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that base64 could contain \n in the middle right ? If so, you can insert some which would make the buffer in emscripten smaller. Because stdout emulator would build buffer until newline and only then write it into browser console.log.

@maraf noted that maybe we had some issues with message order WRT WASM EXIT ... so I don't know if multi-line is good idea or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could in theory write each block as we transform it rather than create a single buffer. We should really review the emscripten line buffering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this could be fetch('POST', xml) in browser and nodejs.
Remaining is v8 shell which doesn't have file write. So we may need to keep this contraption.

Span<char> charData = MemoryMarshal.Cast<byte, char>(bytes.AsSpan());

// Output the result and the the ascii length of the data
Console.WriteLine($"STARTRESULTXML {charData.Length} {charData} ENDRESULTXML");
Copy link
Member Author

@lewing lewing Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Console.WriteLine($"STARTRESULTXML {charData.Length} {charData} ENDRESULTXML");
Console.Write($"STARTRESULTXML {charData.Length} ");
Console.Write(charData);
Console.WriteLine(" ENDRESULTXML");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure with message "MONO_WASM: Out of memory" on string constructor
2 participants