Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][tasks] Extract EmitWasmBundle into generic EmitBundle task and enable bundling in mono self-contained library #84191

Merged
merged 115 commits into from
Jun 20, 2023

Conversation

mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Mar 31, 2023

Contributes to #79377

In order for Mono's self-contained shared|static library mode to truly be self-contained (i.e. not relying on external resources such as compiled managed assemblies existing on disk), bundling the resources into the shared library (i.e. storing the data of necessary resources directly inside the shared library itself) is necessary. Wasm and Wasi already has a form of bundling implemented in #82250.

This PR looks to extract the bundling logic out of wasm specific files to allow other mobile platforms leverage the technology, and incorporate bundling into mono's library mode built through #81919 with runtime initialization through #82253 and #83050.

This PR does the following:

  • Extract Wasm's bundling task into a more general task

  • Incorporates bundling into Mono's self-contained library mode

  • Expand bundling to handle various types of resources (assemblies + associated pdb, satellite assemblies, other data resources [i.e. runtimeconfig.bin/timezones])

    • Defines new MonoBundle*Resource structs all aligned off of MonoBundledResource to handle various types
    • Utilizes a hash table to store pointers MonoBundled*Resources
    • Bootstrap hash table into preexisting bundle resource retrieval paths
  • Modifies EmitBundle input parameters

    • List of files to bundle with possible RegisteredFile metadata designating what the resource should be registered under in the hashtable.
    • Registration function symbol to associate with logic that registers bundled resources via new mono_bundled_resources_add bundling api.
    • Optional file path to generate logic to preallocate MonoBundled*Resources and define passed in registration function
    • Directory to place generated files
  • Output generated bundled resource files from EmitBundle task

    • Appends DataSymbol, DataLenSymbol, DataLenSymbolValue, RegisteredName, and DestinationFile metadata
  • Deprecates old bundling api in favor of new bundling api


Example Files

snippet of source file containing byte array data + lengths
source file to register preallocated resource structs, including typedefs and resource symbols


Testing

Mono's self-contained library mode without bundling had been tested in #83050
It was done by building an Android Functional Test with changes that

  • Removed built in runtime initialization logic from AndroidAppBuilder's monodroid.c
  • Enable Library mode through NativeLib=shared and ForceFullAOT=true (in addition to other tweaks to generate the app in library mode)
  • Add a custom managed side API that is called from native
  • Pass the path to relevant resources (i.e. managed assemblies, runtimeconfig.bin) on disk into the environment variable DOTNET_LIBRARY_ASSEMBLY_PATH
  • Validate that the mono runtime is initialized and that the managed side API has been properly invoked.

In order to validate bundling, DOTNET_LIBRARY_ASSEMBLY_PATH was not set, and BundlesAssemblies was passed to the LibraryBuilder. The app was able to bundle assemblies + runtimeconfig.bin, initialize runtimeconfig, auto initialize mono runtime, and invoke the managed side API properly.

@ghost
Copy link

ghost commented Mar 31, 2023

Tagging subscribers to this area: @directhex
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: mdh1418
Assignees: -
Labels:

area-Infrastructure-mono

Milestone: -

@mdh1418 mdh1418 force-pushed the mono_library_mode_bundling branch from 92c21a6 to 6cf49fe Compare April 3, 2023 21:21
@mdh1418 mdh1418 changed the title Mono library mode bundling [mono][tasks] Extract EmitWasmBundle into generic EmitBundle task and enable bundling in mono self-contained library Apr 3, 2023
@mdh1418
Copy link
Member Author

mdh1418 commented Jun 9, 2023

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mdh1418
Copy link
Member Author

mdh1418 commented Jun 9, 2023

Looking at other instances where there is MONO_WEBCIL_IN_WASM_EXTENSION to assist in assembly searching, there is

#ifdef ENABLE_WEBCIL
else {
/* /path/foo.dll -> /path/foo.webcil */
size_t n = strlen (fullpath);
if (n < strlen(".dll"))
continue;
n -= strlen(".dll");
char *fullpath2 = g_malloc (n + strlen(".webcil") + 1);
g_strlcpy (fullpath2, fullpath, n + 1);
g_strlcpy (fullpath2 + n, ".webcil", strlen(".webcil") + 1);
if (g_file_test (fullpath2, G_FILE_TEST_IS_REGULAR)) {
MonoImageOpenStatus status;
result = mono_assembly_request_open (fullpath2, &req, &status);
}
g_free (fullpath2);
if (result)
break;
char *fullpath3 = g_malloc (n + strlen(MONO_WEBCIL_IN_WASM_EXTENSION) + 1);
g_strlcpy (fullpath3, fullpath, n + 1);
g_strlcpy (fullpath3 + n, MONO_WEBCIL_IN_WASM_EXTENSION, strlen(MONO_WEBCIL_IN_WASM_EXTENSION) + 1);
if (g_file_test (fullpath3, G_FILE_TEST_IS_REGULAR)) {
MonoImageOpenStatus status;
result = mono_assembly_request_open (fullpath3, &req, &status);
}
g_free (fullpath3);
if (result)
break;
}
#endif
and
#ifdef ENABLE_WEBCIL
if (!corlib) {
/* Maybe its in a bundle */
char *corlib_name = g_strdup_printf ("%s.webcil", MONO_ASSEMBLY_CORLIB_NAME);
corlib = mono_assembly_request_open (corlib_name, &req, &status);
g_free (corlib_name);
}
if (!corlib) {
/* Maybe its in a bundle */
char *corlib_name = g_strdup_printf ("%s%s", MONO_ASSEMBLY_CORLIB_NAME, MONO_WEBCIL_IN_WASM_EXTENSION);
corlib = mono_assembly_request_open (corlib_name, &req, &status);
g_free (corlib_name);
}
#endif

The new bundling apis rely on a hashtable, which has the feature of aliasing known assembly extensions .dll, .wasm, .webcil (if webcil is enabled) all to .dll, removing some confusion on what extension the bundled assembly resource has.

The latter specifically mentions bundle, so it seems "safe" to remove that block altogether, but not sure if the former is specific to bundling. It also relies on mono_assembly_request_open, and that has fallbacks if not found in a bundle, @lambdageek do you know if that is also bundle specific path? Should we be able to remove both blocks?

Copy link
Member

@lateralusX lateralusX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work!

Copy link
Member

@pavelsavara pavelsavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just couple of lazy questions, looks great!

I will follow-up with #87284

src/mono/mono/component/mini-wasm-debugger.c Show resolved Hide resolved
if (!strcmp (bsymfile->aname, assembly_name))
return TRUE;
#ifdef ENABLE_WEBCIL
const char *p = strstr (assembly_name, ".webcil");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does (unbundled) webcil still work ?

Copy link
Member Author

@mdh1418 mdh1418 Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests would validate unbundled webcil scenario? I think I was having trouble knowing how to enable webcil when running wasm/wasi tests.

This function, however, is only invoked in the two bundling specific APIs right below.
The new bundling logic will alias all known extensions of assemblies .dll, .wasm, .webcil (if enabled) to .dll, so that helps eliminate a chunk of logic that had searched for the different variations, hence the logic to check for .webcil or .wasm is no longer needed in bundling search paths

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webcil is currently useful for browser. The debugger probably doesn't know about the format difference and possibly not even about name difference. I don't know if we have automated tests for debugger on webcil.

@mdh1418
Copy link
Member Author

mdh1418 commented Jun 12, 2023

System.Runtime.Loader.Tests on wasm tests the satellite assembly bundling logic, registration and lookup are working as intended and all of those tests pass.

Copy link
Member

@radical radical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review in progress..

namespace Microsoft.WebAssembly.Build.Tasks;

public class EmitWasmBundleObjectFiles : EmitWasmBundleBase
public class EmitBundleObjectFiles : EmitBundleBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use some namespace for this. can be done in a follow up PR, along with adding common namespace for the other tasks in this assembly.

src/mono/wasi/build/WasiApp.Native.targets Outdated Show resolved Hide resolved
Comment on lines 22 to 24
private Dictionary<string, string> resourceDataSymbolDictionary = new();

private Dictionary<string, string[]> resourcesForDataSymbolDictionary= new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private instance fields should start with _ like _resourceDataSymbolDictionary.
And these two should be readonly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixed with _
What is the rationale for making them readonly? These variables are being modified within the Execute() method, is that considered the constructor for tasks, in which readonly fields should only be modified within as suggested here.

/// Could have RegisteredName, otherwise it would be the filename.
/// RegisteredName should be prefixed with namespace in form of unix like path. For example: "/usr/share/zoneinfo/"
[Required]
public ITaskItem[] FilesToBundle { get; set; } = default!;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public ITaskItem[] FilesToBundle { get; set; } = default!;
public ITaskItem[] FilesToBundle { get; set; } = Array.Empty<ITaskItem>();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, out of curiosity, why is this preferred over the default keyword?

string? managedAssemblyCulture = null;

var resourceMetadataReader = PEReaderExtensions.GetMetadataReader(resourcePEReader);
if (resourceMetadataReader.IsAssembly)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this return true for satellite assemblies also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe it does. Put in some logging and saw that this block was hit for satellite assemblies as well

Copy link
Member

@radical radical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review in progress..

bundledResource.SetMetadata("RegisteredName", registeredName);
}

string resourceDataSymbol = $"bundled_resource_{ToSafeSymbolName(TruncateEncodedHash(Utils.ComputeHashEx(resourcePath, Utils.HashAlgorithmType.SHA256, Utils.HashEncodingType.Base64Safe), MaxEncodedHashLength))}";
Copy link
Member

@radical radical Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output filename should be more descriptive though, like:

            string destinationFileName = $"{Path.GetFileName(resourcePath)}_{ToSafeSymbolName(TruncateEncodedHash(Utils.ComputeHashEx(resourcePath, Utils.HashAlgorithmType.SHA256, Utils.HashEncodingType.Base64Safe), MaxEncodedHashLength))}_bundle{GetOutputFileExtension()}";

EDIT: .. which would be useful when debugging a build. It might be even be useful to emit these into a subdir bundles.

Copy link
Member

@lateralusX lateralusX Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that we will share identical bundled assets so we just generate one bundled item instead of duplicates for identical content, that means multiple different resources will map the same destination file, so that is why we can't use the file name as part of the destination file name, only the hash. Inside the generated file (if its a source file) there is a list of all resources that maps the same destination file.

Copy link
Member

@lateralusX lateralusX Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to add more logging around this to aid potential debugging mapping input asset -> destination file, but we would need to keep the naming based on the unique content of the input asset in order to only emit one bundled resource for identical input assets. One example where we could have identical input is time zone info, in that case we will register the different time zones using its unique resource name in runtime, but we share one binary representation if several time zones end up with identical time zone data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_resourcesForDataSymbolDictionary tracks which resource registered names have the same file contents and therefore have the same symbol names. I'll add logging around _resourcesForDataSymbolDictionary logic to note when multiple resources have been mapped to the same destination file.

Copy link
Member

@maraf maraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@mdh1418
Copy link
Member Author

mdh1418 commented Jun 13, 2023

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mdh1418
Copy link
Member Author

mdh1418 commented Jun 20, 2023

Related failures:
runtime-wasm: #87423 known
runtime: #87505 fixed
Timeouts: #76454

@mdh1418 mdh1418 merged commit 615eb4a into dotnet:main Jun 20, 2023
@mdh1418 mdh1418 deleted the mono_library_mode_bundling branch June 20, 2023 19:42
@ghost ghost locked as resolved and limited conversation to collaborators Jul 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants