Skip to content

Conversation

andreakarasho
Copy link
Contributor

@andreakarasho andreakarasho commented Sep 23, 2025

I'm not very satisfied with #1389 because as I wrote, other languages still leaking.
I saw the existence of the following instructions which seems to be not used (or at least I didn't find a way to trigger them).

/// Used exclusively for guest-code generation this indicates that
/// the standard memory deallocation function needs to be invoked with
/// the specified parameters.
///
/// This will pop a pointer from the stack and push nothing.
GuestDeallocate {
size: ArchitectureSize,
align: Alignment,
} : [1] => [0],
/// Used exclusively for guest-code generation this indicates that
/// a string is being deallocated. The ptr/length are on the stack and
/// are poppped off and used to deallocate the string.
GuestDeallocateString : [2] => [0],
/// Used exclusively for guest-code generation this indicates that
/// a list is being deallocated. The ptr/length are on the stack and
/// are poppped off and used to deallocate the list.
///
/// This variant also pops a block off the block stack to be used as the
/// body of the deallocation loop.
GuestDeallocateList {
element: &'a Type,
} : [2] => [0],
/// Used exclusively for guest-code generation this indicates that
/// a variant is being deallocated. The integer discriminant is popped
/// off the stack as well as `blocks` number of blocks popped from the
/// blocks stack. The variant is used to select, at runtime, which of
/// the blocks is executed to deallocate the variant.
GuestDeallocateVariant {
blocks: usize,
} : [1] => [0],

My proposal is quite simple: parse all sign parameters, push them to the stack and deallocate. This should solve all memory leaks for all languages (a revert of #1389 is necessary btw).

I'm not sure about that Deallocate::Lists parameter.
Also the generation fails when the sign is marked as indirect_params, not sure how to do procede yet.

Appreciate a feedback!

@jsturtevant @yowl @pavelsavara

@jsturtevant
Copy link
Collaborator

@alexcrichton might be the best to give some insight here as to whether this is possible or if there are other constraints

@andreakarasho
Copy link
Contributor Author

About indirect_params, the deallocation seems to happen here https://github.com/bytecodealliance/wit-bindgen/blob/main/crates/core/src/abi.rs#L1167

@alexcrichton
Copy link
Member

Sorry I haven't kept up-to-date on the other PR referenced here. Can you describe why you think other languages are already leaking and how this solves things? We have some tests for "shouldn't leak" in various places, although they're not comprehensive so it doesn't mean something didn't slip through the cracks. Given all the test failures here though I'm not sure that this PR as-is will work?

@andreakarasho
Copy link
Contributor Author

According to the wit-bindgen doc for the exported functions, the guest must free the memory

- **Arguments**: Your component receives ownership of all arguments passed to the exported function and is responsible for freeing them.

The PR aims to parse all function parameters and emit a deallocation instruction when possible.
I strongly think this should be the correct behaviour because will make the application intent clear.
My solution it's probably wrong and can be done in a better way (I'm not a Rust dev neither I have enough knownledge of the project to make advanced PRs). That's the reason why I left the PR in draft and asked for feedbacks. I surely need a way to setup tests correclty on my side because they were failing also on a clean branch.

Example

Giving the following wit file:

package tests:component@0.1.0;

world test {
    export accept-string: func(s: string, b: string);
}

C# was leaking

See:
Problem: #1377
Solution: #1389

C is leaking

Missing freecalls in __wasm_export_exports_test_accept_string function.

// Generated by `wit-bindgen` 0.46.0. DO NOT EDIT!
#include "test.h"
#include <stdlib.h>
#include <string.h>

// Exported Functions from `test`


// Canonical ABI intrinsics

__attribute__((__weak__, __export_name__("cabi_realloc")))
void *cabi_realloc(void *ptr, size_t old_size, size_t align, size_t new_size) {
  (void) old_size;
  if (new_size == 0) return (void*) align;
  void *ret = realloc(ptr, new_size);
  if (!ret) abort();
  return ret;
}

// Helper Functions

void test_string_set(test_string_t *ret, const char*s) {
  ret->ptr = (uint8_t*) s;
  ret->len = strlen(s);
}

void test_string_dup(test_string_t *ret, const char*s) {
  ret->len = strlen(s);
  ret->ptr = (uint8_t*) cabi_realloc(NULL, 0, 1, ret->len * 1);
  memcpy(ret->ptr, s, ret->len * 1);
}

void test_string_free(test_string_t *ret) {
  if (ret->len > 0) {
    free(ret->ptr);
  }
  ret->ptr = NULL;
  ret->len = 0;
}

// Component Adapters

__attribute__((__export_name__("accept-string")))
void __wasm_export_exports_test_accept_string(uint8_t * arg, size_t arg0, uint8_t * arg1, size_t arg2) {
  test_string_t arg3 = (test_string_t) { (uint8_t*)(arg), (arg0) };
  test_string_t arg4 = (test_string_t) { (uint8_t*)(arg1), (arg2) };
  exports_test_accept_string(&arg3, &arg4);
}

// Ensure that the *_component_type.o object is linked in

extern void __component_type_object_force_link_test(void);
__attribute__((used))
void __component_type_object_force_link_test_public_use_in_this_compilation_unit(void) {
  __component_type_object_force_link_test();
}

CPP is not leaking

If I read correctly the wit::string class disposes the pointer once a wit:string instance gets out of scope thanks to the .dctor

// Generated by `wit-bindgen` 0.46.0. DO NOT EDIT!

// Ensure that the *_component_type.o object is linked in
#ifdef __wasm32__
extern "C" void __component_type_object_force_link_test(void);
__attribute__((used))
void __component_type_object_force_link_test_public_use_in_this_compilation_unit(void) {
  __component_type_object_force_link_test();
}
#endif
#include "test_cpp.h"
#include <cstdlib> // realloc

extern "C" void *cabi_realloc(void *ptr, size_t old_size, size_t align, size_t new_size);

__attribute__((__weak__, __export_name__("cabi_realloc")))
void *cabi_realloc(void *ptr, size_t old_size, size_t align, size_t new_size) {
  (void) old_size;
  if (new_size == 0) return (void*) align;
  void *ret = realloc(ptr, new_size);
  if (!ret) abort();
  return ret;
}


extern "C" __attribute__((__export_name__("accept-string")))
void accept_string(uint8_t * arg0, size_t arg1, uint8_t * arg2, size_t arg3)
{
  auto len0 = arg1;

  auto len1 = arg3;

  exports::test::AcceptString(wit::string((char const*)(arg0), len0), wit::string((char const*)(arg2), len1));
}

// Component Adapters
/// A string in linear memory, freed unconditionally using free
///
/// A normal C++ string makes no guarantees about where the characters
/// are stored and how this is freed.
class string {
  uint8_t const *data_;
  size_t length;
  // C++ is horrible!
  //constexpr uint8_t const *const empty_ptr = (uint8_t const *)1;
  static uint8_t const* empty_ptr() { return (uint8_t const *)1; }

public:
  // this constructor is helpful for creating vector<string>
  string(string const &b) : string(string::from_view(b.get_view())) {}
  string(string &&b) : data_(b.data_), length(b.length) { b.data_ = nullptr; }
  string &operator=(string const &) = delete;
  string &operator=(string &&b) {
    if (data_ && data_!=empty_ptr()) {
      free(const_cast<uint8_t *>(data_));
    }
    data_ = b.data_;
    length = b.length;
    b.data_ = nullptr;
    return *this;
  }
  string(char const *d, size_t l) : data_((uint8_t const *)d), length(l) {}
  char const *data() const { return (char const *)data_; }
  size_t size() const { return length; }
  bool empty() const { return !length; }
  ~string() {
    if (data_ && data_!=empty_ptr()) {
      free(const_cast<uint8_t *>(data_));
    }
  }
  // leak the memory
  void leak() { data_ = nullptr; }
  // typically called by post
  static void drop_raw(void *ptr) { free(ptr); }
  std::string_view get_view() const {
    return std::string_view((const char *)data_, length);
  }
  std::string to_string() const {
    return std::string((const char *)data_, length);
  }
  static string from_view(std::string_view v) {
    if (!v.size()) return string((char const*)empty_ptr(), 0);
    char* addr = (char*)malloc(v.size());
    memcpy(addr, v.data(), v.size());
    return string(addr, v.size());
  }
  char* begin() {
    return (char*)data_;
  }
  char* end() {
    return (char*)data_ + length;
  }
  char const* begin() const {
    return (char const*)data_;
  }
  char const* end() const {
    return (char const*)data_ + length;
  }
};

Rust

I'm not sure what happens here and if the borrow checker is involved somehow

@alexcrichton
Copy link
Member

Aha ok thanks for the explanation! I think I see what's going on here.

The intent is that the pointer/length used to create a string when receiving it from the outside world in this case is an owned allocation. The guest language, in this case the bindings generator, is responsible for freeing it. This is why Rust/C++ do not leak is because they have codified that they take ownership of the allocation. Rust, for example, avoids copying the string as a result and uses the allocation in-place.

Your example with C does indeed leak but that's because it doesn't explicitly call a deallocation function. I realize that what I'm saying here is "your C code leaks because you didn't write it to not leak" which can feel a bit odd, but C isn't a great example of leaks/resource management since everything is manual anyway so the semantics of the bindings generator kind of don't matter. Effectively to write correct C you have to have all the rules in your head at all times.

For C# I think the correct behavior here will be to convert the pointer/length to a C#-native string and then deallocate the original allocation. That should prevent leaks and is what's required if C# can't use the pointer/length as-is at rest. Given the configurability of what can be done with the pointer allocation, I'd prefer to leave this up to individual bindings generators to decide what to do rather than codify that it's always free'd.

@tschneidereit
Copy link
Member

To give an example of what Alex is talking about, in the StarlingMonkey JS runtime we use the C bindings, but create JS strings directly from the pointer/length pair without copying. If the bindings deallocated unconditionally, we'd be forced to do an extra copy instead

@yowl
Copy link
Collaborator

yowl commented Sep 29, 2025

Makes sense, as in c# we are using .net strings we should deallocate the passed in ptr (from the ptr/length) . I don't believe we can use it in place as a .net string - it would need to be tracked by the GC. We could however have a new type of string, perhaps as a flag to the bindgen, that would be more optimial, using the ptr in place, but losing the .net string goodness. It would then be up to the consumer if they wanted to take the hit of the deallocation/allocation as a tax for getting a .net string.

Not suggesting we add that here, just a thought for the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants