Skip to content

Conversation

linh2931
Copy link
Member

@linh2931 linh2931 commented Mar 12, 2025

This PR implements the core functionalities of sync calls, in particular the call host function. The Unit tests show basic call flows are working:

  • Sync calls whose receivers are the same as the senders
  • Sync calls whose receivers are different from the senders
  • Nested sync calls
  • Recursive sync calls
  • Sequential sync calls

Notes for reviewing:

  • Please roughly follow the order of libraries/chain/webassembly/sync_call.cpp, libraries/chain/apply_context.cpp and libraries/chain/webassembly/runtimes/eos-vm.cpp to review
  • eosvm-oc not working yet. Make OC allow for nested Executor execution #1043 (sync call unit tests skip eosvm-oc)
  • Please double check if transaction_checktime_timer related changes are correct
  • Tests will be updated using return values and console output after sync call return value and call trace are implemented (next immediate tasks)
  • Reset of apply_context iterator caches is tracked by a separate issue for keeping this PR focused on main call flow SC: Invalidate iterator_cache in apply_context after a sync call #1252

Resolves #1214

@linh2931 linh2931 requested review from spoonincode and heifner March 12, 2025 19:42
@@ -26,17 +26,21 @@ struct platform_timer {
callback still execute if the timer expires and stop() is called nearly simultaneously.
However, set_expiration_callback() is synchronized with the callback.
*/
void set_expiration_callback(void(*func)(void*), void* user) {
void set_expiration_callback(void(*func)(void*), void* user, bool appending = false) {
Copy link
Member Author

@linh2931 linh2931 Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the approach is deemed OK, I will replace this default parameter with regular parameter, and define an enum for the boolean, like {multi_callbacks_allowed, multi_callbacks_disallowed}.

@@ -91,6 +91,11 @@ namespace eosio { namespace chain {
my->apply( code_hash, vm_type, vm_version, context );
}

void wasm_interface::do_sync_call( const digest_type& code_hash, const uint8_t& vm_type, const uint8_t& vm_version, apply_context& context ) {
ilog("wasm_interface::do_sync_call");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a debug message left in

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_callback_variables_busy.store(false, std::memory_order_release);
}
}

std::atomic_bool _callback_variables_busy = false;
void(*_expiration_callback)(void*) = nullptr;
void* _expiration_callback_data;
std::vector<std::pair<void(*)(void*), void*>> _expiration_callbacks;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about the changes in this file some (or more generally just the semantics of timeout in the code). I am not 100% sure I completely grok the world yet to know for sure all the ramifications.

I think the code as is probably has a race. Notice how the callbacks are not called if the "spin lock" is currently held (it behaves like a "trylock"). This "lock" is held in set_expiration_callback(). So consider when a sync call completes the timed_run completes and then goes to remove its callback and during the removal of the callback the timer expires. This expiration will be ignored from EOS VM's perspective (all code sections remain executable). So then if we reenter the caller's JIT code that JIT code can no longer be timed out and can run forever.

I think for this particular example it can be resolved by ensuring we do a manual checktime() call after the callee completes but before we return control to the caller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking through this!

I added a checktime() before returning to the caller. ab70d89

return;
}

_expiration_callbacks.push_back({func, user});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem kind of unfortunate we need to maintain a stack of callbacks when it seems like there will only effectively be 2 unique ones registered. We could remove the stack with something like a swap_expiration_callback() and place the onus on restoring the value on whoever did the swap (much like we have a requirement whoever registers a callback today clears it). Some other approaches I can think of maybe go against JITs current design (for example instead of wiring up the callback to a timed_run we somehow wire it up directly to the backend allocator). I guess we can keep this as is for now but it'd be nice to find something better.

void do_sync_call(apply_context& context) override {
backend_t bkend;
typename eos_vm_runtime<Impl>::context_t exec_ctx;
vm::wasm_allocator wasm_alloc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to not create and destroy this for every call; it's "expensive" from the standpoint of allocating and freeing pages each time.

Can leave like this for now but,

Have you checked that this is not leaking memory? wasm_allocator requires an explicit free() call to free its resources and I don't believe it's called anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasm_allocator requires an explicit free() call to free its resources

Odd that it mmaps in its constructor and doesn't have a destructor to munmap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I do suspect free() should just be moved to be the dtor. Especially since it looks like free() isn't being called anywhere. (I haven't investigated it much though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I changed to use a pre-created pool of allocators to avoid expensive allocation for each sync call.

48c3e59

@@ -621,6 +628,8 @@ class apply_context {
bool privileged = false;
bool context_free = false;

std::optional<sync_call_context> sync_call_ctx{}; // only one of act and sync_call_ctx can be present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having apply_context having members that can be there or not, I think it would be cleaner to have something like (also that would remove the need for a separate class sync_call_context):

struct context_base {
   ...  // all common members
};

struct apply_context : public context_base {
   uint32_t recurse_depth; ///< how deep inline actions can recurse
   uint32_t first_receiver_action_ordinal = 0;
   uint32_t action_ordinal                = 0;
};

struct call_context : public context_base {
   account_name          sender;
   account_name          receiver;
   std::span<const char> data; // includes function name, arguments, and other information
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The Implementation section of the design document talked about this too. This PR focuses on demonstrating the feasibility of sync calls. I will create a task to track it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


action_name apply_context::get_sync_call_sender() const {
// Current context's receiver is the sender of next sync call
return sync_call_ctx ? sync_call_ctx->receiver : get_receiver();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an assert? Is it valid to call outside of a sync call? Also I don't understand what the comment is trying to convey.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh this makes me realize -- is there a way for a sync call to see who its caller is? (and if not, is that something desirable?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have sender argument in the sync_call entry point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay good missed that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an assert? Is it valid to call outside of a sync call? Also I don't understand what the comment is trying to convey.

A sync call be be initiated by a sync call (using sync_call_ctx) or an action (using apply_context). It is valid for this function to be called outside of a sync call.

I clarified the comments and move the function to the private section.

c2f95c0

@@ -597,8 +601,11 @@ class apply_context {
bool is_privileged()const { return privileged; }
action_name get_receiver()const { return receiver; }
const action& get_action()const { return *act; }
const action* get_action_ptr()const { return act; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used to assert? If so maybe the method should be something more like assert_sync_call.

Copy link
Member Author

@linh2931 linh2931 Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the validations which disallow certain functions under sync calls are planned to be done by #1218. I updated its description.

validating_tester t;

if( t.get_config().wasm_runtime == wasm_interface::vm_type::eos_vm_oc ) {
// skip eos_vm_oc for now.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OC changes have not been done.

typename eos_vm_runtime<Impl>::context_t exec_ctx;
vm::wasm_allocator wasm_alloc;

const std::optional<sync_call_context> sync_call_ctx = context.get_sync_call_ctx();
Copy link
Member

@heifner heifner Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems expensive to make this copy of the sync call data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} catch(eosio::vm::timeout_exception&) {
context.trx_context.checktime();
} catch(eosio::vm::wasm_memory_exception& e) {
FC_THROW_EXCEPTION(wasm_execution_error, "access violation: ${d}", ("d", e.detail()));
} catch(eosio::vm::exception& e) {
FC_THROW_EXCEPTION(wasm_execution_error, "eos-vm system failure: ${d}", ("d", e.detail()));
}

if (multi_expr_callbacks_allowed) {
context.trx_context.checktime(); // protect against the case where during the removal of the callback, the timer expires.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought it through entirely but my hunch is this check needs to be higher up (like maybe at the end of execute_sync_call() or such). Think about a case where the callee is being run by OC but the caller by JIT. We would need to add this checktime in both OC & JIT; or preferably just one common place higher up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to execute_sync_call for now and added a comment to the OC issue. We will find a common place when doing OC change.

fc22049

@@ -152,6 +152,7 @@ class eos_vm_instantiated_module : public wasm_instantiated_module_interface {
};

execute(context, bkend, exec_ctx, wasm_alloc, fn, true);
context.control.return_sync_call_wasm_allocator(); // return the wasm_allocator obtainded by get_sync_wasm_allocator back to the pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be make_scoped so it still runs upon exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively maybe sync_call_wasm_alloc_index can just be reset to 0 upon on a new apply context

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot just reset to 0 upon on a new apply context, as the apply context can make multiple sequential sync calls. I changed to use scoped exit mechanism.

06bdd12

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test to make 500 sync calls in a loop to verify wasm allocators are properly released after each call (we are not running out of them).

b1cbe35

@@ -1013,6 +1013,8 @@ struct controller_impl {
thread_local static platform_timer timer; // a copy for main thread and each read-only thread
#if defined(EOSIO_EOS_VM_RUNTIME_ENABLED) || defined(EOSIO_EOS_VM_JIT_RUNTIME_ENABLED)
thread_local static vm::wasm_allocator wasm_alloc; // a copy for main thread and each read-only thread
thread_local static std::vector<vm::wasm_allocator> sync_call_wasm_alloc; // used for sync call. expensive to create one for each call
uint32_t sync_call_wasm_alloc_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really okay not being thread_local? Wouldn't there be a problem for sync calls in parallel run read only transactions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your good eyes! I intended it to be thread_local but did not do that. Will correct it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a thread_local below in

thread_local uint32_t sync_call_wasm_alloc_index = 0;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 06bdd12

@@ -1301,6 +1303,9 @@ struct controller_impl {
if( shutdown ) shutdown();
} );

sync_call_wasm_alloc.resize(16); // Will change to use max_sync_call_depth of global property object in future version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this would need to be resized on each read only thread too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to do more of that work on #1257 -- it would have to resize upon the parameters changed etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, on #1257, I will do the resize changes using the new parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the initialization of sync_call_wasm_alloc on read threads at start up 06bdd12. Will update it in #1257 with actual sync_call_depth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a task for resizing the vector as it requires some experiment. #1269

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also dependent on if we allow privileged sync calls, since a privileged sync call might, for example, reduce the sync call depth beyond the current depth

@spoonincode
Copy link
Member

It seems like a sync call can not be privileged. I can't tell if that's intentional or not (it's not mentioned in design doc), and I don't know what to immediately think about it (i.e. does it prevent useful use cases of sync calls?)

@linh2931
Copy link
Member Author

It seems like a sync call can not be privileged. I can't tell if that's intentional or not (it's not mentioned in design doc), and I don't know what to immediately think about it (i.e. does it prevent useful use cases of sync calls?)

@arhag, any comments?

@spoonincode
Copy link
Member

@linh2931 which PR do you want to plumb in privileged support?

@linh2931 linh2931 mentioned this pull request Mar 26, 2025
@linh2931
Copy link
Member Author

@linh2931 which PR do you want to plumb in privileged support?

I plan to do it in the restructured sync_call_context class and need to consider changing max_sync_call_depth while in a sync call (probably not allow). I just created an issue to track the task
#1279

std::optional<sync_call_context> sync_call_ctx{}; // only one of act and sync_call_ctx can be present

// Returns the sender of any sync call initiated by this apply_context or its sync_call_ctx
action_name get_sync_call_sender() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange that this is action_name for a name that isn't an action. But I see it's the same for get_sender()

Copy link
Member Author

@linh2931 linh2931 Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1273, get_sync_call_sender() is replaced with the virtual function get_sender(). Maybe the return type action_name should be changed to account_name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah and it's still action_name there too. Maybe could change it in that PR since so much else is being changed.

@linh2931 linh2931 merged commit e52828d into sync_call Mar 27, 2025
36 checks passed
@linh2931 linh2931 deleted the call_implementation branch March 27, 2025 11:52
@ericpassmore
Copy link
Contributor

Note:start
category: Other
component: SyncCalls
summary: Initial commit of core functionality for sync calls.
Note:end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants