-
Notifications
You must be signed in to change notification settings - Fork 15
SC: Implement the core functionalities of sync calls #1251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -26,17 +26,21 @@ struct platform_timer { | |||
callback still execute if the timer expires and stop() is called nearly simultaneously. | |||
However, set_expiration_callback() is synchronized with the callback. | |||
*/ | |||
void set_expiration_callback(void(*func)(void*), void* user) { | |||
void set_expiration_callback(void(*func)(void*), void* user, bool appending = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the approach is deemed OK, I will replace this default parameter with regular parameter, and define an enum for the boolean, like {multi_callbacks_allowed, multi_callbacks_disallowed}
.
…nc_call entry point
…th sync_call_context to better reflect what they are and make sync_call_context fit into apply_context better
libraries/chain/wasm_interface.cpp
Outdated
@@ -91,6 +91,11 @@ namespace eosio { namespace chain { | |||
my->apply( code_hash, vm_type, vm_version, context ); | |||
} | |||
|
|||
void wasm_interface::do_sync_call( const digest_type& code_hash, const uint8_t& vm_type, const uint8_t& vm_version, apply_context& context ) { | |||
ilog("wasm_interface::do_sync_call"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a debug message left in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_callback_variables_busy.store(false, std::memory_order_release); | ||
} | ||
} | ||
|
||
std::atomic_bool _callback_variables_busy = false; | ||
void(*_expiration_callback)(void*) = nullptr; | ||
void* _expiration_callback_data; | ||
std::vector<std::pair<void(*)(void*), void*>> _expiration_callbacks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about the changes in this file some (or more generally just the semantics of timeout in the code). I am not 100% sure I completely grok the world yet to know for sure all the ramifications.
I think the code as is probably has a race. Notice how the callbacks are not called if the "spin lock" is currently held (it behaves like a "trylock"). This "lock" is held in set_expiration_callback(). So consider when a sync call completes the timed_run completes and then goes to remove its callback and during the removal of the callback the timer expires. This expiration will be ignored from EOS VM's perspective (all code sections remain executable). So then if we reenter the caller's JIT code that JIT code can no longer be timed out and can run forever.
I think for this particular example it can be resolved by ensuring we do a manual checktime()
call after the callee completes but before we return control to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking through this!
I added a checktime()
before returning to the caller. ab70d89
return; | ||
} | ||
|
||
_expiration_callbacks.push_back({func, user}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem kind of unfortunate we need to maintain a stack of callbacks when it seems like there will only effectively be 2 unique ones registered. We could remove the stack with something like a swap_expiration_callback()
and place the onus on restoring the value on whoever did the swap (much like we have a requirement whoever registers a callback today clears it). Some other approaches I can think of maybe go against JITs current design (for example instead of wiring up the callback to a timed_run we somehow wire it up directly to the backend allocator). I guess we can keep this as is for now but it'd be nice to find something better.
void do_sync_call(apply_context& context) override { | ||
backend_t bkend; | ||
typename eos_vm_runtime<Impl>::context_t exec_ctx; | ||
vm::wasm_allocator wasm_alloc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferable to not create and destroy this for every call; it's "expensive" from the standpoint of allocating and freeing pages each time.
Can leave like this for now but,
Have you checked that this is not leaking memory? wasm_allocator
requires an explicit free()
call to free its resources and I don't believe it's called anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasm_allocator
requires an explicitfree()
call to free its resources
Odd that it mmap
s in its constructor and doesn't have a destructor to munmap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I do suspect free() should just be moved to be the dtor. Especially since it looks like free() isn't being called anywhere. (I haven't investigated it much though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I changed to use a pre-created pool of allocators to avoid expensive allocation for each sync call.
@@ -621,6 +628,8 @@ class apply_context { | |||
bool privileged = false; | |||
bool context_free = false; | |||
|
|||
std::optional<sync_call_context> sync_call_ctx{}; // only one of act and sync_call_ctx can be present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having apply_context
having members that can be there or not, I think it would be cleaner to have something like (also that would remove the need for a separate class sync_call_context
):
struct context_base {
... // all common members
};
struct apply_context : public context_base {
uint32_t recurse_depth; ///< how deep inline actions can recurse
uint32_t first_receiver_action_ordinal = 0;
uint32_t action_ordinal = 0;
};
struct call_context : public context_base {
account_name sender;
account_name receiver;
std::span<const char> data; // includes function name, arguments, and other information
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. The Implementation section of the design document talked about this too. This PR focuses on demonstrating the feasibility of sync calls. I will create a task to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
action_name apply_context::get_sync_call_sender() const { | ||
// Current context's receiver is the sender of next sync call | ||
return sync_call_ctx ? sync_call_ctx->receiver : get_receiver(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an assert? Is it valid to call outside of a sync call? Also I don't understand what the comment is trying to convey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh this makes me realize -- is there a way for a sync call to see who its caller is? (and if not, is that something desirable?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have sender
argument in the sync_call
entry point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay good missed that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an assert? Is it valid to call outside of a sync call? Also I don't understand what the comment is trying to convey.
A sync call be be initiated by a sync call (using sync_call_ctx) or an action (using apply_context). It is valid for this function to be called outside of a sync call.
I clarified the comments and move the function to the private section.
@@ -597,8 +601,11 @@ class apply_context { | |||
bool is_privileged()const { return privileged; } | |||
action_name get_receiver()const { return receiver; } | |||
const action& get_action()const { return *act; } | |||
const action* get_action_ptr()const { return act; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only used to assert? If so maybe the method should be something more like assert_sync_call
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the validations which disallow certain functions under sync calls are planned to be done by #1218. I updated its description.
validating_tester t; | ||
|
||
if( t.get_config().wasm_runtime == wasm_interface::vm_type::eos_vm_oc ) { | ||
// skip eos_vm_oc for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OC changes have not been done.
typename eos_vm_runtime<Impl>::context_t exec_ctx; | ||
vm::wasm_allocator wasm_alloc; | ||
|
||
const std::optional<sync_call_context> sync_call_ctx = context.get_sync_call_ctx(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems expensive to make this copy of the sync call data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} catch(eosio::vm::timeout_exception&) { | ||
context.trx_context.checktime(); | ||
} catch(eosio::vm::wasm_memory_exception& e) { | ||
FC_THROW_EXCEPTION(wasm_execution_error, "access violation: ${d}", ("d", e.detail())); | ||
} catch(eosio::vm::exception& e) { | ||
FC_THROW_EXCEPTION(wasm_execution_error, "eos-vm system failure: ${d}", ("d", e.detail())); | ||
} | ||
|
||
if (multi_expr_callbacks_allowed) { | ||
context.trx_context.checktime(); // protect against the case where during the removal of the callback, the timer expires. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought it through entirely but my hunch is this check needs to be higher up (like maybe at the end of execute_sync_call()
or such). Think about a case where the callee is being run by OC but the caller by JIT. We would need to add this checktime in both OC & JIT; or preferably just one common place higher up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to execute_sync_call for now and added a comment to the OC issue. We will find a common place when doing OC change.
@@ -152,6 +152,7 @@ class eos_vm_instantiated_module : public wasm_instantiated_module_interface { | |||
}; | |||
|
|||
execute(context, bkend, exec_ctx, wasm_alloc, fn, true); | |||
context.control.return_sync_call_wasm_allocator(); // return the wasm_allocator obtainded by get_sync_wasm_allocator back to the pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be make_scoped
so it still runs upon exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively maybe sync_call_wasm_alloc_index
can just be reset to 0 upon on a new apply context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot just reset to 0 upon on a new apply context, as the apply context can make multiple sequential sync calls. I changed to use scoped exit mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test to make 500 sync calls in a loop to verify wasm allocators are properly released after each call (we are not running out of them).
libraries/chain/controller.cpp
Outdated
@@ -1013,6 +1013,8 @@ struct controller_impl { | |||
thread_local static platform_timer timer; // a copy for main thread and each read-only thread | |||
#if defined(EOSIO_EOS_VM_RUNTIME_ENABLED) || defined(EOSIO_EOS_VM_JIT_RUNTIME_ENABLED) | |||
thread_local static vm::wasm_allocator wasm_alloc; // a copy for main thread and each read-only thread | |||
thread_local static std::vector<vm::wasm_allocator> sync_call_wasm_alloc; // used for sync call. expensive to create one for each call | |||
uint32_t sync_call_wasm_alloc_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really okay not being thread_local? Wouldn't there be a problem for sync calls in parallel run read only transactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your good eyes! I intended it to be thread_local but did not do that. Will correct it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a thread_local below in
spring/libraries/chain/controller.cpp
Line 5073 in fc22049
thread_local uint32_t sync_call_wasm_alloc_index = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 06bdd12
@@ -1301,6 +1303,9 @@ struct controller_impl { | |||
if( shutdown ) shutdown(); | |||
} ); | |||
|
|||
sync_call_wasm_alloc.resize(16); // Will change to use max_sync_call_depth of global property object in future version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this would need to be resized on each read only thread too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to do more of that work on #1257 -- it would have to resize upon the parameters changed etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, on #1257, I will do the resize changes using the new parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a task for resizing the vector as it requires some experiment. #1269
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably also dependent on if we allow privileged sync calls, since a privileged sync call might, for example, reduce the sync call depth beyond the current depth
…read_local definition for sync_call_wasm_alloc_index
…llocators in all read-only threads and the main thread
It seems like a sync call can not be privileged. I can't tell if that's intentional or not (it's not mentioned in design doc), and I don't know what to immediately think about it (i.e. does it prevent useful use cases of sync calls?) |
@arhag, any comments? |
@linh2931 which PR do you want to plumb in privileged support? |
std::optional<sync_call_context> sync_call_ctx{}; // only one of act and sync_call_ctx can be present | ||
|
||
// Returns the sender of any sync call initiated by this apply_context or its sync_call_ctx | ||
action_name get_sync_call_sender() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange that this is action_name
for a name that isn't an action. But I see it's the same for get_sender()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1273, get_sync_call_sender()
is replaced with the virtual function get_sender()
. Maybe the return type action_name
should be changed to account_name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and it's still action_name
there too. Maybe could change it in that PR since so much else is being changed.
Note:start |
This PR implements the core functionalities of sync calls, in particular the
call
host function. The Unit tests show basic call flows are working:Notes for reviewing:
libraries/chain/webassembly/sync_call.cpp
,libraries/chain/apply_context.cpp
andlibraries/chain/webassembly/runtimes/eos-vm.cpp
to revieweosvm-oc
not working yet. Make OC allow for nested Executor execution #1043 (sync call unit tests skip eosvm-oc)transaction_checktime_timer
related changes are correctiterator_cache
inapply_context
after a sync call #1252Resolves #1214