Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stalker: Fix recompile error in Stalker when processing the same real_address block with different insn multiple times. #995

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ZSA233
Copy link

@ZSA233 ZSA233 commented Feb 20, 2025

Problem Resolution:

  1. Due to incorrect snapshot comparison by memcpy in gum_exec_ctx_obtain_block_for, the instruction changes in the same real_address could not trigger a recompile, resulting in erroneous reuse of the first block when block->storage_block branch.
  2. When performing recompile with scratch writes, if the original block cannot accommodate the space required by the instructions generated by the new transform(), the generated js_callback was not destroyed, leading to an issue.

Reproduction Scenario:

func alloc_mem_to_exec(void * start_addr, byte[] insnBytes) {
    void *mem = mmap(start_addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANON | MAP_PRIVATE, -1, 0);

    memcpy(mem, insnBytes, len(insnBytes));

    // flush cpu cached
    __asm__ volatile(
        "dsb sy\n"
        "isb sy\n"
    );

    // call func
    func_t func = reinterpret_cast<func_t>(mem);
    func();

    munmap(mem, page_size);

}

func main(){
    void *start_addr = nullptr; // make sure using the same area
    byte[] nopRet = [
            0x1F, 0x20, 0x03, 0xD5, // nop
            0xD6, 0x5F, 0x03, 0xC0, // ret
    ];
    byte[] ret = [
            0xD6, 0x5F, 0x03, 0xC0, // ret
    ];
    alloc_mem_to_exec(start_addr, nopRet);      // first block
    alloc_mem_to_exec(start_addr, ret);         // [√] recompile
    alloc_mem_to_exec(start_addr, nopRet);      // [x] no recompile
    alloc_mem_to_exec(start_addr, ret);         // [x] no recompile
}

Test

gumstalker-arm64 has been tested, and gumstalker-arm was fixed based on it but not tested.

@oleavr oleavr force-pushed the fix/stalker-wrong-recompile branch from 7787a26 to bf204fe Compare March 12, 2025 08:12
Copy link
Member

@oleavr oleavr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sorry it took me so long to get to this.

@@ -2586,11 +2588,12 @@ gum_exec_ctx_obtain_block_for (GumExecCtx * ctx,
{
const gint trust_threshold = ctx->stalker->trust_threshold;
gboolean still_up_to_date;
GumExecBlock *active_block = gum_exec_block_get_active_block(block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's dealing with a symptom instead of the root cause -- isn't the root cause that ctx->mappings should have been updated so it has the new block (storage_block)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's dealing with a symptom instead of the root cause -- isn't the root cause that ctx->mappings should have been updated so it has the new block (storage_block)?

Since my understanding of this project may not be very comprehensive, my previous commits made only minor modifications based on the original implementation approach — the fix works without updating ctx->mappings while maintaining the original design.

However, after hearing your explanation, I re-evaluated the code, and you are right. If we consider a more thorough modification, we could indeed try discarding the original block and updating ctx->mappings with the newly recompiled new_block, while also freeing the original block.

By doing so, ctx->scratch_slab and block->storage_block can be deprecated, making the recompilation logic clearer and more concise. Additionally, the following two cases in the original gum_exec_ctx_recompile_block implementation can be unified into a single approach (no need to use scratch to calculate and reuse block, generates a new block in any case):

  1. When the size of the recompiled scratch_block fits within the space of the original block, the space in scratch_slab is used as a draft to overwrite the original block.
  2. When the original block space is insufficient (the worst-case scenario requiring two compilations), a b jump is written into block->code_start to redirect execution to storage_block->code_start.

@@ -3109,6 +3118,7 @@ gum_stalker_iterator_put_callout (GumStalkerIterator * self,
GDestroyNotify data_destroy)
{
GumExecBlock * block = self->exec_block;
GumExecBlock * active_block = gum_exec_block_get_active_block(block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems like a symptom and not the correct fix. It seems the code generation is happening on the wrong block. Maybe it's just another symptom of failing to update the mappings?

@ZSA233 ZSA233 requested a review from oleavr March 12, 2025 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants