Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename short identifiers #21

Closed
3052 opened this issue Nov 5, 2023 · 10 comments · Fixed by #100
Closed

rename short identifiers #21

3052 opened this issue Nov 5, 2023 · 10 comments · Fixed by #100
Labels
enhancement New feature or request mangle

Comments

@3052
Copy link

3052 commented Nov 5, 2023

is it possible to correct short variable names? for example with JADX, it has this option:

 --deobf                             - activate deobfuscation

which will turn short variables such as a into a1234 or something, for easier searching

@3052 3052 changed the title fix short variable names rename short identifiers Nov 5, 2023
@3052
Copy link
Author

3052 commented Nov 5, 2023

found another tool that works:

synchrony deobfuscate --rename obf.js

https://github.com/relative/synchrony

@j4k0xb
Copy link
Owner

j4k0xb commented Nov 5, 2023

there's currently the --mangle option for renaming but that does the opposite... turning _0x2d22bf into a
do you want every variable/param to have an unique name?

@j4k0xb j4k0xb added the enhancement New feature or request label Nov 5, 2023
@3052
Copy link
Author

3052 commented Nov 5, 2023

if the variable already has a "normal" name such as hello, we can probably just leave it as is. but if the name has been "minified" such as a, then it should be made longer. ideally variables should be different even when names could be reused, such as two variables in different scopes. this would prevent confusion thinking two variables are the same because of the identifier.

@0xdevalias
Copy link

0xdevalias commented Nov 13, 2023

When I was exploring this concept in my own deobfuscation PoC project, I was exploring to make the variable names unique + have them add sort of semantic information about their source/scope.

Eg. if it was an arg to a function, it might be arg_1. Or potentially if the function is foo, it might end up as foo_arg_1

It looks like most of the PoC code I was playing with was local/in a pretty messy/hacky state, but I did find a link in it to an online REPL I was playing around with some of it in. Not sure how outdated that code is, but it might be useful:

There were a number of different AST parsers I was playing around with, but I think that this babel code may have been the latest (not sure which one):

Within those files, I believe the functions getNameFromPath, getPrefix (and older commented out functions getTypePrefix, getPrefix


Edit: Came across this in another issue here:

I published my decompiler that I used in the above example. I think it might be a good reference for adding this feature.
https://github.com/e9x/krunker-decompiler

Originally posted by @e9x in #10 (comment)

And looking at it's libRenameVars code seems to be taking a vaguely similar approach to how I was looking at doing things in my original PoC that I described above:

  • https://github.com/e9x/krunker-decompiler/blob/master/src/libRenameVars.ts
    • getVarPrefix will set a prefix based on the type (eg. func, arg, Class, imported, var)
    • getName generates a new variable name that does not conflict with existing names or reserved keywords
    • generateName generates a new name for a variable considering its scope, type, and the context in which it is used (e.g., whether it's a class, a function variable, etc.).
      It employs various AST manipulations to ensure the generated name is appropriate and does not conflict with existing names.

A more generalised summary/overview (via ChatGPT):

Certainly, the code implements a sophisticated algorithm for renaming variables in a JavaScript program, adhering to several high-level rules and strategies:

  1. Type-Specific Prefixing:

    • The getVarPrefix function assigns specific prefixes to variable names based on their type (e.g., "func" for function names, "arg" for parameters). This approach helps in identifying the role of a variable just by its name.
  2. Avoiding Reserved Keywords:

    • The script includes a comprehensive list of reserved JavaScript keywords. If a variable's name matches a reserved keyword, it is prefixed with an underscore to prevent syntax errors.
  3. Unique Naming with Context Consideration:

    • The generateName function ensures that each variable gets a unique name that doesn't conflict with other variables in its scope. It also considers the context in which a variable is used. For example, if a variable is part of a class, it may receive a name that reflects this context, using pascalCase or camelCase as appropriate.
  4. Handling Special Cases:

    • The script contains logic to handle special cases, such as variables that are function expressions (isFuncVar) or class instances (isClass). This affects the naming convention applied to these variables.
  5. Randomness with Mersenne Twister:

    • A Mersenne Twister is used to generate random elements for variable names, ensuring that the names are not only unique within the scope of the program but also less predictable.
  6. AST-Based Renaming:

    • The script analyzes the Abstract Syntax Tree (AST) of the program to understand the structure and scope of variables. This analysis guides the renaming process, ensuring that the new names are consistent with the variable's usage and position in the code.
  7. Scope Analysis with ESLint Scope:

    • By leveraging eslint-scope, the script can accurately determine the scope of each variable. This is crucial in avoiding name collisions and ensuring that the renaming respects lexical scoping rules in JavaScript.
  8. Consideration for Exported and Assigned Variables:

    • The script pays special attention to variables that are exported or assigned in specific ways (e.g., through Object.defineProperty). It ensures that these variables receive names that are appropriate for their roles.

In summary, the script uses a combination of type-based naming conventions, context consideration, randomness, AST analysis, and scope analysis to systematically rename variables in a JavaScript program. This approach aims to enhance readability, avoid conflicts, and maintain the logical structure of the program.

@0xdevalias
Copy link

0xdevalias commented Nov 13, 2023

And for an even cooler/more extreme version of improving variable naming; I just came across this blog post / project from @jehna that makes use of webcrack + ChatGPT for variable renaming:

  • https://thejunkland.com/blog/using-llms-to-reverse-javascript-minification.html
    • Using LLMs to reverse JavaScript variable name minification
      This blog introduces a novel way to reverse minified Javascript using large language models (LLMs) like ChatGPT and llama2 while keeping the code semantically intact. The code is open source and available at Github project Humanify.

  • https://github.com/jehna/humanify
    • Un-minify Javascript code using ChatGPT

    • This tool uses large language modeles (like ChatGPT & llama2) and other tools to un-minify Javascript code. Note that LLMs don't perform any structural changes – they only provide hints to rename variables and functions. The heavy lifting is done by Babel on AST level to ensure code stays 1-1 equivalent.

@0xdevalias
Copy link

I came across another tool today that seemed to have a start on implementing some 'smart rename' features:

Digging through the code lead me to this:

There's also an issue there that seems to be exploring how to improve 'unmangling variable names' as well:

Which I wrote the following extra thoughts on:

I just finished up writing some thoughts/references for variable renaming on the webcrack repo, that could also be a useful idea for here. (see quotes below)

When I was exploring PoC ideas for my own project previously, I was looking to generate a file similar to the 'module map' that this project is using; but instead of just for the names of modules, I wanted to be able to use it to provide a 'variable name map'. Though because the specific variables used in webpack/etc can change between builds, my thought was that first 'normalising' them to a 'known format' based on their context would make sense to do first.

That could then be letter enhanced/expanded by being able to pre-process these 'variable name mappings' for various open source projects in a way that could then be applied 'automagically' without the end user needing to first create them.

It could also be enhanced by similar techniques such as what the humanify project does, by using LLMs/similar to generate suggested variable name mappings based on the code.

My personal ideal end goal for a feature like that would then allow me to use it within an IDE-like environment, where I can rename variables 'as I explore', knowing that the mappings/etc will be kept up to date.

Originally posted by @0xdevalias in pionxzh/wakaru#34 (comment)

@0xdevalias
Copy link

0xdevalias commented Nov 22, 2023

Another link from my reference notes that I forgot to include earlier; my thoughts on how to rename otherwise unknown variables are based on similar concepts that are used in reverse engineering tools such as IDA:

  • https://hex-rays.com/blog/igors-tip-of-the-week-34-dummy-names/
    • In IDA’s disassembly, you may have often observed names that may look strange and cryptic on first sight: sub_73906D75, loc_40721B, off_40A27C and more. In IDA’s terminology, they’re called dummy names. They are used when a name is required by the assembly syntax but there is nothing suitable available

    • https://www.hex-rays.com/products/ida/support/idadoc/609.shtml
      • IDA Help: Names Representation

      • Dummy names are automatically generated by IDA. They are used to denote subroutines, program locations and data. Dummy names have various prefixes depending on the item type and value


And a few more I was looking at recently as well (that is sort of basically smart-rename:

  • https://binary.ninja/2023/09/15/3.5-expanded-universe.html#automatic-variable-naming
    • Automatic Variable Naming
      One easy way to improve decompilation output is to come up with better default names for variables. There’s a lot of possible defaults you could choose and a number of different strategies are seen throughout different reverse engineering tools. Prior to 3.5, Binary Ninja left variables named based on their origin. Stack variables were var_OFFSET, register-based variables were reg_COUNTER, and global data variables were (data_). While this scheme isn’t changing, we’re being much more intelligent about situations where additional information is available.

      For example, if a variable is passed to a function and a variable name is available, we can now make a much better guess for the variable name. This is most obvious in binaries with type libraries.

    • This isn’t the only style of default names. Binary Ninja also will name loop counters with simpler names like i, or j, k, etc (in the case of nested loops)

  • Variable name propagation Vector35/binaryninja-api#2558

Originally posted by @0xdevalias in pionxzh/wakaru#34 (comment)

@0xdevalias
Copy link

0xdevalias commented Dec 20, 2023

Tangentially related to this issue, and in line with how wakaru implements 'smart-rename's (Ref) for certain things; I wonder if a similar concept could apply in webcrack.

Based on how all of the functions containing JSX seem to be named a variation of Component, I suspect there may already be some code doing this. (eg. Ref: 1, 2)


Regardless, the specific case I wanted to suggest here was when a React component sets the Component.displayName, and leveraging that to 'smart-rename' the component identifier itself.

Unminifying this source file (Ref), in 63390.js, there are some React components that set the .displayName

// 63390.js, lines 191-194
var _Component67 = forwardRef(function (e, t) {
  return <div ref={t} className={_Z("relative flex h-full w-full overflow-hidden", e.className)}>{e.children}</div>;
});
_Component67.displayName = "CarouselContainer";

Contrasting this against wakaru's output (Ref):

Details

Source (unpacked)

// module-63390.js, lines 279-283
var er = (0, d.forwardRef)(function (e, t) {
  return (0,
  o.jsx)("div", { ref: t, className: (0, l.Z)("relative flex h-full w-full overflow-hidden", e.className), children: e.children });
});
er.displayName = "CarouselContainer";

Transformed (unminified)

// module-63390.js, lines 309-320
const CarouselContainer = forwardRef((props, ref) => (
  <div
    ref={ref}
    className={Z$0(
      "relative flex h-full w-full overflow-hidden",
      props.className
    )}
  >
    {props.children}
  </div>
));
CarouselContainer.displayName = "CarouselContainer";

@0xdevalias
Copy link

0xdevalias commented Feb 29, 2024

copilot now has a similar feature: https://code.visualstudio.com/updates/v1_87#_rename-suggestions
worth looking into how they've done it

Originally posted by @j4k0xb in jehna/humanify#8 (comment)


Release detailed here:

Couldn't see any overly relevant commits in that range, but did find the following searching the issue manually:

Which lead me to this label:

And these issues, which sound like there are 'rename providers' used by the feature:

More docs about rename providers here:

Based on the above, and the release notes explicitly mentioning copilot, I suspect the implementation will be in the Copilot extension itself (which isn't open-source):

⇒ file GitHub.copilot-1.168.741.vsix

GitHub.copilot-1.168.741.vsix: Zip archive data, at least v2.0 to extract, compression method=deflate

Though unzipping that and searching for provideRename didn't seem to turn up anything useful unfortunately.

Originally posted by @0xdevalias in jehna/humanify#8 (comment)

@0xdevalias
Copy link

0xdevalias commented Mar 12, 2024

Continued context from above, it seems that this is implemented via a VSCode proposed API NewSymbolNamesProvider:


It's less about "reverse engineering GitHub copilot" and more about "trying to figure out where the 'rename suggestions' change mentioned in the VSCode release notes was actually implemented; and what mechanism 'integrates' it into VSCode'".

The above is assumptions + an attempt to figure that out; but if you're able to point me to the actual issue/commit on the VSCode side (assuming it was implemented there), or confirm whether it's implemented on the closed source GitHub Copilot extension side of things (if it was implemented there), that would be really helpful.

If it was implemented on the GitHub Copilot extension side of things, then confirming whether the VSCode extension 'rename provider' is the right part of the VSCode extension API to look at to implement a similar feature would be awesome.

Originally posted by @0xdevalias in jehna/humanify#8 (comment)


Thank you for taking interest in this API. The rename suggestions feature is powered by a proposed API defined here. Extensions provide the suggestions, while the vscode shows them in the rename widget.

Originally posted by @ulugbekna in jehna/humanify#8 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mangle
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants