rename short identifiers #21

3052 · 2023-11-05T02:57:04Z

is it possible to correct short variable names? for example with JADX, it has this option:

 --deobf                             - activate deobfuscation

which will turn short variables such as a into a1234 or something, for easier searching

The text was updated successfully, but these errors were encountered:

3052 · 2023-11-05T05:40:41Z

found another tool that works:

synchrony deobfuscate --rename obf.js

https://github.com/relative/synchrony

j4k0xb · 2023-11-05T23:06:00Z

there's currently the --mangle option for renaming but that does the opposite... turning _0x2d22bf into a
do you want every variable/param to have an unique name?

3052 · 2023-11-05T23:46:06Z

if the variable already has a "normal" name such as hello, we can probably just leave it as is. but if the name has been "minified" such as a, then it should be made longer. ideally variables should be different even when names could be reused, such as two variables in different scopes. this would prevent confusion thinking two variables are the same because of the identifier.

0xdevalias · 2023-11-13T00:04:31Z

When I was exploring this concept in my own deobfuscation PoC project, I was exploring to make the variable names unique + have them add sort of semantic information about their source/scope.

Eg. if it was an arg to a function, it might be arg_1. Or potentially if the function is foo, it might end up as foo_arg_1

It looks like most of the PoC code I was playing with was local/in a pretty messy/hacky state, but I did find a link in it to an online REPL I was playing around with some of it in. Not sure how outdated that code is, but it might be useful:

https://replit.com/@0xdevalias/Rewriting-JavaScript-Variables-via-AST-Examples

There were a number of different AST parsers I was playing around with, but I think that this babel code may have been the latest (not sure which one):

Within those files, I believe the functions getNameFromPath, getPrefix (and older commented out functions getTypePrefix, getPrefix

Edit: Came across this in another issue here:

I published my decompiler that I used in the above example. I think it might be a good reference for adding this feature.
https://github.com/e9x/krunker-decompiler

Originally posted by @e9x in #10 (comment)

And looking at it's libRenameVars code seems to be taking a vaguely similar approach to how I was looking at doing things in my original PoC that I described above:

https://github.com/e9x/krunker-decompiler/blob/master/src/libRenameVars.ts
- getVarPrefix will set a prefix based on the type (eg. func, arg, Class, imported, var)
- getName generates a new variable name that does not conflict with existing names or reserved keywords
- generateName generates a new name for a variable considering its scope, type, and the context in which it is used (e.g., whether it's a class, a function variable, etc.).
  It employs various AST manipulations to ensure the generated name is appropriate and does not conflict with existing names.

A more generalised summary/overview (via ChatGPT):

Certainly, the code implements a sophisticated algorithm for renaming variables in a JavaScript program, adhering to several high-level rules and strategies:

Type-Specific Prefixing:

The getVarPrefix function assigns specific prefixes to variable names based on their type (e.g., "func" for function names, "arg" for parameters). This approach helps in identifying the role of a variable just by its name.

Avoiding Reserved Keywords:

The script includes a comprehensive list of reserved JavaScript keywords. If a variable's name matches a reserved keyword, it is prefixed with an underscore to prevent syntax errors.

Unique Naming with Context Consideration:

The generateName function ensures that each variable gets a unique name that doesn't conflict with other variables in its scope. It also considers the context in which a variable is used. For example, if a variable is part of a class, it may receive a name that reflects this context, using pascalCase or camelCase as appropriate.

Handling Special Cases:

The script contains logic to handle special cases, such as variables that are function expressions (isFuncVar) or class instances (isClass). This affects the naming convention applied to these variables.

Randomness with Mersenne Twister:

A Mersenne Twister is used to generate random elements for variable names, ensuring that the names are not only unique within the scope of the program but also less predictable.

AST-Based Renaming:

The script analyzes the Abstract Syntax Tree (AST) of the program to understand the structure and scope of variables. This analysis guides the renaming process, ensuring that the new names are consistent with the variable's usage and position in the code.

Scope Analysis with ESLint Scope:

By leveraging eslint-scope, the script can accurately determine the scope of each variable. This is crucial in avoiding name collisions and ensuring that the renaming respects lexical scoping rules in JavaScript.

Consideration for Exported and Assigned Variables:

The script pays special attention to variables that are exported or assigned in specific ways (e.g., through Object.defineProperty). It ensures that these variables receive names that are appropriate for their roles.

In summary, the script uses a combination of type-based naming conventions, context consideration, randomness, AST analysis, and scope analysis to systematically rename variables in a JavaScript program. This approach aims to enhance readability, avoid conflicts, and maintain the logical structure of the program.

0xdevalias · 2023-11-13T01:26:03Z

And for an even cooler/more extreme version of improving variable naming; I just came across this blog post / project from @jehna that makes use of webcrack + ChatGPT for variable renaming:

https://thejunkland.com/blog/using-llms-to-reverse-javascript-minification.html
- Using LLMs to reverse JavaScript variable name minification
  This blog introduces a novel way to reverse minified Javascript using large language models (LLMs) like ChatGPT and llama2 while keeping the code semantically intact. The code is open source and available at Github project Humanify.
https://github.com/jehna/humanify
- Un-minify Javascript code using ChatGPT
- This tool uses large language modeles (like ChatGPT & llama2) and other tools to un-minify Javascript code. Note that LLMs don't perform any structural changes – they only provide hints to rename variables and functions. The heavy lifting is done by Babel on AST level to ensure code stays 1-1 equivalent.

0xdevalias · 2023-11-13T02:51:25Z

I came across another tool today that seemed to have a start on implementing some 'smart rename' features:

https://github.com/pionxzh/wakaru
- https://wakaru.vercel.app/

Digging through the code lead me to this:

https://github.com/pionxzh/wakaru/tree/main/packages/unminify#smart-rename
- Rename minified identifiers with heuristic rules.
https://github.com/pionxzh/wakaru/blob/main/packages/unminify/src/transformations/smart-rename.ts
- handleDestructuringRename, handleFunctionParamsRename, handlePropertyRename, handleReactRename, getElementName
https://github.com/pionxzh/wakaru/blob/main/packages/unminify/src/utils/identifier.ts#L28-L75
- generateName, getUniqueName
https://github.com/pionxzh/wakaru/blob/master/packages/unminify/src/transformations/__tests__/smart-rename.spec.ts

There's also an issue there that seems to be exploring how to improve 'unmangling variable names' as well:

support un-mangle identifiers pionxzh/wakaru#34

Which I wrote the following extra thoughts on:

I just finished up writing some thoughts/references for variable renaming on the webcrack repo, that could also be a useful idea for here. (see quotes below)

When I was exploring PoC ideas for my own project previously, I was looking to generate a file similar to the 'module map' that this project is using; but instead of just for the names of modules, I wanted to be able to use it to provide a 'variable name map'. Though because the specific variables used in webpack/etc can change between builds, my thought was that first 'normalising' them to a 'known format' based on their context would make sense to do first.

That could then be letter enhanced/expanded by being able to pre-process these 'variable name mappings' for various open source projects in a way that could then be applied 'automagically' without the end user needing to first create them.

It could also be enhanced by similar techniques such as what the humanify project does, by using LLMs/similar to generate suggested variable name mappings based on the code.

My personal ideal end goal for a feature like that would then allow me to use it within an IDE-like environment, where I can rename variables 'as I explore', knowing that the mappings/etc will be kept up to date.

Originally posted by @0xdevalias in pionxzh/wakaru#34 (comment)

0xdevalias · 2023-11-22T07:47:05Z

Another link from my reference notes that I forgot to include earlier; my thoughts on how to rename otherwise unknown variables are based on similar concepts that are used in reverse engineering tools such as IDA:

https://hex-rays.com/blog/igors-tip-of-the-week-34-dummy-names/
- In IDA’s disassembly, you may have often observed names that may look strange and cryptic on first sight: sub_73906D75, loc_40721B, off_40A27C and more. In IDA’s terminology, they’re called dummy names. They are used when a name is required by the assembly syntax but there is nothing suitable available
- https://www.hex-rays.com/products/ida/support/idadoc/609.shtml
  - IDA Help: Names Representation
  - Dummy names are automatically generated by IDA. They are used to denote subroutines, program locations and data. Dummy names have various prefixes depending on the item type and value

And a few more I was looking at recently as well (that is sort of basically smart-rename:

https://binary.ninja/2023/09/15/3.5-expanded-universe.html#automatic-variable-naming

Automatic Variable Naming
One easy way to improve decompilation output is to come up with better default names for variables. There’s a lot of possible defaults you could choose and a number of different strategies are seen throughout different reverse engineering tools. Prior to 3.5, Binary Ninja left variables named based on their origin. Stack variables were var_OFFSET, register-based variables were reg_COUNTER, and global data variables were (data_). While this scheme isn’t changing, we’re being much more intelligent about situations where additional information is available.

For example, if a variable is passed to a function and a variable name is available, we can now make a much better guess for the variable name. This is most obvious in binaries with type libraries.

This isn’t the only style of default names. Binary Ninja also will name loop counters with simpler names like i, or j, k, etc (in the case of nested loops)

Variable name propagation Vector35/binaryninja-api#2558

Originally posted by @0xdevalias in pionxzh/wakaru#34 (comment)

0xdevalias · 2023-12-20T01:47:14Z

Tangentially related to this issue, and in line with how wakaru implements 'smart-rename's (Ref) for certain things; I wonder if a similar concept could apply in webcrack.

Based on how all of the functions containing JSX seem to be named a variation of Component, I suspect there may already be some code doing this. (eg. Ref: 1, 2)

Regardless, the specific case I wanted to suggest here was when a React component sets the Component.displayName, and leveraging that to 'smart-rename' the component identifier itself.

Unminifying this source file (Ref), in 63390.js, there are some React components that set the .displayName

// 63390.js, lines 191-194
var _Component67 = forwardRef(function (e, t) {
  return <div ref={t} className={_Z("relative flex h-full w-full overflow-hidden", e.className)}>{e.children}</div>;
});
_Component67.displayName = "CarouselContainer";

Contrasting this against wakaru's output (Ref):

Details

Source (unpacked)

// module-63390.js, lines 279-283
var er = (0, d.forwardRef)(function (e, t) {
  return (0,
  o.jsx)("div", { ref: t, className: (0, l.Z)("relative flex h-full w-full overflow-hidden", e.className), children: e.children });
});
er.displayName = "CarouselContainer";

Transformed (unminified)

// module-63390.js, lines 309-320
const CarouselContainer = forwardRef((props, ref) => (
  <div
    ref={ref}
    className={Z$0(
      "relative flex h-full w-full overflow-hidden",
      props.className
    )}
  >
    {props.children}
  </div>
));
CarouselContainer.displayName = "CarouselContainer";

0xdevalias · 2024-02-29T03:23:54Z

copilot now has a similar feature: https://code.visualstudio.com/updates/v1_87#_rename-suggestions
worth looking into how they've done it

Originally posted by @j4k0xb in jehna/humanify#8 (comment)

Release detailed here:

https://github.com/microsoft/vscode/releases/tag/1.87.0

microsoft/vscode@1.87.0...1.86.2

Couldn't see any overly relevant commits in that range, but did find the following searching the issue manually:

Rename suggestions UX overhaul microsoft/vscode#206335

Rename suggestions not work ! microsoft/vscode#206498

Which lead me to this label:

https://github.com/microsoft/vscode/issues?q=is%3Aopen+label%3Arename+sort%3Aupdated-desc

And these issues, which sound like there are 'rename providers' used by the feature:

No way to ensure an extension is prioritized as a rename provider in a language microsoft/vscode#115354

Rename providers of several extensions conflict and there is no way to know WHAT it being renamed microsoft/vscode#183075

More docs about rename providers here:

https://code.visualstudio.com/docs/editor/editingevolved#_rename-symbol

https://code.visualstudio.com/api/references/vscode-api#:~:text=registerRenameProvider(

https://code.visualstudio.com/api/references/vscode-api#RenameProvider

RenameProvider
The rename provider interface defines the contract between extensions and the rename-feature.

prepareRename
Optional function for resolving and validating a position before running rename. The result can be a range or a range and a placeholder text. The placeholder text should be the identifier of the symbol which is being renamed - when omitted the text in the returned range is used.

provideRenameEdits
Provide an edit that describes changes that have to be made to one or many resources to rename a symbol to a different name.

https://code.visualstudio.com/api/language-extensions/programmatic-language-features

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_prepareRename

Prepare Rename Request (:leftwards_arrow_with_hook:)
The prepare rename request is sent from the client to the server to setup and test the validity of a rename operation at a given location.

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_rename

Rename Request
The rename request is sent from the client to the server to ask the server to compute a workspace change so that the client can perform a workspace-wide rename of a symbol.

https://vshaxe.github.io/vscode-extern/vscode/RenameProvider.html

https://github.com/vshaxe/vscode-extern/blob/master/src/vscode/RenameProvider.hx

Based on the above, and the release notes explicitly mentioning copilot, I suspect the implementation will be in the Copilot extension itself (which isn't open-source):

https://marketplace.visualstudio.com/items?itemName=GitHub.copilot
Downloading that gives GitHub.copilot-1.168.741.vsix, which seems to just be a .zip file:
⇒ file GitHub.copilot-1.168.741.vsix

GitHub.copilot-1.168.741.vsix: Zip archive data, at least v2.0 to extract, compression method=deflate
Though unzipping that and searching for provideRename didn't seem to turn up anything useful unfortunately.

Originally posted by @0xdevalias in jehna/humanify#8 (comment)

0xdevalias · 2024-03-12T07:19:11Z

Continued context from above, it seems that this is implemented via a VSCode proposed API NewSymbolNamesProvider:

https://github.com/search?q=repo%3Amicrosoft%2Fvscode%20NewSymbolNamesProvider&type=code

It's less about "reverse engineering GitHub copilot" and more about "trying to figure out where the 'rename suggestions' change mentioned in the VSCode release notes was actually implemented; and what mechanism 'integrates' it into VSCode'".

The above is assumptions + an attempt to figure that out; but if you're able to point me to the actual issue/commit on the VSCode side (assuming it was implemented there), or confirm whether it's implemented on the closed source GitHub Copilot extension side of things (if it was implemented there), that would be really helpful.

If it was implemented on the GitHub Copilot extension side of things, then confirming whether the VSCode extension 'rename provider' is the right part of the VSCode extension API to look at to implement a similar feature would be awesome.

Originally posted by @0xdevalias in jehna/humanify#8 (comment)

Thank you for taking interest in this API. The rename suggestions feature is powered by a proposed API defined here. Extensions provide the suggestions, while the vscode shows them in the rename widget.

Originally posted by @ulugbekna in jehna/humanify#8 (comment)

3052 changed the title ~~fix short variable names~~ rename short identifiers Nov 5, 2023

j4k0xb added the enhancement New feature or request label Nov 5, 2023

j4k0xb mentioned this issue Nov 13, 2023

Awesome project - looking for help? #3

Open

0xdevalias mentioned this issue Nov 13, 2023

support un-mangle identifiers pionxzh/wakaru#34

Open

0xdevalias mentioned this issue Dec 12, 2023

Unsafe identifier renames jehna/humanify#8

Open

j4k0xb mentioned this issue Aug 2, 2024

feat: configurable smart rename #100

Merged

j4k0xb closed this as completed in #100 Aug 2, 2024

j4k0xb added the mangle label Aug 16, 2024

0xdevalias mentioned this issue Sep 13, 2024

More deterministic renames across different versions of the same code jehna/humanify#97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rename short identifiers #21

rename short identifiers #21

3052 commented Nov 5, 2023

3052 commented Nov 5, 2023

j4k0xb commented Nov 5, 2023

3052 commented Nov 5, 2023

0xdevalias commented Nov 13, 2023 •

edited

Loading

0xdevalias commented Nov 13, 2023 •

edited

Loading

0xdevalias commented Nov 13, 2023

0xdevalias commented Nov 22, 2023 •

edited

Loading

0xdevalias commented Dec 20, 2023 •

edited

Loading

0xdevalias commented Feb 29, 2024 •

edited

Loading

0xdevalias commented Mar 12, 2024 •

edited

Loading

rename short identifiers #21

rename short identifiers #21

Comments

3052 commented Nov 5, 2023

3052 commented Nov 5, 2023

j4k0xb commented Nov 5, 2023

3052 commented Nov 5, 2023

0xdevalias commented Nov 13, 2023 • edited Loading

0xdevalias commented Nov 13, 2023 • edited Loading

0xdevalias commented Nov 13, 2023

0xdevalias commented Nov 22, 2023 • edited Loading

0xdevalias commented Dec 20, 2023 • edited Loading

0xdevalias commented Feb 29, 2024 • edited Loading

0xdevalias commented Mar 12, 2024 • edited Loading

0xdevalias commented Nov 13, 2023 •

edited

Loading

0xdevalias commented Nov 13, 2023 •

edited

Loading

0xdevalias commented Nov 22, 2023 •

edited

Loading

0xdevalias commented Dec 20, 2023 •

edited

Loading

0xdevalias commented Feb 29, 2024 •

edited

Loading

0xdevalias commented Mar 12, 2024 •

edited

Loading