|
| 1 | +Ghidra Decompiler - CLI guide |
| 2 | +############################# |
| 3 | + |
| 4 | +`Ghidra <https://ghidra-sre.org/>`_ has a decompiler that unlike the rest of the |
| 5 | +program (written in java) is written in C++. This caught my attention so I |
| 6 | +started to hack on it. Unfortunately, there isn't much written on the decompiler |
| 7 | +if one wants to use it standalone, in the terminal without the ghidra GUI. This |
| 8 | +article tries to fill that void. |
| 9 | + |
| 10 | +Building The Decompiler |
| 11 | +*********************** |
| 12 | + |
| 13 | +Fetch and unzip the ghidra package from `their github release page |
| 14 | +<https://github.com/NationalSecurityAgency/ghidra/releases>`_ |
| 15 | + |
| 16 | +.. code:: |
| 17 | +
|
| 18 | + $ unzip ghidra_11.1.2_PUBLIC_20240709.zip |
| 19 | +
|
| 20 | +`cd` into the decompiler directory and build it |
| 21 | + |
| 22 | +.. code:: |
| 23 | +
|
| 24 | + $ cd ghidra_11.1.2_PUBLIC/Ghidra/Features/Decompiler/src/decompile/cpp |
| 25 | + $ make decomp_opt -j $(nproc --all) |
| 26 | +
|
| 27 | +You should end up with a executable called `decomp_opt`. |
| 28 | + |
| 29 | +Running the Decompiler |
| 30 | +********************** |
| 31 | + |
| 32 | +While inside the directory, export the SLEIGHHOME env variable so our decompiler |
| 33 | +can find it, then run the executable. |
| 34 | + |
| 35 | +.. code:: |
| 36 | +
|
| 37 | + $ export SLEIGHHOME=/home/shreeyash/ghidra_11.1.2_PUBLIC |
| 38 | + $ ./decomp_opt |
| 39 | + [decomp]> |
| 40 | +
|
| 41 | +The compiler is running now waiting for commands. |
| 42 | + |
| 43 | +.. note:: |
| 44 | + |
| 45 | + Remember to always export the environment variable before running decomp_opt. |
| 46 | + You could consider tossing the two commands into a script, making life easier |
| 47 | + for you. |
| 48 | + |
| 49 | +Decompile and view an ELF executable |
| 50 | +************************************ |
| 51 | + |
| 52 | +Let's start with a trivial c++ program with some control flow, compile it into an |
| 53 | +executable (ELF) and decompile it. |
| 54 | + |
| 55 | +Here's the program, save and compile it: |
| 56 | + |
| 57 | +.. code:: |
| 58 | +
|
| 59 | + $ cat a.cpp |
| 60 | + #include <iostream> |
| 61 | + #define THRESHOLD 20 |
| 62 | + int foo() { |
| 63 | + return 10; |
| 64 | + } |
| 65 | + int main() { |
| 66 | + int b = foo(); |
| 67 | + std::cout << "The threshold is " << THRESHOLD << '\n'; |
| 68 | + std::cout << "You returned " << b << '\n'; |
| 69 | + if (b < THRESHOLD) { |
| 70 | + std::cout << "get in\n"; |
| 71 | + } else { |
| 72 | + std::cout << "get out!\n"; |
| 73 | + } |
| 74 | + } |
| 75 | + $ g++ -no-pie a.cpp -o a |
| 76 | + $ ./a |
| 77 | + The threshold is 20 |
| 78 | + You returned 10 |
| 79 | + get in |
| 80 | +
|
| 81 | +The executable is ready, what's left now is decompilation. |
| 82 | + |
| 83 | +Let's start the decompiler, and load our file: |
| 84 | + |
| 85 | +.. code:: |
| 86 | +
|
| 87 | + $ ./decomp_opt |
| 88 | + [decomp]> load file a |
| 89 | + a successfully loaded: Intel/AMD 64-bit x86 |
| 90 | +
|
| 91 | +
|
| 92 | +We've loaded our executable in the decompiler. c++ is an abstract language with |
| 93 | +constructs that do not make any sense to a CPU. These include, but are not |
| 94 | +limited to: functions, structs, loops etc. In order to implement these, the |
| 95 | +compiler has to translate abstractions into concrete implementation which |
| 96 | +manifests itself in the form of control flow instructions like branch, compare, |
| 97 | +and jump. If we peep into an executable, we'll notice what we called functions |
| 98 | +are now 'addresses' i.e. a number that represents a location in memory. |
| 99 | +Functions are run by jumping (i.e. setting the program counter) to an address. |
| 100 | +Essentially, if we wish to decompile a function we had in source, we'll have to |
| 101 | +find the corresponding address at which it resides. `a.cpp` has two functions: |
| 102 | +`main` and `foo`. To find the address where a functions resides in the |
| 103 | +executable, we could use `objdump`. |
| 104 | + |
| 105 | +.. code:: |
| 106 | +
|
| 107 | + $ objdump -C -D a |
| 108 | + ... |
| 109 | + 00000000004011c5 <main>: |
| 110 | + 4011c5: f3 0f 1e fa endbr64 |
| 111 | + 4011c9: 55 push %rbp |
| 112 | + 4011ca: 48 89 e5 mov %rsp,%rbp |
| 113 | + 4011cd: 48 83 ec 10 sub $0x10,%rsp |
| 114 | + 4011d1: e8 e0 ff ff ff call 4011b6 <_Z5todayv> |
| 115 | + 4011d6: 89 45 fc mov %eax,-0x4(%rbp) |
| 116 | + 4011d9: 48 8d 05 24 0e 00 00 lea 0xe24(%rip),%rax # 402004 <_IO_stdin_used+0x4> |
| 117 | + 4011e0: 48 89 c6 mov %rax,%rsi |
| 118 | + 4011e3: 48 8d 05 96 2e 00 00 lea 0x2e96(%rip),%rax # 404080 <_ZSt4cout@GLIBCXX_3.4> |
| 119 | + 4011ea: 48 89 c7 mov %rax,%rdi |
| 120 | + 4011ed: e8 9e fe ff ff call 401090 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt> |
| 121 | + 4011f2: 48 89 c2 mov %rax,%rdx |
| 122 | + 4011f5: 8b 45 fc mov -0x4(%rbp),%eax |
| 123 | + ... |
| 124 | +
|
| 125 | +Searching for 'main' reveals its label which resides at address `0x4011c5`. |
| 126 | + |
| 127 | +.. code:: |
| 128 | +
|
| 129 | + [decomp]> load addr 0x4011c5 main |
| 130 | + Function main: 0x004011c5 |
| 131 | +
|
| 132 | +`load addr` takes an address and an optional 'label'. Label is essentially a |
| 133 | +name that we assign to that address. In this case, it was 'main'—could've been |
| 134 | +anything for what its worth. |
| 135 | + |
| 136 | +.. code:: |
| 137 | +
|
| 138 | + [decomp]> decompile |
| 139 | + Decompiling main |
| 140 | + Decompilation complete |
| 141 | + [decomp]> print C |
| 142 | + |
| 143 | + xunknown8 main(void) |
| 144 | +
|
| 145 | + { |
| 146 | + int4 iVar1; |
| 147 | + xunknown8 xVar2; |
| 148 | + |
| 149 | + iVar1 = func_0x004011b6(); |
| 150 | + xVar2 = func_0x00401090(0x404080,0x402004); |
| 151 | + xVar2 = func_0x004010c0(xVar2,0x14); |
| 152 | + func_0x004010a0(xVar2,10); |
| 153 | + xVar2 = func_0x00401090(0x404080,0x402016); |
| 154 | + xVar2 = func_0x004010c0(xVar2,iVar1); |
| 155 | + func_0x004010a0(xVar2,10); |
| 156 | + if (iVar1 < 0x14) { |
| 157 | + func_0x00401090(0x404080,0x402024); |
| 158 | + } |
| 159 | + else { |
| 160 | + func_0x00401090(0x404080,0x40202c); |
| 161 | + } |
| 162 | + return 0; |
| 163 | + } |
| 164 | + [decomp]> |
| 165 | +
|
| 166 | +Just like that, we've decompiled our program. Notice how the names are garbled. |
| 167 | +This is because names (of variables and functions) are really neccessary to |
| 168 | +execute a program. |
| 169 | + |
| 170 | +Let's analyze the decompiled output. The latter part of all function names are |
| 171 | +their address. This means, we can look them up in the `objdump`. Moreover, |
| 172 | +if the set of commands that got us `main` s decompilation we to be repeated |
| 173 | +for all the functions present in in the output, the resulting decompilation |
| 174 | +of main would replace all address with the labels we assign to them. Looking |
| 175 | +up in `objdump`, we find `func_0x004011b6` to be foo: |
| 176 | + |
| 177 | +.. code:: |
| 178 | +
|
| 179 | + ... |
| 180 | + 00000000004011b6 <foo()>: |
| 181 | + 4011b6: f3 0f 1e fa endbr64 |
| 182 | + 4011ba: 55 push %rbp |
| 183 | + 4011bb: 48 89 e5 mov %rsp,%rbp |
| 184 | + 4011be: b8 0a 00 00 00 mov $0xa,%eax |
| 185 | + ... |
| 186 | +
|
| 187 | +`func_0x00401090` is not present in the executable, however, the calls to this |
| 188 | +function are shown in the objdump thusly: |
| 189 | + |
| 190 | +.. code:: |
| 191 | +
|
| 192 | + 4011ed: e8 9e fe ff ff call 401090 <std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)@plt> |
| 193 | +
|
| 194 | +Its quite obvious from the hint that `func_0x00401090` is the operator `<<` |
| 195 | +overloaded to accept a `std::basic_ostream` object and a `const char *`. The |
| 196 | +`@plt` at the end indicates that this function can be found in the `.plt` |
| 197 | +section of the executable. `.plt` which stands for Procedure Linkage Table |
| 198 | +is a redirection table of external functions that can be found in shared |
| 199 | +objects. So, `func_0x00401090` is `operator<<` found in `libstdc++.so` that |
| 200 | +the program is linked to. It takes two arguments: both addresses to |
| 201 | +objects. A search reveals that the first argumnet is the object `std::cout` |
| 202 | +of which the definition resides in an external library (`libstdc++.so`) and |
| 203 | +the other argument is a char literal that can be found in the `.rodata` |
| 204 | +section of the executable. |
| 205 | + |
| 206 | +.. code:: |
| 207 | +
|
| 208 | + $ objdup -s -j .rodata a |
| 209 | + Contents of section .rodata: |
| 210 | + 402000 01000200 54686520 74687265 73686f6c ....The threshol |
| 211 | + 402010 64206973 2000596f 75207265 7475726e d is .You return |
| 212 | + 402020 65642000 67657420 696e0a00 67657420 ed .get in..get |
| 213 | + 402030 6f757421 0a00 out!.. |
| 214 | +
|
| 215 | +Indeed, the string `"The threshold is "` is present at address `0x0402004`. |
| 216 | + |
| 217 | +Likewise, all following functions till `func_0x004010a0` are overloads of |
| 218 | +`operator<<` that handle different types of data. What remains is the control |
| 219 | +flow. It checks if `iVar1` which is `b` in the original source is less than |
| 220 | +`0x14` (`THRESHOLD`) and calls the familiar `func_0x00401090` i.e. |
| 221 | +(`operator<<`). |
| 222 | + |
| 223 | +Conclusion |
| 224 | +********** |
| 225 | + |
| 226 | +Our work was made much easier by the fact that the executable was not |
| 227 | +'stripped'. Stripping is a process that gets rid of all the symbols that are |
| 228 | +not absolutely neccessary for execution (greatly reduces executable size). In |
| 229 | +the real world, especially if we are dealing with propreitary software, |
| 230 | +executables might be stripped. Unstripped executables allows us to tread |
| 231 | +faster by simply searching for symbols like we did to find main. Stripped |
| 232 | +executables require us to trace, find and deduce what we need. In a later |
| 233 | +article, I may demo decompilation of stripped executables. |
0 commit comments