Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm-objdump gives wrong line numbers #23717

Open
stevenwdv opened this issue Feb 21, 2025 · 4 comments
Open

llvm-objdump gives wrong line numbers #23717

stevenwdv opened this issue Feb 21, 2025 · 4 comments

Comments

@stevenwdv
Copy link

llvm-objdump gives wrong line info for a simple WebAssembly file. I'm not 100% sure whether this is a bug in the compiler/linker or in llvm-objdump, but I tend to think the latter as Chrome DevTools with DWARF extension does give the right line numbers. Also idk if this is a bug in Emscripten or LLVM, but if it's the latter, I can report it over there if necessary.

Steps to reproduce

  • Create a simple main.cpp:
int main() { return 42; }
  • Now compile with debug symbols:
em++ -g main.cpp
Verbose output
 "/home/swdv/emsdk/upstream/bin/clang++" -target wasm64-unknown-emscripten -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/home/swdv/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -g3 -DNO_USE_MYFUN -v -c main.cpp -o /tmp/emscripten_temp_pe2lfvyf/main_0.o
clang version 21.0.0git (https:/github.com/llvm/llvm-project 6dc41a639334b913e762f65410fcd14a722b137f)
Target: wasm64-unknown-emscripten
Thread model: posix
InstalledDir: /home/swdv/emsdk/upstream/bin
 (in-process)
 "/home/swdv/emsdk/upstream/bin/clang-21" -cc1 -triple wasm64-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name main.cpp -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -fvisibility=hidden -debug-info-kind=constructor -dwarf-version=4 -debugger-tuning=gdb -fdebug-compilation-dir=/home/swdv/Downloads/plainwasmtest -v -fcoverage-compilation-dir=/home/swdv/Downloads/plainwasmtest -resource-dir /home/swdv/emsdk/upstream/lib/clang/21 -D EMSCRIPTEN -D NO_USE_MYFUN -isysroot /home/swdv/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten/c++/v1 -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1 -internal-isystem /home/swdv/emsdk/upstream/lib/clang/21/include -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include -fdeprecated-macro -ferror-limit 19 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fignore-exceptions -fexceptions -fcolor-diagnostics -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /tmp/emscripten_temp_pe2lfvyf/main_0.o -x c++ main.cpp
clang -cc1 version 21.0.0git based upon LLVM 21.0.0git default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten/c++/v1"
ignoring nonexistent directory "/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten"
#include "..." search starts here:
#include <...> search starts here:
 /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl
 /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/compat
 /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1
 /home/swdv/emsdk/upstream/lib/clang/21/include
 /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.
 /home/swdv/emsdk/upstream/bin/clang --version
 /home/swdv/emsdk/upstream/bin/wasm-ld -o hello.wasm /tmp/emscripten_temp_pe2lfvyf/main_0.o -L/home/swdv/emsdk/upstream/emscripten/cache/sysroot/lib/wasm64-emscripten -L/home/swdv/emsdk/upstream/emscripten/src/lib -lGL-getprocaddr -lal -lhtml5 -lstubs-debug -lnoexit -lc-debug -ldlmalloc-debug -lcompiler_rt -lc++-noexcept -lc++abi-debug-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -mwasm64 /tmp/tmp5u5b29eklibemscripten_js_symbols.so --export=emscripten_stack_get_end --export=emscripten_stack_get_free --export=emscripten_stack_get_base --export=emscripten_stack_get_current --export=emscripten_stack_init --export=_emscripten_stack_alloc --export=__wasm_call_ctors --export=_emscripten_stack_restore --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=main --export-if-defined=__main_argc_argv --export-if-defined=fflush --export-table -z stack-size=65536 --no-growable-memory --initial-heap=16777216 --no-entry --stack-first --table-base=1
 /home/swdv/emsdk/upstream/bin/llvm-objcopy hello.wasm hello.wasm --remove-section=producers
 /home/swdv/emsdk/node/20.18.0_64bit/bin/node /home/swdv/emsdk/upstream/emscripten/src/compiler.mjs /tmp/tmp3fupbzr6.json
 /home/swdv/emsdk/node/20.18.0_64bit/bin/node /home/swdv/emsdk/upstream/emscripten/tools/preprocessor.mjs /tmp/emscripten_temp_pe2lfvyf/settings.js shell.html
  • Now disassemble the main function:
~/emsdk/upstream/bin/llvm-objdump --disassemble-symbols=__original_main --line-numbers a.out.wasm
  • Observe how the line numbers and file are completely incorrect, mentioning fflush.c instead of our main.cpp:
a.out.wasm:	file format wasm

Disassembly of section CODE:

0000017c <__original_main>:
        .local i32, i32, i32, i32, i32, i32, i32
; __original_main():
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:17
     180: 23 80 80 80 80 00    	global.get	0
     186: 21 00        	local.set	0
     188: 41 10        	i32.const	16
     18a: 21 01        	local.set	1
     18c: 20 00        	local.get	0
     18e: 20 01        	local.get	1
     190: 6b           	i32.sub 
     191: 21 02        	local.set	2
     193: 41 00        	i32.const	0
     195: 21 03        	local.set	3
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:18
     197: 20 02        	local.get	2
     199: 20 03        	local.get	3
     19b: 36 02 0c     	i32.store	12
     19e: 41 8d 21     	i32.const	4237
     1a1: 21 04        	local.set	4
     1a3: 41 15        	i32.const	21
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:15
     1a5: 21 05        	local.set	5
     1a7: 20 04        	local.get	4
     1a9: 20 05        	local.get	5
     1ab: 36 02 00     	i32.store	0
     1ae: 41 2a        	i32.const	42
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:20
     1b0: 21 06        	local.set	6
     1b2: 20 06        	local.get	6
     1b4: 0f           	return
     1b5: 0b           	end

Version of emscripten/emsdk

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 4.0.3 (a9651ff57165f5710bb09a5fe52590fd6ddb72df)
clang version 21.0.0git (https:/github.com/llvm/llvm-project 6dc41a639334b913e762f65410fcd14a722b137f)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/swdv/emsdk/upstream/bin
@kripken
Copy link
Member

kripken commented Feb 21, 2025

This does look wrong. The object file looks fine, but the output of wasm-ld switches from the original filename to fflush.c for some reason. The line numbers change as well, not just the filename - very strange.

The problem may be in llvm-objdump, as llvm-dwarfdump gives proper output. There are multiple debug_line sections, and objdump seems to fixate on the very last, which has fflush.c (I suppose after wasm-ld we have several files together, which is why the problem starts after link). Possibly it is not computing the offsets right, or assuming a single section.

This is an LLVM issue, so it should be filed in the LLVM repo. But @dschuff is probably the right person to ask first.

@dschuff
Copy link
Member

dschuff commented Feb 28, 2025

So this problem has to do with the way LLVM handles symbols for linked wasm files and debug info. Specifically, symbol addresses in DWARF are always encoded as offsets in the code section, whereas for linked files, LLVM uses the offset in the file as the address for a function (this is to match how engines print code addresses in backtraces).
See some changes (and llvm/llvm-project#76198) I made to implement this about a year ago in LLVM.
So if you use e.g. llvm-objdump to print symbol addresses, they will match what browser backtraces show, but not match what you see if you use llvm-dwarfdump to look at the debug info, and llvm-symbolizer will not get the right answer. I think the same mechanism in LLVM that causes the latter problem is what is happening when llvm-objdump is looking up line information from the debug info during disassembly (despite the fact that it's correctly finding the right code address when you ask it to disassemble a symbol by name).

So this is an unfortunate mismatch and not everything works right, as you have seen. Emscripten has a tool emsymbolizer that knows a bunch of ways emscripten can store name/address information (e.g. DWARF, source maps, name sections) and can symbolize addresses. It papers over this problem using the --adjust-vma flag of llvm-symbolizer, but it currently only supports the use case of looking up a name or line from an address one at a time.

We might be able to improve this situation. Adjusting how symbols are represented in LLVM is tricky, since they are used in various places in assembly, linking, etc. Ideally we also wouldn't need a bunch of special hacks in the tools such as llvm-objdump (although I wouldn't necessarily be above some kind of special case if it wasn't too horrible). We might also be able to do something on the emscripten side beyond emsymbolizer. We had previously imagined a tool that could take e.g. a stack trace, symbolize it by whatever means and re-print it. Maybe that idea could be generalized.

@stevenwdv
Copy link
Author

stevenwdv commented Mar 1, 2025

@dschuff I see. Sounds to me like getSymbolAddress does two different things: giving an address interpreted by linker & disassembler, and for a specific presentation to the user; but I don't know the details of what it's called for.

We might also be able to do something on the emscripten side beyond emsymbolizer. We had previously imagined a tool that could take e.g. a stack trace, symbolize it by whatever means and re-print it. Maybe that idea could be generalized.

Maybe, but I think llvm-objdump also shouldn't give wrong output. Should I file the issue in llvm/llvm-project instead?

@dschuff
Copy link
Member

dschuff commented Mar 1, 2025

Maybe, but I think llvm-objdump also shouldn't give wrong output. Should I file the issue in llvm/llvm-project instead?

Yeah agreed that this is a bug and that llvm-objdump and llvm-symbolizer should give the correct info without having to adjust the addresses; I just don't know how hard that will be yet, and wanted to mention alternatives.

And yes, llvm is probably the proper place for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants