-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to read memory in ARM vmcore captured by dump-capture kernel #461
Comments
Addendum: skipping that LOAD helped with the call stack of the crashing thread, but I still see the same issue with others. E.g.:
while
|
Hi @alecrivers, I've definitely encountered a similar situation in #217. In that case, it was a bug in It'll be interesting to see if we can reproduce this in an arm32 VM. If so, that'll make it much easier to get to the root cause. As a data point, what kernel version are you using here? Finally, regarding libkdumpfile -- nice catch on But to double-check: when you say you tested it with libkdumpfile, does that mean you set |
Alec Rivers noticed in #461 that WITH_LIBKDUMPFILE is misspelled as WITH_KDUMPFILE here. The whole ifdef block isn't actually needed, so remove it. Fixes: 4e330bb ("cli: indicate if drgn was compiled with libkdumpfile") Signed-off-by: Omar Sandoval <[email protected]>
This does sound like a bug in ARM's Like @brenns10, I'd be curious to hear the results of testing with P.S. I just removed the incorrect |
Sorry, I confused that with your kcore performance improvements :) |
Thanks for the replies both. Things are a bit on fire but I will report back on the DRGN_USE_LIBKDUMPFILE_FOR_ELF results when I get the time. |
Hello,
First off, great project, thanks for it!
I've been debugging a nasty kernel oops, capturing vmcore files using a kexec'ed dump-capture kernel on the affected device. (I don't bother using makedumpfile to compress the cores.) I found that trying to get stack traces, e.g.
prog.crashed_thread().stack_trace()
, typically only showed the first stack frame, and then an empty frame at a meaningless address. Thecrash
utility meanwhile was able to get a full stack trace, but I wanted drgn's ability to report local variables and structures.Doing a bunch of debugging, I found that drgn's unwinder was doing the right thing in terms of looking in the right place for the next frame's FP. However, when it went to read that memory, it was getting the wrong value. I could check this by doing
prog.read(<virtual memory address of the next FP>)
, which gave a different answer than askingcrash
to read the same address. Digging further, I found that the physical memory address translation was wrong. But I was surprised to find that doingprog.read(follow_phys(prog["init_mm"].address_of_(), <address>), 4, True)
gave the correct answer.Looking deeper, I found that
follow_phys()
andcrash
were both referring to the page table to get their lookup data, whereasprog.read()
was using the PT_LOAD data from the core dump.readelf
gave:The virtual memory address in question was inside the last range.
I found that if I ignored that last memory section (simply by skipping it in
drgn_program_set_core_dump_fd_internal()
), everything started working. Huzzah!For now, because I'm on a tight deadline, I don't have time to investigate why this last section may be incorrect. But I do note that that last segment is 256MB in size, which is exactly the same size as the amount of memory reserved by
crashkernel
for the dump-capture kernel. (I could, but haven't, tried changing that size and seeing if it changes too, and I know next to nothing about core dumps so I can't say off the bat if it obviously is or isn't related.)Incidentally, while trying various things before this workaround, I also tried using libkdumpfile to read the core, but ran into the issue that
drgn/libdrgn/python/main.c
Line 5 in 970b9a0
WITH_KDUMPFILE
while I think that the correct line would beWITH_LIBKDUMPFILE
. However, changing that and using libkdumpfile didn't resolve my problem.Thanks again for a great project.
The text was updated successfully, but these errors were encountered: