-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System interrupts consume 100% CPU #6
Comments
I ran it on bare metal Windows 10 22H2 64 bit. |
Hi,
BugChecker is not so well tested on bare metal, however I guess that this
problem is linked to this other issue:
#2 , since you said that
the problem happens before loading the main sys driver.
Moreover, since you said that you found a reference to the
HalpInterruptSendIpi function, my guess is that the OS sends a "state
change" to the kernel debugger, and the debugger doesn't respond
accordingly (because the BugChecker main sys driver is not loaded yet).
So, to sum up, the problem is in the KDCOM project: if you look at the
KdSendPacket function there, you'll see that it is almost empty (it simply
forwards the event to the main BugChecker driver, if it is loaded and
attached). So a possible approach to get a better understanding of what's
going on in your test case is to log the arguments passed to KdSendPacket
in the KDCOM dll (I bet it's a StateChange64 event).
…--Vito
|
Oh, that's sad. Local debugging is a very important advantage, it would be nice to have it.
If this is correct, loading the driver ASAP should solve the problem, right?
From your intuition/approximation: how hard is to implement full-fledged KdSendPacket function? Or at least something that is more or less stable on bare metal? |
Yes, if you can, try to load the driver ASAP after system restart.
In order to get the framebuffer details, since the auto detect feature
fails, you can get the address from the Device Manager; set width and
height to your screen dimensions and set stride to 0. Don't forget to
disable the display drivers before doing this test (more details in the
main README.md).
For the KDCOM KdSendPacket fix, I'll have some free time next month.
|
Hi Vito, I managed to start the driver before the storm but it didn't help, interrupts have started consuming CPU time ~1-2 minute after system start. I made one more dump and will examine it but it feels that there is a more fundamental issue here. -- |
Hi Michael, thank you for your feedback.
A question: after starting the driver, did you manage to enter in the
BugChecker UI successfully (and to resume system execution afterwards)?
I'm trying to understand if this interrupt problem is triggered by some
external event that happens exactly 2-3 mins after boot, regardless of your
interaction with BugChecker.
|
No, I didn't even try to break in. Frame buffer autodetect doesn't work though I see that NativeUtil detects memory resources correctly (at least they match Device manager's info). And I am not sure what is the resolution when the driver is disabled. So, no break in attempt so far. Looks more like an external event that triggers the slowdown. |
Can you try to break into BC (PrintScr key), then to exit from the UI (F5
key) and then to wait until the interrupt problem happens?
In order to configure the framebuffer address, you should try all the
addresses returned by NativeUtil.
Thanks, --Vito
… Message ID: ***@***.***>
|
Hi Vito, Got a lot of progress with BugChecker. What I did:
So, no more interrupt storm! During the test I waited a bit and stopped and started the OS a few times, everything worked fine. It looks that stop-continue workaround prevents the storm. After that I tried framebuffer autodetection and it worked with no error messages though detected parameters look incorrect. BugChecker detected the resolution of 3840*2400 with 15360 stride. This is my working resolution but it's definitely not what I see in VGA mode. Note, that I run BugChecker on a laptop with two graphic cards: built-in Intel and NVidia. For some reasons NativeUtil only detected Intel card before today, so only I tried 640*480 with 2 memory buffers belonging to Intel card. The UI did not appear. I believe (though not sure) that the laptop uses Intel to output to the built-in display. -- |
Perfect! Thank you Michael!
This is exactly the same problem that occurs in the issue
#2 , i.e. an INT3 is
triggered somewhere after 2/3 mins after boot. INT3s require special
handling by the kernel debugger, or, once you hit them, the system gets
stuck in an infinite StateChange64 loop (which manifests itself as our IPI
interrupt storm). I'll fix the bug as soon as I have some free time.
Furthermore the "stop-continue" workaround is not necessary: just start the
BugChecker driver with "some" framebuffer info, correct or incorrect
doesn't matter.
For the framebuffer, I seem to recall that 640*480 is too low to render the
BC UI... can you try with 800*600? Even if it is not the correct
resolution, we will have confirmation that the problem is that 640*480 is
too low. If the framebuffer address is correct, in the worst case garbage
will appear on the screen...
thanks, --Vito
|
Awesome! I played with the framebuffer trying different resolutions and strides. Noticed that:
Anyway, looks like Intel is the right lead but I need to find the right resolution and, the most important, the right stride. Do you know how can I do it? |
Looks one of the problems is that EnumDisplaySettingsA returns incorrect results when video drivers are unloaded. On my bare metal Windows it does not return the actual resolution but the one that I set (4K), even when the drivers are disabled. This is also what I see when opening display settings. |
So you are unable to determine your screen resolution when the display
drivers are disabled, even with Windows' display properties, right?
PS: for the stride, try with 0
|
I tried stride 0 too, no UI, just some garbage.
Yes. I tried calling EnumDisplaySettingsW with iModeNum 0, 1, ..., instead of ENUM_CURRENT_SETTINGS with the drivers disabled. Looks like there is only one mode "0" and it reports unbelievably high resolution of 4K. |
I have a feeling that calling EnumDisplaySettingsExW with EDS_RAWMODE flags can solve the issue, I will try it later today. |
You are about to hear the dumbest root cause ever. You have been warned. 4K resolution returned by EnumDisplaySettings was correct. Stride detected by BugChecker was correct too. What made the screen look weird and trashed video buffer was screen scaling set to 300%. Setting it back to 100% fixed the issue, BugChecker's UI is now visible. When graphic drivers are disabled Windows uses MS built-in driver called BasicDisplay.sys. Surprisingly, it supports pretty high resolutions, like 4K. Also, Windows enables 300% rescaling which I didn't notice. 4K rescaled 3 times makes the screen look very weird, like some basic 800*600 VGA. Also, because everything on the screen gets bigger, it's hard to notice that rescaling is turned on. And I didn't even think that it can be on by default. I guess this notice should be added to the manual. Colored red, bold font. By the way, BasicDisplay.sys is pretty small, ~150 functions, symbols available. Not sure it can be used by BugChecker, it seems only exporting DX interface, but maybe its frame buffer detection code can a source of inspiration. |
Ah ok, thank you Michael.
At the time I did an extensive research on the possibility of hooking into
WDDK in order to get the characteristics of the video framebuffer, but the
current approach of BugChecker, which is doing a memory scan in an attempt
to guess the beginning of the framebuffer (more precisely of the first and
second video line), should be the only one possible, unfortunately.
Thank you, --Vito
|
Hi,
System interrupts consume 100% CPU when booting the OS from BugChecker boot entry with DSE disabled. Happens soon after system start. The OS becomes very slow and freezes when I try to load BugChecker.sys from SymLoader. Also, it fails to detect frame buffer location but I guess it's a different issue... So let's start with the interrupts consumption. I dumped memory using livekd and checked !dpcs command (no DPCs), ran !process 0 1f to detect if some threads serving ISRs (only found one nt!HalpInterruptSendIpi call).
What can be a problem here? Feel free ask running commands on the dump.
The text was updated successfully, but these errors were encountered: