Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System interrupts consume 100% CPU #6

Open
zwclose opened this issue Jun 25, 2023 · 17 comments
Open

System interrupts consume 100% CPU #6

zwclose opened this issue Jun 25, 2023 · 17 comments

Comments

@zwclose
Copy link

zwclose commented Jun 25, 2023

Hi,

System interrupts consume 100% CPU when booting the OS from BugChecker boot entry with DSE disabled. Happens soon after system start. The OS becomes very slow and freezes when I try to load BugChecker.sys from SymLoader. Also, it fails to detect frame buffer location but I guess it's a different issue... So let's start with the interrupts consumption. I dumped memory using livekd and checked !dpcs command (no DPCs), ran !process 0 1f to detect if some threads serving ISRs (only found one nt!HalpInterruptSendIpi call).

What can be a problem here? Feel free ask running commands on the dump.

@zwclose
Copy link
Author

zwclose commented Jun 26, 2023

I ran it on bare metal Windows 10 22H2 64 bit.

@vitoplantamura
Copy link
Owner

vitoplantamura commented Jun 26, 2023 via email

@zwclose
Copy link
Author

zwclose commented Jun 26, 2023

BugChecker is not so well tested on bare metal

Oh, that's sad. Local debugging is a very important advantage, it would be nice to have it.

because the BugChecker main sys driver is not loaded yet

If this is correct, loading the driver ASAP should solve the problem, right?

So a possible approach to get a better understanding of what's
going on in your test case is to log the arguments passed to KdSendPacket
in the KDCOM dll (I bet it's a StateChange64 event).

From your intuition/approximation: how hard is to implement full-fledged KdSendPacket function? Or at least something that is more or less stable on bare metal?

@vitoplantamura
Copy link
Owner

vitoplantamura commented Jun 27, 2023 via email

@zwclose
Copy link
Author

zwclose commented Jun 27, 2023

Hi Vito,

I managed to start the driver before the storm but it didn't help, interrupts have started consuming CPU time ~1-2 minute after system start. I made one more dump and will examine it but it feels that there is a more fundamental issue here.

--
Michael.

@vitoplantamura
Copy link
Owner

vitoplantamura commented Jun 27, 2023 via email

@zwclose
Copy link
Author

zwclose commented Jun 28, 2023

No, I didn't even try to break in. Frame buffer autodetect doesn't work though I see that NativeUtil detects memory resources correctly (at least they match Device manager's info). And I am not sure what is the resolution when the driver is disabled. So, no break in attempt so far.

Looks more like an external event that triggers the slowdown.

@vitoplantamura
Copy link
Owner

vitoplantamura commented Jun 28, 2023 via email

@zwclose
Copy link
Author

zwclose commented Jun 28, 2023

Hi Vito,

Got a lot of progress with BugChecker. What I did:

  1. Disabled video drivers, set framebuffer parameters to: 640 * 480, address from resources.txt, stride = 0
  2. Rebooted from BC menu
  3. Immediately after start ran SymLoader and started the driver
  4. Broke in. The OS has stopped but no BC UI appeared
  5. Pressed F5. The OS unfrozen and continued working

So, no more interrupt storm! During the test I waited a bit and stopped and started the OS a few times, everything worked fine. It looks that stop-continue workaround prevents the storm.

After that I tried framebuffer autodetection and it worked with no error messages though detected parameters look incorrect. BugChecker detected the resolution of 3840*2400 with 15360 stride. This is my working resolution but it's definitely not what I see in VGA mode. Note, that I run BugChecker on a laptop with two graphic cards: built-in Intel and NVidia. For some reasons NativeUtil only detected Intel card before today, so only I tried 640*480 with 2 memory buffers belonging to Intel card. The UI did not appear. I believe (though not sure) that the laptop uses Intel to output to the built-in display.

--
Michael

@vitoplantamura
Copy link
Owner

vitoplantamura commented Jun 28, 2023 via email

@zwclose
Copy link
Author

zwclose commented Jun 29, 2023

just start the BugChecker driver with "some" framebuffer info, correct or incorrect doesn't matter.

Awesome!

I played with the framebuffer trying different resolutions and strides. Noticed that:

  1. NativeUtil sometimes detects NVidia, sometimes it doesn't
  2. Built-in Intel with resolution 800*600 and higher and different strides do print garbage on the screen. Usually F5 works out but sometimes the system gets halted.

Anyway, looks like Intel is the right lead but I need to find the right resolution and, the most important, the right stride. Do you know how can I do it?

@zwclose
Copy link
Author

zwclose commented Jun 29, 2023

Looks one of the problems is that EnumDisplaySettingsA returns incorrect results when video drivers are unloaded. On my bare metal Windows it does not return the actual resolution but the one that I set (4K), even when the drivers are disabled. This is also what I see when opening display settings.

@vitoplantamura
Copy link
Owner

vitoplantamura commented Jun 29, 2023 via email

@zwclose
Copy link
Author

zwclose commented Jun 30, 2023

I tried stride 0 too, no UI, just some garbage.

So you are unable to determine your screen resolution when the display drivers are disabled, even with Windows' display properties, right?

Yes. I tried calling EnumDisplaySettingsW with iModeNum 0, 1, ..., instead of ENUM_CURRENT_SETTINGS with the drivers disabled. Looks like there is only one mode "0" and it reports unbelievably high resolution of 4K.

@zwclose
Copy link
Author

zwclose commented Jun 30, 2023

I have a feeling that calling EnumDisplaySettingsExW with EDS_RAWMODE flags can solve the issue, I will try it later today.

@zwclose
Copy link
Author

zwclose commented Jul 1, 2023

You are about to hear the dumbest root cause ever. You have been warned.

4K resolution returned by EnumDisplaySettings was correct. Stride detected by BugChecker was correct too. What made the screen look weird and trashed video buffer was screen scaling set to 300%. Setting it back to 100% fixed the issue, BugChecker's UI is now visible.

When graphic drivers are disabled Windows uses MS built-in driver called BasicDisplay.sys. Surprisingly, it supports pretty high resolutions, like 4K. Also, Windows enables 300% rescaling which I didn't notice. 4K rescaled 3 times makes the screen look very weird, like some basic 800*600 VGA. Also, because everything on the screen gets bigger, it's hard to notice that rescaling is turned on. And I didn't even think that it can be on by default. I guess this notice should be added to the manual. Colored red, bold font.

By the way, BasicDisplay.sys is pretty small, ~150 functions, symbols available. Not sure it can be used by BugChecker, it seems only exporting DX interface, but maybe its frame buffer detection code can a source of inspiration.

@vitoplantamura
Copy link
Owner

vitoplantamura commented Jul 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants