Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvme0: controller is down (additional log, >1 year worth) now runs hot #6661

Open
graemev opened this issue Feb 10, 2025 · 4 comments
Open

Comments

@graemev
Copy link

graemev commented Feb 10, 2025

Describe the bug

I raised this , via Debian's reportbug(1) , it got returned as "Closing as this is not a Debian system but running on a derivative."

This is simply an attempt to provide more logs of what is likely a 3 year old problem.

2 points of interest, 1: ran without issues for over a year (same hardware, limited use, apt-gt update on most uses)
2: without a power cycle (but with a reboot) produces different errors [dates noted in syslog attached]

I'd anticipate this gets merged with an existing bug (just to add the logs)

AFYI. Some reading I did around this suggests that M$ in windows 10 only used the deepest power saving mode of NVMe while the system was suspended. It appears the Rpi allows this mode while running normally (e.g. allows a latency larger than it can actually accept during normal running) ..so I'm guessing the hardware gets little testing of these modes.

bug-report-sysinfo.txt

reportbug-linux-image-6.6.62+rpt-rpi-2712-20250208114647-yanddw3j.txt

Since, I've moved so much to external text files (thank goodness) I'll just add the key log lines:

Jan 28 14:50:33 argon kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Jan 28 14:50:33 argon kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jan 28 14:50:33 argon kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jan 28 14:50:33 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)

So I'm sure you've seen this a lot already, of interest are the dates and the behaviour of BOOT vs POWER CYCLE

@pelwell
Copy link
Contributor

pelwell commented Feb 10, 2025

When the template says "Describe the bug", you are meant to describe the bug. Much TL, so DR. Have you heard of pastebin et al?

@graemev
Copy link
Author

graemev commented Feb 10, 2025

I was pointed here as a location to file a reportbug(1) report. The support line in the raspberry pi variant of Debian say:

HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

This is clearly not correct as "Debian" reject these as:

"Closing as this is not a Debian system but running on a derivative."

So the support URL and probably the mail address ins reportbug(1) should probably be updated to point at the correct support channel .

FYI I am not raising this because I expect help , I'm trying to get the log added to the appropriate bug . My 1st guess would be the rpi maintainer will say it's not pi specific and pass it upstream ....but I could be wrong, there may well be a unique Pi defect he's aware of.

(and yes I agree, it looks a mess ... not a very good interface to submit a bug report, more like some kind of "online help & support page" ) ...even "attach a file" would be more usable

@pelwell
Copy link
Contributor

pelwell commented Feb 10, 2025

I'm trying to get the log added to the appropriate bug

You've reached the horses mouth.

the rpi maintainer

That would be me, or one of a very small set of colleagues who will already have seen your report.

even "attach a file" would be more usable

You must have missed the part directly below the initial "Describe the bug" section where there's a paperclip icon and the words "Paste, drop or click to add files".

@graemev
Copy link
Author

graemev commented Feb 10, 2025

Ahh, thanks ...and yes I did "miss the paper-clip" ..... feel free to delete the above , I'll submit a more readable version (reportbug(1) output) via the paperclip.

AFYI: The box labelled " System*" says:

Copy and paste the results of the raspinfo command

The output of raspinfo(1) was too big to cut&paste into that box.

In my defence, I'd spent a while collecting data for this to submit to Debian , when they punted it back, I had trouble finding a better location for the report ... then, when I found somewhere (by asking directions) I found a GUI with the "damn fixed text boxes" one finds on marketing type sites (and they usually lack "paperclips") ...after about 20+ failed submits I was just hacking large sections out of the report to try to fit it in.

STOP PRESS
Seems this site allows me to update previous entries , so I've done that :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants