Re: power-off delay/hang due to commit 6d25be57 (mainline)

From: Stephen Berman
Date: Mon Jun 15 2020 - 03:58:18 EST


On Sun, 14 Jun 2020 19:10:05 +0200 Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:

> On 2020-06-14 14:12:18 [+0200], Stephen Berman wrote:
[...]
>> What am I supposed to do after "echo t > /proc/sysrq-trigger"? Both
>> before and after doing that I get an error trying to open the file:
>>
>> root [ ~ ]# cat /proc/sysrq-trigger
>> cat: /proc/sysrq-trigger: Input/output error
>
> echo "t > /proc/sysrq-trigger"
>
> not cat.

Ok, sorry, I had misunderstood, but now I've looked at the
documentation. I had in fact already done `echo t >
/proc/sysrq-trigger' in an xterm (as root) and there was no output.
Later, after booting kernel 5.1.0 because of the message flooding with
5.6.4, I did `echo t > /proc/sysrq-trigger' in a virtual tty (also as
root) and the only output was: `[ <timestamp>] sysrq: Show State'. Is
this expected? (In /proc/sys/kernel/sysrq there is `1'.) I couldn't
try it in a 5.6.4 virtual tty because of the message flooding (see
below).

[...]
>> I applied this patch to 5.6.4 and recompiled, and on the next boot with
>> that kernel, the kernel buffer (and kernel and system logs) began to get
>> flooded with these messages:
>>
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.943987]
>> acpi_os_execute_deferred(843) Start ffff8fb82c7b6500
>> 000000003edf1e05(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944102] acpi_os_execute(1101)
>> Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944104]
>> acpi_os_execute_deferred(845) End ffff8fb82c7b6500
>> 000000003edf1e05(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944105]
>> acpi_os_execute_deferred(843) Start ffff8fb82b844800
>> 000000002ba560ea(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944124]
>> acpi_os_execute_deferred(845) End ffff8fb82b844800
>> 000000002ba560ea(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944288]
>> acpi_os_execute_deferred(843) Start ffff8fb82c7b6540
>> 000000003edf1e05(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944387] acpi_os_execute(1101)
>> Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
>>
>> and so on without stopping. I could start X and avoid seeing the
>> messages, but was afraid the logs would fill up the root partition if I
>> let it keep going, so I rebooted with another kernel.
>>
>> Was this message flood because I booted with "ignore_loglevel
>> initcall_debug"? In the logs there are also lots of messages like this:
>
> Is there a acpi_os_execute_* flood? The *few* at what appears to system
> startup and might be normal. If there appear *many* more and are
> constantly printing (check with dmesg) then we might be to something.

The kernel log shows 305 of these messages in the 4 minutes and 17
seconds between the start of klogd and when I rebooted.

[...]
> I attached the updated acpi patch. It limits the prints to the
> kacpi_notify_wq queue which appears to stuck at shutdown.

I applied this patch and rebuilt the kernel. After booting 5.6.4 this
morning there was again message flooding and this time I was unable to
log in because of it, so had to do a hard reboot and booted with 5.1.0.
The kernel log shows 207 acpi_os_execute_deferred_notify messages in 3
minutes and 6 seconds; here are the first:

Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.943848] acpi_os_execute(1109) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8d7aaba18990 <ffff8d7aab157fc0>
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.943851] acpi_os_execute_deferred_notify(851) Start ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.943865] acpi_os_execute_deferred_notify(853) End ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944104] acpi_os_execute(1109) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8d7aaba18990 <ffff8d7aab157fc0>
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944106] acpi_os_execute_deferred_notify(851) Start ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944115] acpi_os_execute_deferred_notify(853) End ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944406] acpi_os_execute(1109) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8d7aaba18990 <ffff8d7aab157fc0>
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944408] acpi_os_execute_deferred_notify(851) Start ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944417] acpi_os_execute_deferred_notify(853) End ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)

> The interesting part is to see if there is a acpi_os_execute() adding a
> specific event multiple times which does not complete. Maybe at runtime,
> maybe at shutdown time. If that is the case then ignoring this specific
> event might fix the shutdown problem. With all this information so far,
> I don't see a relation with this problem and the commitâ

In those 3 minutes and 8 seconds there were 5 "Adding
acpi_ev_asynch_enable_gpe" messages at the beginning, then 185 "Adding
acpi_ev_notify_dispatch" messages, which kept coming until I rebooted.

Steve Berman