Re: [linux-pm] intermittent suspend problem again
From: Ferenc Wagner
Date: Wed Nov 11 2009 - 08:29:56 EST
"Rafael J. Wysocki" <rjw@xxxxxxx> writes:
> On Wednesday 11 November 2009, Ferenc Wagner wrote:
>
>> "Rafael J. Wysocki" <rjw@xxxxxxx> writes:
>>
>>> On Thursday 29 October 2009, Ferenc Wagner wrote:
>>>
>>>> "Rafael J. Wysocki" <rjw@xxxxxxx> writes:
>>>>
>>>>> On Wednesday 28 October 2009, Ferenc Wagner wrote:
>>>>>
>>>>>> 2.6.32-rc5 feels particularly bad, with frequent failures to switch
>>>>>> off the machine after "S|" or freezes after "Snapshotting system".
>>>>>> The former does not cause much trouble in itself, as the machine can
>>>>>> be switched off and resumed all right, but the latter is nasty.
>>>>>> Suspend to RAM works all the time. The issue is not reproducible,
>>>>>> unfortunately, and the kernel change happened almost together with a
>>>>>> BIOS upgrade. Yesterday I switched back to 2.6.31 to see whether it
>>>>>> still works stably with the new BIOS. I'll report back my findings in
>>>>>> a couple of days.
>>>>>
>>>>> OK, thanks.
>>>>>
>>>>> Still, I'm really afraid we won't be able to debug it any further without a
>>>>> reproducible test case.
>>>>
>>>> Can't you perhaps suggest a way forward there? Or some tricks to create a
>>>> reproducible test case here?
>>>
>>> Well, you can test if the problem is reproducible in the "shutdown" mode of
>>> hibernation.
>>
>> Well, both failure modes happen with "shutdown" mode as well (the S|
>> freeze with yesterday's git, too), but still not reproducibly. When
>> s2disk is stuck in "Snapshotting system", the system is not completely
>> dead, it echoes line feeds and Ctrl-C at least (as added to #14504).
>>
>> I wonder what you did if the issue was reproducible... Is that totally
>> unapplicable if the problem happens with 10% probability only? Slow,
>> sure, but until I manage to set up an automated testing bench...
>
> I would try to identify the commit that made the problem appear using git
> bisection. However, this is really difficult with problems that are not
> reliably reproducible.
Indeed. I'm thinking about setting up a script, which does nothing but
hibernates the laptop in a loop, and get my router provide a constant
stream of WOL packets to restart it. If it always freezes in bounded
time that will make bisecting possible, if slow.
> Failing that, I would add some instrumentation to the code to identify the
> exact place where it hangs.
I managed to achieve this with my STR problem, see
http://bugs.freedesktop.org/show_bug.cgi?id=22126#c17, but maybe that
status = acpi_evaluate_object(NULL, METHOD_NAME__PTS, &arg_list, NULL);
wasn't deep enough, as it got no followup. How deep should one go to be
useful?
I can probably do so again, if slower; but this case may also be easier
if I can depend on working console output. Which are the interesting
parts for instrumentation? Can those parts produce console output to
VGA or netconsole? Wouldn't switching on ACPI debugging before invoking
s2disk be useful? Which parts of it (to avoid it spitting out MBs of
useless characters)?
> BTW, did you carry out the /sys/power/pm_test "core" test on the box?
I'm not clear on how to do that with user space suspend. Simply set it
to "cores" before invoking s2disk? I already did the test for STR (see
http://bugs.freedesktop.org/show_bug.cgi?id=22126#c3), but will redo
with the current kernel tonight.
--
Thanks,
Feri.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/