Re: [PATCH v2 0/3] Freezer, CPU hotplug, x86 Microcode: Fix taskfreezing failures

From: Srivatsa S. Bhat
Date: Mon Oct 10 2011 - 13:32:09 EST


On 10/10/2011 10:23 PM, Borislav Petkov wrote:
> On Mon, Oct 10, 2011 at 11:32:40AM -0400, Srivatsa S. Bhat wrote:
>>> The seems like entirely the wrong way to go about solving this problem.
>>>
>>> The kernel shouldn't be responsible for making hotplug stress tests
>>> exclusive with system sleep. Whoever is running those tests should be
>>> smart enough to realize what's wrong if system sleep interferes with a
>>> test.
>
> Yes, agreed. And more: I'm still trying to understand why a test case
> like that is relevant and needs to be fixed at all. Let me re-formulate
> the question: what real world scenario(s) does the case of hibernating
> _while_ off- and onlining cores cover? Or are you simply doing kernel
> resiliency testing and thought that offlining cores while hibernating
> might make sense?
>

Actually, my whole intention while coming up with this test case was to
test the stability/correct operation of the entire suspend/resume call
path. And since I found that cpu hotplug is used in that call path I
thought of giving it a whirl and finding out if there were any cases that
lead to freezing failures and the like. And I did uncover a couple of cases,
one after the other.
But I do agree that offlining and onlining CPUs while suspending might
not seem all that useful or even wise, but like I said, it was designed to
bring out such problematic race conditions.

So, in the interest of making the important components involved in
suspend/resume call path (namely cpu hotplug) more robust and stable,
I think it makes sense to fix any issue we hit (atleast when we
practically hit it and it is proved that such a scenario is no longer
hypothetical).

For that, we can either go with the simple one-line fix that I posted
earlier (which has got another motivation now, thanks to Borislav) or
with this elaborate solution, whichever seems better/worthwhile.

If it is still strongly felt that this "bug" is not worth fixing with such
mutual exclusion schemes, it will still get solved anyway by applying that
one-line patch.

> IOW, I still fail to see a strong reason for this needing fixing.
>
> Thanks.
>


--
Regards,
Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
Linux Technology Center,
IBM India Systems and Technology Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/