Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might_sleep
From: Andy Lutomirski
Date: Mon May 23 2016 - 21:23:57 EST
On Sun, Feb 28, 2016 at 9:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Wed, Nov 19, 2014 at 11:44 AM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> On Wed, Nov 19, 2014 at 11:29 AM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
>>>
>>> The exception handlers which use the IST stacks don't necessarily
>>> set irq count. Maybe they should.
>>
>> Hmm. I think they should. Since they clearly must not schedule, as
>> they use a percpu stack.
>>
>> Which exceptions use IST?
>>
>> [ grep grep ]
>>
>> Looks like stack, doublefault, nmi, debug and mce. And yes, I really
>> think they should all raise the irq count if they don't already.
>> Rather than add random arch-specific "let's check that we're on the
>> right stack" code to the might-sleep stuff, just use the one we have.
>>
>
> Resurrecting an old thread:
>
> The outcome of this discussion was that ist_enter now raises
> HARDIRQ_COUNT. I think this is causing a problem. If a user program
> enables TF, it generates a bunch of debug exceptions. The handlers
> raise the IRQ count and do stuff, and apparently some of that stuff
> can raise a softirq. (I have no idea where the softirq is being
> raised.) The softirq code notices that we're in_interrupt and doesn't
> wake ksoftirqd because it thinks we're about to exit the interrupt and
> process the softirq. But we don't, which causes occasional warnings
> and confuses things (and me!).
>
> So how do we fix it? If we stop raising HARDIRQ_COUNT (and apply
> $SUBJECT?), then raise_softirq will wake ksoftirqd and life is good.
> But this seems a bit silly, since, if we entered the ist exception
> handler from a context with irqs on and softirqs enabled, we *could*
> plausibly handle the softirq right away -- we're on an essentially
> empty stack. (Of course, it's a *small* stack, since it could be the
> IST stack.)
>
> Or we could just let ksoftirqd do its thing and stop raising
> HARDIRQ_COUNT. We could add a new preempt count field just for IST
> (yuck). We could try to hijack a different preempt count field
> (NMI?). But I kind of like the idea of just reinstating the original
> patch of explicitly checking that we're on a safe stack in schedule
> and __might_sleep, since that is the actual condition we care about.
Ping? I can still trigger this fairly easily on 4.6.
--Andy