Re: frequent lockups in 3.18rc4

From: Andy Lutomirski
Date: Thu Nov 20 2014 - 18:08:30 EST

On Thu, Nov 20, 2014 at 3:05 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello,
> On Thu, Nov 20, 2014 at 11:42:42PM +0100, Thomas Gleixner wrote:
>> On Thu, 20 Nov 2014, Tejun Heo wrote:
>> > On Thu, Nov 20, 2014 at 10:58:26PM +0100, Thomas Gleixner wrote:
>> > > It's completely undocumented behaviour, whether it has been that way
>> > > for ever or not. And I agree with Fredric, that it is insane. Actuallu
>> > > it's beyond insane, really.
>> >
>> > This is exactly the same for any address in the vmalloc space.
>> I know, but I really was not aware of the fact that dynamically
>> allocated percpu stuff is vmalloc based and therefor exposed to the
>> same issues.
>> The normal vmalloc space simply does not have the problems which are
>> generated by percpu allocations which have no documented access
>> restrictions.
>> You created a special case and that special case is clever but not
>> very well thought out considering the use cases of percpu variables
>> and the completely undocumented limitations you introduced silently.
>> Just admit it and dont try to educate me about trivial vmalloc
>> properties.
> Why are you always so overly dramatic? How is this productive? Sure,
> this could have been better but I missed it at the beginning and this
> is the first time I hear about this issue. Shits happen and we fix
> them.
>> > That isn't enough tho. What if the percpu allocated pointer gets
>> > passed to another CPU without task switching? You'd at least need to
>> > send IPIs to all CPUs so that all the active PGDs get updated
>> > synchronously.
>> You obviously did not even take the time to carefully read what I
>> wrote:
>> "Now after that increment the allocation side needs to wait for a
>> scheduling cycle on all cpus (we have mechanisms for that)"
>> That's exactly stating what you claim to be 'not enough'.
> Missed that. Sorry.
>> > For the time being, we can make percpu accessors complain when
>> > called from nmi handlers so that the problematic ones can be easily
>> > identified.
>> You should have done that in the very first place instead of letting
>> other people run into issues which you should have thought of from the
>> very beginning.
> Sure, it would have been better if I noticed that from the get-go, but
> I couldn't think of the NMI case that time and neither did anybody who
> reviewed the code. It'd be awesome if we could have avoided it but it
> didn't go that way, so let's fix it. Can we please stay technical?
> So, for now, all we need is adding nmi check in percpu accessors,
> right?

What's the issue with nmi? Page faults are supposed to nest correctly
inside nmi, right?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at