Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

From: Peter Zijlstra
Date: Thu Apr 24 2014 - 15:09:18 EST


On Thu, Apr 24, 2014 at 05:41:20PM +0000, Luck, Tony wrote:
> >> The BIOS always sends CPU hot-addition events before memory
> >> hot-addition events, so it's hard to change the order.
> >> And we couldn't completely solve this performance penalty because the
> >> affected code tries to allocate memory for all possible
> >> CPUs instead of onlined CPUs.
> >
> > So the BIOS is fucked, news at 11, one would have hoped Intel would have
> > _some_ say in it, but alas. So how about instead you force memory online
> > when you online the first CPU, screw whatever the BIOS does or does not?
>
> Certainly an interesting implementation choice by the BIOS. The only logical
> order to use to bring components of a modern cpu online is:
>
> 1) Memory - so we have a place to allocate structure needed for following steps
> 2) Cores - so we have a place to direct interrupts from next step
> 3) I/O
>
> We should log a bug against the BIOS ... but systems are already shipping so we will
> have to deal with this.

Someone want to clue me in what systems these are so I can try and stay
the hell away from them?

> Either we use your existing patch - and systems with silly BIOS will work, but with a
> small NUMA penalty for objects allocated remotely

Depending on how this all is constructed, I can imagine the worst case
where we bring up a medium to large system (8+ nodes, non fully
connected etc) and we only have memory for the first node online from
booting. The cpu bringup could be concurrent/fast-enough to not have any
other memory online.

This would result in all cpus having their memory on the first node
(including per-cpu chunks I would imagine), that's entirely retarded.

We should really refuse to bring up CPUs and boot in reduced capacity
for such demented systems.

> or ... we implement some crazy queuing scheme ... where we delay bringing cores
> online for a while to see whether more things like memory and I/O start showing
> up too. We can't wait forever - people sometimes do configure systems with
> memory-less nodes.

Is there no distinction between the cases? I've really no idea how the
BIOS communicates this (and honestly no real desire to know), but it
would be best if we can kludge around this in the arch code and keep it
out of core code.

Did I already say that memory-less nodes are stupid? ;-)

> I think your existing solution is the better choice ... the penalties probably aren't
> all that big ... so extensive workarounds for BIOS bugs seem like the wrong direction.

Why can't we have the architecture code generate a memory add event on
the first cpu up of which there is no memory yet?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/