Re: [PATCH] drivers: base: update cpu offline info when do hotplug

From: Dan Streetman
Date: Mon Oct 27 2014 - 12:29:14 EST


On Sun, Oct 26, 2014 at 10:26 PM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Oct 26, 2014 at 07:17:14PM -0700, Neil Zhang wrote:
>>
>>
>> > -----Original Message-----
>> > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
>> > Sent: 2014å10æ27æ 9:59
>> > To: Neil Zhang
>> > Cc: Dan Streetman; linux-kernel@xxxxxxxxxxxxxxx
>> > Subject: Re: [PATCH] drivers: base: update cpu offline info when do hotplug
>> >
>> > On Sun, Oct 26, 2014 at 06:43:11PM -0700, Neil Zhang wrote:
>> > > Greg,
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: ddstreet@xxxxxxxxx [mailto:ddstreet@xxxxxxxxx] On Behalf Of
>> > > > Dan Streetman
>> > > > Sent: 2014å10æ21æ 1:03
>> > > > To: Neil Zhang
>> > > > Cc: Greg KH; linux-kernel@xxxxxxxxxxxxxxx
>> > > > Subject: Re: [PATCH] drivers: base: update cpu offline info when do
>> > > > hotplug
>> > > >
>> > > > On Mon, Oct 20, 2014 at 3:40 AM, Neil Zhang <zhangwm@xxxxxxxxxxx> wrote:
>> > > > > Greg,
>> > > > >
>> > > > >
>> > > > > -----Original Message-----
>> > > > > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
>> > > > > Sent: 2014å10æ20æ 14:48
>> > > > > To: Neil Zhang
>> > > > > Cc: linux-kernel@xxxxxxxxxxxxxxx
>> > > > > Subject: Re: [PATCH] drivers: base: update cpu offline info when
>> > > > > do hotplug
>> > > > >
>> > > > > On Sun, Oct 19, 2014 at 11:39:23PM -0700, Neil Zhang wrote:
>> > > > >>> How much noise is this going to cause on a big/little system
>> > > > >>> that constantly hot unplug/plugs processors all of the time?
>> > > > >>
>> > > > >> Can you explain more what kind of noise will be introduced on a
>> > > > >> big/little
>> > > > system?
>> > > > >
>> > > > > Have you tested this on such a machine?
>> > > > >
>> > > > > I didn't have such kind of machine on hand.
>> > > > > Can anyone has such machine to verify it?
>> > > > > Thanks!
>> > > >
>> > > > I tested this on a ppc PowerVM system, using dlpar operations to
>> > > > remove/add cpus.
>> > > >
>> > > > Without this patch the cpu online nodes get out of sync with the
>> > > > main online node (and the actual state of the cpus), because they
>> > > > aren't updated as the cpus are brought up/down:
>> > > >
>> > > > [root@br10p02 cpu]$ pwd
>> > > > /sys/devices/system/cpu
>> > > > [root@br10p02 cpu]$ cat online
>> > > > 0-39
>> > > > [root@br10p02 cpu]$ for n in {0..47} ; do test $( cat cpu$n/online )
>> > > > -eq 1 && echo -n "$n " ; done ; echo ""
>> > > > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
>> > > > 26
>> > > > 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>> > > >
>> > > >
>> > > > While with the patch, the cpu online nodes are kept up to date as
>> > > > the cpus are brought up/down:
>> > > >
>> > > > [root@br10p02 cpu]$ pwd
>> > > > /sys/devices/system/cpu
>> > > > [root@br10p02 cpu]$ cat online
>> > > > 0-39
>> > > > [root@br10p02 cpu]$ for n in {0..47} ; do test $( cat cpu$n/online )
>> > > > -eq 1 && echo -n "$n " ; done ; echo ""
>> > > > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
>> > > > 26
>> > > > 27 28 29 30 31 32 33 34 35 36 37 38 39
>> > > >
>> > > >
>> > > > Feel free to add
>> > > >
>> > > > Tested-by: Dan Streetman <ddstreet@xxxxxxxx>
>> > > >
>> > >
>> > > It's a real bug in the kernel.
>> >
>> > As this has been this way for many years, I tend to think it's not all that
>> > important...
>>
>> Actually this bug was introduced by the following patch.
>>
>> commit 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c
>> Author: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>> Date: Fri May 3 00:25:49 2013 +0200
>>
>> Driver core: Use generic offline/online for CPU offline/online
>>
>> So seems not that long :)
>
> Ok, over a year.
>
> Any reason why this information wasn't in this patch? Also, why not cc:
> the authors of that patch as well? Surely they would want to know about
> this, right?

To add a bit more info to this, while the PPC (on PowerVM) method for
cpu hotplug, in arch/powerpc/platform/pseries/dlpar.c
dlpar_offline_cpu(), does require this patch because it only takes a
cpu offline during hot remove, the x86/acpi code appears to be
different as it fully unregisters a cpu during hot remove, in
drivers/acpi/acpi_processor.c acpi_processor_remove() - so I believe
the entire cpuN directory would be removed. I don't have a
hw-hotpluggable x86 system though, and it doesn't look like qemu
really supports cpu hotremove yet, so I can't test that. I don't know
how other archs handle cpu hotplug.

Also, the ppc pseries dlpar code may be changed in the future to
unregister the cpu instead of only setting it offline; I've cc'ed
Nathan who probably would be doing that.

But regardless, after commit 0902a90, when a cpu is being taken up or
down by anything other than generic offline/online code, the cpu's
->offline state does need to be updated. If not by a hotplug listener
like this, then possibly by kernel/cpu.c set_cpu_online()...



>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/