Re: [PATCH V2] Driver cpu: update online when cpu_up/down besides sysfs
From: Rafael J. Wysocki
Date: Wed Oct 29 2014 - 17:26:16 EST
On Monday, October 27, 2014 08:46:08 PM Dan Streetman wrote:
> On Mon, Oct 27, 2014 at 5:38 PM, Rafael J. Wysocki
> <rafael.j.wysocki@xxxxxxxxx> wrote:
> > On 10/27/2014 3:59 AM, Neil Zhang wrote:
> >>
> >> The current per-cpu offline info won't be updated when we use
> >> any other method besides sysfs to call cpu_up/down.
> >> Thus the cpu/online can't reflect the real online status.
> >>
> >> This patch is going to fix the issue introduced by commit
> >> 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
> >> Use generic offline/online for CPU offline/online)
> >>
> >> CC: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >> Tested-by: Dan Streetman <ddstreet@xxxxxxxx>
> >> Signed-off-by: Neil Zhang <zhangwm@xxxxxxxxxxx>
> >
> >
> > Oh dear, no.
> >
> > Please first tell me what exactly the problem you're seeing is.
>
> For some background, here is my last comment on the first email thread on this:
> https://lkml.org/lkml/2014/10/27/595
>
> I didn't create this patch, but the problem essentially is that before
> your commit the individual cpu online nodes
> (/sys/devices/system/cpu/cpuN/online) stayed in sync during
> cpu_down/up, because they used the cpu_online_mask; while after the
> commit, they are tracked by the cpu's generic dev->offline flag, which
> isn't updated during cpu_down/up.
Which is not triggered from sysfs.
> So now, any place in the kernel
> that brings a cpu up or down must also update the cpu->dev->offline
> flag.
Not any place. In particular, system suspend-resume doesn't need to
do that, because it takes CPUs offline and then brings them back
online.
If there's a place in the kernel where CPUs are taken offline and
left in that state, then it needs to be updated.
> My interest in the patch was coincidental because I was seeing
> the same problem when using dlpar operations to hotplug cpus, which
> uses the arch/powerpc/platform/pseries/dlpar.c code; that code brings
> a cpu offline when it's hot-removed (and the cpu online when it's
> hot-added), but it hasn't been changed to also update the cpu's
> dev->offline flag.
It should be modified to do that.
> As I said in the previous email to the first thread, the ppc dlpar
> operation might be changed in the future to fully unregister a cpu
> when it's hot-removed, which would remove the entire sysfs cpuN
> directory. Alternately and/or until then, it could be updated to
> simply update the cpu'd dev->offline flag (that's what I originally
> did for my own testing). However, without a central place to update
> the cpu's dev->offline field, like this, or possibly in
> set_cpu_online(), or elsewhere during cpu_down/up, each place in the
> kernel that calls cpu_down() or cpu_up() also needs to update the
> dev->offline flag. It's possible that the ppc dlpar code is the only
> place in the kernel that has this problem; I haven't searched.
It is quite likely to be the only place like that.
While I'm not familiar with the code in question, the most straightforward
way to fix the problem would be to replace cpu_down() in there with
device_offline(get_cpu_device(cpu)), but that needs to be called under
device_hotplug_lock.
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/