On Tue, 24 Jan 2017, Yasuaki Ishimatsu wrote:
rapl_cpu_prepare() must be called after logical package id of CPU
is set by topology_update_package_map().
But when onlining hot-added CPU, rapl_cpu_prepare() is called before
setting logical package id of the hot-added CPU. So cpu_to_rapl_pmu()
in rapl_cpu_prepare() finds a rapl_pmu of wrong logical package id and
rapl_cpu_prepare() initializes the wrong rapl_pmu.
After that logical package id of the hot-added CPU is set by
topology_update_package_map(). But rapl_cpu_prepare() does
not initialize pmu of the logical package id of the hot-added CPU.
So when calling rapl_cpu_online(), cpu_to_rapl_pmu() returns NULL and
the following NULL pointer dereference occurs.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
The patch renames rapl_cpu_prepare() to rapl_cpu_starting() and changes
the position of cpuhp_state so that rapl_cpu_starting() is called
Does not work. You cannot call that callback in the starting context. It
does allocations. This needs be fixed in a different way. I'll have a look