Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

From: Charles (Chas) Williams
Date: Tue Nov 08 2016 - 09:21:12 EST


On 11/07/2016 03:20 PM, Thomas Gleixner wrote:
On Mon, 7 Nov 2016, Charles (Chas) Williams wrote:
On 11/07/2016 11:19 AM, Thomas Gleixner wrote:
On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:
I don't know why the CPU's phys_proc_id is 2.

max_physical_pkg_id gets initialized via:

cpus = boot_cpu_data.x86_max_cores;
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?

I have discovered that that is not the problem. smp_init_package_map()
is calculating the physical core id using the following:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

...
if (!topology_update_package_map(apicid, cpu))
continue;

...
int topology_update_package_map(unsigned int apicid, unsigned int cpu)
{
unsigned int new, pkg = apicid >>
boot_cpu_data.x86_coreid_bits;

But later when the secondary CPU's are identified they use a different
calculation using the local APIC ID from the CPU's registers:

static void generic_identify(struct cpuinfo_x86 *c)
...
if (c->cpuid_level >= 0x00000001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
...
c->phys_proc_id = c->initial_apicid;

So at the end of identify_cpu() when the boot/hotplug assignment is
put back:

c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

topology_phys_to_logical_pkg() is returning an invalid logical processor
since one isn't configured.

It's not clear to me what the right thing to do is or which is right.

Nice detective work! So the issue is that the package mapping code honours
boot_cpu_data.x86_coreid_bit, while generic_identify does
not. boot_cpu_data.x86_coreid_bit is obviously 1 in your case. Tentative
fix below. I still need to gow through that maze and figure out what could
go wrong with that :(

I don't think that will fix the issue. My deubugging leads me to believe
that boot_cpu_data.x86_coreid_bits is probably 0:

[ 0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0
[ 0.016398] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040)
[ 0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1
[ 0.016462] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (ffff88023fd0a040)

So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
apicid but it certainly doesn't seem to the match the apicid in the
CPU's registers. For whatever reason, my VMware system is reporting
that the second CPU has a local APIC ID of 2:

[ 0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0, apicid 0, initial_apicid 0
...
[ 0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2, apicid 2, initial_apicid 2

I was thinking it might be better to call topology_update_package_map()
at the bottom of identify_cpu() to setup the secondary CPU's. The boot
cpu could be setup during smp_init_package_map().

topology_update_package_map() is also poorly named it can create the
assignment, but can't update it (since it doesn't know the previous
mapping).


8<------------------------
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -905,6 +905,8 @@ static void detect_null_seg_behavior(str

static void generic_identify(struct cpuinfo_x86 *c)
{
+ unsigned int pkg;
+
c->extended_cpuid_level = 0;

if (!have_cpuid_p())
@@ -929,7 +931,8 @@ static void generic_identify(struct cpui
c->apicid = c->initial_apicid;
# endif
#endif
- c->phys_proc_id = c->initial_apicid;
+ pkg = c->initial_apicid >> boot_cpu_data.x86_coreid_bits;
+ c->phys_proc_id = pkg;
}

get_model_name(c); /* Default name */