Re: 2.6.32-rc8: amd64_edac slub error

From: Borislav Petkov
Date: Mon Nov 30 2009 - 15:36:00 EST


Hi Randy,

On Mon, Nov 30, 2009 at 09:28:19AM -0800, Randy Dunlap wrote:
> Loading amd64_edac_mod on an amd64 system without the expected hardware support
> causes memory usage error(s).

Well, this is new!

> Is this already fixed/patched? Do you need more info?

Nope :(.

I've tried to reproduce it here by selecting CONFIG_SLUB no success.
Please send me your config.

Also, it would be very helpful if you could enable CONFIG_EDAC_DEBUG and
run it again.

>From looking at the error trace, though, it looks like we're
not allocating enough memory for the struct msr things in
amd64_nb_mce_bank_enabled_on_node(). This is just a hunch though and you
could give the following debug patch a try:

---
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index a38831c..139bc14 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2739,8 +2739,10 @@ static void get_cpus_on_this_dct_cpumask(cpumask_t *mask, int nid)
int cpu;

for_each_online_cpu(cpu)
- if (amd_get_nb_id(cpu) == nid)
+ if (amd_get_nb_id(cpu) == nid) {
+ pr_err("%s: nid: %d, cpu: %d\n", __func__, nid, cpu);
cpumask_set_cpu(cpu, mask);
+ }
}

/* check MCG_CTL on all the cpus on this node */
@@ -2755,6 +2757,8 @@ static bool amd64_nb_mce_bank_enabled_on_node(int nid)

get_cpus_on_this_dct_cpumask(&mask, nid);

+ pr_err("%s: weight: %d\n", __func__, cpumask_weight(&mask));
+
msrs = kzalloc(sizeof(struct msr) * cpumask_weight(&mask), GFP_KERNEL);
if (!msrs) {
amd64_printk(KERN_WARNING, "%s: error allocating msrs\n",

--

PS. I'm travelling till the end of the week and won't have constant
access to mail but I'll do my best to fix this, sorry.

Thanks.

--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/