Re: threshold_init_device/kobject_uevent_env oops

From: Greg KH
Date: Sat Jan 26 2008 - 02:26:36 EST


On Fri, Jan 25, 2008 at 11:08:53PM -0800, Yinghai Lu wrote:
> On Jan 25, 2008 10:14 PM, Greg KH <gregkh@xxxxxxx> wrote:
> >
> > On Fri, Jan 25, 2008 at 10:04:19PM -0800, Yinghai Lu wrote:
> > > On Jan 25, 2008 2:50 PM, Greg KH <gregkh@xxxxxxx> wrote:
> > > > On Fri, Jan 25, 2008 at 02:47:11PM -0800, Greg KH wrote:
> > > > > On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> > > > > >
> > > > > > * Greg KH <gregkh@xxxxxxx> wrote:
> > > > > >
> > > > > > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > > > > > current linus tree + x86.git
> > > > > > > >
> > > > > > > > got
> > > > > > > >
> > > > > > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > > > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > > > > > >
> > > > > > > Does this happen on just Linus's tree?
> > > > > > >
> > > > > > > Can you send me a .config file for this?
> > > > > > >
> > > > > > > What is threshold_init()? Is it something new in the x86.git tree?
> > > > > >
> > > > > > no. A quick grep shows that it is in a file that _your_ changes in
> > > > > > Linus' latest have touched:
> > > > > >
> > > > > > arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> > > > >
> > > > > Ok, those are pretty much just search/and/replace type changes, but I
> > > > > have been running x86-64 boxes with these changes in place.
> > > >
> > > > Oh wait, I do see a change. We are now (finally) emitting a kobject
> > > > uevent for these devices, which somehow the code can't handle properly.
> > > >
> > > > Let me go poke this some more, unfortunatly I don't have any AMD 64
> > > > boxes here anymore, only Intel based processors, so I can't run this
> > > > module...
> > >
> > > it only happens with AMD Quad Core CPU or Fam 10h.
> > >
> > > works well with AMD opteron Rev E, and Rev F.
> >
> > So this only dies on a multi-core system? Or does 2 processor boxes
> > work, but not 4?
>
> 2 sockets x quad core will fail (fam 10h)
> 2 sockets x dual core works....( rev E, and rev F opteron)
>
> there are some changs between opteron and fam10h. fam10h may have
> more local vectors for MCE...
> or more banks and blocks...
>
> will look at AMD64 Bios and kernel porting guide for Fam 10h again..
>
> wonder if your code uncover some bugs ...

No, the logic in this function is just crazy. It's recursive, but we
can circumvent the creation for the kobject and whole creation of the
threshold_block if some conditions are met. That's why we see the
allocate_threshold_blocks so many times in the callstack, yet only a few
kobjects created.

Then we blow up in kobject_uevent_env() on the first debug printk.
Which means that we are just passing in garbage.

Let me know if the patch below fixes this for you, I think it should, as
there is a code path where b is NULL and then we call kobject_uevent.

Man, this is one time that comments in code would have been very nice to
have, and why forward goto's into major code blocks are just evil...

thanks,

greg k-h

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
index 7535887..8a7f204 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
@@ -450,7 +450,8 @@ recurse:
if (err)
goto out_free;

- kobject_uevent(&b->kobj, KOBJ_ADD);
+ if (b && &b->kobj)
+ kobject_uevent(&b->kobj, KOBJ_ADD);

return err;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/