Re: oops in tracepoint_update_probe_range()

From: Mathieu Desnoyers
Date: Thu Mar 19 2009 - 09:28:23 EST


* Ingo Molnar (mingo@xxxxxxx) wrote:
>
> * Lai Jiangshan <laijs@xxxxxxxxxxxxxx> wrote:
>
> > Ingo Molnar wrote:
> > > * Jaswinder Singh Rajput <jaswinder@xxxxxxxxxx> wrote:
> > >
> > >> Good: f4c3c4cdb1de232
> > >> Bad : 1e08816af0bc345
> > >>
> > >> Config:
> > >> http://userweb.kernel.org/~jaswinder/oops_20090318/config-hpdv5-tip-bad-20090318
> > >>
> > >> oops:
> > >> http://userweb.kernel.org/~jaswinder/oops_20090318/oops_page1.jpg
> > >> http://userweb.kernel.org/~jaswinder/oops_20090318/oops_page2.jpg
> > >> http://userweb.kernel.org/~jaswinder/oops_20090318/oops_page3.jpg
> > >> http://userweb.kernel.org/~jaswinder/oops_20090318/oops_page4.jpg
> > >>
> > >> <freeze>
> > >
> > > Steve, Frederic - the crashes above are in:
> > >
> > > tracepoint_update_probe_range()
> > >
> > > in a modular kernel apparently.
> > >
> > >
> >

Jaswinder : maybe you have old modules in your /lib/modules/`uname -r`
directory which have the correct version, but the wrong module.h header ?

It does not matter if the core kernel __tracepoint section is there or
not, what really matters here is if the problematic module has a correct
module header.

Also, I wonder which module is being loaded when your crash occurs. Does
this specific module contain tracepoints or not ? Maybe we could add a
little printk() to show the current module being loaded before the crash ?

As Lai said, the loop should be skipped because begin and end will both
be the same value when there are no tracepoints in a given module. So
adding extra length check will not help anything here.

The "return if NULL" test probably works just because the specific
module header mismatch we have here happen to put this field to NULL for
some reason.

Mathieu

> > I look up the jpg files, this oops is occurred when a new module is
> > being loaded.
> >
> > tracepoint_module_notify() is added by Mathieu Desnoyers on the
> > suggestion of me.
> >
> > tracepoint_update_probe_range() and tracepoint_module_notify()
> > can not trigger this oops if the arguments are correct.
> >
> > If @begin is NULL, @end is NULL too, it's ensued by kernel/module.c.
> >
> > load_module(...):
> > mod->tracepoints = section_objs(hdr, sechdrs, secstrings,
> > "__tracepoints",
> > sizeof(*mod->tracepoints),
> > &mod->num_tracepoints);
> > static void *section_objs(...)
> > {
> > unsigned int sec = find_sec(hdr, sechdrs, secstrings, name);
> >
> > /* Section 0 has sh_addr 0 and sh_size 0. */
> > *num = sechdrs[sec].sh_size / object_size;
> > return (void *)sechdrs[sec].sh_addr;
> > }
> >
> > If the module has not "__tracepoints" section, find_sec() returns 0.
> > So I think, sechdrs[0].sh_size is corrupted.
> >
> > Is the following fix fixed the oops for you?
> > ---
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 7fa134e..2ee47ff 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -1950,6 +1950,7 @@ static noinline struct module *load_module(void __user *umod,
> > sechdrs = (void *)hdr + hdr->e_shoff;
> > secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
> > sechdrs[0].sh_addr = 0;
> > + sechdrs[0].sh_size = 0;
> >
> > for (i = 1; i < hdr->e_shnum; i++) {
> > if (sechdrs[i].sh_type != SHT_NOBITS
>
> Jaswinder, could you please try the fix from Lai, but first do:
>
> git revert ec625cb # tracepoints: dont update zero-sized tracepoint sections
> git revert 09933a1 # tracing: fix oops in tracepoint_update_probe_range()
>
> ?
>
> Ingo

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/