Re: 2.6.18-rc4-mm1

From: Michal Piotrowski
Date: Mon Aug 14 2006 - 18:05:23 EST


Dave Jones wrote:
> On Mon, Aug 14, 2006 at 11:13:14PM +0200, Michal Piotrowski wrote:
> > On 14/08/06, Dave Jones <davej@xxxxxxxxxx> wrote:
> > > > Aug 14 22:30:09 ltg01-fedora kernel: general protection fault: 0000 [#1]
> > > > Aug 14 22:30:09 ltg01-fedora kernel: 4K_STACKS PREEMPT SMP
> > > > Aug 14 22:30:09 ltg01-fedora kernel: last sysfs file: /devices/platform/i2c-9191/9191-0290/temp2_input
> > > > Aug 14 22:30:09 ltg01-fedora kernel: CPU: 0
> > > > Aug 14 22:30:09 ltg01-fedora kernel: EIP: 0060:[<c0205249>] Not tainted VLI
> > > > Aug 14 22:30:09 ltg01-fedora kernel: EFLAGS: 00010246 (2.6.18-rc4-mm1 #101)
> > > > Aug 14 22:30:09 ltg01-fedora kernel: EIP is at __list_add+0x3d/0x7a
> > > > Aug 14 22:30:09 ltg01-fedora kernel: eax: 00000000 ebx: c0670a80 ecx: c038d4dc edx: 00000000
> > > > Aug 14 22:30:09 ltg01-fedora kernel: esi: ffffffff edi: f50ebee8 ebp: f50ebed0 esp: f50ebec4
> > >
> > > __list_add will still be dereferencing prev->next, so you should see exactly
> > > the same gpf. Note that you're not triggering the BUG()'s here, you're hitting
> > > some other fault just walking the list.
> >
> > How can I debug this?
>
> Not sure. I'm somewhat puzzled.
> Disassembling the Code: of your oops shows that we were trying to dereference esi,
> which was -1 for some bizarre reason. (my objdump really hated disassembling that
> function, but I think thats my tools rather than breakage in the oops).
>
> Question is how did that list member get to be -1 ?
> One pie-in-the-sky possibility is that we've corrupted memory recently, and
> this link-list manipulation just stumbled across it. Note that the last file
> opened before we blew up was reading i2c.

Or cpufreq

Aug 14 15:35:10 ltg01-fedora kernel: last sysfs file: /devices/platform/i2c-9191/9191-0290/temp2_input

Aug 14 15:39:12 ltg01-fedora kernel: last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed

Aug 14 22:30:09 ltg01-fedora kernel: last sysfs file: /devices/platform/i2c-9191/9191-0290/temp2_input

I haven't seen this bug on previous (2006-08-08) mm snapshot. Here are new i2c patches
apple-motion-sensor-driver-kconfig-fix.patch
kill-include-linux-configh.patch - it's my patch, I'm testing it for tree days.


Aug 14 15:35:11 ltg01-fedora kernel: <6>note: firefox-bin[2210] exited with preempt_count 1

Aug 14 15:39:13 ltg01-fedora kernel: <6>note: sendmail[1719] exited with preempt_count 1

Aug 14 22:30:10 ltg01-fedora kernel: <6>note: thunderbird-bin[2288] exited with preempt_count 1

Is it possible that this is a problem with net stack?

Binary search will take ages - I don't know how to reproduce this bug.

> Can you try and reproduce this
> (if you can reproduce it at all) without the sensors stuff loaded ?
>
> It's a long-shot, but without further clues, I'm stabbing in the dark.
>
> Dave
>

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/