Re: Kernel bug: mm/rmap.c:483 and related {now 2.6.8}

From: Hugh Dickins
Date: Mon Jan 31 2005 - 15:53:22 EST


On Mon, 31 Jan 2005, Jayant Roplekar wrote:
> I had reported a week ago that with a custom 2.6.10 (tainted with ndiswrapper
> a windows driver loader) I got error above and that I could not reproduce it

Yes, at that time you had a page being allocated while both page_count
and page_mapcount were 1, strongly suggesting it was already in use.
And not matching any other Bad page warnings or rmap.c BUGs reported.

> after Hugh Dickins patch to 2.6.10. Now I have sudden rash of similar errors
> and lock ups with 2.6.8.1-10mdk (my daily use kernel) in the last couple of
> days. This *does not* use Hugh's patch.

> I ran memtest overnight with 19 passes and no errors.

Thanks for giving that a thorough run, memory seems exonerated... yet I
don't trust this machine at all: have you tried manufacturer diagnostics?

> I have VIA motherboard and a matrox AGP card. Following might be relevant
> CONFIG_AGP_VIA=m
> CONFIG_DRM=y
> CONFIG_DRM_MGA=m

> #### Following message appears three times in syslog at (Jan 29 08:27:03)
> #### and at (Jan 29 08:27:08) with identical addresses and
> #### then the machine was rebooted
> Jan 29 08:25:02 Bad page state at prep_new_page ('X', page c1251ae0)
> Jan 29 08:25:02 flags:0x20000004 mapping:00006a00 mapcount:0 count:0

Again, something I've not seen reported before: mapping:00006a00, when
mapping should be NULL (or at least a pointer into kernel memory). You
say message reappeared twice with identical addresses: was that mapping
the same each time? bad_page would reset it to NULL each time non-NULL
found there. But since that non-NULL doesn't look at all like a kernel
address, it seems something is scribbling on your mem_map array of
struct pages: I've no idea what.

Hugh

> #### After the reboot there were no issues. On the next boot half an hour
> #### after the boot following stuff appears in the log
>
> Jan 30 08:33:38 kernel BUG at mm/rmap.c:407!
> Jan 30 08:33:38 invalid operand: 0000 [#1]
> Jan 30 08:33:38 EIP: 0060:[page_remove_rmap+68/112] Not tainted VLI
> Jan 30 08:33:38 eax: 00000000 ebx: 00104000 ecx: c039f1d0 edx: c1327000
> Jan 30 08:33:38 esi: cda49530 edi: 0032b000 ebp: cd02bb78 esp: cd02bb78
> Jan 30 08:33:38 Process net_applet (pid: 5428, threadinfo=cd02a000 task=d125da30)

> Jan 30 08:53:37 kernel BUG at mm/rmap.c:407!
> Jan 30 08:53:37 invalid operand: 0000 [#2]
> Jan 30 08:53:37 EIP: 0060:[page_remove_rmap+68/112] Not tainted VLI
> Jan 30 08:53:37 eax: 00000000 ebx: 0031c000 ecx: c039f1d0 edx: c1191a00
> Jan 30 08:53:37 esi: cd5edc70 edi: 0038d000 ebp: d1475e74 esp: d1475e74
> Jan 30 08:53:37 Process pixie (pid: 4271, threadinfo=d1474000 task=d1471480)

> Jan 30 08:53:37 Bad page state at prep_new_page ('kjournald', page c1191a00)
> Jan 30 08:53:37 flags:0x20000014 mapping:00000000 mapcount:0 count:0

> #### The machine was up for 4 hours after the above, then there was
> #### following error which I think is due to above, then I had to do a power
> #### button due to lockup
>
> Jan 30 13:32:01 Assertion failure in journal_dirty_data() at
> fs/jbd/transaction.c:985: "jh->b_transaction == journal->j_committing_transaction"
> Jan 30 13:32:01 kernel BUG at fs/jbd/transaction.c:985!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/