Re: Fw: crash on x86_64 - mm related?

From: Kai Makisara
Date: Wed Jan 04 2006 - 16:46:35 EST


On Wed, 4 Jan 2006, Ryan Richter wrote:

> On Tue, Dec 27, 2005 at 06:21:39PM +0200, Kai Makisara wrote:
...
> > I don't have any new ideas but I will tell you what I have tried. In order
> > to get more information about what is happening, I inserted the patch at
> > the end of this message to my kernel. The purpose of the patch was to
> > print something about the page mappings (prt_pages) and to catch possible
> > double freeing from st earlier (clear page pointers).
> >
> > Running dump gave me the following output:
>
> Here's what I got:
>
>
> st: page attributes before page_release 8
> 0: flags:0x060000000000006c mapping:ffff810102bba238 mapcount:2 count:4 pfn:1392956
> 1: flags:0x060000000000006c mapping:ffff810102bba238 mapcount:2 count:4 pfn:1397945
> 2: flags:0x060000000000006c mapping:ffff810102bba238 mapcount:2 count:4 pfn:1537473
> 3: flags:0x060000000000006c mapping:ffff810102bba238 mapcount:2 count:4 pfn:1398161
> 4: flags:0x060000000000006c mapping:ffff810102bba238 mapcount:2 count:4 pfn:1398261
> 5: flags:0x060000000000006c mapping:ffff810102bba238 mapcount:2 count:4 pfn:1392778
> 6: flags:0x060000000000006c mapping:ffff810102bba238 mapcount:2 count:4 pfn:1402858
> 7: flags:0x060000000000006c mapping:ffff810102bba238 mapcount:2 count:4 pfn:1396799

This looks similar to my output except that the page flags are
0x060000000000006c in your case whereas I have 0x010000000000006c.

I have run amanda with a patch modified so that it prints results for 32
consecutive writes each 10000 writes (if ((ctr++ % 10000) < 32)). This
showed that taper (at least in my case) is actually using a buffer of 20 x
32 kB = 640 kB. In all writes the mapping, counts, and flags were the
same. Instead of remapping a single 32 kB buffer, st is cyclically mapping
20 32 kB buffers.

> Bad page state at free_hot_cold_page (in process 'taper', page ffff81000427ff60)
> flags:0x010000000000000c mapping:ffff810102bbaaf0 mapcount:2 count:0

The mapping and flags have changed. The flags have lost one bit at the
high end and the PG_lru and PG_active flags.

The patch sets the page pointer to NULL after page_cache_release() is
called in st. This means that st is not calling page_cache_release() more
than once for each page returned by get_user_pages(). The page state
seems to get corrupted somewhere.

I have to continue thinking.

--
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/