On Mon, Oct 23, 2000 at 02:20:17PM -0700, H. Peter Anvin wrote:
> Hi there,
>
> I wanted to let you know that I was trying 2.2.18-pre17 on
> hera.kernel.org, a uniprocessor with an SMP motherboard. After about six
> hours, it went catatonic, responding to pings and TCP SYNs but not doing
> anything that required user space.
>
> On the console, it had multiple copies of the message:
>
> "Kernel panic: LRU list corrupted" [fs/buffer.c:438]
>
> ... but no register dump.
>
> I have fallen back to 2.2.17 and it has run stably for a few days now.
I found one bug that can generate that kind of corruption and lockups and it's
in 2.2.17 too (and it was in the 2.2.18pre*aa kernels too even if for some
VM change I did it was extremely hard to reproduce there)
I fixed it in 2.2.18pre17aa1 (I suggest to give a try to 2.2.18pre17aa1 btw).
I also included the fix in a new VM-global patch against vanilla 2.2.18pre17
(the VM-global patch is available as a single patch inside 2.2.18pre17aa1/
directory too but I have to maintain a separate version of it against clean
2.2.18pre17 due silly rejects that I can't avoid)
(the way I could reproduce the hang with 2.2.18pre17aa1 is been while testing
LVM snapshotting because while a LV is under snapshot [as also while using
raid5] WRITEA will block too)
Vanilla 2.2.18pre17 can reproduce such bug one order of magnitude more easily
since it blocks there all the time, and I had to partly change that blocking
behaviour in my tree for performance reasons. That's why people reported that
VM-global patch "cured" the problem. But really it had a small window for that
bug too.
So now I ported the strict fix to 2.2.18pre17 clean. It's untested but I'm
almost sure it will fix the problem there too.
--- 2.2.18pre17/fs/buffer.c.~1~ Tue Sep 5 02:28:47 2000
+++ 2.2.18pre17/fs/buffer.c Wed Oct 25 04:38:34 2000
@@ -1468,10 +1468,13 @@
#define BUFFER_BUSY_BITS ((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected))
#define buffer_busy(bh) ((bh)->b_count || ((bh)->b_state & BUFFER_BUSY_BITS))
-static int sync_page_buffers(struct buffer_head *bh, int wait)
+static int sync_page_buffers(struct page * page, int wait)
{
+ struct buffer_head * bh = page->buffers;
struct buffer_head * tmp = bh;
+ page->buffers = NULL;
+
do {
struct buffer_head *p = tmp;
tmp = tmp->b_this_page;
@@ -1482,6 +1485,8 @@
ll_rw_block(WRITE, 1, &p);
} while (tmp != bh);
+ page->buffers = bh;
+
do {
struct buffer_head *p = tmp;
tmp = tmp->b_this_page;
@@ -1533,7 +1538,7 @@
busy:
too_many = (nr_buffers * bdf_prm.b_un.nfract/100);
- if (!sync_page_buffers(bh, wait)) {
+ if (!sync_page_buffers(page_map, wait)) {
/* If a high percentage of the buffers are dirty,
* wake kflushd
The above strict version of the fix is downloadable from here too:
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:15 EST