Ok, but I did see head->next == NULL in __remove_wait_queue ().
The calling sequence is
sync_inodes () -> wait_on_inode () -> __wait_on_inode ()
-> remove_wait_queue () -> __remove_wait_queue ().
>
> Note that "next" is initialized to WAIT_QUEUE_HEAD(p), which cannot be
> NULL unless "p" is a totally invalid pointer (in which case you'd get a
> fault at the *p assignment anyway). And the "next = head" assignment
> obviously only happens if head is non-NULL, so next cannot become NULL
> there either.
>
> Yes, I do believe it is some inode corruption/race. Mark Hemment sent
> me some detailed analysis of some possible problems in the inode code,
> which I will be looking into, so I think those bugs might be
> percolating here in the form of corrupted inode wait queues...
>
Here is another patch. The problem is the wait queue is not really
"circular". The last element is always a dummy with the next field
pointed to WAIT_QUEUE_HEAD (p). __remove_wait_queue () fails
when the element to be removed is the first one on the queue.
H.J.
--- Index: ./include/linux/sched.h =================================================================== RCS file: /home/work/cvs/linux/linux/include/linux/sched.h,v retrieving revision 1.1.1.12 diff -u -r1.1.1.12 sched.h --- sched.h 1997/05/24 01:39:23 1.1.1.12 +++ sched.h 1997/06/02 21:11:54 @@ -522,16 +522,24 @@ extern inline void __remove_wait_queue(struct wait_queue ** p, struct wait_queue * wait) { - struct wait_queue * next = wait->next; - struct wait_queue * head = next; + if (*p == wait) + { + *p = wait->next; + } + else + { + struct wait_queue * next = wait->next; + struct wait_queue * head = next; - for (;;) { - struct wait_queue * nextlist = head->next; - if (nextlist == wait) - break; - head = nextlist; + for (;;) + { + struct wait_queue * nextlist = head->next; + if (nextlist == wait) + break; + head = nextlist; + } + head->next = next; } - head->next = next; } extern inline void remove_wait_queue(struct wait_queue ** p, struct wait_queue * wait)