Re: A small patch?

H.J. Lu (hjl@lucon.org)
Mon, 2 Jun 1997 14:16:04 -0700 (PDT)


>
> Date: Mon, 2 Jun 1997 13:12:46 -0700 (PDT)
> From: Linus Torvalds <torvalds@transmeta.com>
>
> H.J. can you trace it back one more function? If I know where the
> wait queue things are getting called from, and on what type of object,
> I can better think about fixing it properly.

Ok, but I did see head->next == NULL in __remove_wait_queue ().
The calling sequence is

sync_inodes () -> wait_on_inode () -> __wait_on_inode ()
-> remove_wait_queue () -> __remove_wait_queue ().

>
> Note that "next" is initialized to WAIT_QUEUE_HEAD(p), which cannot be
> NULL unless "p" is a totally invalid pointer (in which case you'd get a
> fault at the *p assignment anyway). And the "next = head" assignment
> obviously only happens if head is non-NULL, so next cannot become NULL
> there either.
>
> Yes, I do believe it is some inode corruption/race. Mark Hemment sent
> me some detailed analysis of some possible problems in the inode code,
> which I will be looking into, so I think those bugs might be
> percolating here in the form of corrupted inode wait queues...
>

Here is another patch. The problem is the wait queue is not really
"circular". The last element is always a dummy with the next field
pointed to WAIT_QUEUE_HEAD (p). __remove_wait_queue () fails
when the element to be removed is the first one on the queue.

H.J.

---
Index: ./include/linux/sched.h
===================================================================
RCS file: /home/work/cvs/linux/linux/include/linux/sched.h,v
retrieving revision 1.1.1.12
diff -u -r1.1.1.12 sched.h
--- sched.h	1997/05/24 01:39:23	1.1.1.12
+++ sched.h	1997/06/02 21:11:54
@@ -522,16 +522,24 @@
 
 extern inline void __remove_wait_queue(struct wait_queue ** p, struct wait_queue * wait)
 {
-	struct wait_queue * next = wait->next;
-	struct wait_queue * head = next;
+	if (*p == wait)
+	{
+		*p = wait->next;
+	}
+	else
+	{
+		struct wait_queue * next = wait->next;
+		struct wait_queue * head = next;
 
-	for (;;) {
-		struct wait_queue * nextlist = head->next;
-		if (nextlist == wait)
-			break;
-		head = nextlist;
+		for (;;)
+		{
+			struct wait_queue * nextlist = head->next;
+			if (nextlist == wait)
+				break;
+			head = nextlist;
+		}
+		head->next = next;
 	}
-	head->next = next;
 }
 
 extern inline void remove_wait_queue(struct wait_queue ** p, struct wait_queue * wait)