Re: Inotify problem [was Re: 2.6.13-rc6-mm1]

From: George Anzinger
Date: Thu Aug 25 2005 - 13:54:49 EST


Robert Love wrote:
On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote:

On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote:

~
dovecot: Aug 25 19:31:26 Warning: IMAP(gilly): removing wd 1022 from inotify fd 4
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1023
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1024
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1024 from inotify fd 4
dovecot: Aug 25 19:31:27 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1023 from inotify fd 4
dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024
dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024

Note the incrementing wd value even though we are removing them as we go..


What kernel are you running? The wd's should ALWAYS be incrementing, you
should never get the same wd as you did before. From your log, you are
getting the same wd (after you inotify_rm_watch it). I can reproduce
this bug on 2.6.13-rc7.

idr_get_new_above

isn't returning something above.

Also, the idr layer seems to be breaking when we pass in 1024. I can
reproduce that on my 2.6.13-rc7 system as well.


This is using latest CVS of dovecot code and with 2.6.12-rc6-mm(1|2) kernel.

Robert, John, what do you think? Is this possibly related to the oops seen in the log that I reported earlier? (Which is still showing up 2-3 times per day, btw)

There is definitely something broken here.


Jim, George-

We are seeing a problem in the idr layer. If we do idr_find(1024) when,
say, a low valued idr, like, zero, is unallocated, NULL is returned.

I think the best thing is to take idr into user space and emulate the problem usage. To this end, from the log it appears that you _might_ be moving between 0, 1 and 2 entries increasing the number each time. It also appears that the failure happens here:
add 1023
add 1024
find 1024 or is it the remove that fails? It also looks like 1024 got allocated twice. Am I reading the log correctly?

So, is it correct to assume that the tree is empty save these two at this time? I am just trying to figure out what the test program needs to do.



--
George Anzinger george@xxxxxxxxxx
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/