Re: Inotify problem [was Re: 2.6.13-rc6-mm1]

From: John McCutchan
Date: Thu Aug 25 2005 - 14:04:47 EST


On Thu, 2005-08-25 at 11:54 -0700, George Anzinger wrote:
> Robert Love wrote:
> > On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote:
> >
> >>On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote:
> >>
> ~
> >>>dovecot: Aug 25 19:31:26 Warning: IMAP(gilly): removing wd 1022 from inotify fd 4
> >>>dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1023
> >>>dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1024
> >>>dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1024 from inotify fd 4
> >>>dovecot: Aug 25 19:31:27 Error: IMAP(gilly): inotify_rm_watch() failed:
> >>>Invalid argument
> >>>dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1023 from inotify fd 4
> >>>dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024
> >>>dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024
> >>>
> >>>Note the incrementing wd value even though we are removing them as we go..
> >>>
> >>
> >>What kernel are you running? The wd's should ALWAYS be incrementing, you
> >>should never get the same wd as you did before. From your log, you are
> >>getting the same wd (after you inotify_rm_watch it). I can reproduce
> >>this bug on 2.6.13-rc7.
> >>
> >>idr_get_new_above
> >>
> >>isn't returning something above.
> >>
> >>Also, the idr layer seems to be breaking when we pass in 1024. I can
> >>reproduce that on my 2.6.13-rc7 system as well.
> >>
> >>
> >>>This is using latest CVS of dovecot code and with 2.6.12-rc6-mm(1|2) kernel.
> >>>
> >>>Robert, John, what do you think? Is this possibly related to the oops seen
> >>>in the log that I reported earlier? (Which is still showing up 2-3 times per
> >>>day, btw)
> >>
> >>There is definitely something broken here.
> >
> >
> > Jim, George-
> >
> > We are seeing a problem in the idr layer. If we do idr_find(1024) when,
> > say, a low valued idr, like, zero, is unallocated, NULL is returned.
>
> I think the best thing is to take idr into user space and emulate the
> problem usage. To this end, from the log it appears that you _might_ be
> moving between 0, 1 and 2 entries increasing the number each time. It
> also appears that the failure happens here:
> add 1023
> add 1024
> find 1024 or is it the remove that fails? It also looks like 1024 got
> allocated twice. Am I reading the log correctly?

You are reading the log correctly. There are two bugs. One is that if we
pass X to idr_get_new_above, it can return X again (doesn't ever seem to
return < X). The other problem is that the find fails on 1024 (and 2048
if we skip 1024).

>
> So, is it correct to assume that the tree is empty save these two at
> this time? I am just trying to figure out what the test program needs
> to do.

Yes that is the exact scenario. Only 2 id's are used at any given time,
and once we hit 1024 things break. This doesn't happen when the tree is
not empty.

Thanks for looking at this!
--
John McCutchan <ttb@xxxxxxxxxxxxxxxx>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/