[RFC] readdir mess

From: Al Viro
Date: Tue Aug 12 2008 - 02:22:58 EST


When getdents() had been introduced in 1.3.0, ->readdir() has grown
a callback and became an iterator. The type of that iterator had been
a Bloody Bad Idea(tm); it returns a value that gets reduced to one bit
by all callers. Basically it's "do we stop or do we move to the next
entry?". 0 means "go on" for all ->readdir() instances, negative - "stop".
Positives are treated inconsistently.

Note that it can not be used to return an error. In particular,
the callback of getdents(2) returns "stop" when we run out of buffer.
If any instance of ->readdir() would ever treat that as an error to
pass back to caller, it would break spectaculary on the first ls(1) on
a directory large enough to require more than one getdents(2) call.

Anything that wants to pass errors back to caller of ->readdir()
has to store it in whatever struct we are passing to the callback.

That had lead to two recurring classes of bugs:
1) something_readdir() treats "callback returned negative" as "return
than negative". Breaks pretty fast.
2) whatever_filldir() happily returns some error value and expects it
to be seen by caller of ->readdir(). Somehow. Doesn't work. We do
have such bugs right now.

In addition, we have other kinds of crap going on around ->readdir()
(the most popular being "whaddya mean, somebody might've done lseek(2) and
set offset to hell knows what???"), but those are separate story.

An obvious way to deal with that would be to have filldir_t return
bool; true => stop, false => go on. However, that's going to cause silent
breakage of out-of-tree filesystems. Even though e.g. ext2 had always
treated any non-zero return value of filldir as "stop", we'd grown filesystems
that check for return value < 0 instead. Having it return true (i.e. 1)
will break all of those. Note that generic callbacks currently return
negative values for "stop", so they work with those filesystems right now.

FWIW, I'm seriously tempted to go for the following variant:
new __bitwise type with two values - READDIR_STOP and READDIR_CONTINUE,
-1 and 0 resp. The type of filldir_t would become filldir_res_t (*)(......),
all existing instances of ->readdir() would keep working and sparse
would catch all mismatches.

Another variant is to change filldir_t in visible incompatible way
that would have bool as return value and let readdir instances adjust or
refuse to compile; maintainers of out-of-tree code would presumably notice
that, so we would just have to document the calling conventions and say
that checking for < 0 doesn't work anymore.

Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/