Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

From: Eric W. Biederman
Date: Tue Jun 05 2018 - 13:47:48 EST

"'tj@xxxxxxxxxx'" <tj@xxxxxxxxxx> writes:

> Hello,
> On Tue, Jun 05, 2018 at 10:31:36AM -0500, Eric W. Biederman wrote:
>> What I have above is not the clearest, and in fact the logic could be
>> better.
>> The fundamental challenge is because hash collisions are possible a file
>> offset does not hold complete position information in a directory.
>> So the kernfs node that is to be read/displayed next is saved in the
>> struct file. The it is tested if the saved kernfs node is usable
>> for finding the location in the directory. Several things may have
>> gone wrong.
>> - Someone may have called seekdir.
>> - The saved kernfs node may have been renamed.
>> - The saved kernfs node may have been moved to a different directory in
>> kernfs.
>> - the saved kernfs node may have been deleted.
>> If any of those are true the code needs to do the rbtree lookup.
> So, given that the whole thing is protected by a mutex which protects
> modifications, it could be an option to simply keep track of who's
> iterating what and shift them on removals. IOW, always keep cursor
> pointing to the next thing to visit and if that gets removed shift the
> cursor to the next one.

Yes. We could.

The primary case we have to worry about is someone using seekdir,
and for that we always need the rbtree lookup. For seekdir
we could invalidate the saved entry and make things simpler
that way.

We could add list_head to the kernfs_node and create:
struct kernfs_dir_file {
struct list_head entry;
struct kernfs_node *kn;
And point at that from struct file->private_data.

I don't know if it would be worth the trouble to do that over a quick
check to make certain the kernfs_node is what it is expected to be.
But that is an option.

Part of the pain of supporting seekdir is that the offset we expose
to userspace in has to be 32bit to support 32bit userspace applications.
Which unfortunately is small enough that if nothing else a name
collision can be brute forced. So we can not avoid handling collisions.

Sigh, I have found another issue with kernfs_fop_readdir.

We are not currently protecting file->private_data with the kernfs_mutex
or any other kind of serialization. Which means if two processes are
calling readdir on the same file descriptor we might get unpredictable

It doesn't look too bad and easy enough to fix, but definitely something
to be watchful of.