Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64

From: Lennert Buytenhek
Date: Fri Jan 29 2021 - 06:55:08 EST


On Fri, Jan 29, 2021 at 07:37:03AM +0200, Lennert Buytenhek wrote:

> > > > One open question is whether IORING_OP_GETDENTS64 should be more like
> > > > pread(2) and allow passing in a starting offset to read from the
> > > > directory from. (This would require some more surgery in fs/readdir.c.)
> > >
> > > Since directories are seekable this ought to work.
> > > Modulo horrid issues with 32bit file offsets.
> >
> > The incremental patch below does this. (It doesn't apply cleanly on
> > top of v1 of the IORING_OP_GETDENTS patch as I have other changes in
> > my tree -- I'm including it just to illustrate the changes that would
> > make this work.)
> >
> > This change seems to work, and makes IORING_OP_GETDENTS take an
> > explicitly specified directory offset (instead of using the file's
> > ->f_pos), making it more like pread(2) [...]
>
> ...but the fact that this patch avoids taking file->f_pos_lock (as this
> proposed version of IORING_OP_GETDENTS avoids using file->f_pos) means
> that ->iterate_shared() can then be called concurrently on the same
> struct file, which breaks the mutual exclusion guarantees provided here.
>
> If possible, I'd like to keep the ability to explicitly pass in a
> directory offset to start reading from into IORING_OP_GETDENTS, so
> perhaps we can simply satisfy the mutual exclusion requirement by
> taking ->f_pos_lock by hand -- but then I do need to check that it's OK
> for ->iterate{,_shared}() to be called successively with discontinuous
> offsets without ->llseek() being called in between.
>
> (If that's not OK, then we can either have IORING_OP_GETDENTS just
> always start reading at ->f_pos like before (which might then require
> adding a IORING_OP_GETDENTS2 at some point in the future if we'll
> ever want to change that), or we could have IORING_OP_GETDENTS itself
> call ->llseek() for now whenever necessary.)

Having IORING_OP_GETDENTS seek to sqe->off if needed seems easy
enough to implement, and it simplifies the other code as well, so
I'll send out a v2 RFC shortly that does this.