Re: [RFC][PATCH] pipe: Convert ring to head/tail

From: Will Deacon
Date: Sun Sep 15 2019 - 10:59:34 EST


Hi David, [+Peter]

I have a few drive-by comments on the ordering side of things. See below.

On Fri, Sep 13, 2019 at 02:00:39PM +0100, David Howells wrote:
> Convert pipes to use head and tail pointers for the buffer ring rather than
> pointer and length as the latter requires two atomic ops to update (or a
> combined op) whereas the former only requires one.
>
> (1) The head pointer is the point at which production occurs and points to
> the slot in which the next buffer will be placed. This is equivalent
> to pipe->curbuf + pipe->nrbufs.
>
> The head pointer belongs to the write-side.
>
> (2) The tail pointer is the point at which consumption occurs. It points
> to the next slot to be consumed. This is equivalent to pipe->curbuf.
>
> The tail pointer belongs to the read-side.
>
> (3) head and tail are allowed to run to UINT_MAX and wrap naturally. They
> are only masked off when the array is being accessed, e.g.:
>
> pipe->bufs[head & mask]
>
> This means that it is not necessary to have a dead slot in the ring as
> head == tail isn't ambiguous.
>
> (4) The ring is empty if head == tail.
>
> (5) The occupancy of the ring is head - tail.
>
> (6) The amount of space in the ring is (tail + pipe->buffers) - head.
>
> (7) The ring is full if head == (tail + pipe->buffers) or
> head - tail == pipe->buffers.
>
> Barriers are also inserted, wrapped in inline functions:
>
> (1) unsigned int tail = pipe_get_tail_for_write(pipe);
>
> Read the tail pointer from the write-side. This acts as a barrier to
> order the tail read before the data in the ring is overwritten. It
> also prevents the compiler from re-reading the pointer.
>
> (2) unsigned int head = pipe_get_head_for_read(pipe);
>
> Read the head pointer from the read-side. This acts as a barrier to
> order the head read before the data read. It also prevents the
> compiler from re-reading the pointer.
>
> (3) pipe_post_read_barrier(pipe, unsigned int tail);
>
> Update the tail pointer from the read-side. This acts as a barrier to
> order the pointer update after the data read. The consumed data slot
> must not be touched after this function.
>
> (4) pipe_post_write_barrier(pipe, unsigned int head);
>
> Update the head pointer from the write-side. This acts as a barrier
> to order the pointer update after the data write. The produced data
> slot should not normally be touched after this function[*].
>
> [*] Unless pipe->mutex is held.
> ---
> fs/fuse/dev.c | 23 ++-
> fs/pipe.c | 154 ++++++++++++++++----------
> fs/splice.c | 161 +++++++++++++++++----------
> include/linux/pipe_fs_i.h | 76 ++++++++++++-
> include/linux/uio.h | 4
> lib/iov_iter.c | 268 ++++++++++++++++++++++++++--------------------
> 6 files changed, 438 insertions(+), 248 deletions(-)

[...]

> diff --git a/fs/pipe.c b/fs/pipe.c
> index 8a2ab2f974bd..aa410ee0f423 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -43,10 +43,10 @@ unsigned long pipe_user_pages_hard;
> unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR;
>
> /*
> - * We use a start+len construction, which provides full use of the
> + * We use a start+len construction, which provides full use of the
> * allocated memory.
> * -- Florian Coosmann (FGC)
> - *
> + *
> * Reads with count = 0 should always return 0.
> * -- Julian Bradfield 1999-06-07.
> *
> @@ -285,10 +285,15 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
> ret = 0;
> __pipe_lock(pipe);
> for (;;) {
> - int bufs = pipe->nrbufs;
> - if (bufs) {
> - int curbuf = pipe->curbuf;
> - struct pipe_buffer *buf = pipe->bufs + curbuf;
> + /* Barrier: head belongs to the write side, so order reading
> + * the data after reading the head pointer.
> + */
> + unsigned int head = READ_ONCE(pipe->head);

Hmm, I don't understand this. Since READ_ONCE() doesn't imply a barrier,
how are you enforcing the read-read ordering in the CPU?

> @@ -104,6 +104,76 @@ struct pipe_buf_operations {
> bool (*get)(struct pipe_inode_info *, struct pipe_buffer *);
> };
>
> +/**
> + * pipe_get_tail_for_write - Get pipe buffer tail pointer for write-side use
> + * @pipe: The pipe in question
> + *
> + * Get the tail pointer for use in the write-side code. This may need to
> + * insert a barrier against the reader to order reading the tail pointer
> + * against the reader reading the buffer.

What is the purpose of saying "This may need to insert a barrier"? Can this
function be overridden or something?

> + */
> +static inline unsigned int pipe_get_tail_for_write(struct pipe_inode_info *pipe)
> +{
> + return READ_ONCE(pipe->tail);
> +}
> +
> +/**
> + * pipe_post_read_barrier - Set pipe buffer tail pointer in the read-side
> + * @pipe: The pipe in question
> + * @tail: The new tail pointer
> + *
> + * Update the tail pointer in the read-side code. This inserts a barrier
> + * against the writer such that the data write is ordered before the tail
> + * pointer update.
> + */
> +static inline void pipe_post_read_barrier(struct pipe_inode_info *pipe,
> + unsigned int tail)
> +{
> + smp_store_release(&pipe->tail, tail);
> +}
> +
> +/**
> + * pipe_get_head_for_read - Get pipe buffer head pointer for read-side use
> + * @pipe: The pipe in question
> + *
> + * Get the head pointer for use in the read-side code. This inserts a barrier
> + * against the reader such that the head pointer is read before the data it
> + * points to.
> + */
> +static inline unsigned int pipe_get_head_for_read(struct pipe_inode_info *pipe)
> +{
> + return READ_ONCE(pipe->head);
> +}

Saying that "This inserts a barrier" feels misleading, because READ_ONCE()
doesn't do that.

Will