[splice PATCH 2/3] tee: don't return 0 when another task drains/fillsa pipe

From: Max Kellermann
Date: Thu Apr 30 2009 - 08:11:44 EST


Cite from the tee() manual page:

"A return value of 0 means that there was no data to transfer, and it
would not make sense to block, because there are no writers connected
to the write end of the pipe"

There is however a race condition in the tee() implementation, which
violates this definition:

- do_tee() ensures that ipipe is readable and opipe is writable by
calling link_ipipe_prep() and link_opipe_prep()
- these two functions unlock the pipe after they have waited
- during this unlocked phase, there is a short window where other
tasks may drain the input pipe or fill the output pipe
- do_tee() now calls link_pipe(), which re-locks both pipes
- link_pipe() sees that it is unable to read ("i >= ipipe->nrbufs ||
opipe->nrbufs >= PIPE_BUFFERS") and breaks from the loop
- link_pipe() returns 0

Although there may be writers connected to the input pipe, tee() now
returns 0, and the caller (spuriously) assumes this is the end of the
stream.

This patch wraps the link_[io]pipe_prep() invocation in a loop within
link_pipe(), and loops until the result is reliable.

Signed-off-by: Max Kellermann <mk@xxxxxxxxxx>
---

fs/splice.c | 47 ++++++++++++++++++++++++++++++++++-------------
1 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index f07e304..96135eb 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1594,15 +1594,41 @@ static long link_pipe(struct pipe_inode_info *ipipe,
size_t len, unsigned int flags)
{
struct pipe_buffer *ibuf, *obuf;
- long ret = 0;
+ long ret;
int i = 0, nbuf;

- /*
- * Potential ABBA deadlock, work around it by ordering lock
- * grabbing by pipe info address. Otherwise two different processes
- * could deadlock (one doing tee from A -> B, the other from B -> A).
- */
- pipe_double_lock(ipipe, opipe);
+ while (1) {
+ /* wait for ipipe to become ready to read */
+ ret = link_ipipe_prep(ipipe, flags);
+ if (ret)
+ return ret;
+
+ /* wait for opipe to become ready to write */
+ ret = link_opipe_prep(opipe, flags);
+ if (ret)
+ return ret;
+
+ /*
+ * Potential ABBA deadlock, work around it by ordering
+ * lock grabbing by inode address. Otherwise two
+ * different processes could deadlock (one doing tee
+ * from A -> B, the other from B -> A).
+ */
+ pipe_double_lock(ipipe, opipe);
+
+ /* see if the tee() is still possible */
+ if ((ipipe->nrbufs > 0 || ipipe->writers == 0) &&
+ opipe->nrbufs < PIPE_BUFFERS)
+ /* yes, it is - keep the locks and end this
+ loop */
+ break;
+
+ /* no - someone has drained ipipe or has filled opipe
+ between link_[io]pipe_pre()'s lock and our lock.
+ Drop both locks and wait again. */
+ pipe_unlock(ipipe);
+ pipe_unlock(opipe);
+ }

do {
if (!opipe->readers) {
@@ -1691,12 +1717,7 @@ static long do_tee(struct file *in, struct file *out, size_t len,
* Keep going, unless we encounter an error. The ipipe/opipe
* ordering doesn't really matter.
*/
- ret = link_ipipe_prep(ipipe, flags);
- if (!ret) {
- ret = link_opipe_prep(opipe, flags);
- if (!ret)
- ret = link_pipe(ipipe, opipe, len, flags);
- }
+ ret = link_pipe(ipipe, opipe, len, flags);
}

return ret;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/