Re: [PATCH] zerocopy NFS updated

From: Jamie Lokier (lk@tantalophile.demon.co.uk)
Date: Fri Apr 12 2002 - 16:22:52 EST

Next message: Randy.Dunlap: "Re: /proc/stat page in and out values"
Previous message: Andries.Brouwer@cwi.nl: "RE: PROBLEM: kernel mount of initrd fails unless mke2fs uses 1024 byt e blocks"
In reply to: Andi Kleen: "Re: [PATCH] zerocopy NFS updated"
Next in thread: David S. Miller: "Re: [PATCH] zerocopy NFS updated"
Reply: David S. Miller: "Re: [PATCH] zerocopy NFS updated"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Andi Kleen wrote:
> > I wondered if regular truncate() and read() might have the same
> > problem, so I tested again and again.
> > And I realized it will occur on any local filesystems.
> > Sometime I could get partly zero filled data instead of file contents.
> >
> I don't see it as a big problem and would just leave it as it is (for NFS
> and local)
> Adding more locking would slow down read() a lot and there should be
> a good reason to take such a performance hit. Linux did this forever
> and I don't think anybody ever reported it as a bug, so we can probably
> safely assume that this behaviour (non atomic truncate) is not a problem for
> users in practice.

Ouch! I have a program which can output incorrect results if this is
the case. It may seem to use an esoteric locking strategy, but I had no
idea it was acceptable for read to return data that truncate is in the
middle of zeroing.

The program keeps a cache on disk of generated files. For each cached
object, there is a metadata file. Metadata files are text files,
written in such a way that the first line looks similar to "=12296.0"
and so does the last line, but none of the intermediate lines have that
form.

Multiple programs can access the disk cache at the same time, and must
be able to check the metadata files being written by other programs,
blocking if necessary while a cached object is being generated.

When a metadata file is being written, first it is created, then the
cached object is created and written to a related file, and finally the
metadata including the first and last marker line is written to the
metadata file.

When a program reads a metadata file, it reads as much as it can and
then checks the first and last lines are identical. If they are, the
middle of the file is valid otherwise it isn't -- perhaps the process
generating that file died, or has the metadata file locked.

This strategy is used so that there's no need to lock the file, in the
cache of a cache hit with no complications. If there's a complication
we lock with LOCK_SH and try again.

>From time to time, when a cache object is invalid, it's appropriate to
truncate the metadata file. If that were atomic, end of story.
Unfortunately I've just now heard that a read() can successfully
interleave some zeros from a parallel truncate.

If the timing is right, that means it's possible for the reading
process to see a hole of zeros in the middle of the file. The first and
last lines would be intact, and the reader would think that the whole
file is therefore valid. Bad!

This occurs if the reader copies the initial bytes from the page, then
the truncation process catches up and zeros out some bytes, but then the
reader catches up and beats the truncation process to the end of the
file.

I'm not advocating more locking in read() -- there's no need, and it is
quite important that it is fast! But I would very much appreciate an
understanding of the rules that relate reading, writing and truncating
processes. How much ordering & atomicity can I depend on? Anything at all?

cheers,
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Randy.Dunlap: "Re: /proc/stat page in and out values"
Previous message: Andries.Brouwer@cwi.nl: "RE: PROBLEM: kernel mount of initrd fails unless mke2fs uses 1024 byt e blocks"
In reply to: Andi Kleen: "Re: [PATCH] zerocopy NFS updated"
Next in thread: David S. Miller: "Re: [PATCH] zerocopy NFS updated"
Reply: David S. Miller: "Re: [PATCH] zerocopy NFS updated"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Apr 15 2002 - 22:00:21 EST