Re: Lockless file reading

From: Jamie Lokier
Date: Thu Aug 28 2003 - 07:02:40 EST


Timo Sirainen wrote:
> Reorder on per-byte basis? Per-page/block would still be acceptable.

The _CPUs_ can reorder on a per-byte basis, on a multiprocessor. It
has nothing to do with the kernel.

> Anyway, the alternative would be shared mmap()ed file. You can trust
> 32bit memory updates to be atomic, right?

Atomic yes (if aligned), weakly ordered though.

> Or what about: write("12"), fsync(), write("12")? Is it still possible
> for read() to return "1x1x"?

Yes it is possible, in principle.

This is what happens: the writing CPU writes "1", "2" in order. The
reading CPU reads bytes 1, 3 before the writes are
observed and bytes 0, 2 after. The CPU can do this.

The kernel doesn't prevent this, because it doesn't hold any exclusive
lock between the writer and reader during the data transfers.
Furthermore the kernel transfers a byte at a time, on some
architecture (including x86), if any buffer is not aligned.

It is very unlikely to return "1x1x", but you should know it is not
architecturally impossible. Given your incomplete knowledge of every
architectural quirk, it is more likely to occur than an MD5 collision.

On 32-bit aligned atomicity: if the block of 4 bytes is aligned in
memory, then with shared mmap you will only see whole words
transferred because all (current) Linux SMP-capable architectures
offer 32 bit atomicity. It is not a very nice assumption: it doesn't
hold for 16 bits or 64 bits, and may not hold for a future 64 bit
architecture. Keep in mind the words stay whole, but multiple words
are read out of order.

With read() and write(), even aligned 32-bit words don't work. On
some 64-bit architectures, a 32-bit word read() will be issued as 4
byte reads at the machine level, with weak memory ordering. (I'm
reading Alpha and IA64 __copy_user right now, and they both do that).

Enjoy :)
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/