Re: [RESEND] [PATCH] VFS: make file->f_pos access atomic on 32bitarch

From: Andrew Morton
Date: Tue Oct 07 2008 - 13:52:21 EST


On Wed, 8 Oct 2008 03:27:44 +1100 Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:

> On Tuesday 07 October 2008 21:29, Andi Kleen wrote:
> > > Maybe cmpxchg8b is good for i486 or later x86, but i386 or other
> > > architectures that do not have similar instruction needs some locking
> > > primitive. I think lazy
> >
> > We have a cmpxchg emulation on 386. That works because only UP 386s are
> > supported, so it can be done in software.
> >
> > > seqlock is one option for making file->f_pos access atomic.
> >
> > The question is if it's the right option. At least all the common
> > operations on fds (read/write) are all writers, not readers.
>
> Common operations are read, do something, write. So seqlocks then cost
> one atomic operation, a couple of memory barriers (all noops on x86),
> and some predictable branches etc.
>
> cmpxchg based would require 2 lock ; cmpxchg8b on 32-bit. Fairly heavy.
> Also I don't think we have generic accessors to do this, so I think
> that is for another project.
>
> Anyway, I think importantly this creates some usable accessors for the
> f_pos problem. I think we actually need to touch a _lot_ of code to
> cover all f_pos accesses in the kernel, but I guess this gets the ball
> rolling.

Aneesh is proposing using using seqlocks to make percpu_counter.count
atomic on 32-bit.

This patch uses seqlocks to make file.f_pos atomic on 32-bit.

I think we should come up with a common atomic 64-bit type. We already
partly have that: atomic64_t. But for reasons which I don't recall,
atomic64_t is 64-bit-only at present.

If we generalise atomic64_t to all architectures then we can use it in
both the above applications and surely in other places in the future.

> So.. is everyone agreed that corrupting f_pos is a bad thing? (serious
> question) If so, then we should get something like this merged sooner
> rather than later.

- two threads/processes sharing the same fd

- both appending the same fd

- both hit the small race window right around the time when the file
flips over a multiple of 4G.

It's pretty damn improbable, and I think we can afford to spend the
time to get this right in 2.6.29.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/