Re: file offset corruption on 32-bit machines?
From: Lennart Sorensen
Date: Tue Apr 15 2008 - 15:49:34 EST
On Tue, Apr 15, 2008 at 09:12:38PM +0200, Pavel Machek wrote:
> It does not say "repositions the offset to the random number" nor
> "under certain conditions repositions the offsets" nor "it repositions
> the offset unless you are unlucky and hit kernel race". More
> seriously, it does not contain note "not safe from multithreaded
> programs" nor "multithreaded behaviour is undefined".
And if you debug it on a 64bit system then it won't be able to do that.
So not exactly a useful thing to try, and even trying 1000 times you are
unlikely to hit it, so you can't know for sure unless you happen to be
lucky and hit it.
> So this pretty clearly is application bug.
> Really? I see an application to detecting if I'm being debugged. Try
> to hit the race 1000 times, if you hit it, you are probably not
> debugged (because debugger would be very likely to make that race hard
> to hit). Will only work on multicores, but...
If lseek not being atomic breaks your application, then your application
would be broken already. Any weird debug detection you might be able to
do using the fact is isn't atomic could I suppose be considered a kernel
bug if you think being able to do such detection is a bug. Nothing
prevents the debuger from preloading an override to the access to lseek
that uses it's own locks to make the call atomic and hence prevent such
use.
So other than that, is there any case in which lseek being not atomic
can cause an application to break if it wasn't already broken (due to
having a race condition by trying to do 2 or more seeks on the same file
handle at the same time)? If not, I think adding any kind of locking to
seek in the kernel (which would I think have to cause a slight slow
down) is a bad move. But hey that's just my opinion. :) I won't be
upset either way.
> [Plus, there's "strace seen it writing to either offset A or offset B,
> but I see the data at offset C, WTF?]
Most likely it would also be a program where you see it randomly seek to
A and write or seek to A then B then write depending on how it happens
to get scheduled when you run it. Already the program is clearly doing
something unreliable. And C only happens to vary from B if A and B
differ in the upper 32 bits of the file position.
> I'm not saying this kernel bug is likely to hit in practice. It is
> still a kernel bug.
>
> Is the slowdown of lseek worth getting rid of this minor bug? Not
> sure, probably yes.
I think a slow down is the worse choice. Adding a note to the
documentation saying that "By the way, on 32bit systems the seek call is
not atomic for 64bit file offsets, so if you happen to issue two at the
same time to the same file pointer to offsets that differ in the upper
32bits, then the result of the seek might not be either of A or B but
will contain the upper 32bits of either A or B and the lower 32bits of
ether A or B. You should of course use locking for your file access to
ensure you know where your threads end up writing so this should be a
non issue."
--
Len Sorensen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/