Re: O_DIRECT question

From: Michael Tokarev
Date: Sat Jan 13 2007 - 15:27:44 EST


Bill Davidsen wrote:
> Linus Torvalds wrote:
>>
[]
>> But what O_DIRECT does right now is _not_ really sensible, and the
>> O_DIRECT propeller-heads seem to have some problem even admitting that
>> there _is_ a problem, because they don't care.
>
> You say that as if it were a failing. Currently if you mix access via
> O_DIRECT and non-DIRECT you can get unexpected results. You can screw
> yourself, mangle your data, or have no problems at all if you avoid
> trying to access the same bytes in multiple ways. There are lots of ways
> to get or write stale data, not all involve O_DIRECT in any way, and the
> people actually using O_DIRECT now are managing very well.
>
> I don't regard it as a system failing that I am allowed to shoot myself
> in the foot, it's one of the benefits of Linux over Windows. Using
> O_DIRECT now is like being your own lawyer, room for both creativity and
> serious error. But what's there appears portable, which is important as
> well.

If I got it right (and please someone tell me if I *really* got it right!),
the problem is elsewhere.

Suppose you have a filesystem, not at all related to databases and stuff.
Your usual root filesystem, with your /etc/ /var and so on directories.

Some time ago you edited /etc/shadow, updating it by writing new file and
renaming it to proper place. So you have that old content of your shadow
file (now deleted) somewhere on the disk, but not accessible from the
filesystem.

Now, a bad guy deliberately tries to open some file on this filesystem, using
O_DIRECT flag, ftruncates() it to some huge size (or does seek+write), and
at the same time tries to use O_DIRECT read of the data.

Due to all the races etc, it is possible for him to read that old content of
/etc/shadow file you've deleted before.

> I do have one thought, WRT reading uninitialized disk data. I would hope
> that sparse files are handled right, and that when doing a write with
> O_DIRECT the metadata is not updated until the write is done.

"hope that sparse files are handled right" is a high hope. Exactly because
this very place IS racy.

Again, *IF* I got it correctly.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/