Re: page->mapping == 0

From: Linus Torvalds (torvalds@transmeta.com)
Date: Sun Oct 29 2000 - 14:33:19 EST


On Sun, 29 Oct 2000, Linus Torvalds wrote:
>
> Making it policy that we have to re-test page->mapping after aquireing the
> page lock might be the simplest fix for 2.4.x. It still means that we
> might end up allowing people to have a "bad" page in the VM space due to
> the "test->insert" race condition, but it woul dmake that event pretty
> much a harmless bug (and thus move it to the "beauty wart - to be fixed
> later" category).

I'd like to just re-iterate this point: re-testing "page->mapping" fixes
the oops, but is not a full fix for the conceptual problem.

The problem with just re-testing "page->mapping" is that you still have a
nasty potential race where you insert a (bogus) page into the VM space of
a process instead of giving a SIGBUS/SIGSEGV.

Now, I don't think this is really a valid usage pattern, so it's most
likely to be a result of a buggy application, but I can imagine having
some strange kind of user-space VM memory management scheme that depends
on SIGBUS to maintain a file length. I've never heard of such a thing, but
I could imagine somebody doing some kind of persistent data object store
in user space this way.

Does anybody actually know of an application that does something like
this? Because I'm more and more inclined to just going with the half-fix
for now. It would at least guarantee internal kernel data consistency (and
no oopses, of course).

Don't get me wrong - we need to clean this part up, but as far as I can
tell we have never done this "right", so in that sense it's not a new
2.4.x bug and it can't break existing applications.

[ In fact, with the current ordering inside "vmtruncate()" of doing the
  "truncate_inode_pages()" thing before doing the "vmtruncate_list()", I
  have this suspicion that the race might even be impossible to trigger.
  Even when the race "happens" in kernel space, we will end up unmapping
  the page immediately afterwards, and the only effect as far as the user
  is concerned is the disappearance of the SIGBUS.

  And the "disappearing SIGBUS" is actually explainable with a
  _user_level_ race: in order to get the kernel race at all, user level
  itself must have been inherently racing on the truncate/access, and
  depending on which one happened "first" you'd have lost the SIGBUS and
  the data you wrote anyway.

  So it may actually be that we can honestly claim that the half-fix is
  actually a proper fix. I would have to look a lot closer at the issue to
  be able to guarantee this, though. Comments? Anybody? ]

Al?

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:25 EST