Re: [RFC] page fault retry with NOPAGE_RETRY
From: Andrew Morton
Date: Wed Sep 20 2006 - 13:53:57 EST
On Wed, 20 Sep 2006 16:54:59 +1000
Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote:
> > It's a choice between two behaviours:
> >
> > a) get stuck in the kernel until someone kills you and
> >
> > b) fault the page in and proceed as expected.
> >
> > Option b) is better, no?
>
> That's what I don't understand... where is the actual race that can
> cause the livelock you are mentioning.
Suppose a program (let's call it "DoS") is written which sits in a loop
doing fadvise(FADV_DONTNEED) against some parts of /lib/libc.so.
Now suppose another process (let's call it "bash") tries to execute that
page. bash will take a major fault, will submit a read and then it will do
wait_on_page(). I/O completes and the page comes unlocked. Now there is a
time window in which DoS can shoot that page down again. And it's quite a
lengthy window - bash need to wake up, get scheduled, take mmap_sem, return
to do_page_fault(), redo the vma lookup and the pagetable walk and the
pagecache lookup.
That's plenty of time in which DoS can shoot down the page again.
Particularly since every other program in the machine is stuck in disk wait
in its pagefault handler ;)
All of this will cause bash to get permanently stuck in the kernel. And I
don't think it's acceptable to just allow bash to be killed off:
- one would need a statically-linked shell to be able to do this.
- if one didn't kill off DoS first, it wouldn't help. A statically
linked `ps' is also needed.
- having to kill off sshd, xinetd, httpd, etc isn't a very happy solution.
- you can't kill off /sbin/init.
So I think there's a nasty DoS here if we permit infinite retries. But
it's not just that - there might be other situations under really heavy
memory pressure where livelocks like this can occur. It's just a general
robustness-of-implementation issue.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/