On 03/10, Andrew Vagin wrote:Yes. We shouldn't call it at all, even if PF_USER.On 03/10/2011 05:28 PM, Oleg Nesterov wrote:Why?(add cc's)I thought for a bit more...
Subject: x86/mm: handle mm_fault_error() in kernel spaceWhy? I don't understand this part.
From: Andrey Vagin<avagin@xxxxxxxxxx>
mm_fault_error() should not execute oom-killer, if page fault occurs in
kernel space. E.g. in copy_from_user/copy_to_user.
I think we should not execute out_of_memory() in this case at all,
Btw, this may be true, but this is irrelevant. If we shouldn't call
out_of_memory() in this case, then we shouldn't call it at all, even
if PF_USER.
Andrew, I think you missed the point. Or I misunderstood. Or both ;)When does handle_mm_fault() return VM_FAULT_OOM? It may be if current task is killed or a file system returns VM_FAULT_OOM. All allocations of memory (alloc_pages(), kmalloc()) calls out_of_memory() themselves, wait it and try allocate memory again.
because when we return from page fault, we execute the same command andSure. And the same happens if the fault occurs in user-space and
provoke the "same" page fault again
handle_mm_fault() returns VM_FAULT_OOM. This is correct.
We wait memory in __alloc_pages_may_oom(). I think now handle_mm_fault() returns VM_FAULT_OOM only if OOM-killer killed current task.Now pls think what is theThe difference is that oom-killer should free the memory in between.
difference between these page faults?
_OR_ it can decide to kill us, and _this_ case should be fixed.
The kernel code may do it in one page fault...It has been generated from oneThe program does nothing, but the kernel does.
place and the program do nothing between those.
You don't understand. alloc_page calls oom-killer himself, then try allocate memory again. Pls look at __alloc_pages_slowpath(). __alloc_pages_slowpat may fail if order > 3 || gfp_mask & __GFP_NOFAIL || test_thread_flag(TIF_MEMDIE)If handle_mm_fault() returnsWhy do you think the current task should be killed? In this case we
VM_FAULT_OOM and pagefault occurred from userspace, the current task
should be killed by SIGKILL,
do not need oom-killer at all, we could always kill the caller of
alloc_page/etc.
Suppose that the innocent task (which doesn't use a lot of memory) calls,alloc_page() calls oom-killer, which kill a memory hog.
say, sys_read() into the unpopulated memory. Suppose that alloc_page()
fails because we have a memory hog which tries to eat all memory.
Do you think the innocent task should be punished in this case?Probaly you think that oom-killer is called from mm_fault_error() only. It's incorrect.
Assuming that mm/oom_kill.c:out_of_memory() is correct, it should find
the memory hog and kill it, after that we can retry the fault in a hope
we have more memory.
PF_USER is not relevant. If the application does mmap() and then
accesses this memory, memcpy() or copy_from_user() should follow the
same logic wrt OOM.
If handle_mm_fault()Only if current was killed by oom-killer, that is why my patch checks
returns VM_FAULT_OOM and pagefault occurred in kernel space, we should
execute no_context() to return from syscall.
fatal_signal_pending().
Also note that out_of_memory is usually called from handle_mm_fault() ->And note that pagefault_out_of_memory() checks TIF_MEMDIE and calls
... -> alloc_page()->...->out_of_memory().
schedule_timeout_uninterruptible(). This is exactly because if we are
_not_ killed by oom-killer, we are going to retry later once the killed
task frees the memory.
See?
Oleg.