Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

From: Tetsuo Handa
Date: Fri Aug 11 2017 - 11:46:27 EST


Michal Hocko wrote:
> On Fri 11-08-17 16:54:36, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Fri 11-08-17 11:28:52, Tetsuo Handa wrote:
> > > > Will you explain the mechanism why random values are written instead of zeros
> > > > so that this patch can actually fix the race problem?
> > >
> > > I am not sure what you mean here. Were you able to see a write with an
> > > unexpected content?
> >
> > Yes. See http://lkml.kernel.org/r/201708072228.FAJ09347.tOOVOFFQJSHMFL@xxxxxxxxxxxxxxxxxxx .
>
> Ahh, I've missed that random part of your output. That is really strange
> because AFAICS the oom reaper shouldn't really interact here. We are
> only unmapping anonymous memory and even if a refault slips through we
> should always get zeros.
>
> Your test case doesn't mmap MAP_PRIVATE of a file so we shouldn't even
> get any uninitialized data from a file by missing CoWed content. The
> only possible explanations would be that a page fault returned a
> non-zero data which would be a bug on its own or that a file write
> extend the file without actually writing to it which smells like a fs
> bug to me.

As I wrote at http://lkml.kernel.org/r/201708112053.FIG52141.tHJSOQFLOFMFOV@xxxxxxxxxxxxxxxxxxx ,
I don't think it is a fs bug.

>
> Anyway I wasn't able to reproduce this and I was running your usecase
> in the loop for quite some time (with xfs storage). How reproducible
> is this? If you can reproduce easily can you simply comment out
> unmap_page_range in __oom_reap_task_mm and see if that makes any change
> just to be sure that the oom reaper can be ruled out?

Frequency of writing not-zero values is lower than frequency of writing zero values.
But if I comment out unmap_page_range() in __oom_reap_task_mm(), I can't even
reproduce writing zero values. As far as I tested, writing not-zero values occurs
only if the OOM reaper is involved.