Re: Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

From: Michal Hocko
Date: Tue Aug 15 2017 - 02:55:53 EST


On Tue 15-08-17 07:51:02, Tetsuo Handa wrote:
> Michal Hocko wrote:
[...]
> > Were you able to reproduce with other filesystems?
>
> Yes, I can reproduce this problem using both xfs and ext4 on 4.11.11-200.fc25.x86_64
> on Oracle VM VirtualBox on Windows.
>
> I believe that this is not old data from disk, for I can reproduce this problem
> using newly attached /dev/sdb which has never written any data (other than data
> written by mkfs.xfs and mkfs.ext4).
>
> /dev/sdb /tmp ext4 rw,seclabel,relatime,data=ordered 0 0
>
> The garbage pattern (the last 4096 bytes) is identical for both xfs and ext4.

Thanks a lot for retesting. It is now obvious that FS doesn't have
anything to do with this issue which is in line with my investigation
from yesterday and Friday. I simply cannot see any way the file position
would be updated with a zero length write. So this must be something
else. I have double checked the MM side of the page fault I couldn't
find anything there either so this smells like a stray pte while the
underlying page got reused or something TLB related.

> > I wonder what is
> > different in my testing because I cannot reproduce this at all. Well, I
> > had to reduce the number of competing writer threads to 128 because I
> > quickly hit the trashing behavior with more of them (and 4 CPUs). I will
> > try on a larger machine.
>
> I don't think a larger machine is necessary.
> I can reproduce this problem with 8 competing writer threads on 4 CPUs.

OK, I will try with fewer writers which should make it easier to have it
run for long time without any trashing.

> I don't have native Linux environment. Maybe that is the difference.
> Can you try VMware Workstation Player or Oracle VM VirtualBox environment?

Hmm, I do not have any of those handy for use, unfortunately. I will
keep focusing on the native HW and KVM for today.

Thanks!
--
Michal Hocko
SUSE Labs