Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads
From: Ross Zwisler
Date: Tue Apr 25 2017 - 18:59:53 EST
On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote:
<>
> Hum, but now thinking more about it I have hard time figuring out why write
> vs fault cannot actually still race:
>
> CPU1 - write(2) CPU2 - read fault
>
> dax_iomap_pte_fault()
> ->iomap_begin() - sees hole
> dax_iomap_rw()
> iomap_apply()
> ->iomap_begin - allocates blocks
> dax_iomap_actor()
> invalidate_inode_pages2_range()
> - there's nothing to invalidate
> grab_mapping_entry()
> - we add zero page in the radix
> tree & map it to page tables
>
> Similarly read vs write fault may end up racing in a wrong way and try to
> replace already existing exceptional entry with a hole page?
Yep, this race seems real to me, too. This seems very much like the issues
that exist when a thread is doing direct I/O. One thread is doing I/O to an
intermediate buffer (page cache for direct I/O case, zero page for us), and
the other is going around it directly to media, and they can get out of sync.
IIRC the direct I/O code looked something like:
1/ invalidate existing mappings
2/ do direct I/O to media
3/ invalidate mappings again, just in case. Should be cheap if there weren't
any conflicting faults. This makes sure any new allocations we made are
faulted in.
I guess one option would be to replicate that logic in the DAX I/O path, or we
could try and enhance our locking so page faults can't race with I/O since
both can allocate blocks.
I'm not sure, but will think on it.