Re: [rfc patch] mm: protect set_page_dirty() from ongoing truncation
From: Andrew Morton
Date: Wed Nov 26 2014 - 17:00:14 EST
On Tue, 25 Nov 2014 14:48:41 -0500 Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> Tejun, while reviewing the code, spotted the following race condition
> between the dirtying and truncation of a page:
>
> __set_page_dirty_nobuffers() __delete_from_page_cache()
> if (TestSetPageDirty(page))
> page->mapping = NULL
> if (PageDirty())
> dec_zone_page_state(page, NR_FILE_DIRTY);
> dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
> if (page->mapping)
> account_page_dirtied(page)
> __inc_zone_page_state(page, NR_FILE_DIRTY);
> __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
>
> which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.
>
> Dirtiers usually lock out truncation, either by holding the page lock
> directly, or in case of zap_pte_range(), by pinning the mapcount with
> the page table lock held. The notable exception to this rule, though,
> is do_wp_page(), for which this race exists. However, do_wp_page()
> already waits for a locked page to unlock before setting the dirty
> bit, in order to prevent a race where clear_page_dirty() misses the
> page bit in the presence of dirty ptes. Upgrade that wait to a fully
> locked set_page_dirty() to also cover the situation explained above.
>
> Afterwards, the code in set_page_dirty() dealing with a truncation
> race is no longer needed. Remove it.
>
> Reported-by: Tejun Heo <tj@xxxxxxxxxx>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> ---
> mm/memory.c | 11 ++---------
> mm/page-writeback.c | 33 ++++++++++++---------------------
> 2 files changed, 14 insertions(+), 30 deletions(-)
>
> It is unfortunate to hold the page lock while balancing dirty pages,
> but I don't see what else would protect mapping at that point.
Yes.
I'm a bit surprised that calling balance_dirty_pages() under
lock_page() doesn't just go and deadlock. Memory fails me.
And yes, often the only thing which protects the address_space is
lock_page().
set_page_dirty_balance() and balance_dirty_pages() don't actually need
the address_space - they just use it to get at the backing_dev_info.
So perhaps what we could do here is the change those functions to take
a bdi directly, then change do_wp_page() to do something like
lock_page(dirty_page);
bdi = page->mapping->backing_dev_info;
need_balance = set_page_dirty2(bdi);
unlock_page(page);
if (need_balance)
balance_dirty_pages_ratelimited2(bdi);
so we no longer require that the address_space be stabilized after
lock_page(). Of course something needs to protect the bdi and I'm not
sure what that is, but we're talking about umount and that quiesces and
evicts lots of things before proceeding, so surely there's something in
there which will save us ;)
> The
> same btw applies for the page_mkwrite case: how is mapping safe to
> pass to balance_dirty_pages() after unlocking page table and page?
I'm not sure which code you're referring to here, but it's likely that
the switch-balancing-to-bdi approach will address that as well?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/