Re: [PATCH RFC] page_writeback: cleanup mess around cancel_dirty_page()

From: Константин Хлебников
Date: Fri Jan 16 2015 - 08:18:58 EST


16.01.2015, 04:16, "Andrew Morton" <akpm@xxxxxxxxxxxxxxxxxxxx>:
> On Thu, 15 Jan 2015 18:57:31 +0300 Konstantin Khebnikov <khlebnikov@xxxxxxxxxxxxxx> wrote:
>>  This patch replaces cancel_dirty_page() with helper account_page_cleared()
>>  which only updates counters. It's called from delete_from_page_cache()
>>  and from try_to_free_buffers() (hack for ext3). Page is locked in both cases.
>>
>>  Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough here.
>>
>>  cancel_dirty_page() in nfs_wb_page_cancel() is redundant. This is helper
>>  for nfs_invalidate_page() and it's called only in case complete invalidation.
>>
>>  Open-coded kludge at the end of __delete_from_page_cache() is redundant too.
>>
>>  This mess was started in v2.6.20, after commit 3e67c09 ("truncate: clear page
>>  dirtiness before running try_to_free_buffers()") reverted back in v2.6.25
>>  by commit a2b3456 ("Fix dirty page accounting leak with ext3 data=journal").
>>  Custom fixes were introduced between them. NFS in in v2.6.23 in commit
>>  1b3b4a1 ("NFS: Fix a write request leak in nfs_invalidate_page()").
>>  Kludge __delete_from_page_cache() in v2.6.24, commit 3a692790 ("Do dirty
>>  page accounting when removing a page from the page cache").
>>
>>  It seems safe to leave dirty flag set on truncated page, free_pages_check()
>>  will clear it before returning page into buddy allocator.
>
> account_page_cleared() is not a good name - "clearing a page" means
> filling it with zeroes.  account_page_cleaned(), perhaps?

Ok. account_page_cleaned is better.

>
> I don't think your email cc'ed all the correct people?  lustre, nfs,
> ext3?

oops

>>  ...
>>
>>  --- a/fs/buffer.c
>>  +++ b/fs/buffer.c
>>  @@ -3243,8 +3243,8 @@ int try_to_free_buffers(struct page *page)
>>            * to synchronise against __set_page_dirty_buffers and prevent the
>>            * dirty bit from being lost.
>>            */
>>  - if (ret)
>>  - cancel_dirty_page(page, PAGE_CACHE_SIZE);
>>  + if (ret && TestClearPageDirty(page))
>>  + account_page_cleared(page, mapping);
>
> OK.
>>           spin_unlock(&mapping->private_lock);
>>   out:
>>           if (buffers_to_free) {
>>
>>  ...
>>
>>  --- a/fs/nfs/write.c
>>  +++ b/fs/nfs/write.c
>>  @@ -1811,11 +1811,6 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
>>                    * request from the inode / page_private pointer and
>>                    * release it */
>>                   nfs_inode_remove_request(req);
>>  - /*
>>  - * In case nfs_inode_remove_request has marked the
>>  - * page as being dirty
>>  - */
>>  - cancel_dirty_page(page, PAGE_CACHE_SIZE);
>
> hm, if you say so..

That is main reason of this patch.
I dont like these obsoleted pieces of duct tape here and there.

>>                   nfs_unlock_and_release_request(req);
>>           }
>>
>>  ...
>>
>>  --- a/mm/filemap.c
>>  +++ b/mm/filemap.c
>>  @@ -201,18 +201,6 @@ void __delete_from_page_cache(struct page *page, void *shadow)
>>           if (PageSwapBacked(page))
>>                   __dec_zone_page_state(page, NR_SHMEM);
>>           BUG_ON(page_mapped(page));
>>  -
>>  - /*
>>  - * Some filesystems seem to re-dirty the page even after
>>  - * the VM has canceled the dirty bit (eg ext3 journaling).
>>  - *
>>  - * Fix it up by doing a final dirty accounting check after
>>  - * having removed the page entirely.
>>  - */
>>  - if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
>>  - dec_zone_page_state(page, NR_FILE_DIRTY);
>>  - dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
>>  - }
>>   }
>>
>>   /**
>>  @@ -230,6 +218,9 @@ void delete_from_page_cache(struct page *page)
>>
>>           BUG_ON(!PageLocked(page));
>>
>>  + if (PageDirty(page))
>>  + account_page_cleared(page, mapping);
>>  +
>
> OK, but we lost the important comment - transplant that?
>
> It's strange that we left the dirty bit set after accounting for its
> clearing.  How does this work?  Presumably the offending fs dirtied the
> page without accounting for it?  I have a bad feeling I wrote that code :(

account_page_dirtyed() must be always called after dirtying non-truncated pages.
Here page is truncating from mapping, dirty accounting never will see it again.

This is the only place where dirty page might be truncated. All other places:
replace_page_cache_page, invalidate_complete_page2, __remove_mapping
(in memory reclaimer) forbid removing dirty pages.

As I see PageDirty means nothing for truncated pages, it's never be written anywhere.
We could clear dirty bit here, but probably it might appear again: set_page_dirty()
for some reason has branch for pages without page->mapping.

>>           freepage = mapping->a_ops->freepage;
>>           spin_lock_irq(&mapping->tree_lock);
>>           __delete_from_page_cache(page, NULL);
>>  diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>  index 4da3cd5..f371522 100644
>>  --- a/mm/page-writeback.c
>>  +++ b/mm/page-writeback.c
>>  @@ -2106,6 +2106,25 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
>>   EXPORT_SYMBOL(account_page_dirtied);
>>
>>   /*
>>  + * Helper function for deaccounting dirty page without doing writeback.
>>  + * Doing this should *normally* only ever be done when a page
>>  + * is truncated, and is not actually mapped anywhere at all. However,
>>  + * fs/buffer.c does this when it notices that somebody has cleaned
>>  + * out all the buffers on a page without actually doing it through
>>  + * the VM. Can you say "ext3 is horribly ugly"? Tought you could.
>
> "Thought".

Ah, ok. That is copy-paste from cancel_dirty_page().

>>  + */
>>  +void account_page_cleared(struct page *page, struct address_space *mapping)
>>  +{
>>  + if (mapping_cap_account_dirty(mapping)) {
>>  + dec_zone_page_state(page, NR_FILE_DIRTY);
>>  + dec_bdi_stat(mapping->backing_dev_info,
>>  + BDI_RECLAIMABLE);
>>  + task_io_account_cancelled_write(PAGE_CACHE_SIZE);
>>  + }
>>  +}
>>  +EXPORT_SYMBOL(account_page_cleared);
>>
>>  ...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/