Re: [patch 2/4] vfs: add set_page_dirty_notag
From: Edward Shishkin
Date: Mon Feb 16 2009 - 17:42:43 EST
Peter Zijlstra writes:
[...]
> Maybe something like
>
> /*
> * set_page_dirty_notag() -- similar to __set_page_dirty_nobuffers()
> * except it doesn't tag the page dirty in the page-cache radix tree.
> * This means that the address space using this cannot use the regular
> * filemap ->writepages() helpers and must provide its own means of
> * tracking and finding dirty pages.
> *
> * NOTE: furthermore, this version also doesn't handle truncate races.
> */
>
Yup, your version is better. I just have replaced
^finding dirty pages^finding non-tagged dirty pages,
see below..
> > +int set_page_dirty_notag(struct page *page)
> > +{
> > + struct address_space *mapping = page->mapping;
> > +
> > + if (!TestSetPageDirty(page)) {
> > + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
> > + if (mapping_cap_account_dirty(mapping)) {
> > + /*
> > + * Since we don't tag the page as dirty,
> > + * acquiring the tree_lock is replaced
> > + * with disabling preemption to protect
> > + * per-cpu data used for accounting.
> > + */
>
> This should be local_irq_save(flags)
>
> > + preempt_disable();
> > + __inc_zone_page_state(page, NR_FILE_DIRTY);
> > + __inc_bdi_stat(mapping->backing_dev_info,
> > + BDI_RECLAIMABLE);
> > + task_dirty_inc(current);
> > + task_io_account_write(PAGE_CACHE_SIZE);
> > + preempt_enable();
>
> local_irq_restore()
>
> These accounting functions rely on being atomic wrt interrupts.
>
Thanks!
> > + }
> > + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> > + return 1;
> > + }
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(set_page_dirty_notag);
>
> How much performance gain do you see by avoiding that radix tree op?
>
Nop. We want to use it with extended semantics.
All dirty pages are divided into 2 categories:
A) tagged in the radix tree (with PAGECACHE_TAG_DIRTY).
B) captured by atoms (usual linked lists).
reiser4_writepages() looks for pages of "A" in the radix tree
and moves them to "B". set_page_dirty_notag(), introduced by
my patch, is needed for pages of "B".
If "B" is empty, then we get the traditional semantics with
regular ->writepages().
That's all!
> I take it the only reason you don't use the regular
> __set_page_dirty_nobuffers() and just clear the tag when you do the
> write-out by whatever alternative means you have to find the page, is
> that it gains you some performance.
>
> It would be good to have some numbers to judge this on.
>
So, performance is not a concern. We want to extend
functionality, which is currently restricted by the unnecessary(*)
property "dirty bit is set <=> dirty tag is installed".
(*) Reiser4 successfully uses such design since 2002.
I don't remember any BUG_ONs, and related problems.
Please, consider the appended patch.
Thanks,
Edward.
Add set_page_dirty_notag() to the core library to enable
extended functionality of radix tree attached to inode->i_mapping.
Signed-off-by: Edward Shishkin<edward.shishkin@xxxxxxxxx>
---
include/linux/mm.h | 1 +
mm/page-writeback.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
--- mmotm.orig/include/linux/mm.h
+++ mmotm/include/linux/mm.h
@@ -841,6 +841,7 @@ int redirty_page_for_writepage(struct wr
struct page *page);
int set_page_dirty(struct page *page);
int set_page_dirty_lock(struct page *page);
+int set_page_dirty_notag(struct page *page);
int clear_page_dirty_for_io(struct page *page);
extern unsigned long move_page_tables(struct vm_area_struct *vma,
--- mmotm.orig/mm/page-writeback.c
+++ mmotm/mm/page-writeback.c
@@ -1248,6 +1248,42 @@ int __set_page_dirty_nobuffers(struct pa
EXPORT_SYMBOL(__set_page_dirty_nobuffers);
/*
+ * set_page_dirty_notag() -- similar to __set_page_dirty_nobuffers()
+ * except it doesn't tag the page dirty in the page-cache radix tree.
+ * This means that the address space using this cannot use the regular
+ * filemap ->writepages() helpers and must provide its own means of
+ * tracking and finding non-tagged dirty pages.
+ *
+ * NOTE: furthermore, this version also doesn't handle truncate races.
+ */
+int set_page_dirty_notag(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+
+ if (!TestSetPageDirty(page)) {
+ WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
+ if (mapping_cap_account_dirty(mapping)) {
+ /*
+ * The accounting functions rely on
+ * being atomic wrt interrupts.
+ */
+ unsigned long flags;
+ local_irq_save(flags);
+ __inc_zone_page_state(page, NR_FILE_DIRTY);
+ __inc_bdi_stat(mapping->backing_dev_info,
+ BDI_RECLAIMABLE);
+ task_dirty_inc(current);
+ task_io_account_write(PAGE_CACHE_SIZE);
+ local_irq_restore(flags);
+ }
+ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+ return 1;
+ }
+ return 0;
+}
+EXPORT_SYMBOL(set_page_dirty_notag);
+
+/*
* When a writepage implementation decides that it doesn't want to write this
* page for some reason, it should redirty the locked page via
* redirty_page_for_writepage() and it should then unlock the page and return 0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/