[PATCH] mm/page-writeback - highmem_is_dirtyable option (replacesdirty_highmem patch)
From: Bron Gondwana
Date: Tue Nov 27 2007 - 08:06:22 EST
On Tue, Nov 27, 2007 at 11:10:28PM +1100, Bron Gondwana wrote:
> On Mon, Nov 26, 2007 at 09:53:15PM -0800, Andrew Morton wrote:
> > umm, really you want
> > /proc/sys/vm/dont-account-highmem-in-dirty-memory-calculations, only
> > shorter.
> >
> > Do you agree?
>
> I still read dirty_highmem as:
>
> /proc/sys/vm/do-account-highmem-in-dirty-memory-calculations
>
> ... so we're still talking one negative apart!
>
> > It would be simpler to have
> > /proc/sys/vm/do-account-highmem-in-dirty-memory-calculations,
> > defaulting to "true" - this has no negations.
>
> No, that's not true. The whole point is that between 2.6.16 and
> 2.6.20 the kernel behaviour changed. It currently doesn't count
> highmem in dirty memory calculations, which is why the memory pressure
> appears to be so great when actually there's still 4Gb of unused
> memory in the box.
>
> > OK, I give up. Please see if you can think of something less confusing
> > which involves no negations?
>
> I think this might be slightly clearer:
>
> /proc/sys/vm/highmem_is_dirtyable - defaults to false
>
> [ ... ]
>
> Would you like me to re-submit the patch based on this? I'm
> certainly not happy with dirty_highmem as it is now in mm
> because it looks backwards and unclear to me.
Well, it was quick enough to just do - here's the patch. I've
also updated the documentation a bit to clarify the intention and
the reasons why you might want to use it (based in part on the
comments to the original change that made highmem uncountable for
dirtyness purposes)
Tested and applied against 2.6.23.9 (our build script makes Debian
packages from a clean unpack of kernel major plus patch minor plus
svn checkout of out quilt series and apply regardless, so it was
just as easy to bump the version number while I was at it).
Builds, boots, passes a quick run of the test program I used last
time around.
Bron.
Add vm.highmem_is_dirtyable toggle
A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file
of approximately 2Gb size which contains a hash format that is written
randomly by the dbclean process. On 2.6.16 this process took a few
minutes. With lowmem only accounting of dirty ratios, this takes about
12 hours of 100% disk IO, all random writes.
This patch includes some code cleanup from Linus and a toggle in
/proc/sys/vm/highmem_is_dirtyable which can be set to 1 to add the
highmem back to the total available memory count.
Signed-off-by: Bron Gondwana <brong@xxxxxxxxxxx>
Index: linux-2.6.23.8-reiserfix-fai-vmdirty/mm/page-writeback.c
===================================================================
--- linux-2.6.23.8-reiserfix-fai-vmdirty.orig/mm/page-writeback.c 2007-11-21 21:58:20.000000000 -0500
+++ linux-2.6.23.8-reiserfix-fai-vmdirty/mm/page-writeback.c 2007-11-27 07:27:51.000000000 -0500
@@ -70,6 +70,12 @@
int dirty_background_ratio = 5;
/*
+ * free highmem will not be subtracted from the total free memory
+ * for calculating free ratios if vm_highmem_is_dirtyable is true
+ */
+int vm_highmem_is_dirtyable;
+
+/*
* The generator of dirty data starts writeback at this percentage
*/
int vm_dirty_ratio = 10;
@@ -153,7 +159,10 @@
x = global_page_state(NR_FREE_PAGES)
+ global_page_state(NR_INACTIVE)
+ global_page_state(NR_ACTIVE);
- x -= highmem_dirtyable_memory(x);
+
+ if (!vm_highmem_is_dirtyable)
+ x -= highmem_dirtyable_memory(x);
+
return x + 1; /* Ensure that we never return 0 */
}
@@ -163,20 +172,12 @@
{
int background_ratio; /* Percentages */
int dirty_ratio;
- int unmapped_ratio;
long background;
long dirty;
unsigned long available_memory = determine_dirtyable_memory();
struct task_struct *tsk;
- unmapped_ratio = 100 - ((global_page_state(NR_FILE_MAPPED) +
- global_page_state(NR_ANON_PAGES)) * 100) /
- available_memory;
-
dirty_ratio = vm_dirty_ratio;
- if (dirty_ratio > unmapped_ratio / 2)
- dirty_ratio = unmapped_ratio / 2;
-
if (dirty_ratio < 5)
dirty_ratio = 5;
Index: linux-2.6.23.8-reiserfix-fai-vmdirty/include/linux/writeback.h
===================================================================
--- linux-2.6.23.8-reiserfix-fai-vmdirty.orig/include/linux/writeback.h 2007-10-09 16:31:38.000000000 -0400
+++ linux-2.6.23.8-reiserfix-fai-vmdirty/include/linux/writeback.h 2007-11-27 07:22:17.000000000 -0500
@@ -95,6 +95,7 @@
extern int vm_dirty_ratio;
extern int dirty_writeback_interval;
extern int dirty_expire_interval;
+extern int vm_highmem_is_dirtyable;
extern int block_dump;
extern int laptop_mode;
Index: linux-2.6.23.8-reiserfix-fai-vmdirty/kernel/sysctl.c
===================================================================
--- linux-2.6.23.8-reiserfix-fai-vmdirty.orig/kernel/sysctl.c 2007-10-09 16:31:38.000000000 -0400
+++ linux-2.6.23.8-reiserfix-fai-vmdirty/kernel/sysctl.c 2007-11-27 07:26:43.000000000 -0500
@@ -776,6 +776,7 @@
/* Constants for minimum and maximum testing in vm_table.
We use these as one-element integer vectors. */
static int zero;
+static int one = 1;
static int two = 2;
static int one_hundred = 100;
@@ -1066,6 +1067,19 @@
.extra1 = &zero,
},
#endif
+#ifdef CONFIG_HIGHMEM
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "highmem_is_dirtyable",
+ .data = &vm_highmem_is_dirtyable,
+ .maxlen = sizeof(vm_highmem_is_dirtyable),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec_minmax,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+#endif
/*
* NOTE: do not add new entries to this table unless you have read
* Documentation/sysctl/ctl_unnumbered.txt
Index: linux-2.6.23.8-reiserfix-fai-vmdirty/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.23.8-reiserfix-fai-vmdirty.orig/Documentation/filesystems/proc.txt 2007-10-09 16:31:38.000000000 -0400
+++ linux-2.6.23.8-reiserfix-fai-vmdirty/Documentation/filesystems/proc.txt 2007-11-27 07:21:22.000000000 -0500
@@ -1253,6 +1253,21 @@
Data which has been dirty in-memory for longer than this interval will be
written out next time a pdflush daemon wakes up.
+highmem_is_dirtyable
+--------------------
+
+Only present if CONFIG_HIGHMEM is set.
+
+This defaults to 0 (false), meaning that the ratios set above are calculated
+as a percentage of lowmem only. This protects against excessive scanning
+in page reclaim, swapping and general VM distress.
+
+Setting this to 1 can be useful on 32 bit machines where you want to make
+random changes within an MMAPed file that is larger than your available
+lowmem without causing large quantities of random IO. Is is safe if the
+behavior of all programs running on the machine is known and memory will
+not be otherwise stressed.
+
legacy_va_layout
----------------
Index: linux-2.6.23.8-reiserfix-fai-vmdirty/Documentation/sysctl/vm.txt
===================================================================
--- linux-2.6.23.8-reiserfix-fai-vmdirty.orig/Documentation/sysctl/vm.txt 2007-10-09 16:31:38.000000000 -0400
+++ linux-2.6.23.8-reiserfix-fai-vmdirty/Documentation/sysctl/vm.txt 2007-11-27 07:13:30.000000000 -0500
@@ -22,6 +22,7 @@
- dirty_background_ratio
- dirty_expire_centisecs
- dirty_writeback_centisecs
+- highmem_is_dirtyable (only if CONFIG_HIGHMEM set)
- max_map_count
- min_free_kbytes
- laptop_mode
@@ -36,10 +37,10 @@
==============================================================
-dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
-dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode,
-block_dump, swap_token_timeout, drop-caches,
-hugepages_treat_as_movable:
+dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
+dirty_writeback_centisecs, highmem_is_dirtyable,
+vfs_cache_pressure, laptop_mode, block_dump, swap_token_timeout,
+drop-caches, hugepages_treat_as_movable:
See Documentation/filesystems/proc.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/