[PATCH] Sanely size hash tables when using large base pages.

From: Paul Mundt
Date: Tue Dec 26 2006 - 01:42:47 EST


At the moment both the pidhash and inode/dentry cache hash tables (common
by way of alloc_large_system_hash()) are incorrectly sized by their
respective detection logic when we attempt to use large base pages on
systems with little memory.

This results in odd behaviour when using a 64kB PAGE_SIZE, such as:

PID hash table entries: 512 (order: 9, 2048 bytes)
...
Dentry cache hash table entries: 8192 (order: -1, 32768 bytes)
Inode-cache hash table entries: 4096 (order: -2, 16384 bytes)

The mount cache hash table is seemingly the only one that gets this right
by directly taking PAGE_SIZE in to account.

The following patch attempts to catch the bogus values and round it up to
at least 0-order (or down, in the PID hash case).

Signed-off-by: Paul Mundt <lethal@xxxxxxxxxxxx>

--

kernel/pid.c | 17 +++++++++++++----
mm/page_alloc.c | 4 ++++

2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index 2efe9d8..198c6a9 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -383,20 +383,29 @@ void free_pid_ns(struct kref *kref)
/*
* The pid hash table is scaled according to the amount of memory in the
* machine. From a minimum of 16 slots up to 4096 slots at one gigabyte or
- * more.
+ * more. As a safety net for large base page on small memory systems
+ * the hash table is scaled to a 0-order allocation in the event that the
+ * initial detection logic sizes the table incorrectly (which can result
+ * in a very large number of slots).
*/
void __init pidhash_init(void)
{
- int i, pidhash_size;
+ int i, pidhash_size, size;
unsigned long megabytes = nr_kernel_pages >> (20 - PAGE_SHIFT);

pidhash_shift = max(4, fls(megabytes * 4));
pidhash_shift = min(12, pidhash_shift);
pidhash_size = 1 << pidhash_shift;

+ size = pidhash_size * sizeof(struct hlist_head);
+ if (unlikely(size < PAGE_SIZE)) {
+ size = PAGE_SIZE;
+ pidhash_size = size / sizeof(struct hlist_head);
+ pidhash_shift = 0;
+ }
+
printk("PID hash table entries: %d (order: %d, %Zd bytes)\n",
- pidhash_size, pidhash_shift,
- pidhash_size * sizeof(struct hlist_head));
+ pidhash_size, pidhash_shift, size);

pid_hash = alloc_bootmem(pidhash_size * sizeof(*(pid_hash)));
if (!pid_hash)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8c1a116..4a9a83f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3321,6 +3321,10 @@ void *__init alloc_large_system_hash(con
numentries >>= (scale - PAGE_SHIFT);
else
numentries <<= (PAGE_SHIFT - scale);
+
+ /* Make sure we've got at least a 0-order allocation.. */
+ if (unlikely((numentries * bucketsize) < PAGE_SIZE))
+ numentries = PAGE_SIZE / bucketsize;
}
numentries = roundup_pow_of_two(numentries);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/