o Tightened what pools are used for fallbacks, less likely to fragment
o Many micro-optimisations to have the same performance as the standard allocator. Modified allocator now faster than standard allocator using
gcc 3.3.5
o Increased the size of reserve for fallbacks from 10% to 12.5%.
Difference in performance operations report generated by diff-aim9.sh from VMRegress 0.14
N Test Standard MBuddy V10 Diff % diff Test description
Ops/sec Ops/sec Ops/sec
-- ---------- --------- ---------- -------- ------ ----------------
1 add_double 460569.72 465222.46 4652.74 1.01% Thousand Double Precision Additions/second
2 add_float 460523.25 465322.45 4799.20 1.04% Thousand Single Precision Additions/second
3 add_long 1421763.04 1436042.64 14279.60 1.00% Thousand Long Integer Additions/second
4 add_int 1421763.04 1436042.64 14279.60 1.00% Thousand Integer Additions/second
5 add_short 1421363.11 1435760.71 14397.60 1.01% Thousand Short Integer Additions/second
7 page_test 121048.16 123059.49 2011.33 1.66% System Allocations & Pages/second
8 brk_test 445743.79 452407.93 6664.14 1.50% System Memory Allocations/second
9 jmp_test 4158416.67 4232083.33 73666.66 1.77% Non-local gotos/second
10 signal_test 94417.60 94584.24 166.64 0.18% Signal Traps/second
11 exec_test 65.04 66.69 1.65 2.54% Program Loads/second
12 fork_test 1537.82 1730.51 192.69 12.53% Task Creations/second
13 link_test 6411.28 6477.45 66.17 1.03% Link/Unlink Pairs/second
The aim9 results show that there are consistent improvements for common
page-related operations. The results are compiler dependant and there are
variances of 1-2% between versions.
-#define __GFP_BITS_SHIFT 16 /* Room for 16 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 18 /* Room for 16 __GFP_FOO bits */
+#ifdef CONFIG_ALLOCSTATS
+ /*
+ * These are beancounters that track how the placement policy
+ * of the buddy allocator is performing
+ */
+ unsigned long fallback_count[ALLOC_TYPES];
+ unsigned long alloc_count[ALLOC_TYPES];
+ unsigned long reserve_count[ALLOC_TYPES];
+ unsigned long kernnorclm_full_steal;
+ unsigned long kernnorclm_partial_steal;
+ unsigned long bulk_requests[MAX_ORDER];
+ unsigned long bulk_alloced[MAX_ORDER];
+#endif
+int fallback_allocs[ALLOC_TYPES][ALLOC_TYPES] = { + {ALLOC_KERNNORCLM, ALLOC_FALLBACK, ALLOC_KERNRCLM, ALLOC_USERRCLM},
+ {ALLOC_KERNRCLM, ALLOC_FALLBACK, ALLOC_KERNNORCLM, ALLOC_USERRCLM},
+ {ALLOC_USERRCLM, ALLOC_FALLBACK, ALLOC_KERNNORCLM, ALLOC_KERNRCLM},
+ * Here, the alloc type lists has been depleted as well as the global
+ * pool, so fallback. When falling back, the largest possible block
+ * will be taken to keep the fallbacks clustered if possible
+ */
+ alloctype = fallback_list[retry_count];
+
+ /* Find a block to allocate */
+ area = zone->free_area_lists[alloctype] + (MAX_ORDER-1);
+ current_order=MAX_ORDER;
+ do {
+ current_order--;
+ if (list_empty(&area->free_list)) {
+ area--;
+ continue;
+ }
+
+ goto remove_page;
+ } while (current_order != order);
+ }