Linus Torvalds wrote:
>
> In article <392C0AC4.7794EC3F@colorfullife.com>,
> Manfred Spraul <manfreds@colorfullife.com> wrote:
> >
> >I'm still testing our memory allocators, and I added a per-cpu linked
> >list for order==0 to page_alloc:
>
> Manfred, if you _really_ want to speed up the buddy allocator on an SMP
> machine, there's a much simpler way: make sure that the
> "test_and_change_bit()" thing is not run with the "lock" prefix.
>
Around 45 cpu ticks faster:
P II:
before: 631
without "lock;" 586
without "lock;", and with a per-cpu list: 385
One gfp/free contains 2 superflous "lock;" cycles
But the per-cpu list hits saves 200 cpu cycles, and it avoids touching
the spinlock - no cache line trashing.
But I'm mainly collecting stats:
* during kernel compile, 99.6% of all allocations were single page
allocs, and the rest were 2 pages. An 8 entry (per cpu) list optimized ~
30% of all allocations.
* Web serving a static page generates 99% hits, but they probably come
from getname().
--> first we should decide if getname() can use kmalloc(), then I'll
retest the gfp changes.
* using kmalloc(PAGE_SIZE) for getname() might be dangerous: kmalloc()
internally calls gfp(order==1).
Perhaps we should modify the slab allocator:
* kmalloc(PAGE_SIZE) should internally use gfp(order==0)
* if a slab contains only one entry, then both bufctl and slab structure
are superflous: there cannot be an internal fragmentation, a simple
single linked list is sufficient, the pointers could be stored somewhere
in "struct page"
-- Manfred- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:13 EST