Re: [Announce] BKL shifting into drivers and filesystems - beware

From: Andrea Arcangeli (andrea@suse.de)
Date: Sun Jul 16 2000 - 15:41:19 EST


On Sun, 16 Jul 2000, Arjan van de Ven wrote:

>I hope you didn't want to insult linux-kernel's intelligence by using such a
>flawed logic; I assume you know that Rik had a good point (ie. on most
>systems kernel memory is much less than userspace memory) and you were just
>making a joke...

Note that BUG() happens in very fast paths (unlike most of the GFP_KERNEL
allocations that you was referring to), thus defining BUG as a noop would
probably make a difference.

What I wanted to say is that network skbs, buffer cache, dcache and icache
are_n't_ allocated in ZONE_NORMAL and they can't handle HIGHMEM. It would
be costly in performance to change ipv4, VFS and probably also the fses to
allow them to handle highmemory also in the long term (since it would
involve changing a pte mapping and flushing a tlb entry for a little read
or write of a core structure that should be instead very efficient to read
and write from/to). So I think changing the kerenl to be able to use
always HIGHMEM all over the place is the wrong thing to do for
performance.

Could you answer me why an alloc_skb done by tcp_sendmsg should rebalance
the highmem zone instead of left the tasks that can use highmem and kswapd
to rebalance it (asynchronously from the tcp_sendmsg point of view)? Is
that for saving three pushes on the stack of a zone_t pointer in the
slowww GFP path, right? Do you think that allowing the VM to do the right
thing doesn't worth a few pushes on the stack? IMHO even if
icache/dcache, buffer cache and skbs would handle highmem it would still
worth to do the few pushes on the stack to pass the zone_t for the rest of
the slow path thousand GFP_KERNEL (since they will happen eventually) and
I don't see any improvement in saving the three pushes on the stack in the
GFP slow path.

Also do you think that breaking the LRU ordering is worth the three pushes
on the stack?

Also thanks to the zone_t pointer and to the _right_ work that the memory
balancing code is doing, I'm sure I'll refill the local per process
freelist with memory that will be useful (see GFP-race-fix-3 for reference
about the local freelist) and so in case the order of the allocation is 0
I can refill the local freelist with only 1 page and left the other pages
to the shared freelist for other tasks.

To do smart things (like local freelist lazy-refill and also like page
colouring and to be better in allocating order > 0 allocations) we'll have
to pass way more info than zone_t to from the allocator to the memory
balancing algorithm, so I don't really see the point in going in the
backwards direction by making the memory balancing code
allocator-request-blind.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:08 EST