Re: [00/41] Large Blocksize Support V7 (adds memmap support)

From: Goswin von Brederlow
Date: Sun Sep 23 2007 - 01:50:46 EST


mel@xxxxxxxxx (Mel Gorman) writes:

> On (16/09/07 23:31), Andrea Arcangeli didst pronounce:
>> On Sun, Sep 16, 2007 at 09:54:18PM +0100, Mel Gorman wrote:
>> Allocating ptes from slab is fairly simple but I think it would be
>> better to allocate ptes in PAGE_SIZE (64k) chunks and preallocate the
>> nearby ptes in the per-task local pagetable tree, to reduce the number
>> of locks taken and not to enter the slab at all for that.
>
> It runs the risk of pinning up to 60K of data per task that is unusable for
> any other purpose. On average, it'll be more like 32K but worth keeping
> in mind.

Two things to both of you respectively.

Why should we try to stay out of the pte slab? Isn't the slab exactly
made for this thing? To efficiently handle a large number of equal
size objects for quick allocation and dealocation? If it is a locking
problem then there should be a per cpu cache of ptes. Say 0-32
ptes. If you run out you allocate 16 from slab. When you overflow you
free 16 (which would give you your 64k allocations but in multiple
objects).

As for the wastage. Every pte can map 2MB on amd64, 4MB on i386, 8MB
on sparc (?). A 64k pte chunk would be 32MB, 64MB and 32MB (?)
respectively. For the sbrk() and mmap() usage from glibc malloc() that
would be fine as they grow linear and the mmap() call in glibc could
be made to align to those chunks. But for a programm like rtorrent
using mmap to bring in chunks of a 4GB file this looks desasterous.

>> Infact we
>> could allocate the 4 levels (or anyway more than one level) in one
>> single alloc_pages(0) and track the leftovers in the mm (or similar).

Personally I would really go with a per cpu cache. When mapping a page
reserve 4 tables. Then you walk the tree and add entries as
needed. And last you release 0-4 unused entries to the cache.

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/