Re: [Lse-tech] RE: [PATCH] HUGETLB memory commitment

From: Ray Bryant
Date: Mon Apr 05 2004 - 11:22:37 EST


Ken,

Chen, Kenneth W wrote:


A simple counter won't work for different file offset mapping. It has to
be some sort of per-inode, per-block reservation tracking. I think we are
steering in the right direction though.




OK, pardon my question about test code, that is trivial enough I guess.

Anyway, the only way I can see to make this work with non-zero offset is to hang a list of segment descriptors (offset and size) for each reserved segment off of the inode. Then when a new mapping comes in, we search the segment list to see if the new offset and size overlaps with any of the existing reserved segments. If it doesn't, then we make a new reservation (and request file system quota) for the current size, and add the current request to the reserved segment list. If it does, and it fits entirely in a previously reserved segement, then no change to reservation/quota needs to be made. If it only partially fits, then we need to make a new reservation/quota request for the number of new huge pages required and update the overlapping segment's length to reflect the new reservation.

Then in truncate_hugepages() we can search the segment list again, discarding full or partial segments that occur either entirely or partially beyond "lstart", as appropropriate and doing hugetlb_unreserve() and hugetlbfs_put_quota() for the appropriate number of pages.

This will be quite a bit of code and complexity. Do we still think this is all worth it to follow Andrew's suggestion of no API changes for "allocate on
fault" hugetlbpages? It would be a lot cleaner just to return SIGBUS if we run out of hugepages and be done with it, in spite of the API change.

Is there a simpler way to do the correct reservation? (One could allocate the pages at mmap() time, resurrecting hugetlb_prefault(), but zero the pages at fault time, this would solve the original problem we ran into at SGI, but would not solve Andi's requirement to postpone allocation so NUMA API's can control placement.)

--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@xxxxxxx raybry@xxxxxxxxxxxxx
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/