Re: [Lse-tech] RE: [PATCH] HUGETLB memory commitment
From: Ray Bryant
Date: Mon Apr 05 2004 - 11:22:37 EST
Ken,
Chen, Kenneth W wrote:
A simple counter won't work for different file offset mapping. It has to
be some sort of per-inode, per-block reservation tracking. I think we are
steering in the right direction though.
OK, pardon my question about test code, that is trivial enough I guess.
Anyway, the only way I can see to make this work with non-zero offset is to
hang a list of segment descriptors (offset and size) for each reserved segment
off of the inode. Then when a new mapping comes in, we search the segment
list to see if the new offset and size overlaps with any of the existing
reserved segments. If it doesn't, then we make a new reservation (and request
file system quota) for the current size, and add the current request to the
reserved segment list. If it does, and it fits entirely in a previously
reserved segement, then no change to reservation/quota needs to be made. If
it only partially fits, then we need to make a new reservation/quota request
for the number of new huge pages required and update the overlapping segment's
length to reflect the new reservation.
Then in truncate_hugepages() we can search the segment list again, discarding
full or partial segments that occur either entirely or partially beyond
"lstart", as appropropriate and doing hugetlb_unreserve() and
hugetlbfs_put_quota() for the appropriate number of pages.
This will be quite a bit of code and complexity. Do we still think this is
all worth it to follow Andrew's suggestion of no API changes for "allocate on
fault" hugetlbpages? It would be a lot cleaner just to return SIGBUS if we
run out of hugepages and be done with it, in spite of the API change.
Is there a simpler way to do the correct reservation? (One could allocate the
pages at mmap() time, resurrecting hugetlb_prefault(), but zero the pages at
fault time, this would solve the original problem we ran into at SGI, but
would not solve Andi's requirement to postpone allocation so NUMA API's can
control placement.)
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@xxxxxxx raybry@xxxxxxxxxxxxx
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/