Re: [patch] rfc: introduce /dev/hugetlb

From: Nish Aravamudan
Date: Sat Mar 24 2007 - 01:33:07 EST

On 3/23/07, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
On Fri, 23 Mar 2007 01:44:38 -0700 "Ken Chen" <kenchen@xxxxxxxxxx> wrote:

> On 3/21/07, Adam Litke <agl@xxxxxxxxxx> wrote:
> > The main reason I am advocating a set of pagetable_operations is to
> > enable the development of a new hugetlb interface. During the hugetlb
> > BOFS at OLS last year, we talked about a character device that would
> > behave like /dev/zero. Many of the people were talking about how they
> > just wanted to create MAP_PRIVATE hugetlb mappings without all the fuss
> > about the hugetlbfs filesystem. /dev/zero is a familiar interface for
> > getting anonymous memory so bringing that model to huge pages would make
> > programming for anonymous huge pages easier.
> I think we have enough infrastructure currently in hugetlbfs to
> implement what Adam wants for something like a /dev/hugetlb char
> device (except we can't afford to have a zero hugetlb page since it
> will be too costly on some arch).
> I really like the idea of having something similar to /dev/zero for
> hugetlb page. So I coded it up on top of existing hugetlbfs. The
> core change is really small and half of the patch is really just
> moving things around. I think this at least can partially fulfill the
> goal.

Standing back and looking at this...

afaict the whole reason for this work is to provide a quick-n-easy way to
get private mappings of hugetlb pages. With the emphasis on quick-n-easy.

I agree.

We can do the same with hugetlbfs, but that involves (horror) "fuss".


The way to avoid "fuss" is of course to do it once, do it properly then stick
it in a library which everyone uses.

That's sort of what libhugetlbfs
( for stable releases, for development snapshots/git tree) is
for; while it currently only tries to abstract/provide functionality
via hugetlbfs, that's mostly because that is the only interface
available (or was, pending some sort of char dev being merged).

But libraries are hard, for a number of distributional reasons. It is
easier for us to distribute this functionality within the kernel. In fact,
if Linus's tree included a ./userspace/libkernel/libhugetlb/ then we'd
probably provide this functionality in there.

libhugetlbfs is available for both SLES10 and RHEL5.

This comes up regularly, and it's pretty sad.

I agree. There is simply some functionality that is *very* closely
tied to the kernel.

Probably the kernel team should be maintaining, via existing processes, a
separate libkernel project, to fix these distributional problems. The
advantage in this case is of course that our new hugetlb functionality
would be available to people on 2.6.18 kernels, not only on 2.6.22 and

That sounds like a good idea. For this hugetlb stuff, though, I plan
on simply taking advantage of /dev/hugetlb (or whatever it is called)
if it exists, and otherwise falling back to hugetlbfs (which
admittedly requires some admin intervention (mounting hugetlbfs,
permissions, and such), but then again, so does using hugepages in the
first place (either at boot-time or via /proc/sys/vm/nr_hugepages)).
Is that what you mean by available to 2.6.18 (falling back to
hugetlbfs) and 2.6.22 (using the chardev)?

Am I wrong?

I don't think so. And hugepages are hard enough to use (and with
enough architecture specific quirks) that it was worth creating
libhugetlbfs. While having some nice features like segment remapping
and overriding malloc, it is also meant to provide an API that is
useful for general use of hugepages: we currently export
gethugepagesize(), hugetlbfs_test_path() (verify a path is a valid
hugetlbfs mount), hugetlbfs_find_path() (gives you the hugetlbfs
mount) and hugetlbfs_unlinked_fd() (gives you an unlinked file in the
hugetlbfs mount).

Then again, maybe I'm missing some much bigger picture here and you
meant something completely different -- sorry for the noise in that
case :/

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at