Re: [RFC] free_pages stuff

From: Al Viro
Date: Mon Dec 21 2015 - 21:22:43 EST


On Mon, Dec 21, 2015 at 05:16:44PM -0800, Linus Torvalds wrote:
> On Dec 21, 2015 17:04, "Al Viro" <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > > And quite frankly, even the "new name" is likely a bad idea. If you
> > > want to allocate a page, and get a pointer, just use "kmalloc()".
> > > Boom, done!
> >
> > Erm... You've just described the absolute majority of callers. Really.
>
> That wasn't my point.
>
> I totally believe that most of the legacy users actually wanted a pointer.
>
> But that doesn't mean that we should just convert a legacy interface. We
> should either just create a new interface and leave old users alone, or if
> we care about that code and really want to remove the cast, maybe it should
> just use kmalloc() instead.
>
> Long ago, allocating a page using kmalloc() was a bad idea, because there
> was overhead for it in the allocation and the code.
>
> These days, kmalloc() not only doesn't have the allocation overhead, but
> may actually scale better too, thanks to percpu caches etc.
>
> So my point here is that not only is it wrong to change the calling
> convention for a legacy function (and it really probably doesn't get much
> more legacy than get_free_page - I think it's been around forever), but

Yes - present in v0.01, with similar situation re callers even back then ;-)

> even the "let's make up a new name" conversion may be wrong, because it's
> entirely possible that the code in question should just be using kmalloc().
>
> So I don't think an automatic conversion is a good idea. I suspect that old
> code that somebody isn't actively working on should just be left alone, and
> code that *is* actively worked on should maybe consider kmalloc().

Agreed. Again, what I really wanted was to get the clear picture of what
uses _are_ there. In more details than just "grepping seems to indicate
that...". It's really pretty much all of them.

> And if the code really explicitly wants a page (or set of aligned pages)
> for some vm reason, I suspect having the cast there isn't a bad thing. It's
> clearly not just a random pointer allocation if the bit pattern of the
> pointer matters.

BTW, I'm not sure we don't have code that would assume that
kmalloc(PAGE_SIZE,...) always returns something PAGE_SIZE-aligned.

FWIW, pointer-returning get_free_page() would not be a flagday change at
all - we only have __get_free_page()/__get_free_pages() right now. And I'm
not sure that it wouldn't make sense to add void *-returning variants without
underscores - not for bulk conversion of existing callers, but for new
places that want a page. Because most of the new ones (and new ones keep
appearing; it's not just ancient code) still want a pointer. And yes, quite
a few of those should be using something else.

Example (went into the tree just three months ago):
static inline void *scif_zalloc(size_t size)
{
void *ret = NULL;
size_t align = ALIGN(size, PAGE_SIZE);

if (align && get_order(align) < MAX_ORDER)
ret = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
get_order(align));
return ret ? ret : vzalloc(align);
}

This clearly should be using kzalloc() instead of __get_free_pages() (and
I'm not sure whether the cutoff is right - similar "kmalloc if not too
large, vmalloc otherwise" tends to have cutoff lower than MAX_ORDER;
PAGE_ALLOC_COSTLY_ORDER is more common). The callers do not look like
they would care about page alignment - at least quite a few of them do
not.

Incidentally, those caller include the following example of lousy naming:
(*pages)->phys_addr = scif_zalloc(nr_pages * sizeof(dma_addr_t));
First of all, the address is clearly virtual - it's an array! What's more,
I really wonder whether it's DMA or physical addresses that are stored there.
It's declared as an array of dma_addr_t, but...
(*pages)->phys_addr[i] =
__scif_off_to_dma_addr(window, offset +
(i * PAGE_SIZE));
(*pages)->phys_addr[i] = scif_get_phys((*pages)->phys_addr[i],
ep);
and
static phys_addr_t scif_get_phys(phys_addr_t phys, struct scif_endpt *ep)
seems to indicate something fishy going on.

Typechecking for different kinds of addresses is really too weak, and the
amount of casts we have around them doesn't help either ;-/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/