Re: [PATCH] Allocate module.ref array dynamically

From: Rusty Russell
Date: Sat Nov 15 2008 - 19:00:46 EST


On Friday 14 November 2008 10:50:41 Christoph Lameter wrote:
> On Fri, 14 Nov 2008, Rusty Russell wrote:
> > It is defined to be zeroing.
>
> But some uses do not need zeroing. They currently have no choice.

OK, of the 32 in-tree users, only 4 don't need zeroing (rough audit; they
might need zeroing in some subtle way). Performance arguments are not valid
in any of these cases though, and it's hard to see that they would be.

> And other flags can become necessary if percpu areas gain the ability of
> being dynamically extended.

And other flags become impossible, eg. GFP_ATOMIC.

> > It absolutely does. That's why it takes a type!
>
> alloc_percpu is based on __alloc_percpu which currently does not take an
> alignment. Guess we could get there by removing the intermediate
> functions and making alloc_percpu call cpu_alloc.

Yes. When I originally wrote this, it hooked up to the percpu allocator which
took an alignment (this was reduced to the old module.c percpu allocator in
the end).

The strange indirection was from someone's failed experiment at only
allocating only online cpus. We should kill that as a separate patch.

> However, there are only a few places that use allocpercpu and all of
> those are touched by necesssity by the full cpu alloc patchset.

No, that's my point. There are 28 places whihc use alloc_percpu, and they
should be left. There are 4 places which use __alloc_percpu, and all of them
can be converted to alloc_percpu (see patch below).

Then, you change alloc_percpu(type) to hand the size and align to your
infrastructure (call it __alloc_percpu(size, align) if you want, or stick with
cpu_alloc()).

> At mininum
> we need to add the allocation flags and add cpu ops as well as review the
> preemption at each call site etcin order to use the correct
> call. So why bother with the old API?

We don't need a flags arg, there's no evidence of any problem, and it's
useless churn to change it.

There are several seperate things here (this is to make sure my head is
straight and to clarify for others skimming):

1) Make the dynamic percpu allocator use the static percpu system.
- Agreed, always the aim, never quite happened.

2) Make zeroing optional
- Disagree, adds API complexity for corner case with no gain. Using
gfp_t for it is even worse, since it implies GFP_ATOMIC is valid or
sensible.

3) Change API to use CAPS for macros
- Strongly disagree, Linus doesn't use CAPS for type-taking macros
(list_entry, container_of, min_t), it's ugly, and status-quo wins.

4) Get rid of unused "online-only" percpu allocators.
- Agree, and will simplify implementation and macro tangle.

5) Make dynamic percpu var access more efficient.
- Agree, obviously. x86 for the moment, the rest can follow (or not).

6) Use percpu allocations more widely.
- Definitely, I have some other patches which use it too. And makes even
more sense once (5) is done.

7) Make percpu area grow dynamically.
- Yes, but a thorny tangle esp. with IA64. The cmdline hack is probably
sufficient meanwhile, and parallels can be drawn with vmalloc.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/