Re: [PATCH 09/10] percpu: implement new dynamic percpu allocator

From: Rusty Russell
Date: Thu Feb 19 2009 - 07:07:40 EST


On Thursday 19 February 2009 20:40:15 Andrew Morton wrote:
> afacit nobody has answered your "is num_possible_cpus() ever a lot
> larger than num_online_cpus()" question.
>
> It is fairly important.

Hi Andrew,

It can be: suspend a giant machine; goes down to 1 cpu.

But I don't think there's much point worrying about a potentially-giant-
but-actually-tiny machine. Noone else has, so we wait until someone actually
creates such a thing, then they can fix this, as well as all the others.

(The only place I can see that this makes sense is in the virtualization space
when you might be on a 4096 CPU host, so all guests might want the capability
to expand to fill the machine.)

> > + struct page *page[]; /* #cpus * UNIT_PAGES */
>
> "pages" ;)

Heh, disagree: users are clearer if it's page :)

> > +static int pcpu_populate_chunk(struct pcpu_chunk *chunk, int off, int size)
> > +{
> > + const gfp_t alloc_mask = GFP_KERNEL | __GFP_HIGHMEM | __GFP_COLD;
>
> A designed decision has been made to not permit the caller to specify
> the allocation mode?
>
> Usually a mistake. Probably appropriate in this case. Should be
> mentioned up-front and discussed a bit.

Yes, it derives from alloc_percpu which (1) zeroes, and (2) can sleep.

I chose this way-back-when because I didn't want to require atomic allocs
when it was implemented properly, and I couldn't think of a single sane use
case, so I'd rather that pioneer be the one to add the flags.

> > + if (unlikely(!size))
> > + return NULL;
>
> hm. Why do we do this? Perhaps emitting this warning:

Yes, I prefer size++ myself, maybe with a warn_on until someone uses it.



> > +void free_percpu(void *ptr)
> > +{
> > + void *addr = __pcpu_ptr_to_addr(ptr);
> > + struct pcpu_chunk *chunk;
> > + int off;
> > +
> > + if (!ptr)
> > + return;
>
> Do we ever do this? Should it be permitted? Should we warn?

I want to. Yes. No.

Any generic free function should take NULL; it's a bug otherwise, and just
makes for gratuitous over-cautious branches in callers when we equivocate.

BTW Andrew, this was an excellent example of how to review kernel code.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/