Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

From: Dennis Zhou
Date: Fri Mar 01 2019 - 15:35:05 EST


Hi Barret,

On Fri, Mar 01, 2019 at 01:30:15PM -0500, Barret Rhoden wrote:
> Hi -
>
> On 01/21/2019 06:47 AM, Eial Czerwacki wrote:
> >
>
> Your main issue was that you only sent this patch to LKML, but not the
> maintainers of the file. If you don't, your patch might get lost. To get
> the appropriate people and lists, run:
>
> scripts/get_maintainer.pl YOUR_PATCH.patch.
>
> For this patch, you'll get this:
>
> Dennis Zhou <dennis@xxxxxxxxxx> (maintainer:PER-CPU MEMORY ALLOCATOR)
> Tejun Heo <tj@xxxxxxxxxx> (maintainer:PER-CPU MEMORY ALLOCATOR)
> Christoph Lameter <cl@xxxxxxxxx> (maintainer:PER-CPU MEMORY ALLOCATOR)
> linux-kernel@xxxxxxxxxxxxxxx (open list)
>
> I added the three maintainers to this email.
>
> I have a few minor comments below.
>
> > [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is
> set
>
> You misspelled 'reservation'. Also, I'd just say: "percpu: increase module
> reservation size if X86_VSMP is set". ('change' -> 'increase'), only says
> 'reservation' once.)
>
> > as reported in bug #201339 (https://bugzilla.kernel.org/show_bug.cgi?id=201339)
>
> I think you can add a tag for this right above your Signed-off-by tags.
> e.g.:
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201339
>
> > by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from the default one
> > causing the struct size to exceed the size ok 8KB.
> ^of
>
> Which struct are you talking about? I have one in mind, but others might
> not know from reading the commit message.
>
> I ran into this on https://bugzilla.kernel.org/show_bug.cgi?id=202511. In
> that case, it was because modules (drm and amdkfd) were using DEFINE_SRCU,
> which does a DEFINE_PER_CPU on struct srcu_data, and that used
> ____cacheline_internodealigned_in_smp.
>
> >
> > in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if CONFIG_X86_VSMP is set.
> ^increase
>
> >
> > the value was caculated on linux 4.20.3, make allmodconfig all and the following oneliner:
> ^calculated
>
> > for f in `find -name *.ko`; do echo $f; readelf -S $f |grep perc; done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r: "; objdump --syms --section=.data..percpu $r|grep data |sort -n |awk '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc
>
> Not sure how useful the one-liner is, versus a description of what you're
> doing. i.e. "the size of all module percpu data sections, or something."
>
> Also, how close was that calculated value to 64K? If more modules start
> using DEFINE_SRCU, each of which uses 8K, then that 64K might run out.
>
> Thanks,
> Barret
>
> > Signed-off-by: Eial Czerwacki <eial@xxxxxxxxxxx>
> > Signed-off-by: Shai Fultheim <shai@xxxxxxxxxxx>
> > Signed-off-by: Oren Twaig <oren@xxxxxxxxxxx>
> > ---
> > include/linux/percpu.h | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> > index 70b7123..6b79693 100644
> > --- a/include/linux/percpu.h
> > +++ b/include/linux/percpu.h
> > @@ -14,7 +14,11 @@
> > /* enough to cover all DEFINE_PER_CPUs in modules */
> > #ifdef CONFIG_MODULES
> > +#ifdef X86_VSMP
> > +#define PERCPU_MODULE_RESERVE (1 << 16)
> > +#else
> > #define PERCPU_MODULE_RESERVE (8 << 10)
> > +#endif
> > #else
> > #define PERCPU_MODULE_RESERVE 0
> > #endif
> >
>

Thanks for sending this to me.

I must say, I really do not want to expand the reserved region. In most
cases, it can easily end up unused and thus wasted memory as it is hard
allocated on boot. This is done because code gen assumes static
variables are close to the program counter. This would not be true with
dynamic allocations which being at the end of the vmalloc area
(Summarized from Tejun's account in [1]).

Another note on the reserved region. It starts at the end of the static
region which means it generally isn't page aligned. So while an 8kb
allocation would fit, a 4kb alignment more than likely would fail.
Something as large as 8kb should probably be dynamically allocated as
well.

I read through the bugzilla report and it seems that the culprits are:
drivers/gpu/drm/amd/amdkfd/kfd_process.c:DEFINE_SRCU(kfd_processes_srcu);
drivers/gpu/drm/drm_drv.c:DEFINE_STATIC_SRCU(drm_unplug_srcu);

Is there a reason we cannot dynamically initialize these structs? I've
cced Paul McKenney because we saw an issue with ipmi in December [1].

[1] https://lore.kernel.org/linux-mm/CAJM9R-JWO1P_qJzw2JboMH2dgPX7K1tF49nO5ojvf=iwGddXRQ@xxxxxxxxxxxxxx/

Thanks,
Dennis