Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

From: Barret Rhoden
Date: Fri Mar 01 2019 - 16:28:02 EST


Hi -

On 03/01/2019 03:34 PM, Dennis Zhou wrote:
Hi Barret,

On Fri, Mar 01, 2019 at 01:30:15PM -0500, Barret Rhoden wrote:
Hi -

On 01/21/2019 06:47 AM, Eial Czerwacki wrote:


Your main issue was that you only sent this patch to LKML, but not the
maintainers of the file. If you don't, your patch might get lost. To get
the appropriate people and lists, run:

scripts/get_maintainer.pl YOUR_PATCH.patch.

For this patch, you'll get this:

Dennis Zhou <dennis@xxxxxxxxxx> (maintainer:PER-CPU MEMORY ALLOCATOR)
Tejun Heo <tj@xxxxxxxxxx> (maintainer:PER-CPU MEMORY ALLOCATOR)
Christoph Lameter <cl@xxxxxxxxx> (maintainer:PER-CPU MEMORY ALLOCATOR)
linux-kernel@xxxxxxxxxxxxxxx (open list)

I added the three maintainers to this email.

I have a few minor comments below.

[PATCH] percpu/module resevation: change resevation size iff X86_VSMP is
set

You misspelled 'reservation'. Also, I'd just say: "percpu: increase module
reservation size if X86_VSMP is set". ('change' -> 'increase'), only says
'reservation' once.)

as reported in bug #201339 (https://bugzilla.kernel.org/show_bug.cgi?id=201339)

I think you can add a tag for this right above your Signed-off-by tags.
e.g.:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201339

by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from the default one
causing the struct size to exceed the size ok 8KB.
^of

Which struct are you talking about? I have one in mind, but others might
not know from reading the commit message.

I ran into this on https://bugzilla.kernel.org/show_bug.cgi?id=202511. In
that case, it was because modules (drm and amdkfd) were using DEFINE_SRCU,
which does a DEFINE_PER_CPU on struct srcu_data, and that used
____cacheline_internodealigned_in_smp.


in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if CONFIG_X86_VSMP is set.
^increase


the value was caculated on linux 4.20.3, make allmodconfig all and the following oneliner:
^calculated

for f in `find -name *.ko`; do echo $f; readelf -S $f |grep perc; done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r: "; objdump --syms --section=.data..percpu $r|grep data |sort -n |awk '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc

Not sure how useful the one-liner is, versus a description of what you're
doing. i.e. "the size of all module percpu data sections, or something."

Also, how close was that calculated value to 64K? If more modules start
using DEFINE_SRCU, each of which uses 8K, then that 64K might run out.

Thanks,
Barret

Signed-off-by: Eial Czerwacki <eial@xxxxxxxxxxx>
Signed-off-by: Shai Fultheim <shai@xxxxxxxxxxx>
Signed-off-by: Oren Twaig <oren@xxxxxxxxxxx>
---
include/linux/percpu.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 70b7123..6b79693 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -14,7 +14,11 @@
/* enough to cover all DEFINE_PER_CPUs in modules */
#ifdef CONFIG_MODULES
+#ifdef X86_VSMP
+#define PERCPU_MODULE_RESERVE (1 << 16)
+#else
#define PERCPU_MODULE_RESERVE (8 << 10)
+#endif
#else
#define PERCPU_MODULE_RESERVE 0
#endif



Thanks for sending this to me.

I must say, I really do not want to expand the reserved region. In most
cases, it can easily end up unused and thus wasted memory as it is hard
allocated on boot. This is done because code gen assumes static
variables are close to the program counter. This would not be true with
dynamic allocations which being at the end of the vmalloc area
(Summarized from Tejun's account in [1]).

Another note on the reserved region. It starts at the end of the static
region which means it generally isn't page aligned. So while an 8kb
allocation would fit, a 4kb alignment more than likely would fail.
Something as large as 8kb should probably be dynamically allocated as
well.
>
I read through the bugzilla report and it seems that the culprits are:
drivers/gpu/drm/amd/amdkfd/kfd_process.c:DEFINE_SRCU(kfd_processes_srcu);
drivers/gpu/drm/drm_drv.c:DEFINE_STATIC_SRCU(drm_unplug_srcu);

Is there a reason we cannot dynamically initialize these structs? I've
cced Paul McKenney because we saw an issue with ipmi in December [1].

I looked at the AMD driver, and it looks like they could dynamically initialize it. It would require a little extra plumbing. I imagine the DRM one is the same way.

To catch this in the future, should we disallow DEFINE_SRCU in modules or something? Otherwise, this will pop up again the next time someone uses DEFINE_SRCU in a module and builds with CONFIG_X86_VSMP.

That might be a little much, and it still won't be sufficient to catch all cases. This will also come up any time a module has a static per-cpu data structure that uses __cacheline_aligned_in_smp, so it's not limited to SRCU either.

I'm not familiar with VSMP - how bad is it to use L1 cache alignment instead of 4K page alignment? Maybe some structures can use the smaller alignment? Or maybe have VSMP require SRCU-using modules to be built-in?

Thanks,

Barret