Re: [PATCH 1/2] numa: Add simple generic NUMA emulation

From: Tvrtko Ursulin
Date: Mon Aug 12 2024 - 12:36:01 EST



Hi Jonathan,

On 08/08/2024 17:27, Jonathan Cameron wrote:
On Thu, 8 Aug 2024 12:56:44 +0100
Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx> wrote:

[Please excuse the re-send, but as I heard nothing concern is it did not
get lost in your busy mailbox.]

Hi Greg,

Gentle reminder on the opens from this thread. Let me re-summarise the
question below:

On 26/06/2024 12:47, Tvrtko Ursulin wrote:

Hi Greg,

On 26/06/2024 08:38, Greg Kroah-Hartman wrote:
On Tue, Jun 25, 2024 at 01:58:02PM +0100, Tvrtko Ursulin wrote:
From: Maíra Canal <mcanal@xxxxxxxxxx>

Add some common code for splitting the memory into N emulated NUMA
memory
nodes.

Individual architecture can then enable selecting this option and use
the
existing numa=fake=<N> kernel argument to enable it.

Memory is always split into equally sized chunks.

Signed-off-by: Maíra Canal <mcanal@xxxxxxxxxx>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: “Rafael J. Wysocki" <rafael@xxxxxxxxxx>
---
  drivers/base/Kconfig          |  7 ++++
  drivers/base/Makefile         |  1 +
  drivers/base/arch_numa.c      |  6 ++++
  drivers/base/numa_emulation.c | 67 +++++++++++++++++++++++++++++++++++
  drivers/base/numa_emulation.h | 21 +++++++++++

Why not just properly describe the numa topology in your bootloader or
device tree and not need any such "fake" stuff at all?

Also, you are now asking me to maintain these new files, not something
I'm comfortable doing at all sorry.

Mostly because ae3c107cd8be ("numa: Move numa implementation to common
code") and existing common code in drivers/base/arch_numa.c it appeared
it could be acceptable to add the simple NUMA emulation into the common
code too. Then building upon the same concept as on x86 where no need
for firmware changes is needed for experimenting with different
configurations.

Would folding into arch_numa.c so no new files are added address your
concern, or your main issue is the emulation in general?

Re-iterating and slightly re-formulating this question I see three options:

a)
Fold the new simple generic code into the existing arch_numa.c,
addressing the "no new files" objection, if that was the main objection.

b)
Move completely into arch code - aka you don't want to see it under
drivers/base at all, ever, regardless of how simple the new code is, or
that common NUMA code is already there.

c)
Strong nack for either a) or b) - so "do it in the firmware" comment.

Trying to understand your position so we can progress this.

See:
https://lore.kernel.org/all/20240807064110.1003856-20-rppt@xxxxxxxxxx/
and rest of thread
https://lore.kernel.org/all/20240807064110.1003856-1-rppt@xxxxxxxxxx/
[PATCH v4 00/26] mm: introduce numa_memblks

Much larger rework and unification set from Mike Rapoport
that happens to end up adding numa emulation as part of making
the x86 numa_memblk work for arm64 etc.

It's in mm-unstable now so getting some test coverage etc.

Sorry, I'd kind of assumed this also went to linux-mm so
the connection would have been made.

This is great - I did not see it since I don't read linux-mm regularly!

I gave Mike's implementation a spin on top of RPi 6.11 kernel and it mostly works fine.

Is the decision that this is going in pretty much set, that is, high level acks are there?

One area to potentially improve is working around CMA areas when they are put by the firmware at a range which straddles two nodes. In my series, albeit not the one I yet posted, I have some code to fudge that so that CMA ends up wholly in one node and so CMA initialisation can succeed.

I can try and adapt that patch to this series and post as a RFC.

Then there are some odd things about NUMA, memory pressure and swap behaviour, but that is not specific to this series and not something I got to the bottom off just yet. Could be specific to my board and IO for instance.

Regards,

Tvrtko