Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?

From: Gerald Schaefer
Date: Tue Sep 22 2020 - 09:56:29 EST


On Thu, 10 Sep 2020 12:20:34 +0200
David Hildenbrand <david@xxxxxxxxxx> wrote:

> Hi everybody,
>
> I was just exploring how /sys/devices/system/memory/memoryX/phys_device
> is/was used. It's one of these interfaces that most probably never
> should have been added but now we are stuck with it.
>
> "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3],
> back when they were still part of s390x-tools. They were later replaced
> [5] by the variants in linux-utils. For example, RHEL6 and RHEL7 contain
> lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux
> on s390x [4].
>
> "phys_device" was added with sysfs support for memory hotplug in commit
> 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions")
> in 2005. It always returned 0.
>
> s390x started returning something != 0 on some setups (if sclp.rzm is
> set by HW) in 2010 via commit 57b552ba0b2f("memory hotplug/s390: set
> phys_device").
>
> For s390x, it allowed for identifying which memory block devices belong
> to the same memory increment (RZM). Only if all memory block devices
> comprising a single memory increment were offline, the memory could
> actually be removed in the hypervisor.
>
> Since commit e5d709bb5fb7 ("s390/memory hotplug: provide
> memory_block_size_bytes() function") in 2013 a memory block devices
> spans at least one memory increment - which is why the interface isn't
> really helpful/used anymore (except by old lsmem/chmem tools).

Correct, so I do not see any problem for s390 with removing / changing
that for the upstream kernel. BTW, that commit also gave some relief
on the scaling issue, at least for s390. With increasing total memory
size, we also have increasing increment and thus memory block size.

Of course, that also has some limitations, IIRC max. 1 GB increment
size, but still better than the 256 MB default size.

>
> There were once RFC patches to make use of it in ACPI, but it could be
> solved using different interfaces [1].
>
>
> While I'd love to rip it out completely, I think it would break old
> lsmem/chmem completely - and I assume that's not acceptable. I was
> wondering what would be considered safe to do now/in the future:
>
> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on
> s390x). This will make old lsmem/chmem behave differently after
> switching to a new kernel, like if sclp.rzm would not be set by HW -
> AFAIU, it will assume all memory is in a single memory increment. Do we
> care?

No, at least not until that kernel change would be backported to some
old distribution level where we still use lsmem/chmem from s390-tools.
Given that this is just some clean-up w/o any functional benefit, and
hopefully w/o any negative impact, I think we can safely assume that no
distributor will do that "just for fun".

Even if there would be good reasons for backports, then I guess we also
have good reasons for backporting / switching to the util-linux version
of lsmem / chmem for such distribution levels. Alternatively, adjust the
s390-tools lsmem / chmem there.

But I would rather "rip it out completely" than just return 0. You'd
need some lsmem / chmem changes anyway, at least in case this would
ever be backported.

> 2. Restrict it to s390x only. It always returned 0 on other
> architectures, I was not able to find any user.
>
> I think 2 should be safe to do (never used on other archs). I do wonder
> what the feelings are about 1.

Please don't add any s390-specific workarounds here, that does not
really sound like a clean-up, rather the opposite.

That being said, I do not really see the benefit of this change at
all. As Michal mentioned, there really should be some more fundamental
change. And from the rest of this thread, it also seems that phys_device
usage might not be the biggest issue here.