Re: [PATCH] /dev/mem: Add a new parameter strict_devmem to bypass strict devmem

From: David Hildenbrand
Date: Fri Nov 22 2024 - 05:58:43 EST


On 22.11.24 03:14, Yafang Shao wrote:
On Thu, Nov 21, 2024 at 11:23 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 21.11.24 16:14, Greg KH wrote:
On Thu, Nov 21, 2024 at 10:31:12PM +0800, Yafang Shao wrote:
On Thu, Nov 21, 2024 at 4:51 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 20.11.24 13:28, Yafang Shao wrote:
When CONFIG_STRICT_DEVMEM is enabled, writing to /dev/mem to override
kernel data for debugging purposes is prohibited. This configuration is
always enabled on our production servers. However, there are times when we
need to use the crash utility to modify kernel data to analyze complex
issues.

As suggested by Ingo, we can add a boot time knob of soft-enabling it.
Therefore, a new parameter "strict_devmem=" is added. The reuslt are as
follows,

- Before this change
crash> wr panic_on_oops 0
wr: cannot write to /proc/kcore <<<< failed

- After this change
- default
crash> wr panic_on_oops 0
wr: cannot write to /proc/kcore <<<< failed

- strict_devmem=off
crash> p panic_on_oops
panic_on_oops = $1 = 1
crash> wr panic_on_oops 0
crash> p panic_on_oops
panic_on_oops = $2 = 0 <<<< succeeded

- strict_devmem=invalid
[ 0.230052] Invalid option string for strict_devmem: 'invalid'
crash> wr panic_on_oops 0
wr: cannot write to /proc/kcore <<<< failed

Suggested-by: Ingo Molnar <mingo@xxxxxxxxxx>
Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
---
.../admin-guide/kernel-parameters.txt | 16 ++++++++++++++
drivers/char/mem.c | 21 +++++++++++++++++++
2 files changed, 37 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1518343bbe22..7fe0f66d0dfb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6563,6 +6563,22 @@
them frequently to increase the rate of SLB faults
on kernel addresses.

+ strict_devmem=
+ [KNL] Under CONFIG_STRICT_DEVMEM, whether strict devmem
+ is enabled for this boot. Strict devmem checking is used
+ to protect the userspace (root) access to all of memory,
+ including kernel and userspace memory. Accidental access
+ to this is obviously disastrous, but specific access can
+ be used by people debugging the kernel. Note that with
+ PAT support enabled, even in this case there are
+ restrictions on /dev/mem use due to the cache aliasing
+ requirements.
+ on If IO_STRICT_DEVMEM=n, the /dev/mem file only allows
+ userspace access to PCI space and the BIOS code and data
+ regions. This is sufficient for dosemu and X and all
+ common users of /dev/mem. (default)
+ off Disable strict devmem checks.
+
sunrpc.min_resvport=
sunrpc.max_resvport=
[NFS,SUNRPC]

This will allow to violate EXCLUSIVE_SYSTEM_RAM, and I am afraid I don't
enjoy seeing devmem handling+config getting more complicated.

That poses a challenge. Perhaps we should also consider disabling
functions that rely on EXCLUSIVE_SYSTEM_RAM when strict_devmem=off,
but implementing such a change seems overly complex.

Our primary goal is to temporarily bypass STRICT_DEVMEM for live
kernel debugging. In an earlier version, I proposed making the
fucntion devmem_is_allowed() error-injectable, but Ingo pointed out
that it violates the principles of STRICT_DEVMEM.

I think that "primary goal" is the problem here. We don't want to do
that, at all, for all the reasons why we implemented STRICT_DEVMEM and
for why people enable it.

+1


Either you enable it because you want the protection and "security" it
provides, or you do not. Don't try to work around it please.

Do you have any suggestions on enabling write access to /dev/mem in
debugging tools like the crash utility, while maintaining
compatibility with the existing rules?

I think you just don't provide write access to /dev/mem for debugging
tools as it's a huge security hole that people realized and have plugged
up. If you want to provide access to this for "debugging" then just
don't enable that option and live with the risk involved, I don't see
how you can have it both ways.

Exactly. And I think a reasonable approach would be to have a debug
kernel around into which you can boot, and make sure the debug kernel
has such security features turned off.

If you rely on distros, maybe you could convince the distro to ship the
debug kernel with STRICT_DEVMEM off. I just checked RHEL9, and it only
seems to be off in debug kernels on arm64 and s390x (IIUC). Maybe there
is a reason we don't even want that off on debug kernels on x86_64, or
nobody requested it so far, because using the crash utility with write
access on a live system ... is a rather weird ... debugging mechanism in
2024 IMHO.

It seems I might be a bit outdated.
Could you share how you typically modify a live system these days? Are
you using live patching, writing kernel modules, or perhaps some
clever tools or techniques I'm not familiar with?

I think modifying live systems is something people usually don't do anymore. The common debugging workflow is to use kdump and analyze it offline.

I mean, people like me working for distributions analyze *a lot* of issues, and never really rely on /dev/mem or crash on a production system. Well, and apparently not even in debug kernels where some of them have STRICT_DEVMEM enabled.

If you find yourself having to modify a live production system, you are probably something wrong.

If you really want to modify your live system, there is kdb/kgdb. Alternatively, use a debug kernel where you disable security/safety mechanisms.

--
Cheers,

David / dhildenb