Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

From: Rick Lindsley
Date: Wed Jun 24 2020 - 05:06:29 EST


Thanks, Tejun, appreciate the feedback.

On 6/23/20 4:13 PM, Tejun Heo wrote:

The problem is fitting that into an interface which wholly doesn't fit that
particular requirement. It's not that difficult to imagine different ways to
represent however many memory slots, right?

Perhaps we have different views of how this is showing up. systemd is the primary instigator of the boot issues.

Systemd runs

/usr/lib/systemd/system/systemd-udev-trigger.service

which does a udev trigger, specifically

/usr/bin/udevadm trigger --type=devices --action=add

as part of its post-initramfs coldplug. It then waits for that to finish, under the watch of a timer.

So, the kernel itself is reporting these devices to systemd. It gets that information from talking to the hardware. That means, then, that the obfuscation must either start in the kernel itself (it lies to systemd), or start in systemd when it handles the devices it got from the kernel. If the kernel lies, then the actual granularity is not available to any user utilities.

Unless you're suggesting a new interface be created that would allow utilities to determine the "real" memory addresses available for manipulation. But the changes you describe cannot be limited to the unknown number of auxiliary utilities.

Having one subsystem lie to another seems like the start of a bad idea, anyway. When the hardware management console, separate from Linux, reports a memory error, or adds or deletes memory in a guest system, it's not going to be manipulating spoofed addresses that are only a Linux construct.

In contrast, the provided patch fixes the observed problem with no ripple effect to other subsystems or utilities.

Greg had suggested
Treat the system as a whole please, don't go for a short-term
fix that we all know is not solving the real problem here.

Your solution affects multiple subsystems; this one affects one. Which is the whole system approach in terms of risk? You mentioned you support 30k scsi disks but only because they are slow so the inefficiencies of kernfs don't show. That doesn't bother you?

Rick