RE: [PATCH v9 0/4] shut down devices asynchronously
From: Michael Kelley
Date: Fri Oct 11 2024 - 00:22:31 EST
From: Stuart Hayes <stuart.w.hayes@xxxxxxxxx> Sent: Wednesday, October 9, 2024 10:58 AM
>
> This adds the ability for the kernel to shutdown devices asynchronously.
>
> Only devices with drivers that enable it are shut down asynchronously.
>
> This can dramatically reduce system shutdown/reboot time on systems that
> have multiple devices that take many seconds to shut down (like certain
> NVMe drives). On one system tested, the shutdown time went from 11 minutes
> without this patch to 55 seconds with the patch.
I've been testing this series against a 6.11.0 kernel in an Azure VM, which
is running as a guest on the Microsoft Hyper-V hypervisor. The VM has 16 vCPUs,
128 Gbytes of memory, and two physical NVMe controllers that are mapped
into the VM so that it can access them directly.
But I wanted to confirm that the two NVMe controllers are being shutdown
in parallel. So before doing a shutdown, I set
/sys/module/kernel/parameters/initcall_debug to "Y" so the shutdown
of each device is recorded in the console output. Here's the full set of
device shutdown messages:
172.609825 platform intel_rapl_msr.0: shutdown
172.611940 mlx5_ib.rdma mlx5_core.rdma.0: shutdown
172.613931 mlx5_core.eth mlx5_core.eth.0: shutdown
172.618116 nvme c2b7:00:00.0: shutdown
172.618262 nvme 132e:00:00.0: shutdown
172.618349 mlx5_core 1610:00:02.0: shutdown
172.618359 mlx5_core 1610:00:02.0: Shutdown was called
172.782768 hv_pci ba152dae-1610-4c67-b925-81ac4902e4ce: shutdown
172.786405 sd 0:0:0:1: shutdown
172.788788 sd 0:0:0:0: shutdown
172.789949 sd 0:0:0:0: [sda] Synchronizing SCSI cache
172.794209 atkbd serio0: shutdown
172.795974 hv_utils 242ff919-07db-4180-9c2e-b86cb68c8c55: shutdown
172.800432 hv_pci 0cdfe983-132e-434b-8025-fc9ab43c0fc5: shutdown
172.802812 hv_pci 2394da4f-c2b7-43bd-b72f-d3482ef6850a: shutdown
172.805145 hv_netvsc 0022487e-1043-0022-487e-10430022487e: shutdown
172.807575 hyperv_fb 5620e0c7-8062-4dce-aeb7-520c7ef76171: shutdown
172.810026 hyperv_keyboard d34b2567-b9b6-42b9-8778-0a4ec0b955bf: shutdown
172.812522 hid_hyperv 58f75a6d-d949-4320-99e1-a2a2576d581c: shutdown
172.814982 hv_balloon 1eccfd72-4b41-45ef-b73a-4a6e44c12924: shutdown
172.817376 vmbus c4e5e7d1-d748-4afc-979d-683167910a55: shutdown
172.819789 hv_storvsc f8b3781b-1e82-4818-a1c3-63d806ec15bb: shutdown
172.822324 hv_storvsc f8b3781a-1e82-4818-a1c3-63d806ec15bb: shutdown
172.824813 hv_utils 2dd1ce17-079e-403c-b352-a1921ee207ee: shutdown
172.827199 hv_utils b6650ff7-33bc-4840-8048-e0676786f393: shutdown
172.829653 hv_utils fd149e91-82e0-4a7d-afa6-2a4166cbd7c0: shutdown
172.836408 platform eisa.0: shutdown
172.838558 alarmtimer alarmtimer.0.auto: shutdown
172.842461 platform Fixed MDIO bus.0: shutdown
172.864709 kgdboc kgdboc: shutdown
172.878009 serial8250 serial8250: shutdown
172.889725 platform pcspkr: shutdown
172.904386 rtc_cmos 00:02: shutdown
172.906217 serial 00:01: shutdown
172.907799 serial 00:00: shutdown
172.910427 platform efivars.0: shutdown
172.913341 platform rtc-efi.0: shutdown
172.915470 vmgenid HYPER_V_GEN_COUNTER_V1:00: shutdown
172.917479 vmbus VMBUS:00: shutdown
172.919012 platform PNP0003:00: shutdown
172.926707 reg-dummy reg-dummy: shutdown
172.961360 ACPI: PM: Preparing to enter system sleep state S5
You see the Mellanox CX-5 NIC, the two NVMe devices, various Hyper-V
virtual devices, and platform devices being shutdown. Everything seems to
work properly, so that's good. The two NVMe devices are shutdown very
close in time, so they are probably being done in parallel.
As a comparison, I did the same thing with an unmodified 6.11.0 kernel.
Indeed, the NVMe device shutdowns are significantly more apart in time
(110 milliseconds). That's not noticeably slow like the NVMe devices you
were dealing with, but doing them in parallel helps a little bit.
But here's the kicker: The overall process of shutting down the devices
took *longer* with the patch set than without. Here's the same output
from a 6.11.0 kernel without the patch set:
745.455493 platform intel_rapl_msr.0: shutdown
745.456999 mlx5_ib.rdma mlx5_core.rdma.0: shutdown
745.458557 mlx5_core.eth mlx5_core.eth.0: shutdown
745.460166 mlx5_core 1610:00:02.0: shutdown
745.461570 mlx5_core 1610:00:02.0: Shutdown was called
745.466053 nvme 132e:00:00.0: shutdown
745.579284 nvme c2b7:00:00.0: shutdown
745.718739 hv_pci ba152dae-1610-4c67-b925-81ac4902e4ce: shutdown
745.721114 sd 0:0:0:1: shutdown
745.722254 sd 0:0:0:0: shutdown
745.723357 sd 0:0:0:0: [sda] Synchronizing SCSI cache
745.725259 atkbd serio0: shutdown
745.726405 hv_utils 242ff919-07db-4180-9c2e-b86cb68c8c55: shutdown
745.728375 hv_pci 0cdfe983-132e-434b-8025-fc9ab43c0fc5: shutdown
745.730347 hv_pci 2394da4f-c2b7-43bd-b72f-d3482ef6850a: shutdown
745.732281 hv_netvsc 0022487e-1043-0022-487e-10430022487e: shutdown
745.734318 hyperv_fb 5620e0c7-8062-4dce-aeb7-520c7ef76171: shutdown
745.736488 hyperv_keyboard d34b2567-b9b6-42b9-8778-0a4ec0b955bf: shutdown
745.738628 hid_hyperv 58f75a6d-d949-4320-99e1-a2a2576d581c: shutdown
745.740770 hv_balloon 1eccfd72-4b41-45ef-b73a-4a6e44c12924: shutdown
745.742835 vmbus c4e5e7d1-d748-4afc-979d-683167910a55: shutdown
745.744765 hv_storvsc f8b3781b-1e82-4818-a1c3-63d806ec15bb: shutdown
745.746861 hv_storvsc f8b3781a-1e82-4818-a1c3-63d806ec15bb: shutdown
745.748907 hv_utils 2dd1ce17-079e-403c-b352-a1921ee207ee: shutdown
745.750948 hv_utils b6650ff7-33bc-4840-8048-e0676786f393: shutdown
745.753012 hv_utils fd149e91-82e0-4a7d-afa6-2a4166cbd7c0: shutdown
745.755000 platform eisa.0: shutdown
745.756266 alarmtimer alarmtimer.0.auto: shutdown
745.757868 platform Fixed MDIO bus.0: shutdown
745.759447 kgdboc kgdboc: shutdown
745.760679 serial8250 serial8250: shutdown
745.762110 platform pcspkr: shutdown
745.763387 rtc_cmos 00:02: shutdown
745.764726 serial 00:01: shutdown
745.765898 serial 00:00: shutdown
745.767036 platform efivars.0: shutdown
745.768783 platform rtc-efi.0: shutdown
745.770240 vmgenid HYPER_V_GEN_COUNTER_V1:00: shutdown
745.771949 vmbus VMBUS:00: shutdown
745.773197 platform PNP0003:00: shutdown
745.774540 reg-dummy reg-dummy: shutdown
745.775964 ACPI: PM: Preparing to enter system sleep state S5
There's some modest variability in the individual steps, but the 110 ms
saved on the NVMe device seems to be given back on some other
devices. I did the comparison twice with similar results. (I have the
full data set with comparisons in an Excel spreadsheet.)
Any thoughts on what might be causing this? I haven't gone into the
details of your algorithms for parallelizing, but is there any extra
overhead that could be adding to the time? Or maybe this is
something unique to Hyper-V guests. The overall difference is only
a few 10's of milliseconds, so not that big of a deal. But maybe it's
an indicator that something unexpected is happening that we should
understand.
I'll keep thinking about the issue and see if I can get any more insight.
Michael Kelley
>
> Changes from V8:
>
> Deal with shutdown hangs resulting when a parent/supplier device is
> later in the devices_kset list than its children/consumers:
> * Ignore sync_state_only devlinks for shutdown dependencies
> * Ignore shutdown_after for devices that don't want async shutdown
> * Add a sanity check to revert to sync shutdown for any device that
> would otherwise wait for a child/consumer shutdown that hasn't
> already been scheduled
>
> Changes from V7:
>
> Do not expose driver async_shutdown_enable in sysfs.
> Wrapped a long line.
>
> Changes from V6:
>
> Removed a sysfs attribute that allowed the async device shutdown to be
> "on" (with driver opt-out), "safe" (driver opt-in), or "off"... what was
> previously "safe" is now the only behavior, so drivers now only need to
> have the option to enable or disable async shutdown.
>
> Changes from V5:
>
> Separated into multiple patches to make review easier.
> Reworked some code to make it more readable
> Made devices wait for consumers to shut down, not just children
> (suggested by David Jeffery)
>
> Changes from V4:
>
> Change code to use cookies for synchronization rather than async domains
> Allow async shutdown to be disabled via sysfs, and allow driver opt-in or
> opt-out of async shutdown (when not disabled), with ability to control
> driver opt-in/opt-out via sysfs
>
> Changes from V3:
>
> Bug fix (used "parent" not "dev->parent" in device_shutdown)
>
> Changes from V2:
>
> Removed recursive functions to schedule children to be shutdown before
> parents, since existing device_shutdown loop will already do this
>
> Changes from V1:
>
> Rewritten using kernel async code (suggested by Lukas Wunner)
>
>
> Stuart Hayes (4):
> driver core: don't always lock parent in shutdown
> driver core: separate function to shutdown one device
> driver core: shut down devices asynchronously
> nvme-pci: Make driver prefer asynchronous shutdown
>
> drivers/base/base.h | 4 +
> drivers/base/core.c | 137 +++++++++++++++++++++++++++-------
> drivers/nvme/host/pci.c | 1 +
> include/linux/device/driver.h | 2 +
> 4 files changed, 118 insertions(+), 26 deletions(-)
>
> --
> 2.39.3
>