Re: [PATCH v3 0/3] nvme: Add sysfs interface for APST configuration management

From: Yaxiong Tian
Date: Thu Apr 03 2025 - 03:06:36 EST




在 2025/4/3 12:25, Christoph Hellwig 写道:
On Tue, Apr 01, 2025 at 05:22:06PM +0800, Yaxiong Tian wrote:
From: Yaxiong Tian <tianyaxiong@xxxxxxxxxx>

This series enhances NVMe APST (Autonomous Power State Transition) support by:
1. Adding warnings for PST table allocation failures

That looks fine.

2. Exposing APST tables via sysfs for runtime inspection
3. Providing per-controller sysfs interface for APST configuration

Who is going to use this and how? We'll need proper tools for that,
and in general I'd prefer to have a common set of policies for APST
configurations and not random people coming up with lots of random
policies.

These two patches don't fundamentally change the APST configuration policy, but rather enable users to configure various APST parameters in real-time across different devices. As mentioned in commit <ebd8a93aa4f5> ("nvme: extend and modify the APST configuration algorithm"):

1)This patch only introduces partial functionality from the Windows driver - it doesn't enable dynamic regeneration of the APST table when switching between AC power and battery power.

2) Additionally, using default configurations on certain brand SSDs has been observed to increase power consumption.

Therefore, when a system contains multiple drives and users require different configurations for AC vs battery power scenarios, real-time APST configuration updates across different devices become necessary.It must be said that configuring these parameters is difficult for average users, but they can still use the original default settings. However, for advanced users looking to optimize their devices, this provides them with the necessary interface.

As for providing suitable tools for users, I suppose every advanced user has their own preferences and approaches. For example, one could trace device idle time statistics through trace events (e.g., nvme_complete_rq → nvme_setup_cmd) and use that data as a reference to configure apst_primary_timeout_ms.

For example,The device idle time distribution under specific operating
conditions is shown below (unit: nanoseconds):
We can set apst_primary_timeout_ms to 32ms, which ensures that over 99% of NVMe commands remain unaffected by APST.

@intervals:
[512, 1K) 749 |@@@@@@@@ |
[1K, 2K) 4856 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 1002 |@@@@@@@@@@ |
[4K, 8K) 756 |@@@@@@@@ |
[8K, 16K) 413 |@@@@ |
[16K, 32K) 36 | |
[32K, 64K) 14 | |
[64K, 128K) 9 | |
[128K, 256K) 13 | |
[256K, 512K) 8 | |
[512K, 1M) 4 | |
[1M, 2M) 4 | |
[2M, 4M) 3 | |
[4M, 8M) 4 | |
[8M, 16M) 3 | |
[16M, 32M) 0 | |
[32M, 64M) 6 | |
[64M, 128M) 5 | |
[128M, 256M) 5 | |
[256M, 512M) 15 | |
[512M, 1G) 7 | |