Re: [RFC PATCH -next v3 01/10] block: introduce BLK_FEAT_WRITE_ZEROES_UNMAP to queue limits features
From: Zhang Yi
Date: Thu Apr 10 2025 - 05:22:30 EST
On 2025/4/10 15:15, Christoph Hellwig wrote:
> On Thu, Apr 10, 2025 at 11:52:17AM +0800, Zhang Yi wrote:
>>
>> Thank you for your review and comments. However, I'm not sure I fully
>> understand your points. Could you please provide more details?
>>
>> AFAIK, the NVMe protocol has the following description in the latest
>> NVM Command Set Specification Figure 82 and Figure 114:
>>
>> ===
>> Deallocate (DEAC): If this bit is set to ‘1’, then the host is
>> requesting that the controller deallocate the specified logical blocks.
>> If this bit is cleared to ‘0’, then the host is not requesting that
>> the controller deallocate the specified logical blocks...
>>
>> DLFEAT:
>> Write Zeroes Deallocation Support (WZDS): If this bit is set to ‘1’,
>> then the controller supports the Deallocate bit in the Write Zeroes
>> command for this namespace...
>
> Yes. The host is requesting, not the controller shall. It's not
> guaranteed behavior and the controller might as well actually write
> zeroes to the media. That is rather stupid, but still.
IIUC, the DEAC is requested by the host, but the WZDS and DRB bits in
DLFEAT is returned by the controller(no?). The host will only initiate
a DEAC request when both WZDS and DRB are satisfied. So I think that
if the disk controller returns WZDS=1 and DRB=1, the kernel can only
trust it according to the protocol and then set
BLK_FEAT_WRITE_ZEROES_UNMAP flag, the kernel can't and also do not
need to identify those irregular disks.
>
> Also note that some write zeroes implementations in consumer devices
> are really slow even when deallocation is requested so that we had
> to blacklist them.
Yes, indeed. For now, the kernel can only detect through protocol
specifications, and there seems to be no better way to distinguish
the specific behavior of the disk. Perhaps we should emphasize that
this write_zeroes_unmap tag is not equivalent to disk support for
'fast' write zeros in the DOC.
Thanks.
Yi.