Re: [PATCH v34 1/4] scsi: ufs: Introduce HPB feature

From: Stanley Chu
Date: Mon May 17 2021 - 05:40:21 EST


Hi Daejun,

Sorry I lost the cover letter so I replied this mail instead.

For this series,

Reviewed-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx>
Tested-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx>


On Thu, 2021-04-29 at 08:23 +0900, Daejun Park wrote:
> This is a patch for the HPB initialization and adds HPB function calls to
> UFS core driver.
>
> NAND flash-based storage devices, including UFS, have mechanisms to
> translate logical addresses of IO requests to the corresponding physical
> addresses of the flash storage.
> In UFS, Logical-address-to-Physical-address (L2P) map data, which is
> required to identify the physical address for the requested IOs, can only
> be partially stored in SRAM from NAND flash. Due to this partial loading,
> accessing the flash address area where the L2P information for that address
> is not loaded in the SRAM can result in serious performance degradation.
>
> The basic concept of HPB is to cache L2P mapping entries in host system
> memory so that both physical block address (PBA) and logical block address
> (LBA) can be delivered in HPB read command.
> The HPB READ command allows to read data faster than a read command in UFS
> since it provides the physical address (HPB Entry) of the desired logical
> block in addition to its logical address. The UFS device can access the
> physical block in NAND directly without searching and uploading L2P mapping
> table. This improves read performance because the NAND read operation for
> uploading L2P mapping table is removed.
>
> In HPB initialization, the host checks if the UFS device supports HPB
> feature and retrieves related device capabilities. Then, some HPB
> parameters are configured in the device.
>
> We measured the total start-up time of popular applications and observed
> the difference by enabling the HPB.
> Popular applications are 12 game apps and 24 non-game apps. Each target
> applications were launched in order. The cycle consists of running 36
> applications in sequence. We repeated the cycle for observing performance
> improvement by L2P mapping cache hit in HPB.
>
> The Following is experiment environment:
> - kernel version: 4.4.0
> - RAM: 8GB
> - UFS 2.1 (64GB)
>
> Result:
> +-------+----------+----------+-------+
> | cycle | baseline | with HPB | diff |
> +-------+----------+----------+-------+
> | 1 | 272.4 | 264.9 | -7.5 |
> | 2 | 250.4 | 248.2 | -2.2 |
> | 3 | 226.2 | 215.6 | -10.6 |
> | 4 | 230.6 | 214.8 | -15.8 |
> | 5 | 232.0 | 218.1 | -13.9 |
> | 6 | 231.9 | 212.6 | -19.3 |
> +-------+----------+----------+-------+
>
> We also measured HPB performance using iozone.
> Here is my iozone script:
> iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16
> -s $IO_RANGE/16 -F mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5
> mnt/tmp_6 mnt/tmp_7 mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12
> mnt/tmp_13 mnt/tmp_14 mnt/tmp_15 mnt/tmp_16
>
> Result:
> +----------+--------+---------+
> | IO range | HPB on | HPB off |
> +----------+--------+---------+
> | 1 GB | 294.8 | 300.87 |
> | 4 GB | 293.51 | 179.35 |
> | 8 GB | 294.85 | 162.52 |
> | 16 GB | 293.45 | 156.26 |
> | 32 GB | 277.4 | 153.25 |
> +----------+--------+---------+