RE: [RFC PATCH 0/5] scsi: ufs: Add Host Performance Booster Support
From: Avri Altman
Date: Sat Jun 06 2020 - 08:02:46 EST
Hi,
>
> NAND flash memory-based storage devices use Flash Translation Layer (FTL)
> to translate logical addresses of I/O requests to corresponding flash
> memory addresses. Mobile storage devices typically have RAM with
> constrained size, thus lack in memory to keep the whole mapping table.
> Therefore, mapping tables are partially retrieved from NAND flash on
> demand, causing random-read performance degradation.
>
> To improve random read performance, we propose HPB (Host Performance
we propose --> jedec spec XXX proposes â
and here you also disclose what version of the spec are you supporting
> Booster) which uses host system memory as a cache for the FTL mapping
> table. By using HPB, FTL data can be read from host memory faster than from
> NAND flash memory.
>
> The current version only supports the DCM (device control mode).
> This patch consists of 4 parts to support HPB feature.
>
> 1) UFS-feature layer
> 2) HPB probe and initialization process
> 3) READ -> HPB READ using cached map information
> 4) L2P (logical to physical) map management
>
> The UFS-feature is an additional layer to avoid the structure in which the
> UFS-core driver and the UFS-feature are entangled with each other in a
> single module.
> By adding the layer, UFS-features composed of various combinations can be
> supported. Also, even if a new feature is added, modification of the
> UFS-core driver can be minimized.
Like Bart, I am not sure that this extra module is needed.
It only makes sense if indeed there are some common calls that can be shared by several features.
There are up to now 10 extended features defined, but none of them can share a common api.
What other features can share this additional layer? And how those ops can be reused?
If you have some future implementations in mind, you should add this api once you'll add those.
>
> In the HPB probe and init process, the device information of the UFS is
> queried. After checking supported features, the data structure for the HPB
> is initialized according to the device information.
>
> A read I/O in the active sub-region where the map is cached is changed to
> HPB READ by the HPB module.
>
> The HPB module manages the L2P map using information received from the
> device. For active sub-region, the HPB module caches through ufshpb_map
> request. For the in-active region, the HPB module discards the L2P map.
> When a write I/O occurs in an active sub-region area, associated dirty
> bitmap checked as dirty for preventing stale read.
>
> HPB is shown to have a performance improvement of 58 - 67% for random
> read
> workload. [1]
>
> This series patches are based on the "5.8/scsi-queue" branch.
>
> [1]:
> https://www.usenix.org/conference/hotstorage17/program/presentation/jeo
> ng
This 2017 study, is being cited by everyone, but does not really describes it's test setup to its details.
It does say however that they used a 16MB subregions over a range of 1GB,
which can be covered by a 64 active regions, Even for a single subregion per region.
Meaning no eviction should take place, thus HPB overhead is minimized.
Do we have a more recent public studies that supports those impressive figures?
Thanks,
Avri