Re: [PATCH v6 0/5] scsi: ufs: Add Host Performance Booster Support

From: Bart Van Assche
Date: Wed Jul 22 2020 - 10:34:24 EST

Next message: Mika Westerberg: "Re: [PATCH] mtd: spi-nor: intel-spi: Simulate WRDI command"
Previous message: Christoph Hellwig: "dma-contiguous: cleanup dma_alloc_contiguous"
In reply to: Martin K. Petersen: "Re: [PATCH v6 0/5] scsi: ufs: Add Host Performance Booster Support"
Next in thread: Bean Huo: "Re: [PATCH v6 0/5] scsi: ufs: Add Host Performance Booster Support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2020-07-22 06:27, Martin K. Petersen wrote:
> Christoph Hellwig wrote:
>> As this monster seems to come back again and again let me re-iterate
>> my statement:
>>
>> I do not think Linux should support a broken standards extensions that
>> creates a huge share state between the Linux initiator and the target
>> device like this with all its associated problems.
>
> I spent a couple of hours looking at this series again last night. And
> while the code has improved, I do remain concerned about the general
> concept.
>
> I understand how caching the FTL in host memory can improve performance
> from a theoretical perspective. However, I am not sure how much a
> difference this is going to make outside of synthetic benchmarks. What
> are the workloads that keep reading the same blocks from media? Or does
> the performance improvement exclusively come from the second order
> pre-fetching effect for larger I/Os? If so, why is the device's internal
> L2P SRAM cache ineffective at handling that case?

Hi Martin,

These are great questions. The size of the L2P table is proportional to
the device capacity and device capacities keep increasing. My
understanding is that on-device SRAM is much more expensive than (host)
DRAM. Caching the L2P table in host memory allows to keep the (UFS)
device cost low. The Samsung HPB paper explains this as follows: "Mobile
storage devices typically have RAM with constrained size, thus lack in
memory to keep the whole mapping table."

This is not an entirely new approach. The L2P table of the Fusion-io
PCIe SSD adapters that were introduced more than ten years ago was
entirely kept in host DRAM. The manual of that device documented how
much memory the Fusion-io driver needed for the L2P table.

This issue is not unique to UFS devices. My understanding is that DRAM
cost is a significant part of the cost of enterprise and consumer SSD
devices. SSD manufacturers are also interested in solutions to reduce
the amount of DRAM inside SSDs. One possible solution, namely paging the
L2P table, has a significant disadvantage, namely that it doubles the
number of media accesses for random I/O with a small transfer size.

The performance benefit of HPB comes from significantly reducing the
number of media accesses in case of random I/O.

I am not claiming that HPB is a perfect solution. But I wouldn't be
surprised if enterprise SSD vendors would start looking into a similar
solution sooner or later.

Bart.

Next message: Mika Westerberg: "Re: [PATCH] mtd: spi-nor: intel-spi: Simulate WRDI command"
Previous message: Christoph Hellwig: "dma-contiguous: cleanup dma_alloc_contiguous"
In reply to: Martin K. Petersen: "Re: [PATCH v6 0/5] scsi: ufs: Add Host Performance Booster Support"
Next in thread: Bean Huo: "Re: [PATCH v6 0/5] scsi: ufs: Add Host Performance Booster Support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]