Re: [PATCH 0/2] nvme: Add kernel API for admin command

From: Christoph Hellwig
Date: Wed Sep 18 2019 - 09:26:16 EST


On Tue, Sep 17, 2019 at 10:39:09AM -0600, Keith Busch wrote:
> On Mon, Sep 16, 2019 at 12:13:24PM +0000, Baldyga, Robert wrote:
> > Ok, fair enough. We want to keep things hidden behind certain layers,
> > and that's definitely a good thing. But there is a problem with these
> > layers - they do not expose all the features. For example AFAIK there
> > is no clear way to use 512+8 format via block layer (nor even a way
> > to get know that bdev is formatted to particular lbaf). It's impossible
> > to use it without layering violations, which nobody sees as a perfect
> > solution, but it is the only one that works.
>
> I think your quickest path to supporting such a format is to consider
> each part separately, then bounce and interleave/unmix the data +
> metadata at another layer that understands how the data needs to be laid
> out in host memory. Something like this RFC here:
>
> http://lists.infradead.org/pipermail/linux-nvme/2018-February/015844.html
>
> It appears connecting to infradead lists is a bit flaky at the moment,
> so not sure if you'll be able to read the above link right now.
>
> Anyway, the RFC would have needed a bit of work to be considered, like
> using a mempool for the purpose, but it does generically make such
> formats usable through the block stack so maybe try flushing out that
> idea.

Even if we had a use case for that the bounce buffer is just too ugly
to live. And I'm really sick and tired of Intel wasting our time for
their out of tree monster given that they haven't even tried helping
to improve the in-kernel write caching layers.